Research IT Services
- About
- Computing
- Data Management
- Networking
- Storage
- Security
- Training & Events
- Pre/Post Award
Data Management
Research data management (RDM) is a term that describes the organization, storage, preservation, and sharing of data collected and used in a research project. It involves the everyday management of research data throughout the lifecycle of a research project. UC San Diego researchers have an abundance of resources to guide them through all phases of data management in the data lifecycle.
Data Literacy: The Data Literacy Project describes data literacy as "the ability to read, work with, analyze, and argue with data," while Tableau defines data literacy as "the ability to explore, understand, and communicate with data." Expanding upon those definitions a bit further, Carlson et al. (2011) stresses that being data literate also means being a *critical consumer* of data/statistics, noting that "data literacy involves understanding what data mean, including how to read charts appropriately, draw correct conclusions from data, and recognize when data are being used in misleading or inappropriate ways."
Data Science: Data science is an interdisciplinary field that uses algorithms, procedures, and processes to examine large amounts of data in order to uncover hidden patterns, generate insights, and direct decision-making. They are analytics professionals who are responsible for collecting, organizing, analyzing, interpreting, and presenting data to help drive decision-making.
Data Lifecycle: Data lifecycle management (DLM) is a holistic approach to moving data through an interconnected sequence of phases during the execution of a research project. From data creation to data destruction, the adoption of tools, services, and leading practices in each phase prioritizes data protection, disaster recovery, and maximizes data value and integrity.
Research Data Lifecycle at UC San Diego - This infographic associates specific UC San Diego services and resources with the various stages of the data lifecycle.
The Research Data Curation Program provides data curation tools and services through the Library. Analysts are available to assist researchers through all phases of the data lifecycle; they have expertise in data management, sharing and discovery, digital preservation, and the UC San Diego Library Digital Collections.
Researchers assess their project’s data needs, select tools and standards, define workflows, and outline storage, documentation, and metadata strategies to ensure a solid foundation.
Planning and designing your data strategy early sets the stage for a successful project. It ensures accuracy, supports reproducibility and compliance, streamlines workflows, and enables effective collaboration. Thoughtful planning also increases the long-term value of your data, making it easier to share, reuse, and preserve, ultimately extending the impact of your research.
Managing references is a core part of research design. Reference management tools (citation or bibliographic software) help you collect, organize, and format sources. They simplify switching citation styles, generating bibliographies, and maintaining large libraries of references — streamlining your workflow from the start.
Resource: University of Oxford, Bodleian Libraries: Choosing a Reference Manager
Resource: EndNote
Resource: Zotero
Resource: Mendele
Most major funders, including NIH and NSF, now require a Data Management and Sharing Plan. More than a compliance document, a DMSP serves as a blueprint for how you will collect, manage, share, and preserve your data.
A strong plan:
Anticipates preservation, access, and reuse needs
Uses funder-approved templates and institutional guidance
Documents tools, standards, and workflows
Builds in support from campus resources and services
NIH requires a two-page DMSP covering: data types, related software/code, standards, preservation, access/distribution, reuse, and oversight.
Resource: University of California Office of the President, California Digital Library: DMPTool
Resource: NIH: Data Management and Sharing policy (effective January 25, 2023)
Good design includes documenting data so it’s usable by others. Metadata — “data about data” — should be stored with your datasets, often in a README file. Supporting documentation may include data dictionaries, protocols, or lab notebooks.
Standards guide how data is collected, represented, named, and shared and are specific to particular type of experiment of field of study. Using discipline-specific standards ensures your data is interoperable and reusable.
Resource: Fairsharing.org is a great resource for metadata standards and other data policies.
Resource: NIH Common Data Elements is specific to NIH institutes and fields of study
Resource: University of Denver offers its DMP Data and Metadata Guidelines
Organize your data early on; this cannot be overemphasized. Cleaning and organizing a messy directory structure later, or locating specific files due to poor naming convention, can result in lost time and misplaced information. This becomes even more important when a team of many are working from a common repository. When structuring the hierarchy:
Best practices include:
Structuring folders by workflow or meaningful categories
Documenting the system in a README file
Assigning PI/project owner control of top-level folders
Recording roles, responsibilities, and workflows for contributors
README files should explain the folder hierarchy, naming conventions, variable definitions, and include links to public or derived data sources. A data dictionary can further describe dataset contents, types, units, and values.
Resource: Stony Brook University: File Names and Variables
Resource: University of Ottowa: File Naming and Organization of Data
Resource: Harvard University: File Naming Conventions
For projects involving many contributors, clearly define roles of participants and provide a detailed description of their contributions. Roles should include:
Checklist Manager (assigns team members to various checklist items)
DMP Manager (chronicles changes, milestones, and accomplishments)
Data Workflows Manager (documents and diagrams data workflows and files)
Resource: Center for Open Science: Onboarding Checklist
Data Dictionary is a set of information describing the contents, format and structure of the data and the relationship between the elements. One may include:
Variable names and descriptions
Data Types (such as date or integer)
Units of measurement
Possible values
Resource: UC Merced: What is a Data Dictionary?
Resource: Lucidchart
Resource: Microsoft O365
Resource: Google Workspace
Resource: Office of Research Affairs
Resource: Office of IRB Administration
Resource: Research Service Core (RSC)
Resource: Sponsored Research Administration (SPO)
Resource: Dryad’s good data practices
Resource: Data Management: Software Carpentry video (8:32)
Resource: UC San Diego Research Data Access, Use and Management
Resource: UC San Diego Unfunded Research Agreements including Data Use Agreements
Resource: Overleaf
Resource: LaTex
Researchers gather, clean, and document data, apply consistent formats, and organize files while ensuring quality, security, and compliance with ethical or regulatory requirements.
The Collect & Create phase is where research data begins to take shape. Careful collection and documentation ensure accuracy, reliability, and reproducibility while reducing errors and inefficiencies. Thoughtful design at this stage also sets the stage for effective collaboration, secure storage, and future analysis, maximizing the long-term value of your data.
Data rarely stays in one place throughout a project. It often moves among devices and systems as research progresses. For example:
Collected on a lab computer or instrument
Processed on a laptop or workstation
Transferred to an institutional server, external hard drive, or cloud service for storage
Deposited into a public repository for sharing
Tracking where your data lives at each stage is critical. Plan ahead for associated costs, keep multiple secure copies, and consider geographically dispersed backups to guard against hardware failure or natural disaster. Always maintain at least one offline, read-only copy in a secure location.
Divisional or departmental data storage options may be available to you as well.
Cloud storage offers scalability and redundancy, but requires careful planning:
Costs – Active storage is more expensive than archival tiers; design your workflow to optimize costs.
Compliance – Some funders require data to remain on campus, in-state, or within national borders.
Workflow design – Use tiered storage to balance accessibility with long-term archiving needs.
Protecting your data is just as important as collecting it. Research data can include sensitive personal information, intellectual property, or commercially valuable results — all of which require appropriate safeguards.
SecureConnect - an initiative that builds on university-wide cybersecurity standards and expectations. Similar measures exist across UC and peer institutions.
Protected health information (PHI) must be collected and stored in compliance with HIPAA and IRB guidelines.
Patents and commercial data may have special requirements for confidentiality.
Protection levels – Determine your dataset’s classification under UC or institutional policies to apply the right security standards.
Determine which Protection Level Classification your data falls within and understand the UC Policies on Intellectual Property.
Lock lab and office computers when unattended.
Follow institutional security policies and recommendations.
Understand campus security policies and recommendations.
Resource: UC San Diego Security Website
Resource: Contact the UC San Diego Office of Information Assurance
San Diego Supercomputer Center (SDSC)
Altman Clinical & Translational Research Institute (SCTRI)
Electronic Lab Notebooks (ELNs) - A comparison matrix
Researchers explore and transform data, test hypotheses, create visualizations, and work with team members to interpret results, document methods, and generate actionable insights.
Collaboration and analysis turn raw data into meaningful insights. Effective teamwork, clear communication, and documented workflows ensure transparency and reproducibility, while robust analysis and visualization uncover patterns, test hypotheses, and generate actionable conclusions that advance knowledge and support decision-making.
Analysis is the bridge between data collection and decision-making. Well-executed analysis allows teams to:
Identify trends, correlations, and relationships
Test hypotheses rigorously
Generate insights that inform next steps or further research
Ensure transparency and reproducibility through clear documentation
Collaboration is central during this phase. Sharing data, methods, and findings with team members or external collaborators improves efficiency, quality, and reproducibility. Effective collaboration relies on:
Shared storage solutions for data and scripts
Version control systems (e.g., Git, GitHub) to track changes
Clear communication and documentation practices
Harvard University Highlights the importance of collaborative efforts in analyzing and interpreting complex data sets.
DataONE: Analyze Best practices to keep in mind during the Analyze stage.
Data Exploration: Researchers explore datasets using statistical methods, machine learning, or algorithms to understand distributions, identify clusters or outliers, and uncover hidden patterns.
Hypothesis Testing: Formulate and test hypotheses to determine whether specific relationships or trends exist in the data.
Data Visualization: Visualizations translate complex data into actionable insights. Charts, graphs, dashboards, and interactive tools like Tableau or Power BI help illustrate trends and communicate findings effectively.
Reporting & Documentation: Documenting methods, cleaning procedures, and analyzing steps is critical for reproducibility and transparency. Comprehensive records support peer review, regulatory compliance, and future research use.
Insight Generation: The final goal is drawing conclusions and identifying actionable opportunities. Insights may inform decision-making, guide interventions, or suggest new research directions.
High-Performance & High-Throughput Computing
The Carpentries (Python, R, Git, Bash, OpenRefine Training)
Researchers deposit datasets into repositories, assign persistent identifiers, document methods and metadata, and preserve data in secure, long-term storage to ensure discoverability and compliance.
Publishing and archiving safeguard your research for today and the future. Making data discoverable and citable increases visibility and scholarly impact, while long-term preservation ensures integrity, compliance, and continued usability. These practices protect the investment in your research and maximize its contribution to the broader scientific community.
Publishing research data typically involves depositing it into a trusted repository.
Institutional Repository – The Research Data Curation Programanalysts at UC San Diego can guide you and assist with data ingestion to our university’s institutional repositories.
Domain-Specific Repositories – Some fields have well-established domain-specific repositories (e.g., GenBank, ICPSR, Dryad).
Generalist Repositories – Options such as Zenodo or figshare are useful when no domain-specific repository exists.
Apply a suitable license to clarify reuse rights.
Ensure sensitive or restricted data complies with IRB, HIPAA, or other regulatory requirements.
Archiving secures your data and metadata for the long term.
Storage Duration – Our university guarantees preservation for at least X years in the institutional archive.
Formats – Use open, non-proprietary formats when possible (e.g., CSV, TXT, TIFF).
Data Integrity – Files are checked regularly to prevent data loss or corruption.
Access Controls – Options for open, embargoed, or restricted access.
Resource: Scholarly Communications
Resource: Research Data Curation Program
Resource: FAIR Data Office
Resource: Office of Research and Innovation
Resource: ORCiD
Researchers make data accessible to colleagues or the broader community, provide clear documentation and metadata, and apply licenses or permissions to enable responsible reuse.
Sharing and reusing data extends its reach and impact. Making datasets accessible to others promotes transparency, enables replication, and fosters collaboration across disciplines. Thoughtful sharing also ensures that data can be reused responsibly, preserving its value for new research, discoveries, and long-term knowledge building.
Sharing starts with ensuring your data is findable, understandable, and accessible. This often means:
Depositing your data in a repository where it can be discovered and cited.
Providing clear documentation and metadata to make your dataset usable.
Applying licenses that define how others can use your data.
Researchers can choose to share openly, restrict access, or provide controlled sharing depending on data sensitivity and compliance requirements.
Reusing data requires trust in its quality and clarity. Well-documented datasets allow researchers to:
Replicate or validate original findings.
Combine multiple datasets to generate new insights.
Apply data to novel research questions or cross-disciplinary studies.
Our institutional repository and trusted domain repositories ensure data is curated for long-term reuse, with metadata and identifiers to support reliable citation.
Dryad
Discipline Specific Repositories
Digital Collections
Chronopolis
California Digital Library
ORA
FAIR Data Office
RDCP