Data Management

Research data management (RDM) is a term that describes the organization, storage, preservation, and sharing of data collected and used in a research project. It involves the everyday management of research data throughout the lifecycle of a research project. UC San Diego researchers have an abundance of resources to guide them through all phases of data management in the data lifecycle.

Key Terms

Data Literacy: The Data Literacy Project describes data literacy as "the ability to read, work with, analyze, and argue with data," while Tableau defines data literacy as "the ability to explore, understand, and communicate with data." Expanding upon those definitions a bit further, Carlson et al. (2011) stresses that being data literate also means being a *critical consumer* of data/statistics, noting that "data literacy involves understanding what data mean, including how to read charts appropriately, draw correct conclusions from data, and recognize when data are being used in misleading or inappropriate ways."

Data Science: Data science is an interdisciplinary field that uses algorithms, procedures, and processes to examine large amounts of data in order to uncover hidden patterns, generate insights, and direct decision-making. They are analytics professionals who are responsible for collecting, organizing, analyzing, interpreting, and presenting data to help drive decision-making.

Data Lifecycle: Data lifecycle management (DLM) is a holistic approach to moving data through an interconnected sequence of phases during the execution of a research project. From data creation to data destruction, the adoption of tools, services, and leading practices in each phase prioritizes data protection, disaster recovery, and maximizes data value and integrity.

Research Data Lifecycle at UC San Diego - This infographic associates specific UC San Diego services and resources with the various stages of the data lifecycle.

The Research Data Curation Program provides data curation tools and services through the Library. Analysts are available to assist researchers through all phases of the data lifecycle; they have expertise in data management, sharing and discovery, digital preservation, and the UC San Diego Library Digital Collections.

Plan & Design

Researchers assess their project’s data needs, select tools and standards, define workflows, and outline storage, documentation, and metadata strategies to ensure a solid foundation.

Why it Matters

Planning and designing your data strategy early sets the stage for a successful project. It ensures accuracy, supports reproducibility and compliance, streamlines workflows, and enables effective collaboration. Thoughtful planning also increases the long-term value of your data, making it easier to share, reuse, and preserve, ultimately extending the impact of your research.

Reference Management

Managing references is a core part of research design. Reference management tools (citation or bibliographic software) help you collect, organize, and format sources. They simplify switching citation styles, generating bibliographies, and maintaining large libraries of references — streamlining your workflow from the start.

Resource: University of Oxford, Bodleian Libraries: Choosing a Reference Manager

Some Popular Reference Management Applications

Resource: EndNote

Resource: Zotero

Resource: Mendele

Most major funders, including NIH and NSF, now require a Data Management and Sharing Plan. More than a compliance document, a DMSP serves as a blueprint for how you will collect, manage, share, and preserve your data.

A strong plan:

Anticipates preservation, access, and reuse needs
Uses funder-approved templates and institutional guidance
Documents tools, standards, and workflows
Builds in support from campus resources and services

NIH requires a two-page DMSP covering: data types, related software/code, standards, preservation, access/distribution, reuse, and oversight.

Resource: University of California Office of the President, California Digital Library: DMPTool

Resource: NIH: Data Management and Sharing policy (effective January 25, 2023)

Metadata and Data Standards

Good design includes documenting data so it’s usable by others. Metadata — “data about data” — should be stored with your datasets, often in a README file. Supporting documentation may include data dictionaries, protocols, or lab notebooks.

Standards guide how data is collected, represented, named, and shared and are specific to particular type of experiment of field of study. Using discipline-specific standards ensures your data is interoperable and reusable.

Resource: Fairsharing.org is a great resource for metadata standards and other data policies.

Resource: NIH Common Data Elements is specific to NIH institutes and fields of study

Resource: University of Denver offers its DMP Data and Metadata Guidelines

Directory Structure and File Organization

Organize your data early on; this cannot be overemphasized. Cleaning and organizing a messy directory structure later, or locating specific files due to poor naming convention, can result in lost time and misplaced information. This becomes even more important when a team of many are working from a common repository. When structuring the hierarchy:

Best practices include:

Structuring folders by workflow or meaningful categories
Documenting the system in a README file
Assigning PI/project owner control of top-level folders
Recording roles, responsibilities, and workflows for contributors

README files should explain the folder hierarchy, naming conventions, variable definitions, and include links to public or derived data sources. A data dictionary can further describe dataset contents, types, units, and values.

Resource: Stony Brook University: File Names and Variables

Resource: University of Ottowa: File Naming and Organization of Data

Resource: Harvard University: File Naming Conventions

Contributor Roles

For projects involving many contributors, clearly define roles of participants and provide a detailed description of their contributions. Roles should include:

Checklist Manager (assigns team members to various checklist items)
DMP Manager (chronicles changes, milestones, and accomplishments)
Data Workflows Manager (documents and diagrams data workflows and files)

Resource: Center for Open Science: Onboarding Checklist

Data Dictionaries

Data Dictionary is a set of information describing the contents, format and structure of the data and the relationship between the elements. One may include:

Variable names and descriptions
Data Types (such as date or integer)
Units of measurement
Possible values

Resource: UC Merced: What is a Data Dictionary?

Campus-Endorsed Tools and Institutional Resources

Resource: Lucidchart

Resource: Microsoft O365

Resource: Google Workspace

Resource: Office of Research Affairs

Resource: Office of IRB Administration

Resource: Research Service Core (RSC)

Resource: Sponsored Research Administration (SPO)

Other Resources and Popular Tools

Resource: Dryad’s good data practices

Resource: Data Management: Software Carpentry video (8:32)

Resource: UC San Diego Research Data Access, Use and Management

Resource: UC San Diego Unfunded Research Agreements including Data Use Agreements

Resource: Overleaf

Resource: LaTex

Collect & Create

Researchers gather, clean, and document data, apply consistent formats, and organize files while ensuring quality, security, and compliance with ethical or regulatory requirements.

Why it Matters

The Collect & Create phase is where research data begins to take shape. Careful collection and documentation ensure accuracy, reliability, and reproducibility while reducing errors and inefficiencies. Thoughtful design at this stage also sets the stage for effective collaboration, secure storage, and future analysis, maximizing the long-term value of your data.

Data Storage

Data rarely stays in one place throughout a project. It often moves among devices and systems as research progresses. For example:

Collected on a lab computer or instrument
Processed on a laptop or workstation
Transferred to an institutional server, external hard drive, or cloud service for storage
Deposited into a public repository for sharing

Tracking where your data lives at each stage is critical. Plan ahead for associated costs, keep multiple secure copies, and consider geographically dispersed backups to guard against hardware failure or natural disaster. Always maintain at least one offline, read-only copy in a secure location.

Divisional or departmental data storage options may be available to you as well.

Cloud Storage

Cloud storage offers scalability and redundancy, but requires careful planning:

Costs – Active storage is more expensive than archival tiers; design your workflow to optimize costs.
Compliance – Some funders require data to remain on campus, in-state, or within national borders.
Workflow design – Use tiered storage to balance accessibility with long-term archiving needs.

Data Security

Protecting your data is just as important as collecting it. Research data can include sensitive personal information, intellectual property, or commercially valuable results — all of which require appropriate safeguards.

SecureConnect - an initiative that builds on university-wide cybersecurity standards and expectations. Similar measures exist across UC and peer institutions.

Key Considerations:

Protected health information (PHI) must be collected and stored in compliance with HIPAA and IRB guidelines.
Patents and commercial data may have special requirements for confidentiality.
Protection levels – Determine your dataset’s classification under UC or institutional policies to apply the right security standards.

Determine which Protection Level Classification your data falls within and understand the UC Policies on Intellectual Property.

Basic Data Security Practices

Add strong, unique passwords to files or folders.
Lock lab and office computers when unattended.
Follow institutional security policies and recommendations.

Understand campus security policies and recommendations.

Resource: UC San Diego Security Website

Resource: Contact the UC San Diego Office of Information Assurance

Campus-Endorsed Tools & Institutional Resources

Lucidchart

Qualtrics

Export Control Office

San Diego Supercomputer Center (SDSC)

Altman Clinical & Translational Research Institute (SCTRI)

Other Resources & Popular Tools

OpenRefine

Open Science Framework

Electronic Lab Notebooks (ELNs) - A comparison matrix

Collaborate & Analyze

Researchers explore and transform data, test hypotheses, create visualizations, and work with team members to interpret results, document methods, and generate actionable insights.

Why it Matters

Collaboration and analysis turn raw data into meaningful insights. Effective teamwork, clear communication, and documented workflows ensure transparency and reproducibility, while robust analysis and visualization uncover patterns, test hypotheses, and generate actionable conclusions that advance knowledge and support decision-making.

Analysis is the bridge between data collection and decision-making. Well-executed analysis allows teams to:

Identify trends, correlations, and relationships
Test hypotheses rigorously
Generate insights that inform next steps or further research
Ensure transparency and reproducibility through clear documentation

Collaboration

Collaboration is central during this phase. Sharing data, methods, and findings with team members or external collaborators improves efficiency, quality, and reproducibility. Effective collaboration relies on:

Shared storage solutions for data and scripts
Version control systems (e.g., Git, GitHub) to track changes
Clear communication and documentation practices

Harvard University Highlights the importance of collaborative efforts in analyzing and interpreting complex data sets.

DataONE: Analyze Best practices to keep in mind during the Analyze stage.

Analyze

Data Exploration: Researchers explore datasets using statistical methods, machine learning, or algorithms to understand distributions, identify clusters or outliers, and uncover hidden patterns.

Hypothesis Testing: Formulate and test hypotheses to determine whether specific relationships or trends exist in the data.

Data Visualization: Visualizations translate complex data into actionable insights. Charts, graphs, dashboards, and interactive tools like Tableau or Power BI help illustrate trends and communicate findings effectively.

Reporting & Documentation: Documenting methods, cleaning procedures, and analyzing steps is critical for reproducibility and transparency. Comprehensive records support peer review, regulatory compliance, and future research use.

Insight Generation: The final goal is drawing conclusions and identifying actionable opportunities. Insights may inform decision-making, guide interventions, or suggest new research directions.

Campus-Endorsed Tools and Institutional Resources

High-Performance & High-Throughput Computing

Matlab

Stata

The Carpentries (Python, R, Git, Bash, OpenRefine Training)

Globus

RDL-Share

Other Resources and Popular Tools

Jupyter Notebooks

GitHub: Version Control

Publish & Archive

Researchers deposit datasets into repositories, assign persistent identifiers, document methods and metadata, and preserve data in secure, long-term storage to ensure discoverability and compliance.

Why it Matters

Publishing and archiving safeguard your research for today and the future. Making data discoverable and citable increases visibility and scholarly impact, while long-term preservation ensures integrity, compliance, and continued usability. These practices protect the investment in your research and maximize its contribution to the broader scientific community.

Publishing Research Data

Publishing research data typically involves depositing it into a trusted repository.

Institutional Repository – The Research Data Curation Programanalysts at UC San Diego can guide you and assist with data ingestion to our university’s institutional repositories.
Domain-Specific Repositories – Some fields have well-established domain-specific repositories (e.g., GenBank, ICPSR, Dryad).
Generalist Repositories – Options such as Zenodo or figshare are useful when no domain-specific repository exists.

Best Practices

Include rich metadata and documentation.

Apply a suitable license to clarify reuse rights.
Ensure sensitive or restricted data complies with IRB, HIPAA, or other regulatory requirements.

Archiving Research Data

Archiving secures your data and metadata for the long term.

Storage Duration – Our university guarantees preservation for at least X years in the institutional archive.
Formats – Use open, non-proprietary formats when possible (e.g., CSV, TXT, TIFF).
Data Integrity – Files are checked regularly to prevent data loss or corruption.
Access Controls – Options for open, embargoed, or restricted access.

Campus-Endorsed Tools and Institutional Resources

Resource: Scholarly Communications

Resource: Research Data Curation Program

Resource: FAIR Data Office

Resource: Office of Research and Innovation

Other Resources and Popular Tools

Resource: ORCiD

Share & Reuse

Researchers make data accessible to colleagues or the broader community, provide clear documentation and metadata, and apply licenses or permissions to enable responsible reuse.

Why it Matters

Sharing and reusing data extends its reach and impact. Making datasets accessible to others promotes transparency, enables replication, and fosters collaboration across disciplines. Thoughtful sharing also ensures that data can be reused responsibly, preserving its value for new research, discoveries, and long-term knowledge building.

Sharing starts with ensuring your data is findable, understandable, and accessible. This often means:

Depositing your data in a repository where it can be discovered and cited.
Providing clear documentation and metadata to make your dataset usable.
Applying licenses that define how others can use your data.

Researchers can choose to share openly, restrict access, or provide controlled sharing depending on data sensitivity and compliance requirements.

Reusing research data

Reusing data requires trust in its quality and clarity. Well-documented datasets allow researchers to:

Replicate or validate original findings.
Combine multiple datasets to generate new insights.
Apply data to novel research questions or cross-disciplinary studies.

Our institutional repository and trusted domain repositories ensure data is curated for long-term reuse, with metadata and identifiers to support reliable citation.

Campus-Endorsed Tools and Institutional Resources

Dryad

Discipline Specific Repositories

Digital Collections

Chronopolis

Other Resources and Popular Tools

California Digital Library

ORA

FAIR Data Office

RDCP

Key Terms

Why it Matters

Reference Management

Some Popular Reference Management Applications

Data Management and Sharing Plans (DMSP)

Metadata and Data Standards

Directory Structure and File Organization

Contributor Roles

Data Dictionaries

Campus-Endorsed Tools and Institutional Resources

Other Resources and Popular Tools

Why it Matters

Data Security

Basic Data Security Practices

Campus-Endorsed Tools & Institutional Resources

Other Resources & Popular Tools

Why it Matters

Collaboration

Analyze

Campus-Endorsed Tools and Institutional Resources

Other Resources and Popular Tools

Why it Matters

Publishing Research Data

Best Practices

Archiving Research Data

Campus-Endorsed Tools and Institutional Resources

Other Resources and Popular Tools

Why it Matters

Sharing research data

Reusing research data

Campus-Endorsed Tools and Institutional Resources

Other Resources and Popular Tools