Data Sharing Approaches

Get familiar with how and when NIH expects data to be shared.

1
/faqs#/data-sharing.htm
/faqs#/data-management-and-sharing-policy.htm

Data Sharing Approaches

Under NIH’s 2003 data sharing policy, NIH encourages investigators to use the sharing approaches most appropriate for their data. Investigators may choose an approach for sharing based on factors such as the sensitivity of the data, the size and complexity of the dataset, and the volume of requests anticipated.

Below are some examples of data sharing approaches to consider:

Depositing data in a data archive, a place where machine-readable data are acquired, manipulated, documented, and finally distributed to the scientific community for further analysis.

  • Transferring to a data archive facility may help an investigator distribute data more widely to interested users, to maintain associated documentation, and to meet reporting requirements. 
  • Data archives can be particularly attractive for investigators concerned about a large volume of requests or providing technical assistance to users seeking help with analyses.

Depositing data in a data enclave, a secure environment in which eligible researchers can perform analyses using restricted or controlled data resources.

  • If an investigator is working with datasets that cannot be distributed to the general public for privacy, security, or other reasons, they can consider distributing instead through a data enclave.

Distributing data under the auspices of the investigator who is responsible for storing, managing, and sharing of the data. For example, they may mail a CD or post the data on a personal or institutional website for downloading. The investigator also vets and makes decisions on access requests.

  • Some investigators sharing under their own auspices may choose to form collaborations with other investigators.

Mixed mode sharing, any combination of the above approaches.

Data Repositories

Need help finding or choosing a data repository? See Selecting a Data Repository.

Data Use Agreement

Investigators sharing under their own auspices should consider using a data use agreement to impose appropriate limitations on users. Agreements usually include elements such as:

  • Criteria for data access
  • Conditions for research use
  • Privacy and confidentiality standards to ensure data security and to prohibit attempts at identifying subjects
  • Whether or not it is prohibited to transfer the data to other users, or conditions on the transfer of data
  • Penalties for violating the agreement
Examples of data use agreements for specific datasets:

Timelines for Data Sharing

Because the value of data often depends on their timeliness, NIH expects the release and sharing of data to be no later than the acceptance for publication of the main findings from the final dataset.

The nature of the data collected will affect how quickly a dataset can be released. Data from small studies can be analyzed and submitted for publication relatively quickly. Data from large epidemiologic or longitudinal studies, which are collected over several discrete time periods or waves, can be released in waves as data become available or as findings from the waves of the data are published.

NIH recognizes that the investigators who collected the data have a legitimate interest in benefiting from their investment of time and effort. NIH continues to expect that the initial investigators may benefit from first and continuing use but not from prolonged exclusive use.

Sharing Data from Human Research Participants 

The rights and privacy of human research participants who participate in NIH-sponsored research must be protected at all times. It is the responsibility of the investigators, their institution and reviewing Institutional Review Board (IRB) to protect the rights of research participants and the confidentiality of the data.

If research participants are promised that their data will not be shared with other researchers, the application should explain the reasons for such promises. Such promises should not be made routinely and without adequate justification. In general, it is inappropriate for the initial investigator to place limits on the research questions or methods other investigators might pursue with the data. It is also not appropriate for the investigator who produced the data to require authorship as a condition for sharing the data.

Investigators who are planning to share data obtained from research involving human participants should discuss the potential risks posed by data sharing, and the steps taken to address those risks with research participants as part of the informed consent process. Plans for protecting privacy can be discussed in the Human Subjects and Clinical Trials Information Form.

Below are several ways for investigators to protect the privacy of human participants when sharing data:

  • Prior to sharing, remove identifiers from the data.
    • In addition to removing direct identifiers such as name, address, telephone numbers, and Social Security Numbers, researchers should consider removing indirect identifiers and other information that could reveal participants’ identities. Identification from indirect identifiers is a higher risk for participants from small geographic areas, rare populations, or in linked datasets.
  • Adopt strategies to minimize risks of unauthorized disclosure of personal identifiers.
  • Although not required, investigators may reduce the risk of subject identification by withholding part of the data. Alternatively, they may statistically alter the data in ways that will not compromise secondary analyses but will protect individual subjects’ identities.
  • When entering into a data use agreement, include conditions for protecting confidentiality and privacy.

Sharing Data from Clinical Trials

Some study designs may give greater privacy protection to subjects than others.  Before starting their research, investigators who are planning clinical trials and intend to share the resulting data should carefully consider potential privacy risks to research participants associated with the study design and the data that will collected and describe these as well as mitigation approaches in the informed consent process and consent forms. As a resource, the U.S. Department of Health & Human Services (HHS) provides a video on the informed consent process.

NIH recognizes that the sharing of data from clinical trials and other situations may sometimes require anonymizing the data. Alternatively, investigators may share data through a restricted data enclave, which would only give access to researchers who agree to preserve the privacy of subjects.

Investigators who work for or who are themselves covered entities under the federal Health Insurance Portability and Accountability Act (HIPAA) must be aware of HIPAA Privacy Rule requirements for the protection of protected health information. For more information, refer to the HHS page, The HIPAA Privacy Rule.

Generating large-scale genomic data? NIH’s Genomic Data Sharing (GDS) policy may also apply to your research. See our GDS Policy Overview page to learn more.

Methods for Sharing Scientific Data

Under the 2023 Data Management and Sharing (DMS) policy, NIH encourages investigators to use an established repository.

When selecting a repository, investigators should choose based on factors such as the sensitivity of the data, the size and complexity of the dataset, and the volume of requests anticipated.

Need help finding or choosing a data repository? See Selecting a Data Repository.

Timeliness of Data Sharing

Scientific data should be made accessible as soon as possible.

More specifically, the DMS Policy expects scientific data to be shared by the earlier of these two timepoints:

The time of an associated publication: Scientific data underlying peer-reviewed journal articles should be made accessible no later than the date on which the article is first made available in print or electronic format.

The end of the performance period: Scientific data underlying findings not disseminated through peer-reviewed journal articles should be shared by the end of the performance period unless the grant enters into a no-cost extension. If a no cost extension is permitted, then the recipient should share the data by the end of the extended performance period. In addition, researchers should be aware that some preprint servers may require the sharing of data upon preprint posting, and repositories storing data may similarly require public release of data upon preprint posting.

For data sharing expectations for those receiving SBIR/STTR awards, please consult the FAQ “Do SBIR/STTR projects have to share scientific data under the DMS Policy?”

Not that individual funding agreements or other policies that apply to a specific project may have earlier expectations for data sharing timelines.

Data Preservation

Researchers are encouraged to consider relevant requirements and expectations (e.g., data repository policies, award record retention requirements, journal policies) as guidance for the minimum time frame that scientific data should be made available, which researchers may extend.