Data Sharing Approaches

Get familiar with how and when NIH expects data to be shared and learn how to safeguard the privacy of human participants while sharing scientific data.

1
/faqs#/data-sharing.htm
/faqs#/data-management-and-sharing-policy.htm

Data Sharing Approaches

Under NIH’s 2003 data sharing policy, NIH encourages investigators to use the sharing approaches most appropriate for their data. Investigators may choose an approach for sharing based on factors such as the sensitivity of the data, the size and complexity of the dataset, and the volume of requests anticipated.

Below are some examples of data sharing approaches to consider:

Depositing data in a data archive, a place where machine-readable data are acquired, manipulated, documented, and finally distributed to the scientific community for further analysis.

  • Transferring to a data archive facility may help an investigator distribute data more widely to interested users, to maintain associated documentation, and to meet reporting requirements. 
  • Data archives can be particularly attractive for investigators concerned about a large volume of requests or providing technical assistance to users seeking help with analyses.

Depositing data in a data enclave, a secure environment in which eligible researchers can perform analyses using restricted or controlled data resources.

  • If an investigator is working with datasets that cannot be distributed to the general public for privacy, security, or other reasons, they can consider distributing instead through a data enclave.

Distributing data under the auspices of the investigator who is responsible for storing, managing, and sharing of the data. For example, they may mail a CD or post the data on a personal or institutional website for downloading. The investigator also vets and makes decisions on access requests.

  • Some investigators sharing under their own auspices may choose to form collaborations with other investigators.

Mixed mode sharing, any combination of the above approaches.

Data Repositories

Need help finding or choosing a data repository? See Selecting a Data Repository.

Data Use Agreement

Investigators sharing under their own auspices should consider using a data use agreement to impose appropriate limitations on users. Agreements usually include elements such as:

  • Criteria for data access
  • Conditions for research use
  • Privacy and confidentiality standards to ensure data security and to prohibit attempts at identifying subjects
  • Whether or not it is prohibited to transfer the data to other users, or conditions on the transfer of data
  • Penalties for violating the agreement
Examples of data use agreements for specific datasets:

Timelines for Data Sharing

Because the value of data often depends on their timeliness, NIH expects the release and sharing of data to be no later than the acceptance for publication of the main findings from the final dataset.

The nature of the data collected will affect how quickly a dataset can be released. Data from small studies can be analyzed and submitted for publication relatively quickly. Data from large epidemiologic or longitudinal studies, which are collected over several discrete time periods or waves, can be released in waves as data become available or as findings from the waves of the data are published.

NIH recognizes that the investigators who collected the data have a legitimate interest in benefiting from their investment of time and effort. NIH continues to expect that the initial investigators may benefit from first and continuing use but not from prolonged exclusive use.

Sharing Data from Human Research Participants 

The rights and privacy of human research participants who participate in NIH-sponsored research must be protected at all times. It is the responsibility of the investigators, their institution and reviewing Institutional Review Board (IRB) to protect the rights of research participants and the confidentiality of the data.

If research participants are promised that their data will not be shared with other researchers, the application should explain the reasons for such promises. Such promises should not be made routinely and without adequate justification. In general, it is inappropriate for the initial investigator to place limits on the research questions or methods other investigators might pursue with the data. It is also not appropriate for the investigator who produced the data to require authorship as a condition for sharing the data.

Investigators who are planning to share data obtained from research involving human participants should discuss the potential risks posed by data sharing, and the steps taken to address those risks with research participants as part of the informed consent process. Plans for protecting privacy can be discussed in the Human Subjects and Clinical Trials Information Form.

Below are several ways for investigators to protect the privacy of human participants when sharing data:

  • Prior to sharing, remove identifiers from the data.
    • In addition to removing direct identifiers such as name, address, telephone numbers, and Social Security Numbers, researchers should consider removing indirect identifiers and other information that could reveal participants’ identities. Identification from indirect identifiers is a higher risk for participants from small geographic areas, rare populations, or in linked datasets.
  • Adopt strategies to minimize risks of unauthorized disclosure of personal identifiers.
  • Although not required, investigators may reduce the risk of subject identification by withholding part of the data. Alternatively, they may statistically alter the data in ways that will not compromise secondary analyses but will protect individual subjects’ identities.
  • When entering into a data use agreement, include conditions for protecting confidentiality and privacy.

Sharing Data from Clinical Trials

Some study designs may give greater privacy protection to subjects than others.  Before starting their research, investigators who are planning clinical trials and intend to share the resulting data should carefully consider potential privacy risks to research participants associated with the study design and the data that will collected and describe these as well as mitigation approaches in the informed consent process and consent forms. As a resource, the U.S. Department of Health & Human Services (HHS) provides a video on the informed consent process.

NIH recognizes that the sharing of data from clinical trials and other situations may sometimes require anonymizing the data. Alternatively, investigators may share data through a restricted data enclave, which would only give access to researchers who agree to preserve the privacy of subjects.

Investigators who work for or who are themselves covered entities under the federal Health Insurance Portability and Accountability Act (HIPAA) must be aware of HIPAA Privacy Rule requirements for the protection of protected health information. For more information, refer to the HHS page, The HIPAA Privacy Rule.

Generating large-scale genomic data? NIH’s Genomic Data Sharing (GDS) policy may also apply to your research. See our GDS Policy Overview page to learn more.

Data Preservation and Sharing Timelines

Shared scientific data should be made accessible as soon as possible, and no later than the time of an associated publication, or the end of performance period, whichever comes first.

Researchers are encouraged to consider relevant requirements and expectations (e.g., data repository policies, award record retention requirements, journal policies) as guidance for the minimum time frame that scientific data should be made available, which researchers may extend.

Methods for Sharing Scientific Data

Under the 2023 Data Management and Sharing (DMS) policy, NIH encourages investigators to use an established repository.

When selecting a repository, investigators should choose based on factors such as the sensitivity of the data, the size and complexity of the dataset, and the volume of requests anticipated.

Need help finding or choosing a data repository? See Selecting a Data Repository.

Sharing Data from Human Participants 

For research involving human participants, NIH has specific requirements for research staff, and policies regarding research conduct, safety monitoring, and reporting of information about research progress. Below are some of NIH’s expectations. Applicants need to follow all applicable federal, Tribal, state, and local laws, regulations, statutes, guidance, and institutional policies that govern research involving human participants and the sharing and use of scientific data derived from human participants. NIH also respects Tribal sovereignty, even in the absence of written Tribal laws or policies.

The DMS Policy is consistent with federal regulations for the protection of human research participants and other NIH expectations for the use and sharing of scientific data derived from human participants

Award recipients must comply with any applicable laws, regulations, statutes, guidance, or institutional policies related to research with human participants and that protect participants’ privacy. The DMS Policy encourages respect for participants by encouraging researchers and award recipients to:

  • Address data management and sharing plans during the informed consent process to ensure prospective participants understand how their data will be managed and shared;
  • Outline steps they will take for protecting the privacy, rights, and confidentiality of prospective participants (i.e., through de-identification, Certificates of Confidentiality, and other protective measures);
  • Assess limitations on subsequent use of data and communicate these limitations to the individuals or entities (e.g., repositories) preserving and sharing the data; and
  • Consider whether access to shared scientific data derived from humans should be controlled, even if de-identified and lacking explicit limitations on subsequent use. Sharing via controlled access may be specified by certain funding opportunity announcements (FOAs) or the funding NIH Institutes or Centers.

NIH strongly encourages investigators to plan for how data management and sharing will be addressed in the informed consent process. Investigators should communicate with prospective participants about how their scientific data are expected to be used and shared. Investigators should also consider whether scientific data derived from humans, even if de-identified and lacking explicit limitations on subsequent use, should be controlled.

In addition, NIH expects that in drafting their DMS plans, researchers will attempt to maximize scientific data sharing, but may acknowledge that certain factors (i.e., ethical, legal, or technical) may necessitate limiting sharing to some extent. Foreseeable limitations should be described when drafting DMS plans. As outlined in NIH Guide Notice Supplemental Policy Information: Elements of an NIH Data Management and Sharing Plan, a compelling rationale for limiting scientific data sharing should be provided and will be assessed by NIH.

Need help developing informed consent documents for data sharing? See our new sample language and points to consider in the resource Informed Consent for Secondary Research with Data and Biospecimens.

Examples of reasons that would generally not be justifiable factors limiting scientific data sharing include:

  • Data are considered to be too small
  • Data that researchers anticipate will not be widely used
  • Data are not thought to have a suitable repository

NIH respects and recognizes Tribal sovereignty and American Indian and Alaska Native (AI/AN) communities’ data sharing concerns, and NIH has proposed additional considerations when working with Tribes and AI/AN communities.

Generating large-scale genomic data? NIH’s Genomic Data Sharing (GDS) policy may also apply to your research. See our GDS Policy Overview page to learn more.