Controlled-access data users must protect the privacy of the human participants from whom the datasets were generated. Learn about the responsibilities that come with receiving access to human genomic data from NIH.
NIH expects users of both controlled and unrestricted/open-access human genomic data to manage and secure the data in a way that protects the privacy of human participants.
Expectations for Unrestricted/Open-Access Data Users
Investigators who download unrestricted/open-access data from NIH-designated data repositories should:
- Not attempt to identify individual human research participants from whom the data were obtained
- Acknowledge in all oral or written presentations, disclosures, or publications the specific dataset(s) or applicable accession number(s) and the NIH-designated data repositories through which the investigator accessed any data
Expectations for Controlled Access Data Users
Once an individual has been approved to download data, the individual’s institution is also responsible for maintaining the confidentiality, integrity, and availability of the human genomic data accessed through dbGaP. Approved data users, institutional signing officials and IT Directors can review the NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing (GDS) Policy for a detailed and technical description of data management and security expectations.
Investigators who are approved by NIH to download controlled-access data agree to:
- Use datasets only for the research project described in the approved Data Access Request for each dataset;
- Make no attempt to identify or contact individual participants or groups from whom data were collected, or generate information that could allow participants’ identities to be discovered, without appropriate approvals from the institution that submitted the dataset to dbGaP;
- Maintain the confidentiality of the data and not distribute them to anyone outside of those specified in the approved Data Access Request;
- Adhere to the NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing Policy and ensure that only approved users can gain access to data files;
- Acknowledge the Intellectual Property terms as specified in the Data Use Certification Agreement;
- Provide appropriate acknowledgement in any dissemination of research findings including the investigator(s) who generated the data, the funding source, accession numbers of the dataset, and the data repository from which the data were accessed; and,
- Report any inadvertent data release, breach of data security, or other data management incidents in accordance with the terms specified in the Data Use Certification Agreement.
If an investigator plans to use cloud computing systems to store or analyze controlled-access data, NIH expects the cloud systems to meet the same standards as outlined in NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing (GDS) Policy. NIH will hold the institution, not the cloud service provider, responsible for any failure in the oversight of using cloud computing services for controlled-access data.
Users must agree to abide by NIH’s Genomic Data User Code of Conduct as well as the terms of the approved data access request, which includes any dataset-specific data use limitations and the terms of the Data Use Certification Agreement and before accessing any data. Violating these terms is considered a data management incident and may result in loss of access privileges to controlled-access data.
The Data Use Certification Agreement, co-signed by the investigator requesting the data and their institution as represented by the Institutional Signing Official, and NIH, specifies the terms for appropriate secondary research use of controlled-access data, including:
- Using the data only for the research use stated in the approved data access request;
- Protecting data confidentiality;
- Following, as appropriate, all applicable national, tribal, and state laws and regulations, as well as relevant institutional policies and procedures for handling genomic data;
- Not attempting to identify individual participants from whom the data were obtained;
- Not selling any of the data obtained from NIH-designated data repositories;
- Not sharing any of the data obtained from controlled-access NIH-designated data repositories with individuals other than those listed in the data access request;
- Agreeing to the listing of a summary of approved research uses in dbGaP along with the investigator’s name and organizational affiliation;
- Agreeing to report any violation of the GDS Policy to the appropriate data access committee(s) as soon as it is discovered;
- Reporting research progress using controlled-access datasets through annual access renewal requests or project close-out reports;
- Acknowledging in all oral or written presentations, disclosures, or publications the contributing investigator(s) who conducted the original study, the funding organization(s) that supported the work, the specific dataset(s) and applicable accession number(s), and the NIH-designated data repositories through which the investigator accessed any data.
Patenting NIH-funded Genomic Data
NIH encourages the broad use of NIH-funded genomic data in ways that are consistent with responsible management of any intellectual property that resulted from the data. For that reason, NIH encourages patents that can lead to products that address public needs and that do not hinder research. NIH discourages the use of patents to block the use of, or access to, genomic or genotype/phenotype data developed with NIH support.
In addition, naturally occurring DNA sequences are not patentable in the United States. This means that basic DNA sequence data, and related information such as genotypes, are pre-competitive. When investigators use these types of data from NIH repositories, the data as well as any conclusions that came directly from the data should remain freely accessible, without any licensing requirements.