Accessing Genomic Data from NIH Repositories

The repositories on this page are currently being reviewed for potential updates to comply with Administrative directives and agency priorities.

NIH hosts many genomic and phenotypic data repositories. Learn about some of these repositories and the types of available datasets.

Accessing Genomic Data from NIH Repositories

NIH maintains a number of human and non-human genomic data repositories at the National Center for Biotechnology Information (NCBI). In addition, some NIH Institutes or Centers (ICs) maintain repositories aligned with their area of interest. Some repositories store both genomic and non-genomic (for example, imaging) data.

The process for requesting a dataset depends on how a repository manages access to their stored data:

Open or Unrestricted Access: Some repositories store open-access or unrestricted access genomic data and consequently no special credentials are required for downloading data. These datasets are available for the public to access. Individuals downloading these data are expected to use the datasets responsibly.
Registered Access: Some repositories allow access to their data only if users are registered with the repository. In addition, the repository might monitor the usage.
Controlled Access: Some repositories, such as Database of Genotypes and Phenotypes (dbGaP), require credentialed users to apply for access to data. Learn how to request a dataset from dbGaP.
Mixed: Some repositories contain both open- and controlled-access datasets.

Developer Access Materials

Per, NOT-OD-24-157, “Implementation Update for Data Management and Access Practices Under the Genomic Data Sharing Policy,” Lead Developers seeking access are expected to submit a request containing a Developer Use Statement (DUS) to the NIH Developer Data Access Committee (DAC) ([email protected]) no later than at Just-in-Time (JIT) for grants and cooperative agreements, with the proposal provided by the offeror for contracts, or with the application for funding with Other Transactions. The DUS, the DUS Renewal, and the DUS Close-Out templates are available upon request from the NIH Program Official named on the Notice of Funding Opportunity.

Trans-NIH Genomic Data Repositories

NCBI hosts repositories that contain genomic data from humans as well as many other organisms. The table below lists several frequently used repositories along with the type of data hosted at the repository, how the repository manages access, and a link to the repository’s access portal.

NIH Repository	Repository Description	Access Level/Type	Access Portal
Database of Genotypes and Phenotypes (dbGaP)	An archive and distribution center for the description and results of studies which investigate the interaction of genotype and phenotype. These studies include genome-wide association (GWAS), medical resequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.	Controlled: Summary level data is open. Credentialed user must apply for access to individual level data.	dbGaP Authorized Access System
Database of Short Genetic Variations (dbSNP)	dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.	Open	dbSNP home page
Database of Genomic Structural Variation (dbVar)	dbVar is NCBI’s database of human genomic Structural Variation — large variants >50 bp including insertions, deletions, duplications, inversions, mobile elements, translocations, and complex variants.	Open	dbVar home page
GenBank	GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.	Open	GenBank access portal
Gene Expression Omnibus (GEO)	The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes comprehensive sets of microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community.	Open	Gene Expression Omnibus (GEO) access portal
Sequence Read Archive (SRA)	The Sequence Read Archive (SRA) is NIH’s primary archive of high-throughput sequencing data.	Open	Sequence Read Archive (SRA) download portal

NIH Institute and Center Supported Repositories

Some individual NIH Institutes and Centers (ICs) support repositories that contain human genomic as well as other types of data that are relevant to their specific area of interest.

The table below is a non-exhaustive list of repositories currently supported by individual institutes, centers, or offices. The table also lists who supports the repository, what type of data is hosted at the repository, how the repository manages access, and a link to the repository’s access portal.

IC Repository	Institute, Center, or Office	Repository Description	Access	Access Portal
AccessClinicalData@NIAID	National Institute of Allergy and Infectious Diseases	AccessClinicalData@NIAID is an NIAID cloud-based, secure data platform that enables sharing of and access to anonymized individual, patient level clinical data sets from NIAID sponsored clinical trials to harness the power of data to generate new knowledge to understand, treat, and prevent infectious diseases such as COVID-19.	Controlled: Summary level data is open. Researchers must apply for access to individual level data.	Accessing NIAID Clinical Trials Data
All Of Us	NIH Office of the Director	The All of Us Research Program is part of an effort to advance individualized health care by enrolling one million or more participants to contribute their health data over many years.	Controlled: Summary level data is open. Researchers must apply for access to individual level data.	All Of Us Research Hub
AnVIL	National Human Genome Research Institute	The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-Space, or AnVIL, provides a cloud environment for the analysis of large genomic and related datasets.	Mixed	AnVIL Data Portal
BioData Catalyst	National Heart, Lung, and Blood Institute	NHLBI BioData Catalyst is a cloud-based platform providing tools, applications, and workflows in secure workspaces.	Mixed	Accessing BioData Catalyst Data
Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Data and Specimen Hub (DASH)	Eunice Kennedy Shriver National Institute of Child Health and Human Development	The NICHD Data and Specimen Hub (DASH) is a centralized resource that allows researchers to share and access de-identified data from studies funded by NICHD. DASH also serves as a portal for requesting biospecimens from selected DASH studies.	Mixed	DASH Data Request Tutorial
FaceBase	National Institute of Dental and Craniofacial Research	FaceBase is a collaborative NIDCR-funded project that houses comprehensive data in support of advancing research into craniofacial development and malformation.	Mixed	FaceBase: Request Access to Controlled Data
GWAS Catalog	National Human Genome Research Institute	The GWAS Catalog provides a consistent, searchable, visualizable and freely available database of SNP-trait associations.	Open	GWAS Catalog submission
Kids First	NIH Office of Strategic Coordination - The Common Fund	The Gabriella Miller Kids First Data Resource Center (Kids First DRC) is a new, collaborative, pediatric research effort with the goal of understanding the genetic causes and links between childhood cancer and structural birth defects.	Mixed	Kids First Data Resource Center: Getting Started
NCI Cloud Resources: Broad Institute FireCloud	National Cancer Institute	FireCloud is an open, standards-based platform for performing production-scale data analysis in the cloud. Built on the Google Cloud Platform, FireCloud empowers analysts, tool developers, and production managers to run large-scale analysis and to share results with collaborators.	Mixed	Terra Support
NCI Cloud Resources: Institute for Systems Biology ISB Cloud	National Cancer Institute	The ISB Cancer Genomics Cloud, leveraging many aspects of the Google Cloud Platform, allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes and pathways, and share insights with collaborators.	Mixed	ISB Cancer Genomics Cloud guide
NCI Cloud Resources: Seven Bridges Cancer Genomics Cloud	National Cancer Institute	The Seven Bridges Cancer Genomics Cloud, hosted on Amazon, has a rich user interface that allows researchers to find data of interest and combine it with their own private data. Data can be analyzed using more than 200 preinstalled, curated bioinformatics tools and workflows.	Mixed	Seven Bridges Cancer Genomics Cloud Access Guide
National Institute on Aging (NIA) Genetics of Alzheimer's Disease Data Storage Site (NIAGADS)	National Institute on Aging	NIAGADS is the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site. NIAGADS is a national genetics repository created by NIA to facilitate access by qualified investigators to genotypic data for the study of genetics of late-onset Alzheimer's disease.	Controlled: Summary level data is open. Credentialed user must apply for access to individual level data.	NIAGADS access request portal
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Repository	National Institute of Diabetes and Digestive and Kidney Diseases	The NIDDK Central Repository enables scientists to test new hypotheses without the need to collect any new data or biospecimens, and provides the opportunity to pool data across several studies to increase the power of statistical analyses. In addition, most NIDDK-funded studies are collecting genetic biospecimens and carrying out high-throughput genotyping making it possible for other scientists to use Central Repository resources to match genotypes to phenotypes and to perform informative genetic analyses.	Controlled: Summary level data is open. Credentialed user must apply for access to individual level data.	NIDDK Central Repository data request instructions
National Institute of Mental Health Data Archive (NDA)	National Institute of Mental Health	The National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains. NDA provides infrastructure for sharing research data, tools, methods, and analyses enabling collaborative science and discovery. De-identified human subjects data, harmonized to a common standard, are available to qualified researchers. Summary data are available to all.	Mixed	NDA access portal

/faqs#/genomic-data-sharing-policy.htm

Related Resources

How to Request and Access Datasets from dbGaP

Using Genomic Data Responsibly