Accessing Genomic Data from NIH Repositories

NIH hosts many genomic and phenotypic data repositories. Learn about some of these repositories and the types of available datasets.

Accessing Genomic Data from NIH Repositories

NIH maintains a number of human and non-human genomic data repositories at the National Center for Biotechnology Information (NCBI). In addition, some NIH Institutes or Centers (ICs) maintain repositories aligned with their area of interest. Some repositories store both genomic and non-genomic (for example, imaging) data. 

The process for requesting a dataset depends on how a repository manages access to their stored data:

  • Open or Unrestricted Access: Some repositories store open-access or unrestricted access genomic data and consequently no special credentials are required for downloading data. These datasets are available for the public to access. Individuals downloading these data are expected to use the datasets responsibly.
  • Registered Access: Some repositories allow access to their data only if users are registered with the repository. In addition, the repository might monitor the usage.
  • Controlled Access: Some repositories, such as Database of Genotypes and Phenotypes (dbGaP), require credentialed users to apply for access to data. Learn how to request a dataset from dbGaP.
  • Mixed: Some repositories contain both open- and controlled-access datasets.

Trans-NIH Genomic Data Repositories

NCBI hosts repositories that contain genomic data from humans as well as many other organisms. The table below lists several frequently used repositories along with the type of data hosted at the repository, how the repository manages access, and a link to the repository’s access portal.

NIH Repository Repository Description Access Level/Type Access Portal
Database of Genotypes and Phenotypes (dbGaP) An archive and distribution center for the description and results of studies which investigate the interaction of genotype and phenotype. These studies include genome-wide association (GWAS), medical resequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. Controlled: Summary level data is open. Credentialed user must apply for access to individual level data. dbGaP Authorized Access System
Database of Short Genetic Variations (dbSNP) dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations. Open dbSNP home page
Database of Genomic Structural Variation (dbVar) dbVar is NCBI’s database of human genomic Structural Variation — large variants >50 bp including insertions, deletions, duplications, inversions, mobile elements, translocations, and complex variants. Open dbVar home page
GenBank GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. Open GenBank access portal
Gene Expression Omnibus (GEO) The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes comprehensive sets of microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community. Open Gene Expression Omnibus (GEO) access portal
Sequence Read Archive (SRA) The Sequence Read Archive (SRA) is NIH’s primary archive of high-throughput sequencing data. Open Sequence Read Archive (SRA) download portal

NIH Institute and Center Supported Repositories

Some individual NIH Institutes and Centers (ICs) support repositories that contain human genomic as well as other types of data that are relevant to their specific area of interest. 

The table below is a non-exhaustive list of repositories currently supported by individual institutes, centers, or offices. The table also lists who supports the repository, what type of data is hosted at the repository, how the repository manages access, and a link to the repository’s access portal. 

IC Repository Institute, Center,
or Office
Repository Description Access Access Portal
AccessClinicalData@NIAID National Institute of Allergy and Infectious Diseases AccessClinicalData@NIAID is an NIAID cloud-based, secure data platform that enables sharing of and access to anonymized individual, patient level clinical data sets from NIAID sponsored clinical trials to harness the power of data to generate new knowledge to understand, treat, and prevent infectious diseases such as COVID-19. Controlled: Summary level data is open. Researchers must apply for access to individual level data. Accessing NIAID Clinical Trials Data
All Of Us NIH Office of the Director The All of Us Research Program is part of an effort to advance individualized health care by enrolling one million or more participants to contribute their health data over many years. Controlled: Summary level data is open. Researchers must apply for access to individual level data. All Of Us Research Hub
AnVIL National Human Genome Research Institute The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-Space, or AnVIL, provides a cloud environment for the analysis of large genomic and related datasets. Mixed AnVIL Data Portal
BioData Catalyst National Heart, Lung, and Blood Institute NHLBI BioData Catalyst is a cloud-based platform providing tools, applications, and workflows in secure workspaces. Mixed Accessing BioData Catalyst Data
Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Data and Specimen Hub (DASH) Eunice Kennedy Shriver National Institute of Child Health and Human Development The NICHD Data and Specimen Hub (DASH) is a centralized resource that allows researchers to share and access de-identified data from studies funded by NICHD. DASH also serves as a portal for requesting biospecimens from selected DASH studies. Mixed DASH Data Request Tutorial
FaceBase National Institute of Dental and Craniofacial Research FaceBase is a collaborative NIDCR-funded project that houses comprehensive data in support of advancing research into craniofacial development and malformation. Mixed FaceBase: Request Access to Controlled Data
GWAS Catalog National Human Genome Research Institute The GWAS Catalog provides a consistent, searchable, visualizable and freely available database of SNP-trait associations. Open GWAS Catalog submission
Kids First NIH Office of Strategic Coordination - The Common Fund The Gabriella Miller Kids First Data Resource Center (Kids First DRC) is a new, collaborative, pediatric research effort with the goal of understanding the genetic causes and links between childhood cancer and structural birth defects. Mixed Kids First Data Resource Center: Getting Started
NCI Cloud Resources: Broad Institute FireCloud National Cancer Institute FireCloud is an open, standards-based platform for performing production-scale data analysis in the cloud. Built on the Google Cloud Platform, FireCloud empowers analysts, tool developers, and production managers to run large-scale analysis and to share results with collaborators. Mixed Terra Support
NCI Cloud Resources: Institute for Systems Biology ISB Cloud National Cancer Institute The ISB Cancer Genomics Cloud, leveraging many aspects of the Google Cloud Platform, allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes and pathways, and share insights with collaborators. Mixed ISB Cancer Genomics Cloud guide
NCI Cloud Resources: Seven Bridges Cancer Genomics Cloud National Cancer Institute The Seven Bridges Cancer Genomics Cloud, hosted on Amazon, has a rich user interface that allows researchers to find data of interest and combine it with their own private data. Data can be analyzed using more than 200 preinstalled, curated bioinformatics tools and workflows. Mixed Seven Bridges Cancer Genomics Cloud Access Guide
National Institute on Aging (NIA) Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) National Institute on Aging NIAGADS is the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site. NIAGADS is a national genetics repository created by NIA to facilitate access by qualified investigators to genotypic data for the study of genetics of late-onset Alzheimer's disease. Controlled: Summary level data is open. Credentialed user must apply for access to individual level data. NIAGADS access request portal
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Repository National Institute of Diabetes and Digestive and Kidney Diseases The NIDDK Central Repository enables scientists to test new hypotheses without the need to collect any new data or biospecimens, and provides the opportunity to pool data across several studies to increase the power of statistical analyses. In addition, most NIDDK-funded studies are collecting genetic biospecimens and carrying out high-throughput genotyping making it possible for other scientists to use Central Repository resources to match genotypes to phenotypes and to perform informative genetic analyses. Controlled: Summary level data is open. Credentialed user must apply for access to individual level data. NIDDK Central Repository data request instructions
National Institute of Mental Health Data Archive (NDA) National Institute of Mental Health The National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains. NDA provides infrastructure for sharing research data, tools, methods, and analyses enabling collaborative science and discovery. De-identified human subjects data, harmonized to a common standard, are available to qualified researchers. Summary data are available to all. Mixed NDA access portal
/faqs#/genomic-data-sharing-policy.htm