Accessing Genomic Data from NIH Repositories

The repositories on this page are currently being reviewed for potential updates to comply with Administrative directives and agency priorities.

NIH hosts many genomic and phenotypic data repositories. Learn about some of these repositories and the types of available datasets.

Accessing Genomic Data from NIH Repositories

NIH maintains a number of human and non-human genomic data repositories at the National Center for Biotechnology Information (NCBI). In addition, some NIH Institutes or Centers (ICs) maintain repositories aligned with their area of interest. Some repositories store both genomic and non-genomic (for example, imaging) data. 

The process for requesting a dataset depends on how a repository manages access to their stored data:

  • Open or Unrestricted Access: Some repositories store open-access or unrestricted access genomic data and consequently no special credentials are required for downloading data. These datasets are available for the public to access. Individuals downloading these data are expected to use the datasets responsibly.
  • Registered Access: Some repositories allow access to their data only if users are registered with the repository. In addition, the repository might monitor the usage.
  • Controlled Access: Some repositories, such as Database of Genotypes and Phenotypes (dbGaP), require credentialed users to apply for access to data. Learn how to request a dataset from dbGaP.
  • Mixed: Some repositories contain both open- and controlled-access datasets.

Developer Access Materials 

Per, NOT-OD-24-157, “Implementation Update for Data Management and Access Practices Under the Genomic Data Sharing Policy,” Lead Developers seeking access are expected to submit a request containing a Developer Use Statement (DUS) to the NIH Developer Data Access Committee (DAC) ([email protected]) no later than at Just-in-Time (JIT) for grants and cooperative agreements, with the proposal provided by the offeror for contracts, or with the application for funding with Other Transactions.  The DUS, the DUS Renewal, and the DUS Close-Out templates are available upon request from the NIH Program Official named on the Notice of Funding Opportunity.

 

Trans-NIH Genomic Data Repositories

NCBI hosts repositories that contain genomic data from humans as well as many other organisms. The table below lists several frequently used repositories along with the type of data hosted at the repository, how the repository manages access, and a link to the repository’s access portal.

NIH RepositoryRepository DescriptionAccess Level/TypeAccess Portal
Database of Genotypes and Phenotypes (dbGaP)An archive and distribution center for the description and results of studies which investigate the interaction of genotype and phenotype. These studies include genome-wide association (GWAS), medical resequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.Controlled: Summary level data is open. Credentialed user must apply for access to individual level data.dbGaP Authorized Access System
Database of Short Genetic Variations (dbSNP)dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.OpendbSNP home page
Database of Genomic Structural Variation (dbVar)dbVar is NCBI’s database of human genomic Structural Variation — large variants >50 bp including insertions, deletions, duplications, inversions, mobile elements, translocations, and complex variants.OpendbVar home page
GenBankGenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.OpenGenBank access portal
Gene Expression Omnibus (GEO)The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes comprehensive sets of microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community.OpenGene Expression Omnibus (GEO) access portal
Sequence Read Archive (SRA)The Sequence Read Archive (SRA) is NIH’s primary archive of high-throughput sequencing data.OpenSequence Read Archive (SRA) download portal

NIH Institute and Center Supported Repositories

Some individual NIH Institutes and Centers (ICs) support repositories that contain human genomic as well as other types of data that are relevant to their specific area of interest. 

The table below is a non-exhaustive list of repositories currently supported by individual institutes, centers, or offices. The table also lists who supports the repository, what type of data is hosted at the repository, how the repository manages access, and a link to the repository’s access portal. 

IC RepositoryInstitute, Center,
or Office
Repository DescriptionAccessAccess Portal
AccessClinicalData@NIAIDNational Institute of Allergy and Infectious DiseasesAccessClinicalData@NIAID is an NIAID cloud-based, secure data platform that enables sharing of and access to anonymized individual, patient level clinical data sets from NIAID sponsored clinical trials to harness the power of data to generate new knowledge to understand, treat, and prevent infectious diseases such as COVID-19.Controlled: Summary level data is open. Researchers must apply for access to individual level data.Accessing NIAID Clinical Trials Data
All Of UsNIH Office of the DirectorThe All of Us Research Program is part of an effort to advance individualized health care by enrolling one million or more participants to contribute their health data over many years.Controlled: Summary level data is open. Researchers must apply for access to individual level data.All Of Us Research Hub
AnVILNational Human Genome Research InstituteThe NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-Space, or AnVIL, provides a cloud environment for the analysis of large genomic and related datasets.MixedAnVIL Data Portal
BioData CatalystNational Heart, Lung, and Blood InstituteNHLBI BioData Catalyst is a cloud-based platform providing tools, applications, and workflows in secure workspaces.MixedAccessing BioData Catalyst Data
Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Data and Specimen Hub (DASH)Eunice Kennedy Shriver National Institute of Child Health and Human DevelopmentThe NICHD Data and Specimen Hub (DASH) is a centralized resource that allows researchers to share and access de-identified data from studies funded by NICHD. DASH also serves as a portal for requesting biospecimens from selected DASH studies.MixedDASH Data Request Tutorial
FaceBaseNational Institute of Dental and Craniofacial ResearchFaceBase is a collaborative NIDCR-funded project that houses comprehensive data in support of advancing research into craniofacial development and malformation.MixedFaceBase: Request Access to Controlled Data
GWAS CatalogNational Human Genome Research InstituteThe GWAS Catalog provides a consistent, searchable, visualizable and freely available database of SNP-trait associations.OpenGWAS Catalog submission
Kids FirstNIH Office of Strategic Coordination - The Common FundThe Gabriella Miller Kids First Data Resource Center (Kids First DRC) is a new, collaborative, pediatric research effort with the goal of understanding the genetic causes and links between childhood cancer and structural birth defects.MixedKids First Data Resource Center: Getting Started
NCI Cloud Resources: Broad Institute FireCloudNational Cancer InstituteFireCloud is an open, standards-based platform for performing production-scale data analysis in the cloud. Built on the Google Cloud Platform, FireCloud empowers analysts, tool developers, and production managers to run large-scale analysis and to share results with collaborators.MixedTerra Support
NCI Cloud Resources: Institute for Systems Biology ISB CloudNational Cancer InstituteThe ISB Cancer Genomics Cloud, leveraging many aspects of the Google Cloud Platform, allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes and pathways, and share insights with collaborators.MixedISB Cancer Genomics Cloud guide
NCI Cloud Resources: Seven Bridges Cancer Genomics CloudNational Cancer InstituteThe Seven Bridges Cancer Genomics Cloud, hosted on Amazon, has a rich user interface that allows researchers to find data of interest and combine it with their own private data. Data can be analyzed using more than 200 preinstalled, curated bioinformatics tools and workflows.MixedSeven Bridges Cancer Genomics Cloud Access Guide
National Institute on Aging (NIA) Genetics of Alzheimer's Disease Data Storage Site (NIAGADS)National Institute on AgingNIAGADS is the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site. NIAGADS is a national genetics repository created by NIA to facilitate access by qualified investigators to genotypic data for the study of genetics of late-onset Alzheimer's disease.Controlled: Summary level data is open. Credentialed user must apply for access to individual level data.NIAGADS access request portal
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central RepositoryNational Institute of Diabetes and Digestive and Kidney DiseasesThe NIDDK Central Repository enables scientists to test new hypotheses without the need to collect any new data or biospecimens, and provides the opportunity to pool data across several studies to increase the power of statistical analyses. In addition, most NIDDK-funded studies are collecting genetic biospecimens and carrying out high-throughput genotyping making it possible for other scientists to use Central Repository resources to match genotypes to phenotypes and to perform informative genetic analyses.Controlled: Summary level data is open. Credentialed user must apply for access to individual level data.NIDDK Central Repository data request instructions
National Institute of Mental Health Data Archive (NDA)National Institute of Mental HealthThe National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains. NDA provides infrastructure for sharing research data, tools, methods, and analyses enabling collaborative science and discovery. De-identified human subjects data, harmonized to a common standard, are available to qualified researchers. Summary data are available to all.MixedNDA access portal
/faqs#/genomic-data-sharing-policy.htm