Genomic research data that results from NIH-supported studies is expected to be shared in an appropriate database. Find information about frequently used repositories where human and non-human genomic data may be submitted.
Submitting Human Genomic Data
All studies generating human genomic data that fall within the scope of the NIH GDS policy must first register in the Database of Genotypes and Phenotypes (dbGaP)—NIH's central repository for human genomic and associated phenotypic data—even if the data will be submitted elsewhere.
The table below describes several frequently used genomic repositories and provides links to submission portals.
Investigators who have questions about where their data may be deposited should consult with their program officer (Extramural, grants and contracts) or scientific director (NIH Intramural).
Data Privacy & Security
If an investigator chooses to submit data to a repository outside of NIH, their institution should ensure that appropriate data security measures are in place and that confidentiality, privacy, and data use measures are consistent with the GDS Policy.
In addition, NIH encourages non-NIH funded investigators and institutions submitting large-scale human genomic datasets to dbGaP to seek a Certificate of Confidentiality as an additional safeguard to prevent compelled disclosure of any personally identifiable information they may hold. NIH-funded studies are automatically covered under such a certificate.
Examples of Frequently Used Repositories for Human Genomic Data
When deciding which repository is most appropriate for your data, be sure to consult instructions in the funding opportunity announcement and any IC-specific expectations. The table below describes examples of existing, commonly used repositories and provides links to submission portals.
Repository | Repository Description | Submission Guides & Portals |
---|---|---|
‌AnVIL | ‌The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-Space, or AnVIL, provides a cloud environment for the analysis of large genomic and related datasets. | ‌AnVIL Data Portal |
ArrayExpress | The ArrayExpress Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community. | ArrayExpress submission |
BioData Catalyst | NHLBI BioData Catalyst is a cloud-based platform providing tools, applications, and workflows in secure workspaces. | Accessing BioData Catalyst Data |
Bionimbus | The Bionimbus Protected Data Cloud is a secure biomedical cloud operated at FISMA moderate as IaaS with an NIH Trusted Partner status for analyzing and sharing protected datasets. | Apply for access to Bionimbus |
Database of Genotypes and Phenotypes (dbGaP) | An archive and distribution center for the description and results of studies which investigate the interaction of genotype and phenotype. These studies include genome-wide association (GWAS), medical resequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. | dbGaP submission portal |
Database of Short Genetic Variations (dbSNP) | dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations. | dbSNP submission portal |
Database of Genomic Structural Variation (dbVar) | dbVar is NCBI's database of human genomic Structural Variation — large variants >50 bp including insertions, deletions, duplications, inversions, mobile elements, translocations, and complex variants | dbVar submission portal |
DNA Data Bank of Japan (DDBJ) | The DDBJ provides freely available nucleotide sequence data and supercomputer system, to support research activities in life science. | DDBJ submission |
European Nucleotide Archive (ENA) | ENA is an open, supported platform for the management, sharing, integration, archiving and dissemination of sequence data. | ENA submission portal |
GenBank | GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. | GenBank submission portal |
Gene Expression Omnibus (GEO) | GEO is a public repository that archives and freely distributes comprehensive sets of microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community. | GEO submission |
National Cancer Institute (NCI) Genomic Data Commons | The NCI's Genomic Data Commons provides the cancer research community with a unified repository and cancer knowledge base that enables data sharing across cancer genomic studies in support of precision medicine. | NCI Genomic Data Commons submission portal |
NCI Cloud Resources: Broad Institute FireCloud | FireCloud is an open, standards-based platform for performing production-scale data analysis in the cloud. Built on the Google Cloud Platform, FireCloud empowers analysts, tool developers, and production managers to run large-scale analysis and to share results with collaborators. | Terra Support |
NCI Cloud Resources: Institute for Systems Biology ISB Cloud | The ISB Cancer Genomics Cloud, leveraging many aspects of the Google Cloud Platform, allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes and pathways, and share insights with collaborators. | ISB Cancer Genomics Cloud guide |
NCI Cloud Resources: Seven Bridges Cancer Genomics Cloud | The Seven Bridges Cancer Genomics Cloud, hosted on Amazon, has a rich user interface that allows researchers to find data of interest and combine it with their own private data. Data can be analyzed using more than 200 preinstalled, curated bioinformatics tools and workflows. | Seven Bridges Cancer Genomics Cloud data submission guide |
National Institute of Mental Health Data Archive (NDA) | NDA makes available human subjects data collected from hundreds of research projects across many scientific domains. NDA provides infrastructure for sharing research data, tools, methods, and analyses enabling collaborative science and discovery. | NDA submission portal |
National Institute on Aging (NIA) Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) | NIAGADS is the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site. NIAGADS is a national genetics repository created by NIA to facilitate access by qualified investigators to genotypic data for the study of genetics of late-onset Alzheimer's disease. | NIAGADS submission portal |
Sequence Read Archive (SRA) | SRA is NIH's primary archive of high-throughput sequencing data. | SRA submission |
Submitting Non-human Genomic Data
Non-human genomic data may be submitted to any widely used repository. Refer to the table below for examples of some commonly used databases. Data may also be submitted to other widely used repositories not on this list.
In addition, the National Library of Medicine provides a Submission Portal to help find the appropriate NIH repository for sequence data.
Examples of Frequently Used Repositories for Non-human Genomic Data
Repository | Repository Description | Submission Portal |
---|---|---|
ArrayExpress | The ArrayExpress Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community. | ArrayExpress submission portal |
DNA Data Bank of Japan (DDBJ) | DDBJ provides freely available nucleotide sequence data and supercomputer system, to support research activities in life science. | DDBJ submission portal |
European Nucleotide Archive (ENA) | ENA is an open, supported platform for the management, sharing, integration, archiving and dissemination of sequence data. | ENA submission portal |
FlyBase | FlyBase is a database of Drosophila genes and genomes. | FlyBase |
GenBank | GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. | GenBank submission portal |
Gene Expression Omnibus (GEO) | GEO is a public repository that archives and freely distributes comprehensive sets of microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community. | GEO submission portal |
Influenza Research Database (IRD) | The mission of IRD is to provide a resource for the influenza virus research community that will facilitate an understanding of the influenza virus and how it interacts with the host organism, leading to new treatments and preventive actions. | IRD submission portal |
Mouse Genome Informatics (MGI) | MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. | MGI submission portal |
Rat Genome Database (RGD) | RGD is the premier site for genetic, genomic, phenotype, and disease-related data generated from rat research. | RGD submission portal |
Sequence Read Archive (SRA) | SRA is NIH's primary archive of high-throughput sequencing data. | SRA submission portal |
WormBase | WormBase is an international consortium of biologists and computer scientists providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes. | WormBase submission portal |
Xenbase | Xenbase is a web-accessible resource that integrates all the diverse biological, genomic, genotype and phenotype data available from Xenopus research. | Xenbase submission portal |
Zebrafish Information Network (ZFIN) | ZFIN is the database of genetic and genomic data for the zebrafish (Danio rerio) as a model organism. | ZFIN submission portal |