Data Submission and Release Expectations

Under the NIH’s genomic data sharing policy, investigators are expected to share genomic data as well as relevant associated data generated using NIH funding. Learn more about NIH’s expectations.

Large-scale genomic experiments can generate multiple types of data, which can then undergo multiple levels of processing and analysis. The NIH genomic data sharing (GDS) policy designates five levels of processing and associated expectations for data submission and release. Relevant associated data (for example, phenotype and exposure data), should also be shared in a timely manner.

In general, NIH will release data submitted to NIH-designated data repositories no later than six months after the initial data submission begins, or at the time of acceptance of the first publication, whichever occurs first, without restrictions on publication or other dissemination.

Genomic Data Submission and Release Expectations

NIH’s GDS policy sets forth specific expectations for the submission of data according to the type of data and level of processing. Some studies will generate multiple levels of data; each data level has a different expectation for whether and when it is to be submitted to a repository and when the data should be released for access. 

The tables below describe specific expectations on each of those levels of data.

 

Level 0 - Data Submission & Release Expectations
Data Description Level 0 - Raw data generated directly from the instrument platform
Data Examples Images captured by the instrument
Example File Formats TIFF
Data Submission Expectations

Human data: No submission expected.

Non-human data: No submission expected.

Data Release Expectations

Human data: NA.

Non-human data: NA.

 
Level 1 - Data Submission & Release Expectations
Data Description Level 1 - Initial sequence reads, the most fundamental form of the data after the basic translation of raw input
Data Examples
  • DNA sequencing reads
  • ChIP-Seq reads
  • RNA-Seq reads
  • SNP arrays
  • Array CGH
Example File Formats TXT, CEL, FASTQ
Data Submission Expectations

Human data: No submission expected. If investigators choose to submit level 1 human data to an NIH-designated data repository, it is the submitting institution’s responsibility to protect participant privacy by ensuring that data submission is consistent, as appropriate, with all applicable national, tribal, and state laws and regulations as well as relevant institutional policies, and the GDS Policy.

Non-human data: No submission expected, except for de novo sequence data (unless it is included with Level 2 aligned sequence files). Submission of de novo sequence data is expected no later than the time of initial publication

Data Release Expectations

Human data: NA.

Non-human data: Data release is expected no later than the time of initial publication; an earlier release date may be designated for certain data types or NIH projects. Investigators should consult their PO with further questions.

 
Level 2 - Data Submission & Release Expectations
Data Description Level 2 - Data after an initial round of analysis or computation to clean the data and assess basic quality measures
Data Examples
  • DNA sequence alignments to a reference sequence or de novo assembly
  • RNA expression profiling
Example File Formats Sequence Alignment Map (SAM), Binary Alignment Map (BAM)
Data Submission Expectations

Human data: Data submission is expected after data cleaning and quality control, which is generally within 3 months after data have been generated. The timeline depends on the individual project and the specifications set forth in the genomic data sharing plan. Investigators should consult their PO with further questions.

Non-human data: Data submission is expected no later than the time of initial publication; an earlier submission date may be designated for certain data types or NIH projects.

Data Release Expectations

Human data: Data release is expected up to 6 months after data submission is initiated or at the time of acceptance of initial publication, whichever occurs first.

Non-human data: Data release is expected no later than the time of initial publication; an earlier release date may be designated for certain data types or NIH projects.

 
Level 3 - Data Submission & Release Expectations
Data Description Level 3 - Analysis to identify genetic variants, gene expression patterns, or other features of the dataset
Data Examples
  • SNP or structural variant calls
  • Expression peaks
  • Epigenomic features
Example File Formats TXT, BED, WIG, VCF, MAF, PED
Data Submission Expectations

Human data: Data submission is expected after cleaning and quality control, which is generally within 3 months after data have been generated. The timeline depends on the individual project and the specifications set forth in the genomic data sharing plan. Investigators should consult their PO with further questions.

Non-human data: Data submission is expected no later than the time of initial publication; an earlier submission date may be designated for certain data types or NIH projects. The timeline depends on the individual project and the specifications set forth in the genomic data sharing plan. Investigators should consult their PO with further questions.

Data Release Expectations

Human data: Data release is expected up to 6 months after data submission is initiated or at the time of acceptance of initial publication, whichever occurs first.

Non-human data: Data release is expected no later than the time of initial publication; an earlier release date may be designated for certain data types or NIH projects.

 
Level 4 - Data Submission & Release Expectations
Data Description Level 4 - Final analysis that relates the genomic data to phenotype or other biological states
Data Examples
  • Genotype-phenotype relationships
  • Relationships of RNA expression or epigenomic patterns to biological state
Example File Formats TXT
Data Submission Expectations

Human data: Data submission is expected as analyses are completed, and prior to publication.

Non-human data: Data submission is expected no later than the time of initial publication.

Data Release Expectations

Human data: Data release is expected with publication.

Non-human data: Data release is expected no later than the time of initial publication.

/faqs#/genomic-data-sharing-policy.htm