Designating Scientific Data for Controlled Access

The DMS Policy expects researchers to consider whether access to scientific data from participants should be controlled (i.e., measures such as requiring data requesters to verify their identity and the appropriateness of their proposed research use to access protected data), even if de-identified and lacking explicit limitations on subsequent use. The points below are intended to assist researchers when considering whether controlled-access repositories may be needed to protect participant privacy. Note that controls may be needed for data at any level of processing (e.g., raw or fully cleaned data), from any source (e.g., research, clinical, or public health data), and for all types of research data (e.g., quantitative, qualitative, imaging, sensor-based). The framework provided by the operational principles and best practices should still be considered when deciding whether to designate scientific data for controlled access. Researchers should consider sharing participants’ scientific data through controlled-access repositories if data:

  1. Have explicit limitations on subsequent use, such as those imposed by laws, regulations, policies, informed consent, and agreements.

  2. Could be considered sensitive, such as including information regarding potentially stigmatizing traits, illegal behaviors, or other information that could be perceived as causing group harm or used for discriminatory purposes. Sensitive data may also include data from individuals, groups, or populations with unique attributes that increase the risk of re-identification. Even if data are sensitive, it may be possible to de-identify the data in ways that would allow appropriate sharing. When possible, researchers are encouraged to engage with communities affected by sharing sensitive data to discuss approaches for appropriate use and risk mitigation.

  3. Cannot be de-identified to established standards or for which the possibility of re-identification cannot sufficiently be reduced. For example, datasets de-identified to regulatory standards that nonetheless pose risks due to information that can still allow inferences to be made about participants (discussed above in the Best Practice on De-identification) may not be able to be shared openly. Access controls, among other measures, may be appropriate to further mitigate the risk of re-identification.

Other risk-mitigation measures that repositories can employ are discussed in Selecting a Data Repository. Awardees can also employ strategies found in NIST’s Privacy Framework.

  1.  Due to previously unanticipated approaches or technologies that become known, pose risks to participant privacy if released without controls on access. When such risks are identified prior to sharing the scientific data and not outlined in original Data Management and Sharing Plans, any changes to Data Management and Sharing Plans should be communicated to NIH consistent with the DMS Policy.

Need help selecting a data repository? Find some options at Repositories for Sharing Scientific Data.

In certain cases, it may be appropriate to share scientific data without access controls. Factors to consider when choosing whether to share data openly include the following:

  1. Participants explicitly consent to share scientific data openly without restrictions.
  2. Scientific data are de-identified and institutional review has determined that they pose very low risk when shared and used, including any risks posed by the presence of information that can allow inferences to be made about a participant’s identity when combined with other information.