About data access


Although all data available through the BioData Catalyst powered by Seven Bridges have been stripped of direct identifiers, DNA information is inherently unique to an individual.

There are two types of data access tiers, Open Data and Controlled Data, have been put in place to balance the desire to make the data as widely available as possible while ensuring that the rights of study participants are well protected.

Data access tiers

Open Data

Open Data includes information which is not unique to an individual. This includes information such as:

  • De-identified clinical and demographic data
  • Gene expression data
  • Copy number alterations in regions of the genome
  • Epigenetic data
  • Summaries of data across individuals

All Platform users have access to Open Data as long as they agree to the data use and publication guidelines for all relevant datasets at sign-up.

Controlled Data

Controlled Data includes information which is unique to an individual. This includes most raw data files and some processed data such as:

  • Primary sequencing data (CRAM files) from DNA, RNA, miRNA or bisulfite sequencing studies
  • Variant calls for an individual (VCF files)

Each researcher requiring access to Controlled Data for their studies must have their dataset approvals on the NHLBI BioData Catalyst whitelist.

Researchers must be approved by the NCBI Database of Genotypes and Phenotypes (dbGaP) for the TOPMed studies they wish to access on BioData Catalyst.

Accessing dbGaP controlled data

To access dbGaP controlled data you should first make sure you have an eRA Commons account as well as access permissions through the dbGaP.

Next, you should create a new account on the Platform by using your eRA Commons credentials.