NHLBI studies metadata

Overview

Metadata is data that describes other data. On this page, we've detailed NHLBI studies metadata that are available for viewing and filtering NHLBI studies data in the Data Browser. NHLBI studies metadata on the BioData Catalyst powered by Seven Bridges consists of properties which describe the entities.

  • Entities are particular resources such as files, samples, subjects, and studies.

  • Properties can either describe an entity or relate that entity to another entity. For instance, properties include an entity's gender, data format, or biospecimen repository.

Entities for NHLBI studies

The following are entities for NHLBI studies. They represent clinical data, biospecimen data and data about NHLBI studies files. Learn more about NHLBI studies data.

  • Study
  • Subject
  • Sample
  • File

Read more about these entities and related properties below.

Study

The Study entity represents the NHLBI study.

PropertyDescription
Study nameTOPMed-assigned short study name (1:1 with phs number).
Study designStudy design is a process wherein the trial methodology and statistical analysis are organized to ensure that the null hypothesis is either accepted or rejected and the conclusions arrived at reflect the truth.
Biospecimen repositoryA biorepository is a biological materials repository that collects, processes, stores, and distributes biospecimens to support future scientific investigation.
Study diseaseDisease that is being investigated.
dbGap AccessiondbGaP study accession number.

Subject

The Subject entity represents person from whom the sample was taken and analyzed.

PropertyDescription
ConsentConsent group as determined by Data Access Committee (DAC).
DataStage Subject IDDataSTAGE subjects identifiers across datasets and systems, with the following form: StudyIdentifier(with version)_submitted subject ID (e.g. phs001218.v1_GS86970684)
DbGap Subject IDThe dbGaP Subject ID is a dbGaP assigned accession to the submitted Subject ID
Study AccessionEach study or sub-study is assigned an ID with a “phs” prefix, a version suffix and a participant suffix (e.g. phs000946.v4.p1)
Study Accession With Consent(e.g. phs000946.v3.p1.c1)
Study With ConsentDefines specific consent group (e.g. phs000946.c1)
SexSelf-reported sex or gender identity.
OrganismA living thing, such as an animal, a plant, a bacterium, or a fungus. (NCI Thesaurus Code: C14250)
Subject is affectedCase or control status of the subject.

Sample

The Sample entity represents an analyte or biological specimen sampled from a subject (e.g. DNA from blood).

PropertyDescription
BioSample IDA biosample ID assigned by dbGaP. These are unique across all studies in dbGaP.
Body SiteBody site where sample was collected.
Analyte typeAnalyte type(e.g. DNA, RNA).
Histological typeCell or tissue type or subtype - e.g. melanocytes, buccal cells, embryonic stem cells).
Is tumorThe tumour status.
DbGap Sample IDThe dbGaP Sample ID is a dbGaP assigned accession to the submitted Sample ID
Sra Sample IDThe SRA samples are given independent IDs at the different stage of data processing, handling, and archiving for different purposes (most of the SRA samples distributed through the dbGaP have submitted_sample_id, sra_accession, sra_sample_id, and dbgap_sample_id)

File

The File entity refers to the files in TOPMed project produced by aliquot analysis. Find the properties of the file entity below.

PropertyDescription
Access levelA Boolean value indicating Controlled Data or Open Data. Controlled Data is data from public studies that has limitations on use and requires approval by dbGaP. Open Data is data from public studies that doesn't have limitations on its use.
Assembly nameThe reference assembly (such as HG19 or GRCh37) to which the nucleotide sequence of a case can be aligned.
PlatformThe version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying.
Molecular data typeMolecular data type (e.g. SNP/CNV Genotypes).
FreezeTOPMed WGS genotype call sets.
Sequencing centerName of the center which conducted sequencing.
Assay typeDNA sequencing technique that was applied ( e.g. WGS ,WES).
InstrumentType of instrument that was being used for sequencing.
Data formatData format (e.g. CRAM, VCF).
Library nameName of the library.
Library layoutLibrary layout of a project (SINGLE or PAIRED end reads.)
Library selectionThe method used to select and/or enrich the material being sequenced.
Library sourceThe type of source material that is being sequenced.
Alignment providerInformation about who did the alignment.
Release dateDate when the sequencing data was released to SRA.
ConsentConsent group as determined by Data Access Committee (DAC).
CoverageCoverage refers to the number of times the sequencing machine will sequence a genome, the more times the genome is sequenced (ie the higher the coverage), the more accurate the data will be.
Data typeData type (e.g. Aligned Reads, Simple Germline Variation, Variant Call...)
GUIDUnique file identifier.
Study AccessionEach study or sub-study is assigned an ID with a “phs” prefix, a version suffix and a participant suffix (e.g. phs000946.v4.p1).
Study Accession With Consent(e.g. phs000946.v3.p1.c1)
Study With ConsentDefines specific consent group (e.g. phs000946.c1).