NHLBI studies metadata

Overview

Metadata is data that describes other data. On this page, we've detailed NHLBI studies metadata that are available for viewing and filtering NHLBI studies data in the Data Browser. NHLBI studies metadata on the BioData Catalyst powered by Seven Bridges consists of properties which describe the entities.

Entities are particular resources such as files, samples, subjects, and studies.
Properties can either describe an entity or relate that entity to another entity. For instance, properties include an entity's gender, data format, or biospecimen repository.

Entities for NHLBI studies

The following are entities for NHLBI studies. They represent clinical data, biospecimen data and data about NHLBI studies files. Learn more about NHLBI studies data.

Study
Subject
Sample
File

Read more about these entities and related properties below.

Study

The Study entity represents the NHLBI study.

Property	Description
Study name	TOPMed-assigned short study name (1:1 with phs number).
Study design	Study design is a process wherein the trial methodology and statistical analysis are organized to ensure that the null hypothesis is either accepted or rejected and the conclusions arrived at reflect the truth.
Biospecimen repository	A biorepository is a biological materials repository that collects, processes, stores, and distributes biospecimens to support future scientific investigation.
Study disease	Disease that is being investigated.
dbGap Accession	dbGaP study accession number.

Subject

The Subject entity represents person from whom the sample was taken and analyzed.

Property	Description
Consent	Consent group as determined by Data Access Committee (DAC).
DataStage Subject ID	DataSTAGE subjects identifiers across datasets and systems, with the following form: StudyIdentifier(with version)_submitted subject ID (e.g. phs001218.v1_GS86970684)
DbGap Subject ID	The dbGaP Subject ID is a dbGaP assigned accession to the submitted Subject ID
Study Accession	Each study or sub-study is assigned an ID with a “phs” prefix, a version suffix and a participant suffix (e.g. phs000946.v4.p1)
Study Accession With Consent	(e.g. phs000946.v3.p1.c1)
Study With Consent	Defines specific consent group (e.g. phs000946.c1)
Sex	Self-reported sex or gender identity.
Organism	A living thing, such as an animal, a plant, a bacterium, or a fungus. (NCI Thesaurus Code: C14250)
Subject is affected	Case or control status of the subject.

Sample

The Sample entity represents an analyte or biological specimen sampled from a subject (e.g. DNA from blood).

Property	Description
BioSample ID	A biosample ID assigned by dbGaP. These are unique across all studies in dbGaP.
Body Site	Body site where sample was collected.
Analyte type	Analyte type(e.g. DNA, RNA).
Histological type	Cell or tissue type or subtype - e.g. melanocytes, buccal cells, embryonic stem cells).
Is tumor	The tumour status.
DbGap Sample ID	The dbGaP Sample ID is a dbGaP assigned accession to the submitted Sample ID
Sra Sample ID	The SRA samples are given independent IDs at the different stage of data processing, handling, and archiving for different purposes (most of the SRA samples distributed through the dbGaP have submitted_sample_id, sra_accession, sra_sample_id, and dbgap_sample_id)

File

The File entity refers to the files in TOPMed project produced by aliquot analysis. Find the properties of the file entity below.

Property	Description
Access level	A Boolean value indicating Controlled Data or Open Data. Controlled Data is data from public studies that has limitations on use and requires approval by dbGaP. Open Data is data from public studies that doesn't have limitations on its use.
Assembly name	The reference assembly (such as HG19 or GRCh37) to which the nucleotide sequence of a case can be aligned.
Platform	The version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying.
Molecular data type	Molecular data type (e.g. SNP/CNV Genotypes).
Freeze	TOPMed WGS genotype call sets.
Sequencing center	Name of the center which conducted sequencing.
Assay type	DNA sequencing technique that was applied ( e.g. WGS ,WES).
Instrument	Type of instrument that was being used for sequencing.
Data format	Data format (e.g. CRAM, VCF).
Library name	Name of the library.
Library layout	Library layout of a project (SINGLE or PAIRED end reads.)
Library selection	The method used to select and/or enrich the material being sequenced.
Library source	The type of source material that is being sequenced.
Alignment provider	Information about who did the alignment.
Release date	Date when the sequencing data was released to SRA.
Consent	Consent group as determined by Data Access Committee (DAC).
Coverage	Coverage refers to the number of times the sequencing machine will sequence a genome, the more times the genome is sequenced (ie the higher the coverage), the more accurate the data will be.
Data type	Data type (e.g. Aligned Reads, Simple Germline Variation, Variant Call...)
GUID	Unique file identifier.
Study Accession	Each study or sub-study is assigned an ID with a “phs” prefix, a version suffix and a participant suffix (e.g. phs000946.v4.p1).
Study Accession With Consent	(e.g. phs000946.v3.p1.c1)
Study With Consent	Defines specific consent group (e.g. phs000946.c1).

Updated less than a minute ago