Projects on BioData Catalyst powered by Seven Bridges

Overview

Projects are the core building blocks of BioData Catalyst powered by Seven Bridges. Each project corresponds to a distinct scientific investigation and serves as a container for its data, analysis workflows, and results. Multiple workflow executions can be carried out within a project.

Access to a project is restricted to the collaborators in the investigation. Each project has at least one administrator, who controls the project members' permissions to execute analyses.

You can be a member of multiple projects each with different teams of researchers.

Controlled- and Open-Data Project types

BioData Catalyst powered by Seven Bridges hosts both open- and controlled-data, which require different levels of access permissions. There are two types of projects: Open Data and Controlled Data projects.

To protect controlled data (if you are using any data that required dbGaP authorization, for example) in the investigation), you should choose the Controlled Data option at project creation.

Open Data Projects

Open Data Projects are designed to host both Open Data and your private data.

Open Data is available to all the users on the Platform upon sign up. Open Data contains data which is not unique to an individual, such as de-identified clinical data, gene expression data, copy number alterations in regions of the genome, epigenetic data, and summaries of data compiled across individuals.

Note that you cannot copy Controlled Data to an Open Data Project.

Controlled Data Projects

Controlled Data Projects can host both open- and controlled-data as well as your private data.

Access to controlled-data must be obtained through dbGaP. After obtaining permission, controlled data users need to register for BioData Catalyst powered by Seven Bridges with their eRA Commons credentials and agree to the data use and publication guidelines datasets. Learn more about signing up for the Platform or about dbGaP controlled data access.

Controlled-data contains data which may allow individuals to be identified, such as primary sequence data (CRAM files) and VCFs.

The Platform restricts access to controlled-data following dbGaP's model. This security ensures that data is as widely available as possible while protecting the privacy of study participants. Only users with Database of Genotypes and Phenotypes (dbGaP) permissions can access Controlled Data.

Controlled-data Projects are labeled CONTROLLED with a red tag and a lock symbol so you can recognize them easily.

**> 🚧 Losing dbGaP controlled-access

If a collaborator loses dbGaP controlled data access at any point, all controlled-data Project resources will become read-only. **Collaborators can see project resources and file metadata but cannot access and copy files or execute analyses.

Project locations

The BioData Catalyst powered by Seven Bridges currently works with these cloud providers: Amazon Web Services (AWS) and Google Cloud Platform (GCP). Learn more about project locations.