Release notes - April 19, 2021

RAS-BioDataCatalyst Integration Phase 1 Completed

The Researcher Auth Service (RAS), sponsored by The Office of Data Science Strategy, is a service provided by NIH’s Center for Information Technology (CIT) to facilitate access to NIH’s open and controlled data assets and repositories in a consistent and user-friendly manner.

The RAS initiative is advancing data infrastructure and ecosystem goals defined in the NIH Strategic Plan for Data Science. RAS has adopted the Global Alliance for Genomics and Health (GA4GH) standards for integration of researcher-focused applications and data repositories over the OIDC platform.

The goal for this effort is to coordinate all cloud stacks and use RAS identically across systems. The NCI CRDC (Cancer Research Data Commons) stack was chosen for the pilot phase to create a phased approach that should achieve the larger goals of federated data access using GA4GH Passports, with a focus on how this fits in with NIH data in general.

Phase 1 is now completed on BioData Catalyst, introducing a change to the login flow when using eRA Commons:

  • When choosing login with eRA Commons on BioData Catalyst powered by SevenBridges, you will now be redirected to the NIH RAS login screen instead of iTrust.
  • Other than the login flow change, user experience on the Platform remains the same.

Recently published apps

The following apps were published in CWL1.x:

  • Single Cell Multi Sample Pairwise Differential Expression Workflow - pipeline that performs differential expression analysis on single cell data between pairs of user defined conditions.
  • Minimap2 v2.17 - a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database, tailored for use with long read sequencing technologies.
  • fastqValidator 0.1.1 - checks format correctness of paired-end and single-end FASTQ files.
  • FastP 0.20.1 - ultra-fast FASTQ preprocessor with useful quality control and data-filtering features, including adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of FASTQ data.
  • SBG convert SRA/BAM to FASTQ - an all-in-one tool that converts SRA/SAM/BAM/CRAM files into FASTQ format.
  • SBG Create Expression Matrix - creates aggregated matrices from various types of inputs, most typically from abundance estimates produced by tools like RSEM, Salmon, or Kallisto.
  • SHAPEIT 4.2.1 - phasing tool for sequencing and SNP array data.
  • Regenie 2.0.1 - tool for whole genome regression analysis.
  • UMI-tools 1.1.1 - tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes.