Release note - August 26, 2019

Access to task secondary files via the API

You can now use our sevenbridges-python client to access secondary files for task inputs and outputs.

New and improved functionality:

  1. API users can now see exactly which files were used as secondary files for inputs
  2. Python client can now easily get those files via a simple call, as shown in the example below
  3. All of this is also supported for CWL 1.x tools and workflows, where the secondary files can be defined as JS expressions

Some examples utilizing the sevenbridges-python API client:

import sevenbridges as sb
config = sb.Config(profile='default')
api = sb.Api(config=config)

task = api.tasks.get('439221a0-27c8-47a3-bcac-fcc5f44f82a8')
output_secondary_files = task.outputs['my_output'].secondary_files
input_secondary_files = task.inputs['my_input'].secondary_files
print(output_secondary_files)
print(input_secondary_files)

Please note that secondary files are captured from tasks as inputs or outputs, not from the file system. This means that the secondary_files property is available only when the file is pulled from the task itself, not when it is reloaded from the file system or directly instantiated from the file system via the api.files.get(<FILE_ID>) call or a similar one. The only supported way of getting secondary files is shown above - they need to be captured as soon as possible from the input file.

Learn more about the sevenbridges-python client.

Whole Genome Sequencing - Quality Control - CWL1.0 Workflow

Data quality control (QC) is an important component of NGS projects, especially with relatively costly whole genome sequencing (WGS). Timely QC can identify and account for issues with the starting biological material (DNA contamination or sample swaps), the sequencing process or bioinformatic pipelines used for processing.

Whole Genome Sequencing - Quality Control - CWL1.0 Workflow is intended as a general-purpose QC flow for users processing WGS data, regardless of the number of samples. It should offer plots which can be easily visually inspected by the end users, as well as structured data output suitable for aggregation and parsing in an automated setup. As it may be of interest to keep the cost and duration of single-sample tasks to a minimum in large-scale sequencing projects, the workflow is designed to be modular, with nodes that can be turned on/off on request, or segments completely skipped (based on input data availability, for example).