About the Common Workflow Language
Overview
Tool specifications entered using the Tool Editor are automatically transcribed into the Common Workflow Language (CWL). This is a community developed, open specification for reproducible data analyses or workflows which, once described using CWL, can be executed locally or in high performance cloud or cluster environments with the help of CWL-conformant execution engines. Learn more about CWL from its official website.
To develop and test CWL apps locally on your desktop before deploying on the Platform, use the Rabix tolkit. Develop apps locally for faster results, as you do not have to acquire an available cloud instance each time you want to test the workflow.
To get your first hands-on experience with CWL, please read the Common Workflow Language User Guide which will take you from writing your first simple tool using CWL, to creating a workflow that contains several different interconnected steps. By reading this guide, you should be able to understand how each of the CWL tasks is isolated and that there is an explicit definition of its inputs and outputs. It is the explicitness and isolation that allow tools and workflows described with CWL to be flexible, portable across different CWL implementations and CWL-compliant execution engines and scalable from simple local execution to large-scale complex execution environments.
CWL implementation on BioData Catalyst powered by Seven Bridges
BioData Catalyst powered by Seven Bridges supports the following versions of CWL:
sbg:draft-2
sbg:draft-2 is the first implementation of CWL on the Platform. This is essentially the Draft 2 version of the Common Workflow Language, with the addition of several extensions specific to the Seven Bridges execution environment. The extensions were implemented to add the required features that are were not natively supported in the Draft 2 specification of CWL but do present a common use case in bioinformatics analyses.
The following optional features (extensions) were implemented in sbg:draft-2:
- Resource hints - Define the minimum number of CPU cores and megabytes of RAM required for execution of an app.
- Stage input - Make inputs available in the tool's working directory.
- File metadata - Set metadata values for files produced as outputs of an app.
Some of the currently available public apps on BioData Catalyst powered by Seven Bridges are described in accordance with the sbg:draft-2 CWL specification. Also, all apps that are created using the Legacy Tool Editor on the Platform are described using the same CWL version. Such apps can be executed in any execution environment that supports the Seven Bridges extensions, such as those using the Rabix Executor, but are not guaranteed to execute successfully otherwise.
Note that all your existing tools and workflows will continue to work on the Platform just as they did before and will continue to be supported in the future.
CWL v1.0
CWL v1.0 is the CWL version that is widely accepted by the CWL community. Since the CWL v1.0 specification natively supports the custom extensions in the sbg:draft-2 CWL version, CWL v1.0 apps are also portable and executable in any other execution environment when using CWL v1.0-conformant executors such as the Rabix Executor from Seven Bridges.
Learn about CWL v.1.0 improvements over sbg:draft-2.
Extensions in CWL v1.0
When compared to custom extensions in sbg:draft-2 which are listed above, these extensions are dealt with in CWL v1.0 in the following way:
- Resource hints - Are an integral part of the CWL v1.0 specification (http://www.commonwl.org/v1.0/CommandLineTool.html#ResourceRequirement) and allow you to specify the basic hardware resource requirements. At the moment, supported requirements are number of CPU cores and megabytes of RAM required for execution of an app.
- Stage input - Implemented as InitialWorkDirRequirement. Solves the use case that used to be handled by the Stage Input extension in sbg:draft-2. The following example illustrates how the use of Stage Input in sbg:draft-2 and InitialWorkDirRequirement in CWL v1.0.
sbg:draft-2:
id: input
type:
type: array
items: File
sbg:stageInput: link
CWL v1.0:
inputs:
input:
type:
type: array
items: File
requirements:
- class: InlineJavascriptRequirement
- class: InitialWorkDirRequirement:
listing:
- $(inputs.input)
BioData Catalyst powered by Seven Bridges provides support for the execution of CWL v1.0 apps. Apps described using this CWL version can be added to a project on the Platform through the tool editor, through the API using raw CWL or by using the Rabix Composer.
CWL v1.0 support on BioData Catalyst powered by Seven Bridges
Not all CWL v1.0 features are currently supported on the Platform. Future implementations will address this. The following features are not supported in the current implementation:
- Document preprocessing is not supported. Code from included external files will not be resolved within the supplied CWL document.
- Instance selection is done based on CPU and memory requirements. Storage space requirements are not taken into consideration when selecting computation instances for a task.
- File formats are not resolved based on ontology.
Mixed CWL v1.0 and sbg:draft-2 apps
The Platform also supports the execution of workflows containing tools described using CWL v1.0 and tools described using sbg:draft-2. Such workflows are either be CWL v1.0 workflows that contain sbg:draft-2 tool(s) or sbg:draft-2 workflows that contain CWL v1.0 tool(s).
Key differences between sbg:draft-2 and CWL v1.0 on the Platform
See the table below for an overview of the currently available options for the two CWL versions on the Platform:
Option | sbg:draft-2 | CWL v1.0 | Mixed sbg:draft-2 and CWL v1.0 |
---|---|---|---|
Can be executed on the Platform | |||
Editable on the Platform | |||
Fully portable to other execution environments | |||
Can be added to the Platform through the API | |||
Can be added to the Platform through the visual interface | |||
Can be added to the Platform via Rabix Composer | |||
Can be edited in Rabix Composer |
CWL v1.1
Version 1.1 of the Common Workflow Language brings a number of changes and improvements compared to CWL v1.0. Full tool changelog and workflow changelog are available on the official CWL website, while the most important improvements that facilitate working with CWL on the Seven Bridges Platform are the following ones:
- Maximum execution time of a command line tool can now be defined.
- Secondary files can now be explicitly marked as required or not. Seven Bridges Platform validates whether required secondary files for task inputs are available in the project, before starting the task. Validation is currently only performed if a secondary file is not defined using a JavaScript expression. Validation for required secondary files is currently not performed in workflow steps during execution.
- Memoization (WorkReuse) can now be enabled or disabled at the level of a single tool in a workflow.
Mixed CWL v1.0 and v1.1 apps
The Platform allows creation of workflows that contain both CWL v1.0 and v1.1 tools. It is also possible to mix CWL v1.1 tools with those wrapped using the sbg:draft-2 version, but this is not recommended as such workflows would not be portable to other execution environments.
CWL v1.2
CWL v1.2 introduces additional improvements compared to previous CWL versions, with full tool changelog and workflow changelog available on the official CWL website. The most important addition to the CWL specification in CWL v1.2 is conditional execution of workflow steps, based on conditional expressions set in the step CWL code. For more information, see our detailed explanation of conditional execution and how to set it up on BioData Catalyst powered by Seven Bridges. Additionally, read about conditional execution in the CWL v1.2 specification.
Mixed apps containing CWL v1.2 and other CWL versions
CWL v1.2 tools can be used in workflows together with tools that were described using CWL v1.1or CWL v1.0 versions. It is also possible to mix CWL v1.2 tools with those wrapped using the sbg:draft-2 version of CWL, but this is not recommended as such workflows would not be portable to other execution environments.
Updated over 3 years ago