About app input and output types
Overview
One of the key factors to successfully wrapping apps for use on BioData Catalyst powered by Seven Bridges is proper understanding and configuration of app inputs and outputs and their types. An optimal setup of app inputs and outputs will make the app easier and faster to run and easier to use in an automated scenario using the API or one of the API libraries.
Available types of app inputs and outputs on the Platform correspond to CWL types, but also apply to converted Nextflow and WDL apps and are classified as follows:
- Primitive types:
null
boolean
int
long
float
double
string
- Special types:
Any
File
Directory
- Complex types:
record
array
Primitive types
These types correspond to their counterpart data types in most well-known programming languages. The following table explains each of the types:
Type | Description |
---|---|
null | No value |
boolean | A binary value |
int | 32-bit signed integer |
long | 64-bit signed integer |
float | Dingle precision (32-bit) IEEE 754 floating-point number |
double | Double precision (64-bit) IEEE 754 floating-point number |
string | Unicode character sequence |
Here is an example of how these would be configured when defining an input schema for an app:
inputs
id use_index_file
type boolean
inputBinding
position1
prefix -f
id output_file_name
type string
inputBinding
position3
prefix -o
id threads
type int
inputBinding
position2
prefix -t
id index_file
type File?
inputBinding
prefix -i
position4
The code above represents a section of a YAML file where different app inputs are defined. This will be a part of the complete app description written in CWL, or a part of the configuration file when optimizing Nextflow or WDL apps for use on the Platform. For details about parameters available under inputBinding
please refer to CWL documentation.
Primitive types can also be defined for app outputs:
outputs
id id_tumor
outputSource
gatk_collectreadcounts_tumor/entity_id
type int
id entity_id_normal
outputSource
gatk_collectreadcounts_normal/entity_id
type string?
Special types
Available special input types are: Any
, File
and Directory
.
Any
The Any type validates for any non-null value.
File
File is one of the most common input and output types in bioinformatics analyses.
File inputs
File inputs have a number of properties that provide metadata about the file and here are some of the most important ones:
path
: The local path to the file within the execution environment. When running tasks through the Seven Bridges API, set path as the platform File ID.basename
: The name of the file without any leading directory path.secondaryFiles
: A list of additional files or directories that are associated with the primary file and must be transferred alongside the primary file.
Here is an example of a File input inside a YAML file containing the app description:
inputs
id bam_file
type'File?'
secondaryFiles
.bai
inputBinding
prefix --file=
separatefalse
position1
For more details about file inputs and available properties, please refer to CWL documentation.
File outputs
Files are also the most common app output type. File outputs can be produced in two ways:
- By getting the output file directly from an output port in a tool (node) in a workflow, by specifying
outputSource
in the<node_id>/<output_id>
format:
outputs
id out_tumor
outputSource sbg_group_outputs_tumor/out_file
type File
- By using a glob expression that matches the needed files in a tool:
outputs
id out_archive
outputBinding
glob'samples.tar.gz'
type File
File outputs can also be configured as arrays.
Directory
This input and output type represents a directory that is passed to the app or provided as an output of the app.
Directory inputs
When used as an input type, it has several properties that provide additional data about the input:
path
: The local path to the directory prior to app execution.basename
: The base name of the directory, without any leading directory path.listing
: List of files or subdirectories contained in this directory.
Here is an example of a directory input inside a YAML file containing the app description:
inputs
id samples
type Directory
basename"samples"
inputBinding
prefix --samples=
separatefalse
position2
Directory outputs
To configure a directory output, use the following syntax:
- If you are getting the output directory from an output port in a tool (node) in a workflow, specify
outputSource
in the<node_id>/<output_id>
format:
outputs
id normalized_samples
type Directory
outputSource sbg_group_outputs_tumor/out_dir
- If you are configuring a directory output in a command-line tool, use a glob expression that matches the needed files in a tool:
outputs
id normalized_samples
type Directory
outputBinding
glob'samples'
For more details about file inputs and available properties, please refer to CWL documentation.
Complex types
Array
Arrays are used to provide multiple values in a single input or output parameter and contain values that belong to the primitive types.
Array inputs
To define an input array in an app description, use one of the following two options:
- Define an input whose type is
array
, then define data types that can be present in the array.
inputs
id samples
type
type array
items File
inputBinding
prefix -F
inputBinding
position2
- Add brackets
[]
after the type name to indicate that input parameter is array of that type:
inputs
id sample_ids
type string
inputBinding
prefix -C=
itemSeparator","
separatefalse
position4
Array outputs
To create an output that will produce an array of values, use the following syntax:
outputs
output
type
type array
items File
outputBinding
glob'*.txt'
This specific output will return all files that match the *.txt
glob.
Record
Record inputs
Records are complex input types that are used to combine multiple arguments (fields) in a single input. They are useful when additional information needs to be passed along with primary data in an input. Here's how records are defined in an app description:
inputs
id record_input
type
'null'
type record
fields
file
type File
inputBinding
prefix -f
sample_id
type string
inputBinding
prefix -s
name record_input
inputBinding
position0
Enum
Enum inputs
Enum consists of a set of predefined values (symbols). When used as an input, an enum is defined as follows:
inputs
id format
type
'null'
type enum
symbols
bam
sam
bam_mapped
sam_mapped
fastq
inputBinding
position2
prefix'--format'
Input optionality
Inputs can either be required or optional. When an input is required, a value must be provided to it in order for the app to be executed. On the other hand, an optional input may not have a value (more precisely, the value of the input is null) but still allow the app to be executed normally. There are two ways to define an input as optional in the app description:
- By adding a question mark next to the type definition:
inputs
id threads
type int?
inputBinding
position2
prefix -t
In this example, type
is defined as int?
, meaning that the input type is integer, while the question mark defines the input as optional.
- By also adding
null
to the actual type of the input, as shown in the example below:
inputs
id threads
type
int
'null'
inputBinding
position2
prefix -t
In this example, the type
key contains an array of two values, where the first one is int
defining the actual type of the input, and the second one is 'null'
which specifies that the input is optional