Skip to content

Pipeline Overview

In the most basic sense, a pipeline (or "workflow") is an analysis process which consists of a set of discrete but connected tasks.

Pipelines in Bioinformatics

In the field of bioinformatics, the term "workflow" has taken on the more specialized meaning of a computational analysis which consists of multiple discrete tasks which can be organized into a directed acyclic graph, with the outputs of some tasks providing the inputs to others.

Terminology Note: You may see the various terms 'workflow', 'process', and 'pipeline' used interchangeably. While 'workflow' has taken on the above meaning in bioinformatics, the Cirro platform uses the term 'process' to describe an analysis which can be executed on a dataset, and which invokes a workflow on the back-end. Because the term 'process' is rather generic, the term 'pipeline' will be used frequently in the documentation for ease of understanding.

A handful of software projects have been developed to support this approach to computing, most notably Galaxy, Snakemake, Cromwell, and Nextflow.

With the increased adoption of workflow management software many bioinformaticians have taken on the role of "Workflow Developer", focusing their efforts on the development of reproducible computational analysis workflows which can be executed by many different users across institutions and infrastructures.

Pipelines in Cirro

Cirro was built to support the data analysis needs of researchers for robust, reproducible, and publishable analysis of complex datasets.

Curated no-code analysis pipelines:

Predicting analysis cost with AWS HealthOmics Ready2Run:

Deploying custom Nextflow/WDL pipelines:

Design principles of workflows for cloud computing:

If you have any questions about how to best analyze your data in Cirro, please contact our support team.

Computing with GPUs

An increasing number of analysis pipelines use Graphics Processing Units (GPUs), which can provide highly efficient analysis for certain specialized applications. To run a workflow which requires GPUs, first make sure that the Project has been set up to provide GPU-accelerated analysis nodes as part of its compute environments. By default Projects are not set up with any GPU-accelerated nodes, but they can be added by editing the Project attributes in the administrator menu.

Note: If GPUs are being added to a project for the first time, it may take 1-2 business days for the quota increase request to be granted by the AWS support team.

For workflow authors using Nextflow, GPUs may be requested for specific tasks using the accelerator attribute.

Note that when GPU-accelerated nodes are added to a project, those nodes may be used for non-GPU tasks if and only if there are no CPU-only nodes available in the compute environment.