Pipeline Overview
In the most basic sense, a pipeline (or "workflow") is an analysis process which consists of a set of discrete but connected tasks.
Pipelines in Bioinformatics
In the field of bioinformatics, the term "workflow" has taken on the more specialized meaning of a computational analysis which consists of multiple discrete tasks which can be organized into a directed acyclic graph, with the outputs of some tasks providing the inputs to others.
Terminology Note: You may see the various terms 'workflow', 'process', and 'pipeline' used interchangeably. While 'workflow' has taken on the above meaning in bioinformatics, the Cirro platform uses the term 'process' to describe an analysis which can be executed on a dataset, and which invokes a workflow on the back-end. Because the term 'process' is rather generic, the term 'pipeline' will be used frequently in the documentation for ease of understanding.
A handful of software projects have been developed to support this approach to computing, most notably Galaxy, Snakemake, Cromwell, and Nextflow.
With the increased adoption of workflow management software many bioinformaticians have taken on the role of "Workflow Developer", focusing their efforts on the development of reproducible computational analysis workflows which can be executed by many different users across institutions and infrastructures.
Pipelines in Cirro
Cirro was built to support the data analysis needs of researchers for robust, reproducible, and publishable analysis of complex datasets.
Curated no-code analysis pipelines:
- DNA Sequencing Pipelines
- RNA Sequencing Pipelines
- Single-Cell Sequencing Pipelines
- Flow Cytometry Analysis Pipelines
- Image Analysis
- Microbial Analysis Pipelines
- Nanopore Sequencing
- Proteomics Pipelines
- Protein Structure Pipelines
- Spatial Analysis
- Statistical Analysis Pipelines
- Targeted Sequencing Pipelines
- Quality Control Pipelines
Predicting analysis cost with AWS HealthOmics Ready2Run:
- Learn about AWS HealthOmics in Cirro
Deploying custom Nextflow/WDL pipelines:
- Learn how to Add Your Own Workflows to Cirro
Design principles of workflows for cloud computing:
- Read our philosophy on Developing Reproducible Workflows
If you have any questions about how to best analyze your data in Cirro, please contact our support team.
Computing with GPUs
An increasing number of analysis pipelines use Graphics Processing Units (GPUs), which can provide highly efficient analysis for certain specialized applications. To run a workflow which requires GPUs, first make sure that the Project has been set up to provide GPU-accelerated analysis nodes as part of its compute environments. By default Projects are not set up with any GPU-accelerated nodes, but they can be added by editing the Project attributes in the administrator menu.
Note: If GPUs are being added to a project for the first time, it may take 1-2 business days for the quota increase request to be granted by the AWS support team.
For workflow authors using Nextflow, GPUs may be requested for specific tasks using the accelerator attribute.
Note that when GPU-accelerated nodes are added to a project, those nodes may be used for non-GPU tasks if and only if there are no CPU-only nodes available in the compute environment.