Skip to content

AWS HealthOmics Ready2Run Pipelines

HealthOmics is a suite of AWS services that can be used to analyze large-scale genomics data. The HealthOmics Ready2Run pipelines are a set of curated workflows that are designed to run on AWS infrastructure, and are available for use in Cirro.

The primary difference in how Ready2Run pipelines are executed in Cirro is that the cost of running the workflow is predictable, and is determined before the analysis starts.

Predictable Analysis Costs

When running scientific workflows in the cloud, it is typically difficult to predict the exact cost of the analysis before it is run. In addition to differences in input data size and complexity, the cost of running a workflow can vary with the Spot Market price of EC2 instances, which is based on the supply and demand of compute resources in the region at a particular point in time.

AWS HealthOmics Ready2Run pipelines provides a solution to this problem by providing a fixed price for running the workflow. To account for the differences in input data size and complexity, the Ready2Run pipelines are priced in tiers based on the analysis type. For example, the analysis of genome sequencing data is tiered by the sequencing depth (e.g. 5X, 30X, 50X) and protein folding analysis is tiered by the polypeptide length (e.g. < 600aa vs. 600 - 1,200aa).

The fixed cost of a Ready2Run pipeline for a particular dataset may be higher or lower than running an analogous non-Ready2Run pipeline, depending on the specifics of the analysis and the current Spot Market price, but it will always be predictable.

Available Ready2Run Pipelines

Analysis Type Name
Protein Folding AlphaFold
Protein Folding ESMFold
DNA Sequencing Element AVITI Bases2FASTQ
DNA Sequencing GATK Best Practices FQ2BAM
DNA Sequencing GATK Best Practices Germline BAM2VCF
DNA Sequencing GATK Best Practices Somatic WES BAM2VCF
DNA Sequencing GATK Best Practices Germline FQ2VCF
DNA Sequencing NVIDIA Parabricks FQ2BAM WGS
DNA Sequencing NVIDIA Parabricks BAM2FQ2BAM WGS
DNA Sequencing NVIDIA Parabricks Germline HaplotypeCaller WGS
DNA Sequencing NVIDIA Parabricks Germline DeepVariant WGS
DNA Sequencing NVIDIA Parabricks Somatic Mutect2 WGS
DNA Sequencing Sentieon Germline WES/WGS
DNA Sequencing Sentieon Somatic WES/WGS
DNA Sequencing Sentieon LongRead for PacBio HiFi
DNA Sequencing Sentieon LongRead for ONT
DNA Sequencing Ultima Genomics DeepVariant
Single-Cell Sequencing nf-core/scRNAseq

Implementation Notes

Wall Clock Timeout: Analysis of Ready2Run pipelines is limited to 7.5 hours (450 minutes) of wall clock time.