System Custom Pipelines

The Custom Pipelines page contains all the pipelines that have been uploaded by users in the organization. Organization administrators can access this page directly through the Pipelines page in the System section, or by navigating into a project's Pipelines page.

New custom pipelines can be added to Cirro from the "Add Pipeline" button at the top of the page. Pipelines can be viewed, re-synced, or edited by selecting the pipeline from the table.

Once a custom pipeline is ready for use, it can be added to projects either by the pipeline author or an organization admin. This can be done when first adding the pipeline or by editing it later and updating the "Project Visibility" options. After being added to a project, the pipeline will become visible on the regular Pipelines page and can be used by all members of the project.

Adding a Custom Pipeline

Click on the "Add Pipeline" button to open a new page and fill out the following information to add a custom pipeline.

custom pipelines

Process Type: The type of process you want to add. There are three options:
- Ingest: A process that runs when new data is ingested into Cirro. This process type is used to define dataset types that can be used as inputs for other processes.
- Nextflow: A process that runs a pipeline using Nextflow.
- Cromwell: A process that runs a pipeline using Cromwell (WDL).
Name: The name of the process, which is visible to the user.
Process ID: The unique ID of the process. This field is auto-generated from the name, but can be edited. This cannot be changed once the process is created.
Description: The description of the process, which is visible to the user.
Data Type: (Optional) The type of data this pipeline will output. This is only used as helper text for the output dataset to provide clarity for users. If omitted, users will just see the name of the pipeline. (Example: Aligned Reads)
Documentation URL: (Optional) A link to the documentation of the process. This could be an external website, a public repository's README, etc.
Category: (Optional) Select which group the process belongs to. (Example: Microbial Analysis)
Input Processes: (Optional) The processes that produce output files which can be used as inputs for this new process. These are also known as the "parent" processes.
Output Processes: (Optional) The processes that consume input files that come from this new process. These are also known as the "child" processes.
Pipeline supports multiple input datasets: Check the box if the process is set up to handle users selecting more than one dataset to run through the analysis together and produce a single result.
Pipeline supports Cirro sample sheet: Check the box if the process is set up to use the Cirro sample sheet. This is the recommended behavior for most pipelines as it allows the user to select samples, along with metadata, rather than individual files. See Preprocess Script for more information on how to read the sample sheet.

Define which projects will have access to use your pipeline using the "Project Visibility" section. Any user with Contributor-level permissions for an allowed project will be able to execute the workflow.

Available to All Projects: (Only available with requisite permissions) Allow pipeline to be available for all projects across you organization.
Available to Projects: (Optional) Select all Cirro project(s) that will have access to view and run this pipeline.

If the process type is not "Ingest", you will also need to fill out the following section about the repository that contains the pipeline's configuration files:

Configuration Repository Name: The GitHub repository which contains the pipeline configuration files, including process-form.json, process-input.json, and process-output.json. It can also contain preprocess.py and process-compute.config where applicable. (e.g. organization/repository).
Configuration Repository Type: If the repository with the pipeline code is private or public.
Configuration Repository Version: The branch, tag, or commit hash of the GitHub repository which contains the version of the configuration to be used.
Configuration Repository Folder: The folder, often named .cirro, which contains the configuration files for the workflow. If the files live in the top of the repository, leave this section blank.
Pipeline code is in a separate repository: Check this box if the repository that contains the pipeline definition and configuration files, named above, does not also contain the pipeline code. If checked, the options below will become available.

If the pipeline code is flagged as being in another repository, you will also need to fill out the following section on the repository with the pipeline code:

Pipeline Repository Name: The GitHub repository which contains the code for the pipeline itself. (e.g. organization/repository).
Pipeline Repository Type: If the repository with the pipeline code is private or public.
Pipeline Repository Version: The branch, tag, or commit hash of the repository which contains the version of the pipeline that should be used.

No matter the location of the pipeline code, please also provide:

Pipeline Entrypoint File: The file which will be run by the chosen workflow executor, like main.nf or main.wdl. See Relative Imports in WDL for more information on WDL-specific behavior.

*Note: When adding repository paths, do not include the https://github.com in https://github.com/organization/repository -- just organization/repository. If you repository is private, please make sure our GitHub app is installed so we can read your repository. At this time, only repositories hosted on GitHub are supported.

You will have the option to define any "Expected Files". For ingest processes, these are the files users are expected to upload. For nextflow or cromwell processes, these are the files output from the pipelines. Defining these files allows Cirro to parse sample information out from the file names using Cirro's auto-population feature and save out individual samples. For ingest processes, users will be able to override the file name requirements by using a samplesheet.

Add New File Type*: (Optional) Click the button to add information about the files that will be generated by this process. You can add as many expected file types as you require.
File Type Description: The description of the file type, which is visible to the user. (Example: Unaligned BAMs)
Minimum Files Expected: The minimum number of files required for this file type. If the file type is optional, set this value to 0.
Maximum Files Expected (Optional) The maximum number of files that can be uploaded for this file type. If there is no maximum, you can leave this blank.
Regex Pattern: Define the regular expression (or regex) pattern of file name that is expected to be uploaded. Be careful to use regex and not glob, as they are often mistaken for each other. Optionally, if you need to capture certain information from the file name and save it as file-level metadata in Cirro, you can use the table below for the most common capture patterns. (Example: If you have a file that starts with the sample name and then ends either in .fq.gz or .fastq.gz, the appropriate regex would be (?<sampleName>[\S ]*).(fq|fastq).gz). Use Pythex to test your regex patterns.

Variable	Regex Patterns	Explanation
SampleName	`(?<sampleName>[\S ]*)`	The variable "sampleName" will be set to the characters (including whitespace) present at the specified location in the string.
ReadType	`(?<readType>R\\|I)`	The variable "readType" will be set to be either R or I if one of those values is present at the specified location in the string.
Read	`(?<read>1\\|2\\|3)`	The variable "read" will be set to either the value of 1, 2, or 3 if one of those values is present at the specified location in the string. (You may add whatever numbers you expect, separated by the \| character.)

Add Alternative Pattern: Click the button to add more allowed options for how the input file names can be formatted for the same file type. These file patterns are combined with the OR operator, so any of them may be used by the user for the same result. You can add as many expected file name patterns to your file type as you require.

Updating a Custom Pipeline

After a pipeline has been added, it can be modified in terms of both user access and configuration.

Clicking on the row in the table for a single custom pipeline will open a pop up with information on the pipeline including the repository it was loaded from and the exact code commit used.

To update the pipeline configuration in Cirro to reflect the newest code available in that GitHub repository, click on the "Sync Pipeline" button.

Note: Any changes made to the GitHub repository containing the pipeline configuration files will not be applied in Cirro until the "Sync Pipeline" button is clicked. Only the pipeline author can sync with the source repository. Changes to the pipeline code will be reflected in whatever the version is set to in the pipeline configuration.

sync custom pipeline

You can also use the "Edit Pipeline" button to view and update all the options listed above. This includes the list of projects in which the pipeline can be run, which is under the "Project Visibility" options.

Learn More

Learn more about the entire process on creating pipelines in our pipeline documentation.