Skip to content

Pipeline Catalog: Single-Cell Sequencing

Uploading Data

Single-cell sequencing datasets should be uploaded as a collection of paired-end FASTQ files (gzip-compressed). There are two options for how the data can be formatted for upload:

  1. GEX-only: Use the sample IDs encoded in the FASTQ file names, or
  2. Others: Use a samplesheet CSV to specify the sample IDs for each pair of FASTQs.
Parsing sample IDs from FASTQ file names

When parsing the sample IDs from the FASTQ file names, the standard format used by the CellRanger Demultiplexing software will be used to assign the sample name ("SampleName" in the examples below) to each of the FASTQ files:

Note that each sample must have four files corresponding to Read 1 (R1), Read 2 (R2), Index 1 (I1), and Index 2 (I2).

Naming Pattern:

SampleName_S1_L001_R1_001.fastq.gz
┃           ┃    ┃ ┃┃┗━ Extension must be '_001.fastq.gz'
┃           ┃    ┃ ┃┗━━ Read/index pair: '1' (with matching '2')
┃           ┃    ┃ ┗━━━ Read ('R') or Index ('I')
┃           ┃    ┗━━━━━ Lane on Illumina sequencer
┃           ┗━━━━━━━━━━ Sample index number
┗━━━━━━━━━━━━━━━━━━━━━━ Sample identifier ('SampleName' in this case)

Organizing data with a sample sheet

The advantages of using a sample sheet when uploading data are (a) the file names do not have to follow any of the patterns listed above, and (b) additional sample metadata can be added en masse. To use a sample sheet, simply create a file named samplesheet.csv in the folder containing the data to be uploaded with the format:

sample,fastq_1,fastq_2
SampleA,SampleA.R1.fastq.gz,SampleA.R2.fastq.gz
SampleB,SampleB.R1.fastq.gz,SampleB.R2.fastq.gz

Note that the file names do not need to match any particular pattern.

Any additional metadata can be added as columns to the sample sheet. For example, multimodal analysis with the CellRanger pipeline can use information on grouping and feature_types for each sample, as described below.

Example (GEX + VDJ):

sample,fastq_1,fastq_2,grouping,feature_types
SampleA_GEX,SampleA_GEX.R1.fastq.gz,SampleA_GEX.R2.fastq.gz,SampleA,Gene Expression
SampleA_VDJ,SampleA_VDJ.R1.fastq.gz,SampleA_VDJ.R2.fastq.gz,SampleA,VDJ

Visualization of Gene Expression

In addition to the summary images produced by CellRanger, the results of single-cell gene expression analysis can be visualized directly in Cirro using an interactive display. Visualizations which are available for a dataset can be opened by clicking on the button immediately above the file browser:

select-visualization

After selecting the dataset of interest, visualization will load directly in the browser:

display-visualization

10X Single Cell Sequencing (cellranger)

Data Type

Single-cell sequencing data produced by the 10X platform can be analyzed using the CellRanger software suite produced by that company. Demultiplexed FASTQ files produced by the 10X platform can be uploaded as a Single-cell sequencing data (10X) dataset and the sample name will be inferred automatically from the file names.

Single-cell sequencing datasets can be analyzed for:

When uploading data which includes Gene Expression combined with another modality, it is strongly recommended to upload a samplesheet.csv along with the FASTQ files which indicates the modality used for each individual sample.

Example (GEX + VDJ):

sample,fastq_1,fastq_2,grouping,feature_types
SampleA_GEX,SampleA_GEX.R1.fastq.gz,SampleA_GEX.R2.fastq.gz,SampleA,Gene Expression
SampleA_VDJ,SampleA_VDJ.R1.fastq.gz,SampleA_VDJ.R2.fastq.gz,SampleA,VDJ
User Guide

Sample Metadata

If you feel comfortable with JSON files, you can follow the instructions below to edit the JSON schemas for metadata. If not, contact the Cirro team for assistance.

For datasets which contain both GEX and V(D)J information, the user must specify which samples should be grouped together, and which correspond to each sequencing type.

To provide the appropriate metadata, provide a samplesheet.csv along with the uploaded FASTQ files which includes this information. The columns required for this samplesheet.csv are (example shown above):

  • sample: Unique identifier used for each sequencing library
  • fastq_1: Read 1 sequences in FASTQ format
  • fastq_2: Read 2 sequences in FASTQ format
  • grouping: Group label indicating which sequencing libraries (GEX, VDJ, etc.) should be combined
  • feature_types: Defined label indicating the analysis modality

Options for feature_types

  • "Gene Expression"
  • "VDJ"
  • "VDJ-T"
  • "VDJ-B"
  • "Antibody Capture"
  • "CRISPR Guide Capture"

To allow for easy editing of the information provided in the samplesheet.csv directly in the Samples page (using the defined metadata schema), add fields for grouping and feature_types to the metadata.schema.json for the project as shown here:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "grouping": {
      "type": "string"
    },
    "feature_types": {
      "type": "string",
      "enum": [
        "Gene Expression",
        "VDJ",
        "VDJ-T",
        "VDJ-B",
        "Antibody Capture",
        "CRISPR Guide Capture",
        ""
      ]
    }
  }
}

Next, edit the sample metadata so that samples which were processed in parallel from the same batch of cells have the same unique identifier in the grouping column. Also make sure that the appropriate values are selected for each of the feature types, indicating if they are GEX or a type of V(D)J sequencing.

Workflow Repository: github.com/FredHutch/nf-cellranger-tools

Citations:

CellRanger Gene Expression

Data Type

CellRanger gene expression analysis can be performed for any sequencing data produced by Chromium Single Cell Gene Expression. The output of this analysis is a set of files summarizing the relative expression of each gene detected across each of the cells prepared in the sample.

Parameters:

  • Transcriptome: Select the appropriate reference genome (human, mouse, or combined)
  • Include Introns: Retain reads which align to intronic regions of genes
  • CellRanger Version: Specify the CellRanger software version to be used for analysis
Technical Details

Workflow Repository: github.com/FredHutch/nf-cellranger-tools

Citations:

CellRanger V(D)J

Data Type

CellRanger V(D)J analysis can be performed for any sequencing data produced by Chromium Single Cell 5' V(D)J libraries. The output of this analysis is a set of files summarizing the reconstructed V(D)J alleles from each cell.

Parameters:

  • Genome: Select the appropriate reference genome (human or mouse)
  • CellRanger Version: Specify the CellRanger software version to be used for analysis
Technical Details

Workflow Repository: github.com/FredHutch/nf-cellranger-tools

Citations:

CellRanger Multi Analysis

Data Type

CellRanger multi analysis can be performed for any sequencing data produced with a combination of Chromium Single Cell Gene Expression with other modalities including 5' V(D)J, Antibody Capture, or CRISPR Guide Capture.

The implementation of cellranger multi provided in this workflow does not currently support CMO multiplexing - please reach out to support@cirro.bio if you have an interest in CMOs.

Uploading Data:

Input FASTQ data must be annotated to indicate which files contain each type of library using the samplesheet.csv approach shown above. This information will be used to automatically construct the config CSV required by cellranger multi.

Feature References:

When analyzing Antibody Capture or CRISPR Guide Capture data, you must provide the appropriate Feature Reference CSV file. First upload that file to Cirro from the References page as the CellRanger Feature Reference (CSV) file type. Make sure to provide an appropriate name for the reference CSV which you upload. Then, when launching an analysis of CellRanger Multi, select the reference appropriate for that dataset.

Parameters:

  • Transcriptome: Select the appropriate reference genome (human, mouse, or combined)
  • V(D)J Reference: Reference genome used for alignment of V(D)J data (human or mouse)
  • Include Introns: Retain reads which align to intronic regions of genes
  • Feature Reference: Select the Feature Reference CSV which has been uploaded to the References page
  • CellRanger Version: Specify the CellRanger software version to be used for analysis
Technical Details

Workflow Repository: github.com/FredHutch/nf-cellranger-tools

Citations:

CellRanger Flex

Data Type

Fixed RNA Profiling data can be analyzed using the CellRanger software suite provided by 10X Genomics.

Barcodes

When analyzing 10X data produced by Fixed RNA Profiling, the barcode used for each sample must be listed at the time of analysis. This information will be used to automatically populate a configuration CSV used by cellranger multi for analyzing this sample type.

Probe Sets

By default, the Chromium Probe Set v1.0.1 will be used for analysis. Custom probe sets are also supported by this analysis workflow. First upload the probe set CSV provided by 10X Genomics as a Pipeline Reference, selecting the type "CellRanger Probe Set (CSV)". Then select that uploaded probe set when analyzing the associated sequencing data.

Parameters:

  • Reference Genome: Select the appropriate reference genome (human or mouse)
  • Custom Probe Set: Optionally select a custom probe set to use for analysis
  • Samples: List the samples used for each barcode (BC001, BC002, etc.)
  • CellRanger Version: Specify the CellRanger software version to be used for analysis
Technical Details

Workflow Repository: github.com/FredHutch/nf-cellranger-tools

Citations:

CellRanger Multiome ATAC + GEX

Data Type

Analyze Single Cell Multiome ATAC + Gene Expression data using the CellRanger ARC software suite provided by 10X Genomics.

Input Data:

After processing a sample for Multiome ATAC + Gene Expression, raw sequencing data will be produced in the form of FASTQ files both for chromatin accessibility (ATAC) as well as gene expression (GEX) information. In order to analyze these datasets, the FASTQ files must be marked appropriately according to the data type that they represent.

The best way to annotate the input FASTQ data is using a samplesheet.csv with columns for:

  • sample: Identifier for the sequencing library
  • fastq_1: Name of the R1 (or I1) FASTQ file
  • fastq_2: Name of the R2 (or I2) FASTQ file
  • grouping: Identifier for the sample analyzed for both GEX and ATAC
  • feature_types: Either Gene Expression, or Chromatin Accessibility

Example:

sample,fastq_1,fastq_2,grouping,feature_types
sampleA_gex,sampleA_gex_S1_L001_R1_001.fastq.gz,sampleA_gex_S1_L001_R2_001.fastq.gz,sampleA,Gene Expression
sampleA_atac,sampleA_atac_S2_L001_R1_001.fastq.gz,sampleA_atac_S2_L001_R2_001.fastq.gz,sampleA,Chromatin Accessibility

Note: The file described above should be named samplesheet.csv and can be uploaded either (1) along with the FASTQ files during initial upload or (2) by clicking on the "Upload Samplesheet" button for a previously-uploaded dataset.

Parameters:

  • Reference Genome: Select the appropriate reference genome (human or mouse)
  • CellRanger ARC Version: Specify the CellRanger ARC software version to be used for analysis
Technical Details

Workflow Repository: github.com/FredHutch/nf-cellranger-tools

Citations:

Single-Cell Azimuth Projection

Data Type

Single-cell sequencing provides an incredibly detailed description of the genes being expressed by individual cells from a complex tissue. However, it can sometimes be difficult to interpret this information in a coordinated way across multiple specimens, analysis batches, or sampling modalities. The Satija lab published a reference-based approach (Azimuth), which projects newly collected datasets into the multidimensional space established for a set of curated tissue-specific atlases. This approach can be used to quickly annotate cell types and align UMAP ordinations for new datasets to facilitate rapid comparison.

The Azimuth analysis can be run on single-cell gene expression datasets in Cirro, and will produce as an output an updated Seurat (h5seurat) or Scanpy (h5ad) object which can be used for further downstream analysis.

Azimuth Human Motor Cortex Reference Atlas:

human brain umap

Azimuth References:

Citation:

  • Hao, Yuhan et al. “Integrated analysis of multimodal single-cell data.” Cell vol. 184,13 (2021): 3573-3587.e29. doi:10.1016/j.cell.2021.04.048

Aggregate CellRanger Outputs

Data Type

Combine the results from multiple datasets with CellRanger aggr.

Many experiments generate data for multiple samples. Depending on the experimental design, these could be replicates from the same set of cells, cells from different tissues or time points from the same individual, or cells from different individuals. Samples could be processed through different Gel Bead-in Emulsion (GEM) wells or multiplexed within the same GEM well on Chromium instruments. The cellranger aggr pipeline can be used to aggregate samples from these scenarios into a single feature-barcode matrix.

When a single dataset in Cirro contains results from multiple samples, those results can be combined into a single set of outputs. This can be particularly useful when there is a need to project cells from multiple samples into the same t-SNE ordination and gene expression clusters.

Technical Details

Workflow Repository: github.com/FredHutch/nf-cellranger-tools

Citations:

Visium: Spatial Transcriptomics

Data Type

Spatial Transcriptomics datasets generated on the Visium platform with custom probes can be analyzed on Cirro using the Space Ranger software provided by 10X.

Probe Reference:

To get ready for running the analysis, build an analysis reference using the custom probes included in the analysis.

Steps:

  1. Upload the Visium sequencing data as a "10X Single-Cell (FASTQ)" dataset
  2. Upload the image files generated for those samples (see note below)
  3. Open the "Analyze Visium Spatial Transcriptomics" pipeline
  4. Run using the appropriate datasets for (a) FASTQs, (b) images, and (c) analysis reference

Images:

When uploading images, make sure to provide a samplesheet.csv file which matches up the images to the appropriate sequencing dataset. Use columns slide and area to include details on Visium Slide Serial Number and Capture Area. If those columns are omitted, then Space Ranger will be run with the --unknown-slide parameter.

For example:

sample,file,slide,area
sampleA,sampleA_image.tif,V19L01-041,A1
sampleB,sampleB_image.tif,V19L01-041,B1
sampleC,sampleC_image.tif,V19L01-041,D1

Images can be provided for one of:

  • CytAssist brightfield image
  • Colorized image (TIFF or JPEG)
  • Multi-channel, dark-background fluorescence image (TIFF)
  • Single H&E brightfield image in either TIFF or JPG format
Technical Details

Workflow Repository: github.com/FredHutch/nf-cellranger-tools

Citations:

Visium: Custom Probes

Data Type

When analyzing Visium Spatial Transcriptomics data which has been prepared using custom probes, the reference genome must be combined with those probes prior to running that analysis.

Steps:

  1. Upload the custom probe(s) as a "Nucleotide Sequences (FASTA)" dataset (either in a single or multiple files)
  2. Open the "Build SpaceRanger Reference" pipeline and select the uploaded probes
  3. Select the appropriate reference genome (human or mouse) to combine with those probes
  4. Provide a memorable name for the custom reference and click "Run"
  5. Once the custom reference is finished building, it can be used with the "Visium Spatial Transcriptomics (Custom Probes) pipeline
Technical Details

Workflow Repository: github.com/FredHutch/nf-cellranger-tools

Citations: