Virscan
VirScan
The presence of antibodies against specific epitopes can be estimated by measuring the degree of antibody binding to complex phage display libraries, a process generally called Phage Immunoprecipitation Sequencing (PhIP-Seq). The VirScan process uses PhIP-Seq to detect antibodies binding to a panel of viral epitopes, integrating antibody binding data across many peptides to infer past exposure to specific pathogens.
User Guide
References:
Reference libraries for VirScan analysis consist of two files, one containing the list of epitope sequences to search, and one containing a list of "public" epitopes which are more commonly detected across individuals. Any reference library may be used which conforms to the format shown here for the Vir3 library:
After formatting the necessary reference CSV files, upload them to Cirro
from the References page using the VirScan Library
type.
Make sure to upload both the library and public epitopes CSV to the same reference.
When analyzing VirScan data, select the appropriate reference using the
name which was provided at the time of upload.
Uploading Data:
When uploading sequencing data (in FASTQ format) from VirScan assays, it is important that the experimental replicates are marked appropriately. The VirScan analysis process compares the degree of antigen recognition across experimental replicates to better identify high-confidence predictions.
To ensure that experimental replicates are marked appropriately, the best approach
is to use a sample sheet (uploading a file named samplesheet.csv
) which assigns a
sample name to each FASTQ file.
The sample names should go in a column named sample
, while the FASTQ files should
be listed in a column named fastq_1
, with one line per file.
In addition, the sample sheet should include a column called control_status
which
indicates whether the sample is a control sample (beads_only
) or an experimental
sample (empirical
).
Every batch of analysis should include at least one beads_only
control and one
empirical
sample (although greater numbers of controls are better).
An example samplesheet.csv
(with two samples, two controls, and two replicates each) may look like this:
sample,fastq_1,control_status
sample1,sample1_rep1_S1_R1_001.fastq.gz,empirical
sample1,sample1_rep2_S2_R1_001.fastq.gz,empirical
sample2,sample2_rep1_S3_R1_001.fastq.gz,empirical
sample2,sample2_rep2_S4_R1_001.fastq.gz,empirical
control1,control1_rep1_S5_R1_001.fastq.gz,beads_only
control1,control1_rep2_S6_R1_001.fastq.gz,beads_only
control2,control2_rep1_S7_R1_001.fastq.gz,beads_only
control2,control2_rep2_S8_R1_001.fastq.gz,beads_only
Note: File names which match the pattern shown above (SAMPLE_REP_SN_R1_001.fastq.gz) can be uploaded without a sample sheet and will be parsed appropriately. However, that approach is not recommended because it does not support sample names with underscores, and
control_status
will need to be filled in manually.
Sample Metadata:
If you feel comfortable with JSON files, you can follow the instructions below to edit the JSON schemas for metadata. If not, contact the Cirro team for assistance.
After uploading the samples, the user must indicate which were generated from beads-only controls, and which were generated from empirical samples.
If a sample sheet was used to automatically annotate control status at upload time, it is still helpful to follow the steps below so that the sample annotation page can be used to view and edit control status.
To add the appropriate field in the sample annotation page, upload the following metadata.schema.json
:
{
"$id": "https://json-schema.org/draft/2020-12/schema",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"control_status": {
"type": "string",
"title": "Treatment Status",
"description": "Indicates whether a sample is a treatment or a control",
"enum": [
"beads_only",
"empirical"
],
"enumNames": [
"Control (beads only)",
"Empirical sample"
]
}
}
}
After updating the metadata schema, you will be able to mark each of the uploaded samples as either treatment or control.
Parameters:
- Read / Peptide Length: Indicates the length of the sequence generated from each epitope which is listed in the library CSV (both read and peptide length should be the same in most cases)
- Num. Mismatches: The number of mismatches between the sequence read and the reference sequence which are allowed to count it as a hit
- Z-score Threshold: Used for identifying significantly enriched peptides in each treatment sample relative to the beads-only controls for the CPM Enrichment Analysis
- Enrichment Modeling: While the CPM Enrichment Analysis should work even with low numbers of beads-only controls, the Negative Binomial Modeling is only expected to perform well with larger numbers (>=10) of controls
Workflow Repository: github.com/matsengrp/phip-flow
Citations:
- PhIP-Seq: Mohan D, Wansley DL, Sie BM, Noon MS, Baer AN, Laserson U, Larman HB. PhIP-Seq characterization of serum antibodies using oligonucleotide-encoded peptidomes. Nat Protoc. 2018 Sep;13(9):1958-1978. doi: 10.1038/s41596-018-0025-6. Erratum in: Nat Protoc. 2018 Oct 25;: PMID: 30190553; PMCID: PMC6568263.
- VirScan: Xu GJ, Kula T, Xu Q, Li MZ, Vernon SD, Ndung'u T, Ruxrungtham K, Sanchez J, Brander C, Chung RT, O'Connor KC, Walker B, Larman HB, Elledge SJ. Viral immunology. Comprehensive serological profiling of human populations using a synthetic human virome. Science. 2015 Jun 5;348(6239):aaa0698. doi: 10.1126/science.aaa0698. PMID: 26045439; PMCID: PMC4844011.