Microbial pangenome anvio
Microbial Pangenome Anvi'o
Because of the high degree of diversity within microbial genomes, the term "pangenome" is used to refer to the aggregate content of a collection of genomes. When inspecting a bacterial pangenome, it is very common to find that there are large groups of genes which are found in only a subset of genomes. These blocks of genomic content very often define clades or subspecies, but can also be associated with horizontal gene transfer.
The anvi'o pangenome analysis tool is used widely in the field to summarize the content of a collection of microbial genomes. This includes:
- Identifying gene clusters (similar genes across genomes)
- Estimating relationships of genomes based on gene clusters
- Annotating the likely function of amino acid sequences
- Identifying gene clusters or functions enriched in a subset of genomes
The anvi'o software suite also provides an interactive viewer which can be installed on your computer to provide a rich visualization interface using the output from the pangenome analysis tool.
Supports combining input datasets in a single analysis.
User Guide
Analysis Parameters:
- mcl_inflation: Gene Clustering Threshold (mcl_inflation)
- Ranges from 2 (approximately species-level) to 10 (approximately strain-level)
- default: 2
- minbit: Sequence Similarity Threshold (minbit)
- Threshold of sequence similarity used to group genes (ranges 0-1)
- default: 0.5
- min_occurrence: Minimum Occurrence
- Filter out any genes which are found in fewer than this number of genomes
- default: 1
- min_alignment_fraction: Minimum Alignment Fraction
- Any pairwise genome ANI scores below this threshold will be set to zero
- default: 0.0
- category_name: Compare Genomes By
- Optional genome metadata attribute used to compare genomes
- gene_enrichment: Calculate Gene-Level Enrichment
- Instead of using functional groupings, enrichment scores can be calculated on a per-gene basis
- default: false
- distance: Distance Metric
- Metric used to summarize genome similarity
- default:
euclidean
- linkage: Linkage Method
- Method used to build tree of genomes
- default:
ward
Workflow Repository: github.com/FredHutch/nf-anvio-pangenome
Citations:
- Eren, A.M., Kiefl, E., Shaiber, A. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat Microbiol 6, 3–6 (2021). https://doi.org/10.1038/s41564-020-00834-3