Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract Methods
Metagenomics has facilitated the sequencing of viral Viral genomes are identified & annotated by virMine via the following steps:
communities inhabiting niches from across the globe.
STEP 1: Assembly STEP 3: Open Reading Frame (ORF) Prediction
Genomes of novel viral species - particularly those
Sickle1 for raw read quality control GLIMMER5 scripts have been modified
in high abundance - have been successfully
SPAdes2, metaSPADes3, or to better predict viral coding regions
excavated directly from the metagenomes of complex
MEGAHIT4 for sequence assembly
communities. Discovery of such viral genomes often
STEP 4: Classify contigs by ORFs
relies heavily on manual curation. Prior studies have
STEP 2: Filter contigs BLASTx or BLASTp against viral & non-
employed a variety of different criteria when sifting
Options include size, coverage, viral data sets
through sequence data.
presence of genes or sequences of BLAST scores used to categorize
To provide an automated and comprehensive means of viral genome interest (e.g., CRISPR spacers) contigs as viral, bacterial or unknown
discovery, we developed the tool virMine. Key features of this tool are its ease
of use and flexibility, allowing researchers to select from a variety of filters for
their search. In addition to benchmark tests with synthetic communities of OUTPUT: Contigs predicted to be
sequences, VirMine was used to examine viral metagenomic data sets from viral or unknown (putative novel
several studies of: (1) freshwater viromes, (2) urinary viromes, and (3) gut taxa) are written to file, as are their
viromes, identifying several novel viral genomes. The virMine tool provides a ORF predictions and BLAST results.
robust and expedient means for viral genome sequence discovery from
complex community sequence data. Fig. 1: Overview of virMine pipeline.
Benchmarking
Synthetic data sets were created using a single non-viral sequence (Pseudomonas aeruginosa UW4 (NC_019670.1)) and a single viral
sequence (Pseudomonas phage PB1 (NC_011810.1)) at various concentrations using MetaSim6. While these synthetic data sets were
made both with and without mutation, in many cases the mutated sequence sets were unable to assemble. Contigs were classified using
the set of all annotated RefSeq phage genes and bacterial COGs7 (less those annotated as belonging to the mobilome; code X).
Only one filter was used in our search: contigs were required to have a coverage 3. Fig. 2 shows the results of this analysis. When 50%
or more of the reads were from the PB1 genome, the full PB1 genome was retrieved. As the N50 values for each of the assemblies (written
above the corresponding bars) show, the virMine assembled viral genome exceeds that of PB1s annotated genome (65,764 bps); this is a
residual of the terminal repeats in the PB1 sequence. The contigs classified as unknown were further investigated and found to
corresponded to phage-associated coding regions within the P. aeruginosa genome.
Fig. 2: Benchmark statistics.
All of these data sets are from the study of Santiago-Rodriguez et al.11. VirMine results demonstrate This table represents a subset of the data generated by Qin et al.14 that was examined by
how little is currently known about the urinary microbiota. Contigs identified here often have little to virMine. A disparity can be seen between the number of contigs and blast hits. Further
no sequence similarity to known sequences (the nt database). investigation of the blast hits revealed homologies to characterized prophages.
References: [1] Joshi NA, Fass JN. (2011) (Version 1.33) [Software]. [2] Bankevich et al. (2012) J Comput Biol. 19: 455-77. [3] Nurk et al. (2017) Genome Res. 27: 824-834. [4] Li et al. (2015) Bioinformatics. 31: 1674-6. [5] Delcher et al. (1999) Nucl Acids Res. 27:463641. [6] Richter et al. (2008) PLoS One. 3: e3373. [7] Galperin et al. (2015) Nucleic Acids Res.
43: D261-9. [8] Rihtman et al. (2016) PeerJ. 4: e2055. [9] Sible et al. (2015) Data Brief. 5: 9-12. [10] Skvortsov et al. (2016) PLoS One. 11: e0150361. [11] Santiago-Rodriguez et al. (2015) Front Microbiol. 6: 14. [12] Reyes et al. (2010) Nature 466: 334-8. [13] Dutilh et al. (2014) Nat Commun. 5: 4498. [14] Qin et al. (2010) Nature. 464: 59-65.