Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Gabriel Valiente
University of Zaragoza
18 May 2012
Abstract
Phylum Streptophyta
Class Streptophytina
Order Solanales
Family Solanaceae
Genus Solanum
Phylum Chordata
Class Mammalia
Order Primates
Family Hominidae
(Schloss, 2009)
Simulating metagenomic samples I
Different metagenomic analysis pipelines produce different results,
and standardized simulated data are essential to their evaluation.
Ti
Ni Mi
FPi TPi
TP TP
P= R=
TP + FP TP + FN
Assigning sequence reads V
Given a reference taxonomy T , a set R of sequence reads, and a
threshold value k of sequence similarity,
• Let Tij be the subtree of T rooted at the jth node of Ti
• Let Mij be the leaves of Tij matching Ri with up to k
mismatches
• Let Nij be the leaves of Tij not matching Ri with up to k
mismatches
For the ith read and the jth node of Ti , the leaves of Ti can be
partitioned in the following four subsets:
• TP ij = Mij (true positives)
• FP ij = Nij (false positives)
• TN ij = Ni \ Nij (true negatives)
• FN ij = Mi \ Mij (false negatives)
Assigning sequence reads VI
Ti
Tij
Ni Nij Mij Mi
Phylum Crenarchaeota
Class Thermoprotei
Order
Family
Genus
Species
References I
D. Alonso-Alemany, J. C. Clemente, J. Jansson, and G. Valiente.
Taxonomic assignment in metagenomics with TANGO.
EMBnet.journal, 17(2):46–50, 2011.
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J.
Lipman. Basic Local Alignment Search Tool. J. Mol. Biol., 215
(3):403–410, 1990.
K. E. Ashelford, N. A. Chuzhanova, J. C. Fry, A. J. Jones, and
A. J. Weightman. At least 1 in 20 16S rRNA sequence records
currently held in public repositories is estimated to contain
substantial anomalies. Appl. Environ. Microbiol., 71(12):
7724–7736, 2005.
J. G. Caporaso, J. Kuczynski, J. Stombaugh, et al. Qiime allows
analysis of high-throughput community sequencing data. Nat.
Methods, 7(5):335–6, 2010.
References II
J. C. Clemente, J. Jansson, and G. Valiente. Accurate taxonomic
assignment of short pyrosequencing reads. In Proc. 15th Pacific
Symp. Biocomputing, volume 15, pages 3–9. World Scientific,
2010.
J. C. Clemente, J. Jansson, and G. Valiente. Flexible taxonomic
assignment of ambiguous sequencing reads. BMC
Bioinformatics, 12:8, 2011.
J. R. Cole, Q. Wang, E. Cardenas, J. Fish, B. Chai, R. J. Farris,
A. S. Kulam-Syed-Mohideen, D. M. McGarrell, T. Marsh, G. M.
Garrity, and J. M. Tiedje. The Ribosomal Database Project:
Improved alignments and new tools for rRNA analysis. Nucleic
Acids Res., 37(D):141–145, 2009.
J. Fischer and D. H. Huson. New common ancestor problems in
trees and directed acyclic graphs. Inform. Process. Lett., 110
(8–9):331–335, 2010.
References III
D. H. Huson, A. F. Auch, J. Qi, and S. C. Schuster. MEGAN
analysis of metagenomic data. Genome Res., 17(3):377–386,
2007.
National Research Council. The New Science of Metagenomics:
Revealing the Secrets of Our Microbial Planet. The National
Academic Press, Washington, DC, 2007.
P. Ribeca and G. Valiente. Computational challenges of sequence
classification in microbiomic data. Brief. Bioinform., 12(6):
614–625, 2011.
D. C. Richter, F. Ott, A. F. Auch, R. Schmid, and D. H. Huson.
MetaSim: A sequencing simulator for genomics and
metagenomics. PLoS ONE, 3(10):e3373, 2008.
P. D. Schloss. A high-throughput dna sequence aligner for
microbial ecology studies. PLoS ONE, 4(12):e8230, 2009.
References IV
P. D. Schloss, S. L. Westcott, T. Ryabin, et al. Introducing
mothur: Open-source, platform-independent,
community-supported software for describing and comparing
microbial communities. Appl. Environ. Microbiol., 75(23):
7537–7541, 2009.
Y. Van de Peer, P. D. Rijk, J. Wuyts, T. Winkelmans, and R. D.
Wachter. The european small subunit ribosomal RNA database.
Nucleic Acids Res., 28(1):175–176, 2000.