Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
net/publication/301335465
CITATIONS READS
26 274
8 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Anita Kloss-Brandstätter on 19 April 2016.
Received February 12, 2016; Revised March 25, 2016; Accepted April 2, 2016
* To whom correspondence should be addressed. Tel: +43 512 9003 70579; Fax: +43 512 9003 73561; Email: sebastian.schoenherr@i-med.ac.at
†
These authors contributed equally to the paper as first authors.
Present address: Dr Sebastian Schönherr, Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Medical
University of Innsbruck, Innsbruck 6020, Austria.
C The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which
permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
journals.permissions@oup.com
2 Nucleic Acids Research, 2016
BAM Hetero-
Files Variants
plasmies
BAM
Files
QC Haplo- Annotated
Report groups Heteroplasmies
Parallel analysis of the BAM file and variant detection mic variants for both profiles into consideration, render-
ing the classification more reliable, and helping tracing the
For homoplasmic as well as heteroplasmic variant detec-
origin of the sample mix-up. Thereby homoplasmic vari-
tion, HadoopBAM (22) is applied to split input BAM files.
ants are shared haplogroup defining polymorphisms on the
For each chunk, mtDNA-Server filters reads with a map-
main branch from the rCRS to the mitochondrial Eve (mt-
ping quality score <20 (Phred score) and a read length <25
MRCA). Subsequently, heteroplasmic variants are divided
(9). BAM reads marked as duplicates are filtered within
into major and minor profiles, which are segregated into dif-
this step. Additionally, mtDNA-Server excludes all reads
ferent branches in case of contamination (see Supplemen-
with an alignment Phred score ≤30 and applies per-BAQ
tary Figure S1).
to all reads by default. For this purpose, the GATK (23)
BAQ implementation has been adapted to work with the
circular nature of mitochondrial genomes. For each passed
read, all bases with a quality Phred score <20 are filtered. Web service
Since mtDNA-Server detects point heteroplasmies, recali- mtDNA-Server uses Cloudgene as the underlying platform
bration and re-alignment did not affect the result quality to build a scalable web service including a graphical user
and has been excluded. Finally, all passed bases for each interface. Cloudgene describes a high-level workflow sys-
site are counted per strand (A, C, G, T, N (unknown base) tem for Apache Hadoop designed as a web application. The
or d (deletion)). For heteroplasmy detection, several fil- API provides methods for the execution and monitoring of
ters and methods are applied: first, mitochondrial hotspots MapReduce jobs and is conceived as an additional layer be-
around 309 and 315 as well as 3107 according to the rCRS tween Hadoop and the web client. mtDNA-Server has been
Data output currently available web services were not capable of an-
alyzing our mix-ups due to their file size (>50 MB), we
The final output generated by mtDNA-Server provides sev-
compared mtDNA-Server with LoFreq (30), a software for
eral downloadable files and interactive reports. This in-
ultra-sensitive variant detection. As Tables 2 and 3 show,
cludes (i) an interactive HTML report summarizing all find-
both LoFreq and mtDNA-Server report heteroplasmies re-
ings, (ii) detected heteroplasmic and homoplasmic sites, (iii)
liable. They show a similar level of specificity and pre-
the HaploGrep input and final classification file and (iv) the
cision, but LoFreq is slightly outperformed by mtDNA-
raw pileup file including base position counts per sample.
Server regarding sensitivity on both NGS devices. Further-
The HTML report itself presents several quality control
more, mtDNA-Server classifies mix-ups into haplogroups
(QC) measures: (I) a check for the selected base-quality