Sei sulla pagina 1di 9

Articles

Evolution and transmission of drug-resistant tuberculosis


in a Russian population
Nicola Casali1, Vladyslav Nikolayevskyy1, Yanina Balabanova1, Simon R Harris2, Olga Ignatyeva3,
Irina Kontsevaya3, Jukka Corander4, Josephine Bryant2, Julian Parkhill2, Sergey Nejentsev5, Rolf D Horstmann6,
Timothy Brown1 & Francis Drobniewski1,7
2014 Nature America, Inc. All rights reserved.

The molecular mechanisms determining the transmissibility and prevalence of drug-resistant tuberculosis in a population
were investigated through whole-genome sequencing of 1,000 prospectively obtained patient isolates from Russia. Two-thirds
belonged to the Beijing lineage, which was dominated by two homogeneous clades. Multidrug-resistant (MDR) genotypes
were found in 48% of isolates overall and in 87% of the major clades. The most common rpoB mutation was associated with
fitness-compensatory mutations in rpoA or rpoC, and a new intragenic compensatory substitution was identified. The proportion
of MDR cases with extensively drug-resistant (XDR) tuberculosis was 16% overall, with 65% of MDR isolates harboring eis
mutations, selected by kanamycin therapy, which may drive the expansion of strains with enhanced virulence. The combination
of drug resistance and compensatory mutations displayed by the major clades confers clinical resistance without compromising
fitness and transmissibility, showing that, in addition to weaknesses in the tuberculosis control program, biological factors drive
the persistence and spread of MDR and XDR tuberculosis in Russia and beyond.

Tuberculosis is the second leading cause of death from an infectious through chromosomal mutation, typically resulting in a fitness cost
disease after HIV. In 2011, there were an estimated 8.7 million new seen as a reduced growth rate in vitro7. Fitness cost generally inversely
cases and 1.4 million deaths from the disease1. The increasing preva- correlates with the frequency of a mutation in clinical isolates8.
lence of drug resistance is a major public health concern that threatens Compensatory mutations mitigating the deleterious effects of
the progress made in controlling drug-sensitive tuberculosis 2. resistance-conferring mutations are also important in determining
Globally, 4% of new cases and 20% of previously treated cases are the transmissibility of specific genotypes9. Studies using molecular
estimated to have MDR tuberculosis, defined as resistance to at least epidemiological clustering rates to assess the transmission cost of
rifampicin and isoniazid, with the highest proportions in Eastern resistance-associated genotypes report varying results10,11, suggesting
Europe and Central Asia1. MDR cases require prolonged, costly treat- that fitness costs may be affected by epistasis: that is, the phenotypic
ment with toxic second-line drugs, and rates of treatment failure and effect of a mutation depends on the presence or absence of other
mortality are high24. XDR strains, which are MDR strains with addi- mutations in the same genome12.
tional resistance to any fluoroquinolone and at least one second-line Whole-genome sequencing offers the power to track the evolu-
injectable agent (kanamycin, amikacin or capreomycin), have now tionary mechanisms that promote the development and transmission
been found in every region of the world1. Globally, the proportion of of drug resistance within pathogen populations with unparalleled
MDR tuberculosis cases with XDR tuberculosis has reached 9%. In resolution. Evidence for adaptive selection in response to antibiotic
the UK, 26 XDR cases were reported between 1995 and 2011; 45% therapy can be found by identifying homoplasies (mutations arising
(9/20) of the patients with known country of birth originated from independently multiple times within a phylogeny) or loci that are sub-
Eastern Europe5. Recent years have seen an ominous accumulation ject to frequent mutation1315. By identifying all changes that occur in
of reports of totally drug-resistant strains, which are not susceptible a genome, it is possible to uncover co-occurring polymorphisms that
to any tested drugs6. contribute to resistance phenotypes or signify epistatic effects12.
Prevalence of drug-resistant tuberculosis is dependent on the The high incidence of MDR and XDR tuberculosis in Samara 3,16,
rate of acquisition of resistance-conferring mutations (acquired resist- a region with over 3 million citizens in Russia, afforded an oppor-
ance) and the rate of transmission of drug-resistant strains (primary tunity to study the emergence and spread of antibiotic resistance
resistance). In Mycobacterium tuberculosis, drug resistance arises within a population. In this prospective study, the largest of its kind

1Public Health England (PHE) National Mycobacterium Reference Laboratory, Clinical TB and HIV Group, Blizard Institute, Queen Mary University of London,
London, UK. 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. 3Samara Oblast Tuberculosis Dispensary, Samara, Russian
Federation. 4Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland. 5Department of Medicine, University of Cambridge, Cambridge,
UK. 6Department of Molecular Medicine, Bernhard Nocht Institute for Tropical Medicine, Hamburg, Germany. 7Department of Infectious Diseases, Imperial College,
London, UK. Correspondence should be addressed to F.D. (f.drobniewski@qmul.ac.uk).

Received 16 September 2013; accepted 2 January 2014; published online 26 January 2014; doi:10.1038/ng.2878

Nature Genetics ADVANCE ONLINE PUBLICATION 


Articles

Figure 1 Coverage of the population of patients with tuberculosis.


(a) The locations of Samara Oblast in Russia (red) and the Baltic
a E b
Li Lv 8
States (Lithuania (Li), Latvia (Lv) and Estonia (E)) are shown. 4
3
(b,c) The number of sequenced patient isolates from each territory 9
1
(green) and city (blue; Samara City (Sm), Togliatti (T) and Syzran (Sz)) 4
4

of Samara Oblast (b) or district of Samara City (c) is shown inside the 13

corresponding circle. The area of each circle reflects coverage of the c T


118
36
25
8

11
region (the number of isolates sequenced relative to the number of 14
Sm 18

tuberculosis cases notified). Sz 349


86 39
6
9 5
24
9 123
yet reported, we used whole-genome sequencing to investigate the 2

molecular mechanisms underlying resistance, fitness compensation 88


76
2 8
3

and positive epistasis that together determine the transmissibility and 22


29
6
8
36
prevalence of drug-resistant strains. Comparative analysis with XDR 11 26
10
isolates from the UK addressed the origin of highly drug-resistant 35

tuberculosis in this low-prevalence country. 2 km 20 km

RESULTS
Population structure to the Beijing lineage, 355 belonged to the Euro-American lineage,
During a 2-year period (20082010), 2,348 M. tuberculosis iso- 2 belonged to the Central Asian (CAS) lineage and a single isolate
lates were prospectively collected from individual patients living belonged to the East AfricanIndian (EAI) lineage, reflecting the
2014 Nature America, Inc. All rights reserved.

in Samara, Russia. The genomes of 1,000 isolates were sequenced. proportions seen in the whole Samaran patient population.
Comparison of patient epidemiological data indicated that this sample In comparison to the disparate Beijing isolates from the UK, the
was representative of the entire population and covered the distribu- majority of Samaran Beijing sequences formed a monophyletic group
tion of patients with tuberculosis across the whole region (Fig. 1 and that we term the East European sublineage18 (Supplementary Fig. 1),
Supplementary Table 1). For comparison, we selected 28 sequences consistent with a single relatively recent expansion of this lineage in
from a study of over 2,000 London-based patients with tuberculo- the region. Bayesian population genetic clustering22 defined the two
sis, originating from 90 different countries, as representatives of the largest clades nested within the Beijing lineageclade A and clade B
global population17,18. We included five phenotypically XDR strains comprising 264 and 119 isolates, respectively (Supplementary
isolated from UK patients in 2011, as well as an isolate from Estonia Fig. 2). Outside of these clades, the majority of the remaining Beijing
representing the clone that dominates the M. tuberculosis population isolates were members of smaller clusters; 60% (387/642) of all Beijing
of this country19. isolates differed from their last common ancestor by 5 or fewer SNPs
Mapping reads for each isolate against the reference sequence H37Rv (Supplementary Fig. 3). The western part of Samara is geographically
identified a total of 32,445 SNPs in nonrepetitive regions of the genome, isolated by the River Volga. Comparison of the isolate population in
including 238 SNPs associated with drug resistance. These variable the west to that in the rest of the region showed a significant reduc-
sites were used to reconstruct a maximum-likelihood phylogeny (Fig. 2 tion in the proportion of clade A isolates (18% versus 28%, P = 0.04;
and Supplementary Data Set 1). Tree topology was consistent with the Supplementary Fig. 4 and Supplementary Table 2).
global phylogenetic structure of M. tuberculosis sensu stricto compris- The Euro-American lineage showed significantly higher nucleotide
ing four main lineages20,21. Of the 1,000 Samaran isolates, 642 belonged diversity ( = 0.0042, s.d. = 0.00018, 95% confidence interval (CI) =
0.00380.0045) than the Beijing lineage ( = 0.0022, s.d. = 0.00039,
95% CI = 0.00150.0030). Fewer Euro-American isolates were mem-
S-type LAM bers of closely related clusters; 39% (137/355) differed from their last
R common ancestor by 5 or fewer SNPs (P < 0.001). Although global
diversity within the Euro-American lineage has not yet been well
Ural
characterized, Homolka et al.23 identified eight SNP-based subline-
ages that were broadly concordant with molecular fingerprintbased
CAS
Euro- classifications. We identified four of these sublineages in Samara
American
Haarlem (Haarlem, LAM, Ural and S-type), whereas 36% of isolates could not
Beijing be classified by this scheme (Fig. 2).
Clade
Short genetic distances between isolates have been used to infer
B the likelihood of direct transmission24. In our data set, four pairs
EAI of patients with sequenced isolates shared households. Within three

Figure 2 Maximum-likelihood phylogeny of 1,035 M. tuberculosis isolates


based on 32,445 variable sites. The four M. tuberculosis lineages
Beijing, CAS, Euro-American and EAIare indicated. The Euro-American
Clade
A
SNP-defined sublineages23 and the major Beijing clades are shaded.
The ancestral node of the Beijing East European sublineage is indicated
with a star. Radial dotted lines show the positions of isolates from the UK;
those with an XDR phenotype are marked with white circles. The Estonian
strain is indicated by a filled blue circle. The position of the reference
sequence H37Rv is marked by R. The East European sublineage, clade
A and clade B had 100% bootstrap support (Supplementary Data Set 1).

 aDVANCE ONLINE PUBLICATION Nature Genetics


Articles

Figure 3 Phylogenetic distribution of resistance-

P R

bB
em A
cA
cA

oC

oB
hA

dB

rB

oA
hA
r
tG

rA
b

hA
sL

j
St

in
em

s
rrs

rrs
conferring and compensatory genotypes.

pn
R
in

et
pn

ei
gy

gy
ka

P
rp

rp

rp

rp
gi

et
R
The phylogeny of 1,000 Russian isolates is
depicted on the left; lineages are colored as
in Figure 2. The first 16 columns depict drug
resistance loci. P followed by a subscript
gene name indicates mutations affecting

Clade A
the promoter region of the gene. For the 16S
rRNA gene, rrsstr refers to the 530 stem loop
and 915 regions involved in streptomycin
resistance49, and rrsinj refers to downstream
regions associated with resistance to
second-line injectable agents36. Colored
bands represent different polymorphisms
and include previously identified and new
mutations described in this study. The last
three columns show nonsynonymous SNPs in
the RNA polymerase genes rpoABC, excluding
those in the RRDR. Genotypes shown are
provided in full in Supplementary Table 4.

of these households, the patient isolates


were almost identical, with zero, two and
2014 Nature America, Inc. All rights reserved.

three SNPs separating each pair, consistent


with intrahousehold transmission or infec-

Clade B
tion from a common source24. Patients in the
fourth household were infected with unre-
lated isolates (183 SNPs different). In addi-
tion to the household pair, 30 further pairs of
isolates, 3 clusters of 3 isolates and 1 cluster
of 5 isolates had no SNP differences within
the cluster. Patients with identical isolates
were resident in the same region of Samara
in only 20 of these 35 clusters. By mapping
patient addresses, we found that patients with
identical strains lived up to 136 km apart
(Supplementary Fig. 5).
XDR tuberculosis isolates from four UK
patients who originated from the Baltic States
were members of the East European subline-
age, with two belonging to clade B and one
belonging to clade A, suggesting that infec-
tions may have been acquired during stays in
their country of origin or within sympatric
communities in the UK (Fig. 2). In support of this conjecture, the of Euro-American isolates encoded 1 of 3 alternative substitutions
fifth XDR isolate, from a Chinese patient, was not a member of in codon 315 (P < 0.001).
the East European sublineage. The Estonian isolate was also a member A total of 70 isolates had mutations in the promoter of the fabG1-
of clade B. Correlation of variable-number tandem repeat (VNTR) inhA operon that confer low-level cross-resistance to isoniazid and its
fingerprint data indicated that this strain is the same one that predom- structural analogs, ethionamide and prothionamide27. As determined
inates across northwestern Russia18,19,25. The East European isolates using the phylogenetic reconstruction, the inhA mutation arose in
were remarkably closely related, with isolates from different countries a katG mutant in 44 of the isolates. It is improbable that isoniazid
separated by as few as 13 SNPs (Supplementary Table 3). therapy would select for mutations conferring low-level resistance
in the presence of a SNP conferring high-level resistance, indicating
Prevalence of MDR genotypes that the majority of these promoter SNPs were acquired in response
A maximum-likelihood phylogeny of the 1,000 Russian isolates was to therapy with ethionamide or prothionamide.
reconstructed and annotated with drug resistance genotypes (Fig. 3 Mutations in the 81-bp rifampicin resistancedetermining
and Supplementary Tables 46). The most commonly mutated drug region (RRDR) of rpoB are accurate predictors of rifampicin
resistance locus, in codon 315 of katG, which confers high-level resist- resistance in M. tuberculosis28. Within this region, we identified
ance to isoniazid26, was substituted in 74% (478/642) of Beijing iso- 20 nonsynonymous SNPs and 2 small deletions, which affected
lates and in 30% (106/355) of Euro-American isolates (P < 0.001; 70% (430/642) of Beijing and 19% (67/355) of Euro-American isolates
Fig. 4 and Supplementary Table 7). A new nonsense SNP in codon (P < 0.001). The most common rifampicin resistance genotype, encod-
668 also mediated resistance, consistent with the requirement of ing a p.Ser450Leu substitution, was found in 90% (390/435) of Beijing
KatG for activation of the pro-drug. All (478) of the Beijing katG isolates with RRDR mutations and in 67% (45/67) of Euro-American
mutants encoded a p.Ser315Thr substitution, whereas 11% (12/106) isolates (P < 0.001).

Nature Genetics ADVANCE ONLINE PUBLICATION 


Articles

Figure 4 Prevalence of drug resistance mutations and association with 1.0


Beijing
lineage. The proportion of isolates harboring polymorphisms at each drug Euro-American
resistance locus was subdivided by lineage. Asterisks indicate significant
0.8
differences between lineages (P < 0.05; Supplementary Table 7). Data are *

Proportion of isolates
based on the polymorphisms detailed in Supplementary Table 4. *
*
0.6

* *
On the basis of these genotypes, 66% (422/642) of Beijing and 17% 0.4 * *
(61/355) of Euro-American isolates have a predicted MDR phenotype *
*
(P < 0.001). The proportion of isolates that were MDR in clade A and
0.2
clade B combined was significantly higher than for the rest of the *
* *
Beijing lineage (332/383 versus 90/259, P < 0.001). * * *
0
*

hA

cA

em A

hA
tG

oB

P A

bB

sL

dB

rA
rB

hA
in
st

ei
rrs
b

rrs

P
c
in

pn

et
em
Compensatory mutations in MDR isolates

gy
ka

rp

gy
P

P
pn

et
rp

gi
P
We investigated the occurrence of compensatory mutations in rpoA
and rpoC18 in isolates carrying rifampicin resistance mutations. We emerging significantly more frequently in this genetic background
identified 14 different nonsynonymous SNPs in rpoA, 11 of which were (16/435 versus 4/565, P = 0.001). Mutations affecting the most com-
found in isolates containing the rpoB mutation encoding p.Ser450Leu monly substituted residue, Thr187, were acquired independently
(equivalent to the p.Ser531Leu substitution in Escherichia coli), at least seven times, providing strong evidence that rpoA is subject
2014 Nature America, Inc. All rights reserved.

RpoA
1 100 200 300

G31S
G31A K177M D190G L304R S307L
T181A T187P
V183G T187A
E184D

H445N
Q432P H445C
S431G M434V H445D
L430P D435Y H445L S450P
L430R D435V H445Y S450L
S428R D435G H445Q S450W L452P
H445R G456S

RRDR

RpoB
1 100 200 300 400 500 600 700 800 900 1,000 1,100

E82G T399A D571A E761D


L42F P45S P479T F503S H723Y L731P R827C H835P
I488V V496M H835R
I491V V496L
G433S
G433C W484G
P434Q V483A D485N G519D A521D
V431M P434R L449V V483G D485Y V517L H525Q
S428A K445R F452C I491V L516P L527V
I491T

RpoC
1 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 1,300

G332S N416T R572H P678R A734V R770H V1252L


G332R N416S N698H N826K Y849C D943G G945V P1040T I1046M
G332C N698S D747A Q761R I832V L847R P1040S
N698K S838C P1040R

Figure 5 Distribution of rifampicin resistance and compensatory amino acid substitutions in the RNA polymerase subunits RpoA, RpoB and RpoC.
Resistance-conferring mutations are clustered in the RRDR region of rpoB. All other substitutions depicted are putative compensatory alterations
that co-occurred with the p.Ser450Leu alteration in RpoB.

 aDVANCE ONLINE PUBLICATION Nature Genetics


Articles

to positive selective pressure. Clustering of SNP sites within three were significantly more common in isolates belonging to clade A or
small regions suggests that the encoded residues are important for clade B compared to the rest of the Beijing lineage (275/332 versus
interaction with the rifampicin-binding pocket (Fig. 5). Eighteen of 21/90, P < 0.001). Mutations in the drug target rrs, which confer
the 58 nonsynonymous SNPs identified in rpoC were homoplasic. resistance to amikacin and capreomycin, as well as to kanamycin36,
The nonsynonymous SNPs in rpoC were significantly more likely to were found in only 40 isolates, 4 of which had a preexisting SNP in
arise in isolates harboring an rpoB mutation encoding p.Ser450Leu the eis promoter.
than in those with wild-type RRDR or other resistance-conferring Fluoroquinolones target the DNA gyrases GyrA and GyrB,
mutations (76/435 versus 11/565, P < 0.001; Supplementary Table 8). and resistance is conferred by mutations affecting the quinolone
In total, 36 of the 59 amino acid substitutions in RpoA or RpoC found resistancedetermining regions (QRDRs) that interact with the
to be associated with the p.Ser450Leu alteration in RpoB had not drugs37. Eighty-six isolates harbored mutations affecting the QRDR
previously been reported18,29. of GyrA, and 11 Beijing isolates had SNPs affecting the GyrB QRDR.
Overall, 47% (170/390) of isolates harboring the mutation encoding Substitutions in gyrAB arose relatively more often in isolates with rrs
p.Ser450Leu had a putative compensatory mutation in rpoA or rpoC. mutations than in those with eis promoter mutations (15/40 versus
However, such compensatory mutations were remarkably infrequent 62/317, P = 0.009; Supplementary Fig. 7).
in clade A (Fig. 3 and Supplementary Fig. 6); 89% (150/169) of non In addition to isolates with fluoroquinolone resistance genotypes,
clade A Beijing isolates harboring p.Ser450Leu had a nonsynonymous we noted a significant number of isolates with ambiguous base calls
SNP in rpoA or rpoC compared to 9% (20/221) of clade A isolates within the sequences encoding QRDRs (Supplementary Table 9).
(P < 0.001). This difference in frequency was unexpected given the This phenomenon was almost never observed at other drug resistance
obvious epidemiological success of this clade. Thus, we predicted that loci. For 18 isolates with ambiguous gyrA genotypes, ambiguity was
the large distal cluster of clade A isolates carrying an rpoB mutation often apparent at more than one of the codons (codons 90, 91 and 94)
2014 Nature America, Inc. All rights reserved.

encoding p.Ser450Leu harbored alternative compensatory mutations that most commonly confer resistance (Supplementary Fig. 8). In
that restored fitness. By inspection of the phylogeny, we deduced the inspecting raw sequencing reads mapping to this region, we did
branch on which this putative mutation likely occurred; one of the not find multiple substitutions that were present on a single read,
four SNPs on this branch resulted in a p.Glu761Asp substitution in indicating that multiple fluoroquinolone-resistant clones coexisted
RpoB (Supplementary Fig. 6). Intragenic compensatory mutations in in a single patient. Fluoroquinolone treatment of non-tuberculosis
rpoB harboring resistance-conferring mutations have been observed infections could drive the acquisition of fluoroquinolone resistance in
in experimentally evolved E. coli30, Pseudomonas aeruginosa31 and patients chronically infected with tuberculosis38. If this were the case,
Salmonella enterica32. We surmise that the p.Glu761Asp substitu- we would expect to see resistance in non-MDR isolates; however, we
tion provides an analogous fitness benefit in M. tuberculosis strains found no resistant or heterogeneous QRDR genotypes in non-MDR
carrying the rpoB mutation encoding p.Ser450Leu. isolates, indicating that fluoroquinolone exposure occurred as part
In addition to the p.Glu761Asp substitution and excluding RRDR of tuberculosis therapy.
polymorphisms, a further 26 nonsynonymous SNPs were identi-
fied in rpoB. Sixteen of these co-occurred with a mutation encoding Adaptive selection at other drug resistance loci
p.Ser450Leu (Fig. 5) and were significantly more likely to be found Repeated independent acquisition of SNPs, identified by phy-
in isolates without alternative compensatory mutations (24/37 versus logenetic homoplasy, provides strong evidence of selection 13
2/396, P < 0.001). Multiple substitution events in codons 496 and (Supplementary Table 10). Farhat et al.14 recently identified 22
835 provide confirmation that regions of rpoB other than the RRDR new genomic regions that were targets of positive selection in
are under selective pressure. Including all putative compensatory drug-resistant strains (excluding repetitive regions), including 4
SNPs, 97% (421/435) of isolates carrying an rpoB mutation encoding that harbored homoplasies in our data set. Using a complementary
p.Ser450Leu harbored additional rpoABC mutations, which did not approach, Zhang et al.15 identified 98 new regions that were enriched
appear in any isolates with wild-type RRDR and may mitigate the for SNPs in drug-resistant versus drug-sensitive strains, including 11
deleterious effect of this resistance-conferring mutation. in which we identified homoplasies.
Sherman et al.33 proposed that loss of catalase-peroxidase func- embB harbored homoplasic mutations in codons 306, 406 and 497
tion in isoniazid-resistant katG mutants was compensated by that are commonly associated with ethambutol resistance, although
upregulation of alkyl hydroperoxidase, ahpC. We identified four discordance with susceptibility testing is reported 39. We identified
polymorphic sites within the ahpC regulatory region, including two homoplasic mutations in five additional embB codons (Table 1).
homoplasies. Of the nine isolates harboring ahpC SNPs, four car- Surprisingly, the most frequent homoplasic embB substitution was
ried a katG mutation encoding p.Ser315Thr, four had a rare katG p.Asp354Ala, which affected a large cluster of clade A isolates as
mutation (encoding p.Ser315Gly or p.Trp668*) and one had wild- well as two unrelated isolates. This unusual alteration was associated
type katG. The ahpC SNPs were significantly more likely to arise in with phenotypic ethambutol resistance in 50% (83/166) of the isolates
isolates with unusual katG mutations (2/12 versus 3/572, P = 0.004), tested. Multiple acquisitions of SNPs in the region upstream of embAB
supporting the theory that the p.Ser315Thr alteration has low or no provide evidence that operon upregulation also confers resistance40.
fitness cost34. Promoter and coding SNPs frequently co-occurred, and isolates with
two mutations were more often phenotypically resistant (27/28 versus
Prevalence of XDR genotypes 199/375, P < 0.001; Supplementary Table 5), offering an explanation
We ascertained that 17% (71/422) of Beijing MDR and 7% (4/61) for the poor concordance between embB codon 306 mutations and
of Euro-American MDR isolates had genotypes predicting an XDR phenotypic resistance39.
phenotype (P = 0.046). The pyrazinamide resistance gene pncA41 was the most variable
Mutations in the eis promoter that confer kanamycin resistance35 gene in the genome (Table 2 and Supplementary Fig. 9). In addition,
were found in 66% (317/483) of all MDR isolates. Eight different its promoter harbored five different mutations. gidB was the
sites were polymorphic, including five homoplasies. eis mutations second most polymorphic gene, indicating that it is a target of selective

Nature Genetics ADVANCE ONLINE PUBLICATION 


Articles

Table 1Ethambutol resistance mutations


Number of Number of
Locus Mutation Substitution isolates acquisitionsb Additional alterationsc
PembAB c.16C>T, c.16C>A, c.16C>G 15,7,1 9, 3, 1 p.Met306Ile (4), p.Asp354Ala (9)
c.15C>G 1 1
c.12C>T 9 8 p.Met306Val (1), p.Asp354Ala (2), p.Gln497Arg (1)
c.11C>A 1 1
c.8C>A 7 1 p.Asp354Ala (7)
embB c.916A>G, c.916A>C p.Met306Vala, p.Met306Leua 114, 4 29, 2 c.12C>T (1), p.Asp354Ala (2)
c.918G>A, c.918G>C, c.918G>T p.Met306Ilea 23, 18, 11 14, 9, 2 p.Gln497Arg (4), c.16C>T (3), c.16C>A (1)
c.956A>C, c.956A>G p.Tyr319Ser, p.Tyr319Cys 3, 3 1, 2
c.1061A>C p.Asp354Ala 213 3 c.8C>A (7), c.12C>T (2), c.16C>T (3),
c.16C>A (5), c.16C>G (1), p.Met306Val (2)
c.1133A>C p.Glu378Ala 2 2
c.1217G>A, c.1217G>C p.Gly406Aspa, p.Gly406Ala 16, 16 6, 4
c.1489C>A p.Gln497Lys 9 3
c.1490A>G p.Gln497Arga 27 11 c.12C>T (1), p.Met306Ile (4)
c.3005A>G p.His1002Arg 3 3
c.3070G>A p.Asp1024Asn 3 2
aHigh-confidence mutation in TB Drug Resistance Mutation Database (see URLs). bNumber of times the mutation independently arose. cThe number of isolates with the additional
2014 Nature America, Inc. All rights reserved.

alteration is given in parentheses.

pressure and supporting its proposed role in streptomycin resistance42. By inferring the order of SNP acquisition events from the phylogenetic
Discounting the common sublineage-defining SNPs43, gidB muta- tree, we determined that, for 97% (481/495) of isolates with
tions were relatively more common in Euro-American than in RRDR SNPs, a katG mutation affecting Ser315 occurred before
Beijing isolates (47/355 versus 16/642, P < 0.001) and were less or on the same branch as the RRDR SNP. Hence, phylogenetic
likely to be concomitant with rpsL or rrs mutations (1/47 versus clustering of rpoB SNPs is essentially a surrogate for the clustering
13/16, P < 0.001). This skewed distribution suggests that gidB of MDR genotypes.
mutations have a lineage-specific effect. ethA, which catalyzes The most frequently acquired resistance genotype was for pyrazi-
activation of the pro-drug ethionamide 44, was also among the namide; the modal number of isolates harboring each pncA coding
most highly variable genes in the population. A homoplasic SNP or promoter polymorphism was 1 (65 of 106 clusters), and only 1
7 bp upstream of ethA was independently acquired at least twice, mutation was shared by a phylogenetic cluster of more than 7 isolates.
both times in clade A. Positive selection of this SNP suggests that The common pncA nonsynonymous SNP, encoding the conservative
it functions clinically in resistance, although only 33% (56/172) of substitution p.Ile6Leu, was found in 157 clade A isolates. This SNP
the promoter mutants tested were phenotypically resistant. In total, was not associated with phenotypic pyrazinamide resistance in vitro
84% (409/483) of all MDR isolates carried mutations in ethA or its (Supplementary Table 5); however, no secondary pncA mutations
upstream region. were identified in these isolates.
SNPs in gyrA were significantly less likely to be found in clusters
Transmissibility of drug resistance (P = 0.006; Supplementary Table 11); the largest cluster contained
Previous studies relied upon fingerprint clustering to estimate 5 isolates, and 59% (52/88) of resistant isolates did not cluster. In
the transmission dynamics of drug resistance genotypes10,11. contrast, eis promoter SNPs were typically found in large clusters, the
Employing a similar principle but applying the improved resolution most notable of which was a cluster of 207 clade A isolates.
of whole-genome sequencing, we investigated transmissibility
by using the phylogeny to estimate the number of isolates that
independently acquired a SNP versus the number of isolates that Cluster size
inherited that SNP from an inferred common ancestor (indicating 600 >20
primary resistance). 500
620
25
SNPs conferring resistance to rifampicin, isoniazid, streptomycin 1
Number of isolates

and ethambutol were significantly more likely to be found in phyloge- 400

netic clusters than not (P < 0.05; Fig. 6 and Supplementary Table 11). 300

200
Table 2 Mutations in highly polymorphic genes implicated in drug
resistance 100

Gene Nonsynonymous Nonsense Large 0


Locus Drug resistance length (bp) SNPs SNPs Indelsa deletions
hA

cA

bA

hA
tG

oB

cA

bB

sL

dB

rA
rB

hA
in
st

ei
rrs

P
rrs
in

pn

et
em
P

gy

P
ka

rp

gy
pn

et
P
rp

em

gi
P

pncA Pyrazinamide 561 79 1 16 5


gidB Streptomycin 675 28 2 9 1 Figure 6 Transmissibility of drug resistance genotypes. The number of
ethA Ethionamide 1,470 33 6 19 2 isolates within clusters sharing a genotypic marker was estimated by
Mutation frequency was determined by calculating the number of nonsynonymous maximum-likelihood placement of the polymorphisms on the phylogeny.
SNPs per gene, adjusted for gene length. A cluster size of one suggests acquired resistance, whereas larger clusters
aAll indels resulted in frameshifts. are indicative of primary transmitted resistance.

 aDVANCE ONLINE PUBLICATION Nature Genetics


Articles

DISCUSSION EthR repressor48. Typically, the most common resistance-conferring


In the largest bacterial whole-genome sequencing project reported so SNPs are associated with the least fitness cost8, implying that suc-
far, we provide a region-wide snapshot of the M. tuberculosis population cessful clones would carry these SNPs. Thus, the discovery of this
in Samara, Russia. Circulating strains belonged mainly to two lineages: new ethA promoter SNP and the rare embB nonsynonymous SNP
Beijing and Euro-American. In concordance with other Russia-based encoding p.Asp354Ala in clade A was unexpected.
studies, the Beijing lineage was dominant and accounted for two-thirds The majority of clade A isolates encoded a conservative p.Ile6Leu
of isolates16,25. Relative to isolates from the Beijing lineage, Euro- substitution in the pyrazinamide resistance gene pncA, which does
American isolates were phylogenetically diverse, and tree topology sup- not confer resistance in vitro and is the only nonsynonymous SNP
ports the division of this lineage into multiple sublineages23. Samaran in pncA found in a large cluster. The deduced frequency of acquired
Beijing isolates were essentially monophyletic with respect to isolates pyrazinamide resistance in the sequenced population, together with
representing the global population, forming a group we term the East the small cluster size associated with primary resistance, suggests
European sublineage, which was dominated by two clades of extremely that mutants with non-functional PncA have impaired transmission
limited diversity. Short genetic distance between tuberculosis isolates efficiency. Given the prevalence of pyrazinamide resistance in the
has been used to infer transmission links24. In this population, we found population, it is difficult to reconcile the epidemiological success of
large geographic distances within even identical clusters, suggesting clade A with a pyrazinamide-sensitive phenotype. We speculate that
that short SNP distances did not always reflect transmission events or the p.Ile6Leu substitution results in intermediate PncA activity that
that the importance of casual contact may be underestimated. manifests as pyrazinamide sensitivity in vitro but as clinical resistance
Genotypes conferring drug resistance were extremely common; in vivo and allows retention of sufficient nicotinamidase activity for
48% of isolates had an MDR genotype, and 16% of these were XDR. efficient transmission.
In comparison to a small microbiologically based study conducted in We propose that the unusual combination of resistance-conferring
2014 Nature America, Inc. All rights reserved.

Samara in 2001 (ref. 45), the proportion of MDR isolates resistant to and compensatory mutations acquired by clade A comprise a perfect
amikacin has remained relatively stable (7.2% versus 8.5%), whereas storm, providing clinical drug resistance without compromising
the frequency of fluoroquinolone resistance has risen substantially fitness and transmissibility.
from 4.3% to 23.8%. However, fluoroquinolone resistance was sig- Preventing tuberculosis transmission relies on accurate and rapid
nificantly more likely to be acquired than other resistance genotypes, diagnosis. Molecular methods that reduce the time taken to detect
indicating that gyrAB mutants may have impaired transmission fit- drug-resistant tuberculosis strains expedite the institution of effective
ness, impeding the spread of XDR clones. Current rates of resistance therapy and efficient infection control measures, thus minimizing
support the continued use of fluoroquinolones together with amikacin the infectious period. Whole-genome sequencing offers the potential
or capreomycin, rather than kanamycin, for MDR therapy. Whereas for rapid, unambiguous determination of all existing clinically
a fluoroquinolone in combination with prothionamide was highly relevant drug resistance mutations, and decreasing costs have
effective in the treatment of MDR strains in Lithuania, which has a neutralized a major argument over the value of targeted sequencing
comparable MDR tuberculosis problem and a similar historical treat- relative to whole-genome sequencing. We have reported the existence
ment strategy4, the frequency of ethA mutations in Russian isolates of multiple mutations in some resistance loci, as well as lineage- and
means that thioamides may be ineffective in this population. clade-specific association of certain mutations that are suggestive
Drug resistance has previously been associated with the Beijing of undiscovered epistatic interactions. These observations indicate
lineage16. Here we provide evidence that Beijing isolates are more that drug resistance may be more multifactorial than previously
likely to harbor the isoniazid resistance genotype, conferred by a appreciated, which may in some cases explain discordance between
katG mutation encoding p.Ser315Thr, that has a negligible fitness phenotypes and genotypes. As more resistance-conferring loci are
cost34, and the rifampicin resistance genotype, conferred by an rpoB identified and the phenotypic effects of multiple mutations and strain
mutation encoding p.Ser450Leu, that we find strongly associated background are elucidated, the public health value of routine whole-
with putative compensatory mutations within the RNA polymerase genome sequencing for the diagnosis of drug resistance will increase,
genes. Other studies have shown that rpoC mutations restored fitness although it may vary depending on prevalence and likely exposure
in competitive growth assays29 and were significantly more common to resistant strains.
in isolates belonging to a fingerprint cluster than in non-clustered We have reported, as have others, the extensive prevalence
isolates9. These observations explain, at least in part, the predomi- of MDR Beijing isolates in Russia and the former Soviet states of
nance of Beijing MDR isolates. The widespread use of kanamycin in Eastern Europe1,4,16, which shared a common treatment and Bacille
MDR tuberculosis therapy may further exacerbate the spread of MDR CalmetteGurin (BCG) vaccination strategy. In addition to program-
strains by selecting strains with eis mutations that both confer resist- matic and clinical weaknesses, we have identified plausible biological
ance35 and increase bacterial multiplication in host macrophages46 by mechanisms contributing to the devastating MDR tuberculosis situ-
disrupting the protective immune response47. Notably, eis mutations ation in the region. The current dominance of clade B across Eastern
were significantly associated with the dominant Beijing clades. Europe19,25 and the isolation of both clade A and clade B strains in
The epidemiological success of clade A strains was particularly the UK indicate that the situation throughout the European Union
striking, and the comb-like structure of the tree, apparent in the dis- could follow that in the East.
tal portion of the clade, suggests a highly infectious clone. The incom-
plete dispersion of clade A to western Samara supports its recent and URLs. TB Drug Resistance Mutation Database, http://www.
rapid spread. From this success, we deduced the presence of unidenti- tbdreamdb.com/; SMALT, http://www.sanger.ac.uk/resources/
fied fitness-enhancing mutations, which led to the discovery of new software/smalt; FigTree, http://tree.bio.ed.ac.uk/software/figtree.
compensatory mutations in rpoB associated with rifampicin resist-
ance. In addition, we surmised that a new SNP upstream of ethA likely Methods
confers low-level clinical resistance to ethionamide, not detected Methods and any associated references are available in the online
in vitro, through promoter disruption or enhanced binding of the version of the paper.

Nature Genetics ADVANCE ONLINE PUBLICATION 


Articles

Accession codes. Raw sequence data have been submitted to the 19. Krner, A. et al. Spread of drug-resistant pulmonary tuberculosis in Estonia.
J. Clin. Microbiol. 39, 33393345 (2001).
European Nucleotide Archive under accession ERP000192. 20. Baker, L., Brown, T., Maiden, M.C. & Drobniewski, F. Silent nucleotide polymorphisms
and a phylogeny for Mycobacterium tuberculosis. Emerg. Infect. Dis. 10,
Note: Any Supplementary Information and Source Data files are available in the 15681577 (2004).
online version of the paper. 21. Gagneux, S. et al. Variable host-pathogen compatibility in Mycobacterium
tuberculosis. Proc. Natl. Acad. Sci. USA 103, 28692873 (2006).
Acknowledgments 22. Cheng, L., Connor, T.R., Sirn, J., Aanensen, D.M. & Corander, J. Hierarchical and
We are grateful to members of the Public Health England National Mycobacterium spatially explicit clustering of DNA sequences with BAPS software. Mol. Biol. Evol.
Reference Laboratory and the Samara Regional Tuberculosis Laboratory for 30, 12241228 (2013).
bacteriological work, particularly M. Stone, X. Gonzalo and A. Broda. We would 23. Homolka, S. et al. High resolution discrimination of clinical Mycobacterium
tuberculosis complex strains based on single nucleotide polymorphisms. PLoS ONE
also like to thank the Samara Tuberculosis Service, particularly I. Fedorin, as well
7, e39855 (2012).
as V. Kulichenko. We thank R. Hooper for expert statistical advice, S. Hoffner 24. Walker, T.M. et al. Whole-genome sequencing to delineate Mycobacterium
for bacteriological advice and S. Bentley for sequencing advice. This study was tuberculosis outbreaks: a retrospective observational study. Lancet Infect. Dis. 13,
supported by European Union Framework Programme 7 (grant 201483; 137146 (2013).
TB-EUROGEN), with sequencing funded by the Wellcome Trust (grant 098051) 25. Mokrousov, I. et al. Mycobacterium tuberculosis population in Northwestern Russia:
and EUROGEN. S.N. is a Wellcome Trust Senior Research Fellow in Basic an update from Russian-EU/Latvian border region. PLoS ONE 7, e41318 (2012).
Biomedical Science (095198/Z/10/Z) and is also supported by European Research 26. Zhang, Y., Heym, B., Allen, B., Young, D. & Cole, S. The catalase-peroxidase gene and
Council Starting Grant 260477. isoniazid resistance of Mycobacterium tuberculosis. Nature 358, 591593 (1992).
27. Banerjee, A. et al. inhA, a gene encoding a target for isoniazid and ethionamide
in Mycobacterium tuberculosis. Science 263, 227230 (1994).
AUTHOR CONTRIBUTIONS 28. Telenti, A. et al. Detection of rifampicin-resistance mutations in Mycobacterium
N.C., V.N., Y.B. and F.D. designed the study. O.I., I.K., V.N. and Y.B. recruited tuberculosis. Lancet 341, 647650 (1993).
patients and collected epidemiological data. O.I. and I.K. performed laboratory 29. Comas, I. et al. Whole-genome sequencing of rifampicin-resistant Mycobacterium
work. N.C., S.R.H. and J.C. conducted sequence analysis. N.C., T.B., V.N. and F.D. tuberculosis strains identifies compensatory mutations in RNA polymerase genes.
interpreted the data. Y.B., O.I. and F.D. performed statistical comparisons. N.C. Nat. Genet. 44, 106110 (2012).
2014 Nature America, Inc. All rights reserved.

and F.D. drafted the manuscript. T.B., V.N., Y.B., O.I., I.K., S.R.H., J.P., J.B., S.N. 30. Reynolds, M.G. Compensatory evolution in rifampin-resistant Escherichia coli.
and R.D.H. provided critical analysis and reviewed the manuscript. All authors Genetics 156, 14711481 (2000).
31. Hall, A.R., Griffiths, V.F., MacLean, R.C. & Colegrave, N. Mutational neighbourhood
approved the final draft.
and mutation supply rate constrain adaptation in Pseudomonas aeruginosa.
Proc. R. Soc. B Biol. Sci. 277, 643650 (2010).
COMPETING FINANCIAL INTERESTS 32. Brandis, G., Wrande, M., Liljas, L. & Hughes, D. Fitness-compensatory mutations
The authors declare no competing financial interests. in rifampicin-resistant RNA polymerase. Mol. Microbiol. 85, 142151 (2012).
33. Sherman, D.R. et al. Compensatory ahpC gene expression in isoniazid-resistant
Reprints and permissions information is available online at http://www.nature.com/ Mycobacterium tuberculosis. Science 272, 16411643 (1996).
reprints/index.html. 34. Pym, A.S., Saint-Joanis, B. & Cole, S.T. Effect of katG mutations on the virulence
of Mycobacterium tuberculosis and the implication for transmission in humans.
Infect. Immun. 70, 49554960 (2002).
1. World Health Organization. Global Tuberculosis Report (World Health Organization, 35. Zaunbrecher, M.A., Sikes, R.D., Metchock, B., Shinnick, T.M. & Posey, J.E.
Geneva, 2012). Overexpression of the chromosomally encoded aminoglycoside acetyltransferase eis
2. Gandhi, N.R. et al. Multidrug-resistant and extensively drug-resistant tuberculosis: confers kanamycin resistance in Mycobacterium tuberculosis. Proc. Natl. Acad. Sci.
a threat to global control of tuberculosis. Lancet 375, 18301843 (2010). USA 106, 2000420009 (2009).
3. Balabanova, Y. et al. Survival of civilian and prisoner drug-sensitive, multi- and 36. Maus, C.E., Plikaytis, B.B. & Shinnick, T.M. Molecular analysis of cross-resistance
extensive drug-resistant tuberculosis cohorts prospectively followed in Russia. PLoS to capreomycin, kanamycin, amikacin, and viomycin in Mycobacterium tuberculosis.
ONE 6, e20531 (2011). Antimicrob. Agents Chemother. 49, 31923197 (2005).
4. Balabanova, Y. et al. Survival of drug resistant tuberculosis patients in Lithuania: 37. Maruri, F. et al. A systematic review of gyrase mutations associated with
retrospective national cohort study. BMJ Open 1, e000351 (2011). fluoroquinolone-resistant Mycobacterium tuberculosis and a proposed gyrase
5. Health Protection Agency. Tuberculosis in the UK: 2012 Report (Health Protection numbering system. J. Antimicrob. Chemother. 67, 819831 (2012).
Agency, London, 2012). 38. Ginsburg, A.S., Grosset, J.H. & Bishai, W.R. Fluoroquinolones, tuberculosis, and
6. Udwadia, Z.F. MDR, XDR, TDR tuberculosis: ominous progression. Thorax 67, resistance. Lancet Infect. Dis. 3, 432442 (2003).
286288 (2012). 39. Shen, X. et al. Association between embB codon 306 mutations and drug resistance
7. Andersson, D.I. & Hughes, D. Antibiotic resistance and its cost: is it possible to in Mycobacterium tuberculosis. Antimicrob. Agents Chemother. 51, 26182620
reverse resistance? Nat. Rev. Microbiol. 8, 260271 (2010). (2007).
8. Bttger, E.C. & Springer, B. Tuberculosis: drug resistance, fitness, and strategies 40. Ramaswamy, S.V. et al. Molecular genetic analysis of nucleotide polymorphisms
for global control. Eur. J. Pediatr. 167, 141148 (2008). associated with ethambutol resistance in human isolates of Mycobacterium
9. de Vos, M. et al. Putative compensatory mutations in the rpoC gene of rifampicin- tuberculosis. Antimicrob. Agents Chemother. 44, 326336 (2000).
resistant Mycobacterium tuberculosis are associated with ongoing transmission. 41. Scorpio, A. & Zhang, Y. Mutations in pncA, a gene encoding pyrazinamidase/
Antimicrob. Agents Chemother. 57, 827832 (2013). nicotinamidase, cause resistance to the antituberculous drug pyrazinamide in
10. Dye, C., Williams, B.G., Espinal, M.A. & Raviglione, M.C. Erasing the worlds slow tubercle bacillus. Nat. Med. 2, 662667 (1996).
stain: strategies to beat multidrug-resistant tuberculosis. Science 295, 20422046 42. Wong, S.Y. et al. Mutations in gidB confer low-level streptomycin resistance in
(2002). Mycobacterium tuberculosis. Antimicrob. Agents Chemother. 55, 25152522
11. Cohen, T., Sommers, B. & Murray, M. The effect of drug resistance on the fitness (2011).
of Mycobacterium tuberculosis. Lancet Infect. Dis. 3, 1321 (2003). 43. Spies, F.S. et al. Streptomycin resistance and lineage-specific polymorphisms
12. Borrell, S. & Gagneux, S. Strain diversity, epistasis and the evolution of drug in Mycobacterium tuberculosis gidB gene. J. Clin. Microbiol. 49, 26252630 (2011).
resistance in Mycobacterium tuberculosis. Clin. Microbiol. Infect. 17, 815820 44. Morlock, G.P., Metchock, B., Sikes, D., Crawford, J.T. & Cooksey, R.C. ethA, inhA,
(2011). and katG loci of ethionamide-resistant clinical Mycobacterium tuberculosis isolates.
13. Parkhill, J. & Wren, B.W. Bacterial epidemiology and biologylessons from genome Antimicrob. Agents Chemother. 47, 37993805 (2003).
sequencing. Genome Biol. 12, 230 (2011). 45. Balabanova, Y. et al. Multidrug-resistant tuberculosis in Russia: clinical
14. Farhat, M.R. et al. Genomic analysis identifies targets of convergent positive characteristics, analysis of second-line drug resistance and development of
selection in drug-resistant Mycobacterium tuberculosis. Nat. Genet. 45, standardized therapy. Eur. J. Clin. Microbiol. Infect. Dis. 24, 136139 (2005).
11831189 (2013). 46. Wu, S. et al. Activation of the eis gene in a W-Beijing strain of Mycobacterium
15. Zhang, H. et al. Genome sequencing of 161 Mycobacterium tuberculosis isolates tuberculosis correlates with increased SigA levels and enhanced intracellular growth.
from China identifies genes and intergenic regions associated with drug resistance. Microbiology 155, 12721281 (2009).
Nat. Genet. 45, 12551260 (2013). 47. Shin, D.-M. et al. Mycobacterium tuberculosis Eis regulates autophagy, inflammation, and
16. Drobniewski, F. et al. Drug-resistant tuberculosis, clinical virulence, and cell death through redox-dependent signaling. PLoS Pathog. 6, e1001230 (2010).
the dominance of the Beijing strain family in Russia. J. Am. Med. Assoc. 293, 48. Engohang-Ndong, J. et al. EthR, a repressor of the TetR/CamR family implicated
27262731 (2005). in ethionamide resistance in mycobacteria, octamerizes cooperatively on its operator.
17. Brown, T., Nikolayevskyy, V., Velji, P. & Drobniewski, F. Associations between Mol. Microbiol. 51, 175188 (2004).
Mycobacterium tuberculosis strains and phenotypes. Emerg. Infect. Dis. 16, 49. Finken, M., Kirschner, P., Meier, A., Wrede, A. & Bttger, E.C. Molecular basis of
272280 (2010). streptomycin resistance in Mycobacterium tuberculosis: alterations of the ribosomal
18. Casali, N. et al. Microevolution of extensively drug-resistant tuberculosis in Russia. protein S12 gene and point mutations within a functional 16S ribosomal RNA
Genome Res. 22, 735745 (2012). pseudoknot. Mol. Microbiol. 9, 12391246 (1993).

 aDVANCE ONLINE PUBLICATION Nature Genetics


ONLINE METHODS Molecular fingerprinting and microbiological testing. Isolates were char-
Study population and whole-genome sequencing. From October 2008 to acterized by spoligotyping according to standard methods61.
2010, 2,348 patients with pulmonary disease and culture-proven tuberculosis Susceptibility to the first-line drugs rifampicin, isoniazid, streptomycin,
were recruited from all 18 civilian tuberculosis dispensaries located across ethambutol and pyrazinamide was determined using the absolute con-
Samara. M. tuberculosis isolates were prospectively archived at the Samara centration method on Lowenstein-Jensen slopes62 or by using the auto-
Tuberculosis Service. Anonymized epidemiological data were stored on a mated Mycobacterial Growth Indicator Tube (MGIT) 960 system (Becton
password-protected Access database. Informed consent was obtained from Dickinson)63. For MDR isolates, susceptibility to the second-line drugs ami-
all subjects. The study was approved by the Samara Medical Ethics Committee, kacin, capreomycin, ofloxacin, moxifloxacin and prothionamide was also
the Queen Mary Research Ethics Committee and the University of Cambridge determined using the MGIT system61.
Research Ethics Committee. A quality assurance procedure was implemented to ensure that isolate meta-
Isolates were cultured on Middlebrook medium for 46 weeks at 37 C. data corresponded to the appropriate sequence. Data for isolates belonging to
Sweeps of colonies were harvested and lysed by vortexing with glass beads, the SNP-defined Beijing lineage that did not exhibit the characteristic Beijing
and genomic DNA was purified using a DNeasy Blood and Tissue kit (Qiagen). spoligotype and, conversely, for isolates in other SNP-defined lineages that
Paired-end multiplex libraries with a mean insert size of 200 bp were prepared shared the Beijing spoligotype were excluded from further analysis (n = 45).
as previously described50. Sequencing was performed at the Wellcome Trust The specificity of the katG mutation encoding p.Ser315Thr for the prediction
Sanger Institute on the Illumina Genome Analyzer GAII or HiSeq 2000 of isoniazid resistance is >99% (ref. 64); thus, microbiological data for isolates
platform, generating reads of 54 bp, 75 bp or 100 bp. with this genotype but a sensitive phenotype were also excluded (n = 58).

Sequence analysis. Sequence reads were aligned to the corrected H37Rv refer- Statistical methods. Simple descriptive statistics were used to compare patient
ence genome18,51 with SMALT (see URLs), and GATK indel realignment was data for the sample population and the remaining population and to character-
applied52. Pindel53 was used to predict the positions of indels and structural ize the prevalence of mutations and the geographic distribution of sublineages;
variants; these variants were visually checked in the mapping files. Candidate 95% confidence intervals (CIs) were established. In addition, the sampled
2014 Nature America, Inc. All rights reserved.

SNPs were identified using SAMtools54. At each mapped position, alleles were population was evaluated by attributive risk analysis. The significance of dif-
considered to be valid if they were supported by greater than 70% of mapped ferences between studied groups of variables was calculated using two-sample
reads, including at least five in each direction and a minimum mapping quality tests of proportions: Pearson 2 or Fishers exact test when any expected group
of 45. SNPs located within repetitive regions were excluded from analysis18. size was less than five. Statistical tests were two-sided at = 0.05. Analysis was
Mixed base calls in nonrepetitive regions of the genome were considered performed using STATA (version 12.1, StataCorp).
valid if they had mapping quality of 45, both calls were supported by at
least five reads on each strand, and P values for strand bias, base quality bias,
50. Harris, S.R. et al. Evolution of MRSA during hospital transmission and intercontinental
map quality bias and tail distance bias were 0.001. spread. Science 327, 469474 (2010).
To assess data consistency, nine isolates that were sequenced with 54-bp 51. Cole, S.T. et al. Deciphering the biology of Mycobacterium tuberculosis from the
paired-end reads were recultured, and 100-bp paired-end sequence was gener- complete genome sequence. Nature 393, 537544 (1998).
52. DePristo, M.A. et al. A framework for variation discovery and genotyping using
ated. For each of these technical replicates, there were no inconsistencies in the
next-generation DNA sequencing data. Nat. Genet. 43, 491498 (2011).
bases called at variant sites that passed quality filters in both sequences. 53. Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth
approach to detect break points of large deletions and medium sized insertions
Phylogenetic and population genetic analyses. A maximum-likelihood from paired-end short reads. Bioinformatics 25, 28652871 (2009).
54. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics
phylogeny was reconstructed with RAxML55 using a general time-reversible
25, 20782079 (2009).
model with gamma correction for among-site rate variation. Calculation of 100 55. Stamatakis, A., Ludwig, T. & Meier, H. RAxML-III: a fast program for maximum
bootstrap replicates provided support for nodes on the tree. The phylogenetic likelihoodbased inference of large phylogenetic trees. Bioinformatics 21, 456463
tree was visualized with FigTree (see URLs). Ancestral sequences were recon- (2005).
56. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol.
structed onto each node of the phylogeny using PAML56. From these ancestral
24, 15861591 (2007).
sequences, SNPs were reconstructed onto branches of the tree. 57. Corander, J., Marttinen, P., Sirn, J. & Tang, J. Enhanced Bayesian modelling in
To statistically define population structure, we used BAPS (Bayesian BAPS software for learning genetic structures of populations. BMC Bioinformatics
Analysis of Population Structure) software57,58, in particular its hierBAPS 9, 539 (2008).
58. Tang, J., Hanage, W.P., Fraser, C. & Corander, J. Identifying currents in the gene
module22, which delineates population structure using nested clustering.
pool for bacterial populations using an integrative approach. PLoS Comput. Biol.
Three nested levels of molecular variation were fitted to the data using 10 5, e1000455 (2009).
independent runs of the stochastic optimization algorithm with the a priori 59. Nei, M. Molecular Evolutionary Genetics (Columbia University Press, New York,
upper bound of the number of clusters varying over the interval of 50300 1987).
60. Cai, J.J. PGEToolbox: a Matlab toolbox for population genetics and evolution.
across the runs.
J. Hered. 99, 438440 (2008).
To estimate nucleotide diversity (ref. 59) for the Euro-American and 61. Kamerbeek, J. et al. Simultaneous detection and strain differentiation of
Beijing clades, the functions available in the PGEToolbox60 were used in paral- Mycobacterium tuberculosis for diagnosis and epidemiology. J. Clin. Microbiol. 35,
lel on a cluster computing environment. The very large number of sequences 907914 (1997).
62. Canetti, G. et al. Mycobacteria: laboratory methods for testing drug sensitivity and
in each clade would require excessive amounts of computing resources when
resistance. Bull. World Health Organ. 29, 565578 (1963).
analyzing all the sequences in a single process, and, hence, to allow for more 63. Krner, A., Yates, M.D. & Drobniewski, F.A. Evaluation of MGIT 960based
economical calculations, 50 random subsets of 100 strains were sampled from antimicrobial testing and determination of critical concentrations of first- and
each clade, and inference was performed using 100 bootstraps for each of them, second-line antimicrobial drugs with drug-resistant clinical strains of Mycobacterium
tuberculosis. J. Clin. Microbiol. 44, 811818 (2006).
as described by Cai et al.60, and averaging the results. Confidence intervals for
64. Ling, D.I., Zwerling, A.A. & Pai, M. GenoType MTBDR assays for the diagnosis of
estimates were computed using the normal distribution approximation with multidrug-resistant tuberculosis: a meta-analysis. Eur. Respir. J. 32, 11651174
s.d. derived by the bootstrap procedure. (2008).

doi:10.1038/ng.2878 Nature Genetics

Potrebbero piacerti anche