Sei sulla pagina 1di 4

Lepr Rev (2009) 80, 246 249

COMMENTARY

Molecular epidemiology of Mycobacterium leprae: a solid beginning


BARRY G. HALL Bellingham Research Institute, Bellingham, USA Accepted for publication 15 June 2009
Until I was invited to present a workshop on phylogenetic methods at an IDEAL consortium meeting in March of 2007 I knew virtually nothing about Mycobacterium leprae. I was both astonished and awed as I learned about the many constraints that leprae investigators labour under; constraints that would drive most microbiologists and molecular biologists whimpering into a corner. The simple matter of growing a culture in order to prepare a sample of highly puried DNA for molecular strain typing purposes, something that most of us think of as a matter of taking a few hours, becomes a prodigious effort often requiring months to years and the clonality of that culture is always uncertain. Likewise we take for granted the high level of genetic variability that permits the straight forward application of modern DNA technology to the problem of strain typing for epidemiological purposes. The apparent paucity of variability has forced leprae epidemiologists to turn to the variability that is inherent in short polynucleotide repeats (VNTR) as a basis of strain typing. At that March 2007 meeting it was evident that the eld lacked a sufcient number of VNTR loci to reliably type strains or to estimate relationships among those strains. The variability in VNTR loci is with respect to the number of repeats of a short sequence, and the genotype of each allele is the repeat number. The papers in this special issue represent a concerted effort to provide a practical, reliable basis for molecular typing of M. leprae. The Gillis et al. paper identies 16 potentially useful VNTR loci, provides a standard operating protocol (SOP) for determining the repeat number at each locus, and rigorously evaluates the reliability of each locus and its stability through time in a single infection. Six of the seven remaining papers employ that set of VNTR loci and apply the SOP in the eld in six different countries as a practical trial of the value of those loci and the SOP in the eld. Taken together, these papers represent both an amazing and an outstanding effort to create and validate the tools of a modern epidemiological system. The Gillis et al. paper sets rigorous standards for certication of a locus and its SOP: the PCR conditions must permit reliable amplication, generating a full-length PCR product, from 10 cells, and the product must be able to be reliably sequenced. All 16 loci meet that standard. It turns out that determining the repeat number is not as simple as just reading the output sequence le. Slippage during the PCR amplication process can potentially result in
Correspondence to: Barry G. Hall, Bellingham Research Institute, 218 Chuckanut Point Re, Bellingham, WA 98229, USA (Tel: 360 752 1422; e-mail: barryhall@zeninternet.com)

246

0305-7518/09/064053+04 $1.00

q Lepra

Molecular epidemiology

247

mixed products in which the repeat number is unclear. Applying an uncommonly high standard, Gillis et al. sequenced multiple independent amplications in both directions, and had two independent readers determine the repeat number in order to rank the reliability of each locus. Two loci, AT15 and TA18, ranked at the bottom of reliability and were not recommended for use. Gillis et al. judged, quite correctly, that unreliable data are worse than no data at all. None of the six surveys, conducted in Columbia, Brazil, India, the Philippines, China and Thailand, excluded those two loci. By including those unreliable loci they provided the opportunity to determine, under eld conditions, whether they are, in fact, any less reliable than the other loci. Inexplicably (or at least unexplained) was the decision in three studies to exclude the TA10 locus and in one to exclude the 18-8 locus, both of which ranked at the top of the Gillis et al. reliability scale. Those decisions leave 14 loci (including the two Gillis et al. unreliable loci) that are common to all six survey studies. Taken together those six studies include (by my manual count) 386 samples, and they afford the opportunity to evaluate the reliability of those loci and the SOP in the eld. One of the most unusual, important, and very refreshing, aspects of these six studies is that they report all of the results, not just the positive outcomes. For reasons that I will make clear below, we can consider a sample to have succeeded if it was possible to determine the repeat number at all 14 of the loci that were common to all six studies. On that basis, 208, or 539%, of the samples succeeded. The success rates ranged from 912% in the Brazil study down to 22% in the Thailand study. Two types of failure occurred: a failure to amplify enough product for sequencing, or an inability to determine the repeat number, usually meaning that two possible repeat lengths were observed. The 208 successful samples provide a large set of common samples that can be analysed in terms of geographical distributions of allele, and patterns of those distributions, as discussed below. The 178 failed samples, those that were not readable at every locus, provide the opportunity for an equally important analysis. The 14 loci can be ranked in terms of reliability. Does unreliability in the laboratory tell us anything about unreliability in the eld? In some cases there were interesting patterns of failures; i.e. most of the samples from one particular region failed at the same locus. Why? Those same loci, in the hands of the same investigators, produced reliable results in many other regions. While the problem might be as simple as a contaminant in the local water supply, it is also possible that sequence variation at a common site near the 30 end of one of the primers interfered with amplication. If so, the site of another SNP may have accidentally been revealed. Appropriate analysis of the failed samples might suggest whether it is worthwhile designing some alternative primers for those loci. It may seem unduly harsh to consider samples in which the repeat number at a single locus was not determined as failed, but that really is not the case. Each locus is a character, and the state of each character is the repeat number. Fourteen is a tiny number of characters to distinguish among isolates, and every absent character signicantly degrades the resolution of VNTR-based strain typing. However, the most serious problem is not resolution, but the effect of missing data on the ability to estimate relationships among the isolates. The ability to understand issues such as sources of infection, transmission, incubation period, modes of transmission and importance of contact patterns depends completely upon accurate estimation of the relationships among isolates. Based on a simulation study, I estimated that about 25 VNTR loci would be required to achieve 90% accuracy of VNTR-based phylogenetic trees. With only 14 loci that accuracy is reduced to about 77%, and is reduced about another 2% for each additional locus excluded. (Hall, unpublished results).

248

B. G. Hall

While estimation of a phylogenetic tree is the most common way to estimate relationships among isolates, there is serious question about the validity of phylogenetic analysis as applied to estimating relationships within a bacterial species. Valid phylogenetic analysis depends completely on the assumption that the taxa are genetically isolated from one another. It is now well understood that, when there is signicant recombination within a species, the relationships among isolates cannot validly be represented as a tree.1,2 For most microbial species studied there is sufcient recombination to signicantly degrade phylogenetic signal,3 and my comparison of sequenced Mycobacterium tuberculosis genomes suggests considerable recombination even within that species (Hall, to be published elsewhere). There is insufcient information to estimate the extent of recombination within M. leprae, but it would be unwise to assume an absence of recombination, thus unwise to estimate relationships by phylogenetic analysis. The problem of taking recombination into account led to an alternative method, eBURST, for estimating relationships from multi-locus sequence typing (MLST) data.4 eBURST considers all alleles of a locus to be equidistant from each other whether they differ by one or several nucleotides. As a consequence recombination, which may introduce many substitutions from a single event, and mutation are given equal weight. eBURST does not seek to represent the historical relationships among all isolates on a single diagram (a tree), instead it seeks to cluster closely related isolates into groups based only upon identity by state, not identity by descent. eBURST connects the most closely related isolates to each other, and generates diagrams that are ideal for epidemiological investigations. MLST and eBURST analysis are by now the standard tools for molecular epidemiology of bacteria. Epidemiology based on VNTR loci is ideally suited to analysis by eBURST. Where MLST represents each unique sequence of a locus by an allele number, the allele number of a VNTR locus is simply the repeat number. MLST denes an allele prole as the set of allele numbers at the loci under consideration, and assigns to each unique allele prole a sequence type (ST) number. For VNTR loci the equivalent would be a VT number that represent a unique prole of the repeat lengths at all 14 loci. The VT thus represents the VNTR-based genotype as a single number, from which the repeat number at all 14 loci can be deduced. That sort of representation is ideal for constructing a database of M. leprae genotypes. However, if the repeat number of a single locus is missing it is impossible to assign a VT and impossible for the eBURST algorithm to place the isolate in a cluster. Thus, samples missing the repeat number for a single locus are, indeed, failed. Some of the surveys included SNP data for the samples they reported, however those data appear, at rst glance, to contribute little to the resolution of the typing scheme. They do, however, afford the opportunity to ask whether SNP analysis potentially adds anything to epidemiological studies of leprae by asking, for samples where the SNP data is available, can we draw any conclusions with the SNP information that could not be drawn without it? That is an important question because it addresses the issue of efciency. It requires as much effort to determine the status of a single polymorphic site (which can have only four possible states) as it does to determine the status of a VNTR locus that can have many possible states. A comparison of the two completely sequenced M. leprae genomes shows that there are only about 125 polymorphic sites in the genes that the two strains have in common (some of which are at VNTR loci). If SNPs contribute little of value to epidemiological information it may not be cost effective to try to identify more polymorphic sites. The common goal of the papers in this special issue was to establish a common set of loci for strain typing and a common protocol for the application of those loci to investigating

Molecular epidemiology

249

the epidemiology of leprae. To accomplish that goal it is important to agree upon and use a common terminology. These papers fail to reect such a common terminology. VNTR loci are referred to as VNTR, STR, micro-satellites or mini-satellites (depending on repeat length). Whatever the personal preferences of the authors, they should agree on a single terminology. The data in these papers represent a rich epidemiological resource, but effective exploitation of that resource requires the development of a database that is open to all and that includes tools such as eBURST for analysing VNTR data. Development of that database should be given a high priority in the near future, for without it the results of these studies in six countries will remain effectively isolated from each other. A database, of course, requires agreement upon not only a common set of loci, but upon what other information will be associated with each strain. It requires development of the equivalent of an MLST scheme, a group to host the database, computer hardware, and a person to curate the database. In short, it requires both funds and commitment. When allocating scarce resources it should be fully understood that the development of such a database is far more important than is additional or expanded surveys of the sort that are reported in these papers. In the end there are really only three ways to distinguish among strains based on their DNA content: sequence differences among their core genes (SNPs), differences in repeat numbers of VNTRs, and differences in the presence or absence of genes. The two sequenced genomes differ only in the presence/absence of 10 genes. One strain has three genes the other lacks, and the second has seven genes the rst lacks. This proportion of distributed genes is the smallest among the 22 species that have been studied (Hall, unpublished results). Only two M. leprae genomes have been sequenced so far. When another 10 or so have been sequenced we will have a much better understanding of leprae genome dynamics, but at this point the prospects for molecular epidemiology based on gene sequences differences do not appear to be good. It seems likely that analysis of VNTR loci is likely to be, for the foreseeable future, as good as it gets for leprae molecular epidemiology. That makes the continued investment in developing and improving VNTR analysis very worthwhile. The work reported here is both important and very valuable, but it is only a beginning an impressive and solid beginning, but just that, a beginning.

References
1 2 3 4

Feil EJ, Holmes EC, Bessen DE et al. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci U S A, 2001; 98: 182 187. Didelot X, Falush D. Inference of bacterial microevolution using multilocus sequence data. Genetics, 2007; 175: 12511266. Perez-Losada M, Browne EB, Madsen A et al. Population genetics of microbial pathogens estimated from multilocus sequence typing (MLST) data. Infect Genet Evol, 2006; 6: 97112. Feil EJ, Li BC, Aanensen DM et al. eBURST: inferring patterns of evolutionary descent among clusters of related bacterial genotypes from multilocus sequence typing data. J Bacteriol, 2004; 186: 15181530.

Potrebbero piacerti anche