Sei sulla pagina 1di 2

Computers and Chemistry 26 (2002) 547 548 www.elsevier.

com/locate/compchem

Book review
Mathematics of Genome Analysis Jerome K. Percus, Cambridge University Press, Cambridge, 2002. Paperback, ISBN 0-521-58526-0, 139 pp.; Price USD 17.95 The title of the book promises an insight into mathematics of genome sequence analysis and into mathematical methods of genome sequencing technologies. The chapter and section titles as well as the authors formal credentials appear promising too. The book begins with a general preface that indicates the existence of molecular biology and the fact that mathematical methods of both computer science and physics can easily be used for studies of linear sequences of symbols to which, according to the author, biologically important macromolecules can be reduced. The preface also states that there exist so many mathematical methods designed for studies of DNA, RNA and proteins that it is not easy to dene methodological standards. The author does not mention who should read the book but he refers to it as to a textbook. In addition the content of this compact book is loaded with (unnumbered!) mathematical formulas that do require advanced knowledge of mathematical disciplines as diverse as functional analysis, spectral analysis, spherical harmonics (Bessel functions), combinatorics, probability theory, communication theory and theoretical mechanics. The idea that molecular biology can be described in approximately 30 sentences and seven illustrations is not a very good one but it is precisely what the author attempted to do in chapter 1 entitled Decomposing DNA. The entire chapter is full of metaphors, handwaiving and misconceptions. The section introducing DNA sequences is contaminated by unnecessary description of chemical structure of sugars, heterocyclic compounds (purines and pyrimidines) and double-helical DNA 3-D structure. The text largely misrepresents empirical facts contributing to protein biosynthesis and I am surprised that the prominent publisher (Cambridge University Press) accepted this publication for printing without a meritorious review by a biologist. The author has managed to commit all these mistakes on ve poorly written pages, which do not seem to be devoted to any comprehensible explanation. The author just parades his own knowledge without any pedagogical purpose that could be visible to this reviewer. The two sections of chapter 1 that contain mathematics are much better but not really precise despite being quite condensed and formal. The newcomer and expert alike will have serious problems understanding what messages the author is trying to convey. In addition the references are chosen randomly and most of them will not sufce as a compensation for the pedagogical deciencies of this chapter. Chapter 2, entitled Recomposing DNA, is somewhat better as far as mathematics is concerned. However, the writing is again full of technical jargon of gel-running DNA sequencers with no sufcient tutorial explanation of basic concepts and techniques of genome mapping and sequencing. Because the math requirements for this chapter are formidable and because gifted mathematicians usually do not do gel electrophoresis, only a very few (if any at all) mathematically competent readers will have a chance to understand this chapter. [On the other hand the readers who are not good mathematicians will certainly not understand this chapter even if they were familiar with techniques of genome mapping and sequencing.] A positive thing about this chapter is the fact that three formulas in it 2.1, 2.2. and 2.3 are numbered. All other formulas in the entire book (unless I have a defective copy) are not numbered and therefore difcult to either nd or refer to. Chapter 3 entitled Sequence Statistics is really very confusing because the author appears not to have working experience with statistical sequence analysis. As a result his selection of topics to write about is haphazard. References to and treatment of the controversial (and according to some reports erroneous) past work concerning the long-range correlations in DNA sequences are over-exaggerated while a vast body of solid statistical methodology developed over the last three decades is completely neglected. Particularly the methods that have not been originally presented in a nal, formalized form are either misrepresented or not mentioned at all. Again, the reader has no chance to learn by herself/himself because the references are chosen in a haphazard manner from those published in either physics or large-circulation general reader journals. [Most of fundamental sequence analysis methods

0097-8485/02/$ - see front matter 2002 Published by Elsevier Science Ltd. PII: S 0 0 9 7 - 8 4 8 5 ( 0 2 ) 0 0 0 2 6 - 8

548

Book re6iew

were originally published in journals specialized in structural biology, computational biology, and molecular evolution by authors who were not well known physicists or mathematicians.] It should be added that terminology the author uses is not always understandable. That is why it takes a lot of reading to determine if the author discusses genome segmenting and function-associated annotation. Apparently he does but in a different chapter (chapter 4) and in a terminologically unusual, and thereby obscure, way. The exposition of material in chapter 3 is very formal (with unnumbered formulas!) but the language used for description and interpretation is by no means precise. Overuse of metaphors, hand-waving and jargon makes the text obscure not only in this chapter but in entire book. Chapter 4 entitled Sequence Comparison is again very condensed and formal but not really precise. Lack of precision is again due to the unusual terminology emerging from the authors private language but not from the established parlance of computational and molecular biologists. For instance a typical for this book statement: In attempting to understand the language of DNA, we know that there are broad categories of function that constitutes the basic structure is so general that it must be logically and factually true (page 108). Yet from a pragmatic point of view this sentence does not make any sense to biologists (even the computational ones) because they are used to a conceptual distinction between the structure of biopolymers and the biological function of their fragments. When a biologist is faced with a fully determined 3-D structure of a protein, she/he in general cannot tell, based on the structure alone, what the biological function of this protein (or its parts) could be. As a matter of fact it usually takes years of research to determine biological function of a protein with or without known structure. Knowledge of structure can be helpful in determination of function but in general structure and function need to be observed and determined independent of each other. (In a more physical spirit one could say that structure and function are complex but different from each other observables of the same system.) Because of this pragmatic independence of structure and function biologists are likely to consider the authors phrase categories of function that constitutes the basic structure to be an oxymoron. The preceding example sentence is one of many bizarre passages in the book. Because of a great abundance of the terms and phrases formed in a very unclear authors private

language it would be a disservice to the students (particularly non-biologists) to force them to learn sequence analysis from this book. In fact it is a pity that the writing itself is so decient because chapters 3 and 4 contain a lot of potentially interesting mathematical presentations. The nal chapter 5 is devoted to a brief expose of biophysics of DNA. The Hamiltonian for DNA in thermal equilibrium is written-down. So is the energy partition function. A potentially interesting (for physicists) discussion of dynamics of DNA in low temperatures is again ruined by the authors intrusion of his private language (a relevant sample: the essence of DNA lies in its heterogeneity; page 126). Needless to say that neither the author nor the publishers comfort the reader with numbering the formulas. Not even the four-lines-long ones! My general impression is that potential readers who are not seasoned (and talented) mathematicians should avoid this book. In contrast to the authors implicit intention a poor exposition and imprecise language make the book useless as a textbook for students of bioinformatics or computational biology. The readers who are at the Fields-medal-winning peak form in mathematics could understand this book provided that their minds were open enough to read and comprehend basic textbooks of general biology, molecular biology, organic chemistry, statistical mechanics and polymer chemistry. Experienced computational biologists with interest in mathematics will benet from having this book as a nonessential reference that covers mathematical intricacies of routine methods of sequence analysis. Mathematicians who are curious about genome analysis but have no intention to become bioinformaticians will likely benet from reading this book as well. However I hasten to admonish that even the mathematically literate readers will need to exercise caution in order to distinguish solid, well established methodologies from the controversial or potentially erroneous ones. Andrzej K. Konopka Center for Ad6anced Sequence Analysis, BioLingua Research Inc., 10331 Battleridge Place, Gaithersburg, MD 20886, USA E-mail: akk@blingua.org

Potrebbero piacerti anche