Sei sulla pagina 1di 7
Reprint from Molecular Strategie in Biological Evolution Volume 870 ofthe Annals of the New York Academy of Sciences ‘May 18, 1959 The Linguistics of DNA: Words, Sentences, Grammar, Phonetics, and Semantics SUNGCHUL Jt Department of Pharmacology and Toxicology, Rutgers University, Piscataway, ‘New Jersey 08855, USA here are theoretical reasons to believe that bilosic systems and processes cannot be fully accounted for in erms ofthe principles and laws of physics and chemisiry aloe, but they require in addition the principles of semiorcs—the science of symbols and signs, including linguistics." For convenience, we may refer tothe belief, common among con- temporary molecular bilosiss, thatthe laws of physics and chemisty are necessary and suiiient to account for life asthe PC (physics and chemistry) paradigm. while the aler- native view that principles of semiotics are aditionally absolutely required fora complete understanding of ving systems and processes as the PCS (physics, chemist, and semiot- ies) paradigm. Tt was von Neumann who first recognized the necessity for symbolic sel-represent- tion of organisms asa prerequisite for eficient sel-epication In view of the fundamnen- tal importance ofthis insight for biology, we may refer to this notion asthe von Neumann doczrne, Tis doettne was further elaborated and developed by Pate into what may be the theory of matter-symbo! complementarity ® The linguistic theory of DNA pre- sented here canbe viewed as a natural extension 1 the structure and function of DNA of the von Neumann doctrine and Pates's theory of mastersynbol complementary. (See Note Added in Proof) Since the discovery ofthe DNA double helix in 1953, many biologists have employed Janguage as a useful metaphor to describe certain aspects of molecular biologie phenom ena”? But recently it was postulate that language is more than just a metaphor and that linguistics provides a fundamental principle to account for the structure and function of the cell This conclusion is supported by the facs (1) that cells wea language, called cel language or cellese, defined as “a sol-organizing system of molecules, some of which encode, act assigns fr, or trigger, gene-directed cll processes," and (2) that xl language has molecular counterparts to 10 ofthe 13 design features of human language (humanese) characterized by Hockett and Lyon, thus suggesting an isomorphism between cellese and hhumanese"™" Because cellse must be tansmited from one generation tothe next. it must be encoded in DNA. Therefore, the man objective of this communication isto cha acterize the structure and function of DNA based on linguistic principles, ISOMORPHISM BETWEEN CELL AND HUMAN LANGUAGES Both human and cell languages can be treated as a 6-uple (L, WS, G, BM), where L isthe alphabet (i. ase of basic symbols called provosemaa'), Wis the vocabulary or lexicon ie, set of words), Sis an arbitrary set of sentences, Gis a set ofthe rules g0v- ering the formation of sentences from word (the first articulation) as wel as the forma- an a ‘ANNALS NEW YORK ACADEMY OF SCIENCES tion of words from letters (the second articulation), P is a set of physical mechanisms realizing and implementing a language, and finally M is a set of objects (both symbolic ‘and material) or processes refered to by words and sentences, TABLE I summarizes a con parison between sound-based and visual signal-based human language and molecule- ‘based cell language with respect to these categories of linguistic features. The table is self- ‘explanatory, and newly appearing terms are explained inthe accompanying footnotes. The isomorphism between cell and human languages evident in TaBte 1 suggests the existence ‘of three distinct categories of genetic information in DNA here called the lexical, symtac- fic, and semantic, To visualize the elation among these three categories of genetic infor "TABLE 1. Comparison between Human snd Cell Languages Human Language Call Language TAlphaber (2) —_Leters “F Nleotides (er 29 amino acid) 2. Lesion (W) Words Setar gnes (or polypeptides) 3.Semences(S) Stings of words Sets of genes expressed coorinatly in space ‘od time under the contol of spatiotemporal genes" 4.Grammar(G) Rules ofeatence formation Lave of chemisty and physics of alee 20s thar determine the folding pater of DNA according to ncleotie sequences and microenironmentalsanditios. Only ‘sal suet of grammatically folded (ence sraactically cone! chromatin structures is selected by evelution and hence cary genetic (Le, semantic) information, '5.Phonetcs(P) Physiologie structures and Conformational dynamics of DNA that Process underlying ‘enables the expression of genetic ‘honation audition, nd information tough input of fee encrey fncerptation ia protein binding andor ATP-dependent super coling of DNA 6, Semantics M) Meaning of words and Gene-rected ell process driven by Senteaces ‘onformons and intracellular isiatve sructures (DSS 7First Aricultion Formation of sentences trom Organization of gene expression in space and ‘words time (oh noncavalent interactions") 8 Second Trmaton of words om Orpaizaion of nucleotides (amino side) into ‘Anieulaion eters s20es (polypeptides) (lrouph covalent Imeractiont) “Genes tat consol the spatiotemporal cvohaton of the expression of Mricural Genes by regulating the tine- and space-dependen folding patterns of chromosomes.” ‘Conformational tains of biopolymers tha cary free energy (Wo do work) and infomation (10 contol work)? “Dissipative sructares of Prigoine (or atractors) localized within the ell? “Molecular inersetons that do not mplicate any breaking or forming of covaent bonds, “Molecular iterations that involve changes in covalent bonds, namely, alterations in valence lectonie configurations. JI: LINGUISTICS OF DNA : a3 mation, it is convenient to use a loaded carousel as a metaphor for DNA with the alignment as shown in Tan 2. Just asa grammar constrains mentally the word order in sentences, so a carousel con- strains physically the positioning of slides ino a linear array, any linear array. The genetic analog of this constraint is referred to as the syntactic genetic code identified with the physicochemical constraints of nucleic acids that control the folding patterns of chrom- tins in esponse to microenvironmental conditions such as the presence of transcription factors, pH, fons, and mechanical stresses of nuclear scaffolding. Please noe tat there are a large number (i. n!) of arranging a slides into slots ina carousel, But only one, or at ‘most afew, of these linear arrays will actually be utilized by a given speaker. The informa tion needed (logs n! bits) to select these few arrangements out of the large possible arrangements derives from the brain of the speaker. But inthe case of DNA, the informa tion determining the temporal order in which a se of genes is expressed must be encoded in DNA itself (in the form of semantic genetic code)—in regions that were previously called spatiotemporal genes and postulated to be located in noncoding DNA.™! In the Ihuman genome, structural genes account for approximately 3% of the total DNA mass, whereas the remaining 97% of DNA is noncoding and was once thought to be without any biologic function. But impressive amounts of empirical data were recently accumulated in the literature, indicating that noncoding regions, particularly “repetitive sequences,” play an important role in genetic control processes.” Consistent with these developments itis postulated here that these noncoding regions regulate the spatiotemporal evolution of the expression of structural genes and thus contain genetic information analogous to the semantic information of sentenees. The genetic information that determines the spatiotem: poral organization of gene expression is referred to as "semantic genetic code.” It is ‘thought that semantic genetic information isa subset of syntactic genetic information, just as semantically meaningful sentences constitute but a small subset of grammatically cor- rect sentences in human language. The syntactic genetic information is distributed over the ‘whole DNA molecule in that every aspect ofthe physics and chemistry of DNA affects the ‘dynamics of DNA. Therefor, the sum ofall the genetic information encoded in DNA is 200% (Taste 2). This makes sense only if we can assume that DNA structures encode ‘more than one kind of information within identical sequences and that different kinds of ‘TADLE 2. The “Loaded Carousel” Model ef DNA Stueture and Fupetion Genetic Coe Mole Repetition DNA mas vied) ‘Structural genes in cong DNA oe Carousel Suearpospate bacttone, Wasoo- _Syacte gens code (1008) ‘Gk bse ping, hems a pais ofDNA. Onder ofsdes Spe andtinedepndcat gene Semantic enc code 07) expression, mde possibe by space: tnd time-zopenentfldings of ‘Somalis exposing ight genes at ‘ight times, ll regulated by Spatiotemporal genes located in encoding DNA as ANNALS NEW YORK ACADEMY OF SCIENCES ‘genetic information can overlap in DNA, in agreement with the multiple genetic code hypothesis of Trifonov." The present result is also consent with the view that DNA pos- ‘esses dual or complementary aspeets—dynamic and semiotic, ot material and symbolic." ‘The syntactic genetic code represents the dynamic or material aspect of DNA obeying the laws of physics and chemistry, while lexical and semantic genetic codes constitute the symbolic (orsign) aspect that obeys the rues forged by biologic evolution. This interpre ation fits nicely with the notion ofthe mattr-symbol complementarity (or more generally ‘matter-sign complementarity; see Note Added in Proof) as the most fundamental distin- guishing feature of biology vis-2-vis physics and chemistry." Indirect evidence for the existence of spatiotemporal genes (carrying semantic ‘zenetic code) was recently provided by Amano et a.'® Their data from Figure 8 ean be replotted in a graph ofthe percentage of noncoding bases per genome versus the relative ‘amount of structural genes in the form of transcription factors per genome to obtain two lines, one with a zero slope passing through five species of unicellular organisms (Syco- plasma genitalium, Haemophilus influenca, Methanococcus jannaschii, Synechocystis sp, and Escherichia coli) and the other with a slope of sbout 20 passing through three species (Saccharomyces cerevisiae, Caenorhabditis elegans, and Homo sapiens), two of Which ate multicellular organisms. Interestingly, these lines intersect in the neighbor- hood of & coli and S, cerevisiae. Two conclusions may be drawn from ths plot (1) The amount of noncoding DNA increases abruptly with the multiceliularty of organisms (most likely due tothe fact that noncoding regions act as “spatiotemporal genes” regu lating the development of multicellular organisms), and (2) Of the two mechanisms for ‘regulating gene expression—rans mechanisms mediated by transcription factors and cis ‘mechanisms mediated by noncoding regions, the latter contributing to a greater extent (ve tothe slope being greater than 1) than the former as the complexity of multicelhular ‘organisms inreases. ‘The role of noncoding DNA strongly suggested by these data is dificult to be accom- ‘modated by the traditional view that the final referents oe meaning of genes are polypep- tides. However, the data are consistent with the so-called DNA:polypeptide-IDS hypothesis” which claims thatthe final products of genes (i, structural genes under the control of spatiotemporal genes) are not polypeptides but dynamic processes collectively called intracelluar dissipative siructures (IDSs) whose generation is catalyzed by enzymes encoded in structural gene. IDSs include ionic gradients in the cytosol or across Diomembranes, and mecanical stess gradients in biopolymers including eytoskeletons and DNA, all of which together act as the proximal or immediate causes for cell func tions.’ (ee aso Pa. 1). ‘According to some linguists, the phenomenon of double articulation or duality (see seventh and eighth rows in Tabte 1) isthe most fundamental aspect ofall human lan- guages. The cell language theory is based on the basic assumption thatthe cell-linguistic counterpart of double articulation isthe duality of covalent and non-covalent interac ‘ions in the cel. Just as the first and second articulations are both essential in human lan- ‘guage, so it is postulated that both covalent and conformational iterations are Fundamental in cell anguage (enabling intercellular communication and signal trans duction). This postulate appears to provide the first explicit rationale forthe fundamen tal role of conformational interactions in molecular biology, as observed in ligand- protein interactions, protein foldings, and chromatin reorganizations during the cell coyele i LINGUISTICS OF DNA i ve \ \ ® ecusiom 39 ster) oy £20 a ee wf seg FIGURE 1. The Bhopastor, a molecular model of th lving cell Te cells molecular machine in that its moving pats are made out of molecules, some of which act a molceular matory dven by conformational stains (called conformons) scserated from chemical reactions cr ligan nding ‘eactions. The final form of expression of genes snot polypepide, a wally thought bt dpa five structures of Prigogine (ot aac) (ee the recengl) namely gradients hema concen tations aid mechanical steses in the cel These dissipative strctres at asthe dct causes far alleel futons. Solid arrows indicate the detion of information flow-—tom DNA to miRNA 10 proteins o dsipative structures af rigogine (aso called intracllolar dissipative wructurs, of 105s), and back to DNA. Dotted ars indicate Feedback interactions. The cel receives apts fom its rounding (Step 19) an process it according to the genetic information stored in DNA (Steps 5-11 and 1-4) and outputs the result (Step 20), Because IDS's ean influence the rae of mations and recombinations of DNA (Step 10), DNA. can guide its own evolution. Thali, DNA is self ‘volving molecule driven by conformons and IDSs, For mee nfonnation, sce reference 9 patch lay 178. ‘THE BHOPALATOR: A MOLECULAR MACHINE THAT ACCEPTS. ‘CELL LANGUAGE Since the founding of the cell theory inthe mid-nineteenth century, there hed boon no rigorous and comprehensive theoretic mode! of the living cell available in the literature ‘until 1983, when the Bhopalstor model of the cell was proposed in a meeting held in Bho- 416 ANNALS NEW YORK ACADEMY OF SCIENCES pal, India The name ofthe model reflects the convention that mechenisms of seif-orga nizing chemical reaction difusion systems are reamed 36 "X-atos,” where Xs the name of ‘city connected in some way with the model. Two concepts are novel in this model—(1) ‘conformons, sequence-specific conformational strains of biopolymers that provide free energy and cootrl information fr diving all molecular motors inthe cel,’ and (2) intra cellular dissipative structures (IDSs), geadients of chemical concentrations and mechani- ‘al siresses in the cell that mediate information transfer from the aucleus to the cytosol and {rom the eytosol to the extracellular space.” With conformons and IDSs (synonymous with “auractors”), the cell cannot only “read” genetic messages encoded in DNA but also “implement” and “reify” these messages into molecular processes and actions constituting cell functions. In other words, the Bhopalator can be viewed as a molecular machine that ‘ccepis cell language encoded in DNA, just as the Turing machine acts as an abstract ‘machine that accepts and defines a formal language encoded on a tape. Iti to be noted ‘that these molecular entities that drive the cell, namely conformons and IDS's, can be ‘viewed as the microscopic embodiments of the matter-symmbol complementarity discussed bby Pattee." PREDICTIONS: 1. The cel language theory predicts that DNA of higher eukaryotes contains two kinds of genes: structural genes located in coding regions (accounting for ~3% of the human ‘genomic mass) and spatiotemporal genes located in noncoding regions (-97% of the human genomic mas), 2. Spatiotemporal genes encode the information controlling the timing of gene expres- 3. The timing information encoded in spatiotemporal genesis retrieved through space- and time-dependent chromatin folding and unfolding processes driven by ATP-dependent ‘opoisomerases and free energy-rleasing binding interactions between transcription fac- tors and DNA, CONCLUSION ‘The cell language theory and the Bhopalator model of the living cell provide the first ‘comprehensive and rigorous theoretic framework for molecular and cell biology. As such, ‘they may find important applications in functional genomics in the coming decades. [Nove apne 1s pxoor: It was recently suggested elsewhere (S. Ji, “The cell as the ‘smallest DNA-based molecular computer” BioSystems, in press) thatthe ideas of J. von Neumann (ie. the necessity of self-representation for seli-teproduction) and H. Pattee ‘namely, the matter-symbol complementarity as an essential feature of ll sel-eproducing systems) be combined into what may be called the “von Newnann-Patte principle of mat- tersign complementarity” The term ‘symbol is replaced with the more general term, ‘sign, since according to CS. Peirce (1839-1914), signs include symbols along with icons and indexes (J.J. Liszka, “A General Induction o the Semeiotc of Charles Sand= ets Peirce, Indiana University Press, Bloomington, 1996). The essential content of the vor II: LINGUISTICS OF DNA 47 Neumann-Patte principle of maver-sign complementarity is that all sef-reproducing $yS- tems embody two complementary aspects—the physical law-governed materiaVenergetic aspect and the evolutionary rul-governet sign aspects. The dual role of DNA revealed in ‘Table 2, namely the fact that [syntactic (100%)] + [lexical (39%) + semantic (979)] = 200%, finds «rigorous theoretical rationale in the von Neumann-Pattee principle of the ‘matter (syntacti)-sign (lexical & semantic) complementarity] REFERENCES, 1, Pass, HLHL 1968. The physical basis of coting and reliability in biological evolution. fn “Towards a Theoretical Biolog. |. Prolegomena. CH. Waddington, Ed 67-93. Aldine Pub- lishing Co. Chicago. 2. Perm, HH. 1970. The problem of biological hierarchy. Jn Towards a Theoretical Biology. 3. Drafls.CH. Waddington, Ed 117-136. Aldine Publishing. Chicago. 3. Perms, Hil. 1972, Laws and constrains, symbols and languages. In Towards a Theoretical ‘Biology. 4 Essays. CH. Wadditon, Ed.” 248-258, Edinburgh University Press. Einar 4. Vox Neth, J. 1966. Theory of Self Reproducing Automata. A.W. Burks, Bd: 122-123. Uai- Versi of Minos Pres. Uta 5, Parr, HL. 1982, Cell Psychology: An Evolutionary Approsch tothe Symbol Matter Prob- ‘em. Cog. Brain Theor §:325-341 6, Parmin, HI, 1995, Evolving Seif-Reference: Mater, Symbols, and Serande Closure, Inte. ‘Study Arie, Intll, Cogn. Sei. Appl. Epstemo. 12: 9-77 17, Seamso, ML 1991. Four Analogies Betseen Biological abd CultualLinguistc Evoluon. 3 "Theoret, Biol. 151: 467-507, 8, GasciBauuioo, A. 1984, Towards a Genctic Grammar. An English version of “Hacia una ‘Granuica Gensica,” Real Acadeaia de Ciencias Exacas, isis y Naturales, 9. 8, $1991 Biocybereies: A Machine Theory of Biology. In Molecular Theories of Cell Life ‘and Deas, S11, Ed: 1-237, Rutgers Universiy res. New Brunswick. 10, 4,8. 1997. Isomerphism between cell and human languages: Molecular biological, bioinfor- ‘matic and linistc implications. BioSystems 4: 17-89, 11, J, 5.1998, Cell Language (Cells): Implications for Biology, Linguistics and Philosophy. “international Workshop om the Linguistics of Biology a the Biology of Language, CHEN, Universidad Nacional Auténoma de Mexico, Cuemsvaca, Mésico, March 23-27, Fr abstract, see hp:fwwrcfn nam mComputationl_Biology998l 12, Mateus S. 1967. Algebraic Linguistics; Analycal Models. Academic Press. New York 15, Braioucnanl SK, G. Mines, Ps, Saskate P-Baisctnewoonrny, J. Tapas, 8 Rasta, U. Stauioran & S. Parasia 1995. Simple repetitive sequences inthe genome: Stecture and funcional significance. Electophoresis 16: 1708-1714 14, Thurow, EN, 1989, The multiple codes of nucleotide sequences. Bil. Math, Bio. 1: 417 2, 15. Asano, N.Y. Ouruks de M, Suk, 1997, Genomes and DNA conformation. Bil, Chem. 378: 1397-1404.

Potrebbero piacerti anche