Aminoacidos Cod Binarios

Downloaded from http://rsta.royalsocietypublishing.
org/ on May 22, 2018
The non-power model of the

genetic code: a paradigm for
rsta.royalsocietypublishing.org
interpreting genomic
information
Research Diego Luis Gonzalez1,2 , Simone Giannerini1
Cite this article: Gonzalez DL, Giannerini S, and Rodolfo Rosa2
Rosa R. 2016 The non-power model of the
1 Dipartimento di Scienze Statistiche, Università di Bologna,
genetic code: a paradigm for interpreting
genomic information. Phil. Trans. R. Soc. A 374: Via delle Belle Arti 41, 40126 Bologna, Italy
2 CNR-IMM, Sezione di Bologna, Via Gobetti 101, 40129 Bologna, Italy
20150062.
http://dx.doi.org/10.1098/rsta.2015.0062 DG, 0000-0001-9646-9900; SG, 0000-0002-0710-668X
Accepted: 27 October 2015 In this article, we present a mathematical framework

based on redundant (non-power) representations of
integer numbers as a paradigm for the interpretation
One contribution of 21 to a theme issue of genomic information. The core of the approach
‘DNA as information’. relies on modelling the degeneracy of the genetic code.
The model allows one to explain many features and
Subject Areas: symmetries of the genetic code and to uncover hidden
biomathematics, biophysics symmetries. Also, it provides us with new tools for
the analysis of genomic sequences. We review briefly
Keywords: three main areas: (i) the Euplotid nuclear code, (ii)
genetic code, numeration systems, the vertebrate mitochondrial code, and (iii) the main
coding/decoding strategies used in the three domains
degeneracy, symmetry
of life. In every case, we show how the non-power
model is a natural unified framework for describing
Author for correspondence: degeneracy and deriving sound biological hypotheses
Simone Giannerini on protein coding. The approach is rooted on number
e-mail: simone.giannerini@unibo.it theory and group theory; nevertheless, we have kept
the technical level to a minimum by focusing on key
concepts and on the biological implications.
1. Introduction
In this article, we present an innovative mathematical
description of the genetic code that opens new avenues in
the interpretation of genetic information. The approach
is based on number theory and has been proposed
for the first time in [1]. Since then, new routes have
2016 The Author(s) Published by the Royal Society. All rights reserved.
Downloaded from http://rsta.royalsocietypublishing.org/ on May 22, 2018
been explored and led to several advancements but the full potential of the approach has yet to be
2
exploited. The main idea relies on modelling the degeneracy of the genetic code. Degeneracy is
a universal property shared by all the variants of the genetic code and it might be associated
rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

.........................................................
to key biological functions. Our model is based on redundant numeration systems where an
integer number can have more than one binary representation. We establish a correspondence
between codons and length-6 binary strings, on the one side, and between amino acids and integer
numbers, on the other side.
The article is structured as follows. In §2, we present a brief historical account of the
mathematical attempts to understand the mechanisms of protein coding from an informational
point of view since the discovery of DNA in 1953. After more than 50 years, the challenge is
still open and indicates clearly that the quest for mathematical regularities is important since it
is universally accepted that these are associated to physico-chemical properties. In §3, we outline
the three main aspects of the problem. First, we show that the non-power model describes exactly
the degeneracy distribution of the Euplotid nuclear genetic code. The analysis allows also the
uncovering of hidden symmetries in its structure. The mathematical properties of the model have
a biochemical interpretation that leads to the definition of dichotomic classes, nonlinear operators
that allow one to code the genetic information in a new fashion and provide innovative tools for
the statistical analysis of genomic sequences. The second aspect concerns the structure of the
vertebrate mitochondrial code. We present a non-power model of the degeneracy distribution
and a biological hypothesis on the origin of amino acid coding based on ancient tRNA adaptors
acting on a special set of four-base codons that we call tesserae. Symmetry, again, is the key feature
for connecting the mathematical model with the chemical and biological features of the genetic
code. The third aspect concerns the coding strategies used in the different domains of life. Each
strategy has an observed minimal number of adaptors needed to implement it. We show that
such numbers can be obtained exactly by means of our non-power modelling. In §4, we present
the conclusion and further perspectives.
2. Background and history

Since the discovery of the double helix structure of DNA by Watson and Crick in 1953, many
mathematical hypotheses regarding how proteins are coded have been proposed [2]. Among the
most remarkable are those of George Gamow (crystal model [3]) and Francis Crick (comma-free
codes [4]). Despite the fact that both hypotheses were experimentally proved incorrect [5], Crick’s
proposal indicated clearly that coding theory is ‘the’ framework for interpreting the mechanisms
of protein synthesis. We quote here the words of Gamow in a letter to Watson and Crick [6, p. 10]:
[I] think that this brings biology over into the group of ‘exact’ sciences [. . .]. If your point
of view is correct, and I am sure it is at least in its essentials, each organism will be
characterized by a long number written in quadrucal1 system [. . .]
Gamow trusted that this digital representation of genetic information would have opened a new
era in the understanding of life.
Somehow, in the following years, the interest in a mathematical description of the genetic code
faded away, and subsequent studies were more targeted to explain features of the code mainly
on a biological and molecular basis. However, after more than 50 years, we think that the original
expectation of Gamow still makes sense. The mathematical description of the regularities of the
genetic code has important theoretical and practical consequences. From a theoretical point of
view, it provides information about the origin and evolution of the code that it is difficult to
obtain otherwise. From a practical point of view, it might be useful to improve bioinformatics
algorithms and, in medicine, for the diagnosis and therapy of genetic diseases. Indeed, many
1
That is quaternary (authors’ note).
Table 1. Left: Euplotid nuclear genetic code. Right: vertebrate version of the mitochondrial genetic code. The differences with
3
the Euplotid nuclear code are evidenced in grey (yellow online). (Online version in colour.)

.........................................................
U C A G U C A G
UUU Phe UCU Ser UAU Tyr UGU Cys U UUU Phe UCU Ser UAU Tyr UGU Cys U
UUC Phe UCC Ser UAC Tyr UGC Cys C UUC Phe UCC Ser UAC Tyr UGC Cys C
U UUA Leu UCA Ser UAA Stop UGA Cys A U UUA Leu UCA Ser UAA Stop UGA Trp A
UUG Leu UCG Ser UAG Stop UGG Trp G UUG Leu UCG Ser UAG Stop UGG Trp G
CUU Leu CCU Pro CAU His CGU Arg U CUU Leu CCU Pro CAU His CGU Arg U
CUC Leu CCC Pro CAU His CGC Arg C CUC Leu CCC Pro CAU His CGC Arg C
C CUA Leu CCA Pro CAA Gln CGA Arg A C CUA Leu CCA Pro CAA Gln CGA Arg A
CUG Leu CCG Pro CAG Gln CGG Arg G CUG Leu CCG Pro CAG Gln CGG Arg G
AUU Ile ACU Thr AAU Asn AGU Ser U AUU Ile ACU Thr AAU Asn AGU Ser U
AUC Ile ACC Thr AAC Asn AGC Ser C AUC Ile ACC Thr AAC Asn AGC Ser C
A AUA Ile ACA Thr AAA Lys AGA Arg A A AUA Met ACA Thr AAA Lys AGA Stop A
AUG Met ACG Thr AAG Lys AGG Arg G AUG Met ACG Thr AAG Lys AGG Stop G
GUU Val GCU Ala GAU Asp GGU Gly U GUU Val GCU Ala GAU Asp GGU Gly U
GUC Val GCC Ala GAC Asp GGC Gly C GUC Val GCC Ala GAC Asp GGC Gly C
G GUA Val GCA Ala GAA Glu GGA Gly A G GUA Val GCA Ala GAA Glu GGA Gly A
GUG Val GCG Ala GAG Glu GGG Gly G GUG Val GCG Ala GAG Glu GGG Gly G
regularities in the genetic code were clear from the very beginning (e.g. [7,8]).2 For example, in
most cases, codons sharing the first two nucleotides code for the same amino acid. Furthermore,
some biochemical properties of amino acids are correlated with specific nucleotides in different
codon positions; in fact, codons with the same nucleotide in the second position are associated to
similar amino acids; also, there is a correlation between the first base of codons and the precursors
from which the encoded amino acids should be synthesized [9,10].
The genetic code is a translation table that connects the world of nucleic acids, where biological
information is stored, to the world of proteins, the chemical bricks of cellular metabolism. The
genetic information is stored in double helix DNA molecules. Part of this information is converted
into the single helix messenger RNA (mRNA) through a process called transcription. In this
process, thymine (T), one of the four bases thymine (T), cytosine (C), adenine (A), and guanine
(G) that compose DNA, is replaced by uracil (U). Counting from the start signal, every group of
three contiguous bases in mRNA forms a codon. The genetic code assigns each of the 64 possible
codons to the 20 amino acids (plus stop signals) so as to form the protein polymeric chain. Such
process is called translation. Since there are 64 codons and 20 amino acids, the genetic code is
not a one-to-one mapping, that is, more than one codon can code for the same amino acid. For
this reason, amino acids are called degenerate and codons are called redundant or synonymous.
The degeneracy distribution of amino acids is the number of amino acids that share a given
degeneracy. All the known variants of the genetic code are degenerate, and each one has its
peculiar degeneracy distribution which can be grouped into two main classes, i.e. nuclear and
mitochondrial variants. In table 1, we show the Euplotid nuclear and the vertebrate mitochondrial
genetic codes; these two variants are the most symmetric within their classes. The corresponding
degeneracy distributions are shown in table 2. In the following, we will show that there is an
underlying strong mathematical structure and several hidden symmetries that can be uncovered
by means of a unifying approach based on number theory.
2
For the first time, the three articles by Rumer on the symmetry of the genetic code have been translated from Russian and
are included in this issue.
Table 2. Degeneracy distribution (inside quartets) of the Euplotid version of the genetic code (left) and of the vertebrate
4
mitochondrial genetic code (right).

.........................................................
Euplotid nuclear mitochondrial
degeneracy no. amino acids degeneracy no. amino acids

4 8 4 8
..........................................................................................................................................................................................................
3 2 2 16
..........................................................................................................................................................................................................
2 12
...................................................................................................
1 2
..........................................................................................................................................................................................................
3. A mathematical model of the genetic code

In order to model a genetic code and its degeneracy distribution, we need to find a non-bijective
mapping that describes it. We use numeration systems where each amino acid is represented
by an integer number and a codon by its 6-bit binary representation (26 = 64). For instance,
the integer number 18 has the binary representation 010010 since 0 · 25 + 1 · 24 + 0 · 23 + 0 · 22 +
1 · 21 + 0 · 20 = 16 + 2 = 18. However, usual number representation systems (like the binary
system) are not redundant since each integer number has a unique representation (i.e. a unique
binary string). One way of creating a redundant numeration system is to replace the powers of
the base 2 with a series that grows more slowly than 2n [11]. A known example is the binary
Fibonacci non-power system where the powers of 2 are replaced by the Fibonacci numbers,
i.e. 1 1 2 3 5 8 (note that this series grows more slowly than the powers of 2). In this system,
number 18 has two representations, 111011 and 111100: 1 · 8 + 1 · 5 + 1 · 3 + 0 · 2 + 1 · 1 + 1 · 1 =
18 = 1 · 8 + 1 · 5 + 1 · 3 + 1 · 2 + 0 · 1 + 0 · 1. The degeneracy distribution of the Fibonacci non-
power system is different from that of all the known versions of the genetic code. In the following,
we show the unique non-power solutions that describe exactly both the Euplotid nuclear and the
vertebrate mitochondrial genetic codes. For a brief review on redundant numeration systems, see
appendix A.
(a) The Euplotid nuclear genetic code

The degeneracy distribution inside quartets of the Euplotid nuclear genetic code is shown in
table 2(left). We chose this version of the genetic code because it differs only in one codon
assignment from the standard code and represents the most symmetric version of all nuclear
codes; we will discuss in the second section the crucial role of symmetry in our approach. A
quartet is a set of four codons that share the first two letters, for example CTT, CTC, CTA and
CTG. The main motivation for the use of quartets is that the genetic code is implemented by
means of tRNA adaptors that match anti-codons to amino acids. Because of wobble pairing,
tRNAs can recognize up to four codons of the same quartet. For this reason, amino acids with
degeneracy 6 (Arg, Leu, Ser) are composed by (at least) two elements, one of degeneracy 2 and
one of degeneracy 4.
The key result is that the set of non-power weights (8, 7, 4, 2, 1, 1) is the unique solution that
describes exactly the degeneracy distribution of the Euplotid nuclear genetic code of table 2(left).
It is possible to show that no other non-trivial solution exists. The study of the symmetries allows
one to establish connections between the genetic code and the non-power system so as to obtain
a proper mathematical model. The final result is the model shown in figure 1 where each codon
is assigned a 6-bit binary string and each amino acid is assigned an integer number between 0
and 23. For example, the amino acid Asn (asparagine) is coded by AAT and AAC (degeneracy 2).
The model associates Asn to number 18, and the two strings that represent it, 110110 and 110101,
to codons AAT and AAC, respectively. The model possesses all the observed symmetries of the
U C A G 5
1 000001 Phe 14 011110 Ser 5 001001 Tyr 7 001101 Cys U

.........................................................
1 000010 Phe 14 011101 Ser 5 001010 Tyr 7 001110 Cys C
U
4 000111 Leu 14 101100 Ser 21 111011 Ter 7 010000 Cys A
11 011000 Leu 14 101011 Ser 21 111100 Ter 0 000000 Trp G
11 100101 Leu 8 010010 Pro 3 000101 His 12 011010 Arg U
11 100110 Leu 8 010001 Pro 3 000110 His 12 011001 Arg C
C
4 001000 Leu 8 100000 Pro 17 110100 GIn 19 110111 Arg A
11 010111 Leu 8 001111 Pro 17 110011 GIn 12 101000 Arg G
16 110010 IIe 9 100001 Thr 18 110110 Asn 22 111110 Ser U
16 110001 IIe 9 100010 Thr 18 110101 Asn 22 111101 Ser C
A
16 101111 IIe 9 010011 Thr 2 000100 Lys 19 111000 Arg A
23 111111 Met 9 010100 Thr 2 000011 Lys 12 100111 Arg G
13 101001 Val 15 101101 Ala 20 111010 Asp 10 010110 Gly U
13 101010 Val 15 101110 Ala 20 111001 Asp 10 010101 Gly C
G
13 011100 Val 15 011111 Ala 6 001011 Glu 10 100011 Gly A
13 011011 Val 15 110000 Ala 6 001100 Glu 10 100100 Gly G
Figure 1. Non-power model of the Euplotid nuclear genetic code: each codon is assigned to a binary string and each amino acid
is assigned to an integer number between 0 and 23. The 32 codons associated to amino acids with degeneracy 4 (Rumer’s class
1) are coloured in green. The 32 codons in white have Rumer’s class 0, i.e. their amino acids have degeneracy 1, 2 or 3. (Online
version in colour.)
Table 3. String representation of the symmetry with respect to the YR character of the third base of a codon.
string codon
8 7 4 2 1 1
..........................................................................................................................................................................................................
x x x x 1 0
.......................................................................................................................................... NNY
x x x x 0 1
..........................................................................................................................................................................................................
x x x x 1 1
.......................................................................................................................................... NNR
x x x x 0 0
..........................................................................................................................................................................................................
genetic code. For instance, the pyrimidine exchange (T ↔ C) in the third letter of the codon does
not change the coded amino acid. This symmetry is particularly remarkable since it is shared by
all the known variants of genetic codes (both nuclear and mitochondrial). Indeed, in the model,
strings of the kind xxxx01 and xxxx10 represent the same integer number. This implies also that
strings ending in 01 or 10 represent codons ending in C or T (pyrimidine), whereas strings ending
in 00 or 11 represent codons ending with a purine (table 3). Even more striking, the analysis of
the model allows one to uncover hidden symmetries in the genetic code. For a detailed analysis
of the model and the rationale of its assignations, see [1,12].
In [1,12–14], we have shown that the mathematical properties of the binary strings are linked
to the chemical properties of the bases of the codons. The analysis led us to the definition of
dichotomic classes that divide the 64 codons in two halves. The first dichotomic class is the parity
of a codon, defined as the parity of the associated binary string. It is possible to show that this
mathematical operation on the binary string coincides with a chemical algorithm acting on the
last two bases of the codon. Note that nucleotides—U,C,A and G—can be classified according to
(a) (b) (c)

6
b1 b2 b3 b1 b2 b3 a1 a2 a3 b1 b2 b3

.........................................................
Y R KM WS
MK A G W S C A R Y C G R Y
0 1 1 0 0 1 1 0 0 1 0 1 0 1
Figure 2. Algorithmic representation of the dichotomic classes: (a) parity, (b) Rumer and (c) hidden. (Online version in colour.)
their chemical character as follows:
{Purine; Pyrimidine} R = {A, G}; Y = {C, U}

{Keto; Amino} K = {U, G}; M = {A, C}
{Strong; Weak} S = {C, G}; W = {A, U}.
The algorithmic representation of the parity is shown in figure 2a. The algorithm can be described
as follows. If the last letter of the codon is a purine (R = A, G), then the parity of the binary
string is derived directly: A → odd, G → even. If the last letter is a pyrimidine (Y = U, C), then
the parity is inferred from the chemical class of the second base of the codon: Amino (M = A,
C) → even, Keto (K = U, G) → odd. In addition to the parity, there are two more dichotomic
classes. The three of them are based on chemical algorithms where the chemical characters
involved are always the same in the three bases: YR on the third base, KM on the second base
and SW on the first base. Notably, if we shift the parity algorithm so as to act on the first
two bases of the codon, we obtain a dichotomic class that coincides exactly with Rumer’s class
(figure 2b). It is a bi-partition of the genetic code observed by the theoretical physicist Rumer in
the 1960s and is related to the degeneracy of amino acids inside quartets: 32 codons code for
amino acids with degeneracy 4, whereas the other 32 code for amino acids with degeneracy
non-4 (1, 2 or 3 in this case) (see also figure 1). Rumer also found out that the class is anti-
symmetric with respect to the KM transformation (i.e. if we apply the KM transformation to
a codon, the Rumer class changes). Note that the parity class is anti-symmetric with respect
to the YR transformation. At this point, a complete framework for dichotomic classes can be
derived: if we shift the algorithm one base to the 5 end of the nucleic acid, we can define a third
dichotomic class which we called hidden class (figure 2c). The hidden class can be defined either
inside a codon (first and third base) or between two contiguous codons and is anti-symmetric
with respect to the SW transformation. It is possible to derive dichotomic classes in terms of
matrix operators and show that they are nonlinear functions of the chemical properties of a
dinucleotide. Moreover, they are related to the Klein V group of transformations. Dichotomic
classes provide a new, non-trivial way for recoding the information of nucleotide sequences in
a binary code. This has several important implications for the statistical analysis of genomic
data and for bioinformatics. In [14], we have shown that coding sequences are characterized
by universal short-range correlations that show up only by looking at dichotomic classes.
Moreover, in [15], we have shown that dichotomic classes can be used to design algorithms for
frame retrieval.
Table 4. Non-power weights for the vertebrate mitochondrial and Euplotid nuclear genetic codes.
7
type non-power weights

.........................................................
mitochondrial 8,8,4,2,1,0
..........................................................................................................................................................................................................
Euplotid nuclear 8,7,4,2,1,1

..........................................................................................................................................................................................................
(b) The mitochondrial genetic code

The main motivation that led to the mathematical model of the Euplotid nuclear code was the
search for informational error detection/correction mechanisms. Indeed, we found the main
ingredients of error detection/correction codes. For instance, there are connections between
dichotomic classes, orthogonal arrays and finite groups. Also, redundancy/degeneracy is related
to discrete symmetry groups. Can we hope to crack the hypothesized mechanisms of error
detection/correction? The problem is a very difficult one but if these mechanisms are universally
based on the genetic codes, each one with its own degeneracy, then it is natural to focus on the
simplest life system that possesses a genetic code: the mitochondrion. The genetic code of the
mitochondrion is the simplest and most symmetric of all code variants and has been proposed
as a model for the early code [16,17], the progenitor of the universal genetic code of LUCA (Last
Universal Common Ancestor).
The vertebrate mitochondrial genetic code differs from the standard nuclear genetic code in
just four codons (table 1). However, the degeneracy distribution of the mitochondrial code is
much simpler and more symmetric (table 2(right)). Here, amino acids have either degeneracy
2 or 4. Remarkably, it is possible to show that, also for the mitochondrial code, there is a
unique non-power representation system that describes exactly its degeneracy distribution. The
representation is determined by the six non-power weights (8,8,4,2,1,0). In table 4, we show
the non-power weights for the two genetic codes. Notably, the simplification in the degeneracy
distribution of the mitochondrial code is associated to the presence of a 0 weight (i.e. there are no
elements with degeneracy 1). Since the 0 weight does not enter in the additive decomposition
of a number, its effect is that of a duplication label. This implies that the representation is
split in two identical halves. This fact has profound consequences in the description of the
associated degeneracy and led us to a biological hypothesis on the origin of degeneracy in protein
coding [18].
Similarly to the mathematical model, the biological hypothesis is based on symmetry
properties. Symmetry is the key feature for connecting the mathematical model with the chemical
and biological features of the genetic code. In brief, the main result is that the degeneracy
distribution of the vertebrate mitochondrial genetic code can be exactly described by primeval
tRNA adaptors acting on a particular set of four-base codons that we called tesserae (from
the Greek tessera = four). These adaptors possess the reverse and the self-complementary
symmetries. These symmetries imply the invariance of the interaction Hamiltonian with respect
to such spatial transformations. An interesting possibility to be explored is that conserved
quantities associated to these symmetries might have contributed heavily, in evolutionary terms,
to the shape of present codes. The tessera set is presented in table 5.
Our results on the study of the mitochondrial genetic code have important implications
regarding: (i) molecular evolution and the origin of protein coding, (ii) the information-based
error detection/correction mechanisms in the synthesis machinery, and (iii) the description of
decoding strategies in the three domains of life in terms of wobble symmetries and redundant
representations systems. In evolutionary terms, the proposed decoding system is placed before
the early code, the code that preceded the universal genetic code of LUCA. The early code has
been hypothesized to have the same symmetry as the mitochondrial genetic code [17,19,20].
Our theory complements this hypothesis since our tessera code, possessing the same symmetry
as the present mitochondrial genetic code, can be seen as a pre-early code, an ancestor of the
early code. The tessera code exhibits error-detecting capabilities (complete immunity to point
Table 5. Complete table of tesserae for the mitochondrial genetic code in the form of four-bases-length codons with particular
8
symmetry properties.

.........................................................
AAAA AAUU AAGG AACC
UUUU UUAA UUCC UUGG
CCCC CCGG CCUU CCAA
GGGG GGCC GGAA GGUU
AUUA AUAU AUCG AUGC
UAAU UAUA UAGC UACG
CGGC CGCG CGAU CGUA
GCCG GCGC GCUA GCAU
AGGA AGCU AGAG AGUC
UCCU UCGA UCUC UCAG
CUUC CUAG CUCU CUGA
GAAG GAUC GAGA GACU
ACCA ACGU ACUG ACAC
UGGU UGCA UGAC UGUG
CAAC CAUG CAGU CACA
GUUG GUAC GUCA GUGU
errors) and +1 frame-shift immunity. Such features explain why the evolutionary pressure might
have led to the selection of this mechanism. In fact, accuracy in protein synthesis needs to
have been preserved to some extent in actual organisms and the preservation of the degeneracy
distribution over geological times might be the molecular evidence of such origin. Note that it
has been proposed that extant triplet codes derive from primeval codes having codons with
more than three nucleotides [21]. This could ensure the necessary bonding stability for direct
decoding without ribosomes. Moreover, the feasibility of decoding codons of four letters has been
demonstrated with evolved extant ribosomes [22]. In [23], ribosomes have been made evolving to
reduce premature termination; in [24], such ribosomes further evolved so as to obtain the same
fidelity and efficiency as with triplet codons. These ribosomes were termed ribo-Q due to their
capability to efficiently read quadruplet codons.
Regarding the connection with extant decoding strategies, both the non-power model and
the hypothesis of primeval symmetric tRNAs provide an exact prediction about the number of
adaptors needed to implement the code. Note that this number is 22, the minimal number used
in all extant forms of life. Amino acid decoding is implemented by tRNA adaptors, so that one
might state that the actual degeneracy distribution is observed at the level of tRNAs. The case of
the mitochondrial genetic code is a peculiar one, not only because it uses the minimal number of
adaptors, but also because the degeneracy distribution of the code coincides with the degeneracy
distribution of its tRNAs, namely, each element of the degeneracy distribution is described by
exactly one adaptor (each amino acid is associated to exactly one adaptor). This is an extreme
case of the decoding strategies used in extant forms of life.
(c) Coding strategies

A hypothesis about the evolution of the genetic code from the putative early code is that the
decoding of all codons was performed with a minimum number of tRNA species [17]. Since post-
transcriptional modifications of the first base of the anti-codon differ widely between organism
lines, such modifications are supposed to have been introduced after the birth of the early code.
Table 6. Minimal number of adaptors associated to the three decoding strategies presented in [25] and the corresponding
9
number predicted through the non-power approach. The last row reports the associated non-power weights.

.........................................................
strategy
I II III
min. number observed 46 33 22
.........................................................................................................................................................
of adaptors predicted (non-power) 46 33 22

..........................................................................................................................................................................................................
non-power weights (16,16,8,4,2,1) (16,8,4,2,1,0) (8,8,4,2,1,0)

..........................................................................................................................................................................................................
However, the constraint of the minimum number of adaptors also holds for different extant
decoding strategies, developed after the introduction of post-transcriptional modifications. The
work in [25] provides a summary of the coding strategies in the different domains of life and
associates the minimal number of adaptors to each strategy. Remarkably, in all cases, this number
can be predicted by the non-power modelling approach (table 6). In order to see this, we show
how to predict the minimal number of adaptors needed to decoding the 61 codons (64 − 3 stop
codons) plus one additional tRNA for the elongation of Met (recall that Met indicates also the
start of protein synthesis). In the mitochondrial case, this elongation tRNA is not present and
we have two groups of two stop codons. Thus, the non-power representation for the vertebrate
mitochondrial code (8,8,4,2,1,0) codes a total of 24 elements and predicts 22 tRNA adaptors
when the two groups of stop codons are subtracted (24 − 2 = 22). In [25], this is denoted as
strategy III and allows the decoding of an entire family (super wobble) by using a non-modified
U in position 34, the main wobble position (first base) of the anti-codon. This strategy is used
only in bacteria and mitochondria and implies that, inside a quartet, amino acids have either
degeneration 4 or 2 + 2. In strategy I, codons ending in a pyrimidine (T or C) are decoded by
a post-transcriptionally modified G in position 34 for Archea and Bacteria, or a modified A in
Eukarya. Codons ending in a purine (A or G) are recognized separately by a modified U or C
in the same position of the anti-codon. In this strategy, the minimum number of adaptors is 46
and the degeneracy distribution inside quartets is of the type 2 + 1 + 1. We describe it with the
non-power representation (16,16,8,4,2,1) that gives a total of 48 elements. Now, if we subtract
three elements corresponding to the stop codons (they all end with a purine base so that they
correspond to a single adaptor) and add an additional tRNA for the elongation Met, we have
exactly the minimal number of adaptors for strategy I, i.e. 48 − 3 + 1 = 46. Finally, in strategy
II, a family is decoded by means of two tRNAs, i.e. post-transcriptionally modified U, and G,
in position 34 for Archea and Bacteria, or post-transcriptionally modified A, and U, for Eukarya.
The minimal number of adaptors for such strategy is 33. In the general case, the degeneracy inside
quartets can be described as 2+2. Thus, we can represent the degeneracy associated to the strategy
through the non-power weights: (16,8,4,2,1,0). Similar to the representation of the mitochondrial
genetic code, there is a 0 weight. Here, the total number of elements is 32 but if we take into
account the stop codons and the special decoding boxes (Ile-Met and Cys-Stop-Trp) plus one
additional tRNA for elongation Met, we obtain 33, which is exactly the minimal predicted value
for this strategy.
4. Conclusion
In this article, we have shown that both the nuclear and the mitochondrial variants of the genetic
code can be modelled by using non-power representation systems of integer numbers. The model
describes exactly many salient features of the genetic code, including its symmetry properties.
Note that there is a wide literature on the idea that the genetic code is related to symmetry
properties and, eventually, to symmetry breaking (see for instance the seminal work of Hornos
& Hornos [26]). Such paper originated a series of attempts to apply continuous and discrete
symmetry groups and algebras to the problem [27–29]. However, this approach has not been
10
received well by biologists; as Maddox states [30]: ‘The application of the theory of mathematical
groups to the origin of the genetic code will startle molecular biologists, but is best regarded as a

.........................................................
valuable exercise in classification’. For further works about symmetry and symmetry breaking,
see among others [31–37]. We argue that the above criticism does not apply to our model
since it describes naturally the degeneracy of the actual genetic code without recourse to an
improbable chain of symmetry breaking steps. Moreover, it allows a clear biological interpretation
of the results.
The mitochondrial genetic code has been proposed as a model that originated the universal
genetic code (the code characterizing the LUCA). This pre-universal code has been called the
early code. The non-power representation of the vertebrate mitochondrial genetic code allows
one to infer a biological hypothesis for the organization of the code that predated the early code
and for the origin of protein coding based on the symmetries of ancient tRNA adaptors and a set
of four-base codons (tesserae) [18]. This paradigmatic genetic code exhibits strong properties of
error detection and correction which might be responsible of its selection in evolutionary terms
(synthesis accuracy and expression fitness). Furthermore, the model provides an explanation
of why the present genetic code is characterized precisely by 64 codons and 20 + 2 amino
acids. The non-power approach (i) describes the evolution starting from putative ancient codes
characterized by a small number of amino acids and coding words and (ii) supports an origin
of protein synthesis with adaptors containing oligonucleotides longer than three bases and a
ribosome-less direct template synthesis.
Finally, we recall that it is now widely accepted that the genetic code is optimized for
conveying more information than the linear coding of proteins [38,39]. The degeneracy of amino
acids allows the use of synonymous codons for conveying additional information. This in
turns affects the distribution of codons in coding sequences and produces the so-called ‘codon
bias’ [40–42]. Codon bias correlates with translation efficiency (accuracy and speed of protein
synthesis) and many other key cellular processes from differential protein production to protein
folding [43,44]. The mathematical structure of the model and its evolutionary implications lead
naturally to the problem of protein synthesis accuracy and efficiency related to mechanisms of
error detection/correction in terms of point mutations and frame-shift. In this respect, there are
connections with the theory of comma-free and circular codes. These are important to study
the problem of retrieving and maintaining of the correct reading frame [45–48]. The non-power
approach allows one to study sequences of DNA and mRNA, including their codon bias, under
a new perspective [13–15,49] and can lead to new bioinformatics tools and new applications in
medicine.
Authors’ contributions. All the authors contributed equally to this work.
Competing interests. We declare we have no competing interests.
Funding. We received no funding for this study.
Appendix A. Redundant numeration systems and the degeneracy

of the genetic code
Historically, the notion of number has been introduced to count the elements of a set or to compare
quantitatively two or more sets. A numeration system is a way of expressing a number by using
a string of other numbers, called digits, as to facilitate their management, for instance for the
implementation of arithmetic operations. Nowadays, a proper choice of a numeration system may
be crucial for solving specific problems and developing and improving mathematical models and
algorithms [11]. A positional numeration system allows one to represent numbers by means of
a set of digits di and a set of weights wi . A given integer number N has the following additive
representation:
N = dk wk + dk−1 wk−1 + · · · + d0 w0 .
In other words, a system is positional if each digit di is weighted with a different value
11
wi according to its position. For practical reasons, e.g. commerce, accounting, etc., these
representations have always been univocal since every number has a unique representation. One

.........................................................
notable exception is the Maya representation system called serpent number used in their Long
Count calendar to describe astronomical times [50].
(a) Power numeration systems

Usual numeration systems are based on the additive decomposition of a number using the powers
of a base b as weights, i.e. wi = bi . Each integer number 0 ≤ N ≤ bk − 1 has a unique representation
of the kind
N = dk bk + dk−1 bk−1 + · · · + d0 b0 0 ≤ di ≤ b − 1.
It is easy to prove that power numeration systems are univocal since they satisfy the following
two conditions:
(i) The digits di range from 0 to b − 1: 0 ≤ di ≤ b − 1.

(ii) The weights are the power of the base b: wi = bi .
k
i=0 (b − 1)b = b
Note that the two conditions imply that i k+1 − 1, which is the maximum
representable number, the minimum being zero. Here, we focus on numeration systems that allow
one to represent all the integer numbers in this interval (for instance, if wi > bi some numbers do
not have a representation).
Example A.1. Representation of the number seventeen both in the binary and in the decimal
system.
binary (b = 2) decimal (b = 10)

0 1 0 0 0 1 di 1 7 di
25 24 23 22 21 20 bi 101 100 bi
. 16 . . . 1 10 7
string: 010001 string: 17
— In both cases, seventeen has a unique representation/string (degeneracy 1).
Note that in example A.1, we used the term ‘seventeen’ (in letters) on purpose in order to
distinguish between the concept of a number (seventeen) and its representations (strings).
The binary system plays a central role since it is used by computers to represent numbers.
Hence, as in the example above, from now on we adopt the 6-bit binary system and the usual
decimal system in place of the letters so that, for instance, we use 17 in place of seventeen. In the
binary example above, the digits range from 0 to b − 1 = 1, i.e. di ∈ {0, 1}. Also, there are 26 = 64

represented numbers that range from 000000 = 0 to 111111 = 5i=0 wi = 5i=0 bi = 26 − 1 = 63. In
order to build a redundant numeration system, we need to relax at least one of the two conditions
above as in the following two examples.
(b) Signed-digits numeration systems

In this system, the digits di do not satisfy the condition 0 ≤ di ≤ b − 1.
Example A.2. Representation of the number seventeen in the signed binary system (wi = 2i )
12
with di ∈ {−1, 0, 1}.
−1

.........................................................
0 1 0 0 1 di 0 1 0 0 0 1
25 24 23 22 21 20 bi 25 24 23 22 21 20
0 16 0 0 2 −1 = 17 = 0 16 0 0 0 1
string number
010011̄ −→ 17
010001 −→ 17
— 17 has two representations, i.e. degeneracy 2 (we use 1̄ to denote −1).
(c) Non-power numeration systems

In non-power systems, the weights wi grow more slowly than the power of the base b, i.e. wi ≤ bi .
Example A.3. Representation of the number 17 in the binary Fibonacci system: wi = Fi = Fi−1 +
Fi−2 , with F0 = 1.
1 1 1 0 1 0 di 1 1 0 1 1 1
8 5 3 2 1 1 Fi 8 5 3 2 1 1
8 5 3 0 1 0 = 17 = 8 5 0 1 1 1
string number
111010 −→ 17
110111 −→ 17
— 17 has two representations (degeneracy 2).

— Note that the Fibonacci sequence 1, 1, 2, 3, 5, 8 grows more slowly than the powers of two,
1, 2, 4, 8, 16, 32.
(d) Degeneracy distribution

Every numeration system possesses a degeneracy table that records the degeneracy of each
number. The degeneracy table of the 6-bit binary system presented in example A.1 is
N 0 1 ... 63
D(N) 1 1 ... 1
The table is trivial: every number from 0 to 63 has a unique representation. The degeneracy table
of the binary Fibonacci system presented in example A.3 is more interesting:
N 0 1 2 3 4 5 6 7 8 9 10
20 19 18 17 16 15 14 13 12 11
D(N) 1 2 2 3 3 3 4 3 4 5 4
It is easy to show that D(N) = D(K − N), i.e. N and K − N have the same degeneracy, where

K = 5i=0 wi is the maximum representable number. In the binary Fibonacci system above K = 20
and, for example, 1 and 19 have the same degeneracy (i.e. 2). This implies that the degeneracy
table is symmetric. From the degeneracy table, one can derive the degeneracy distribution that
counts how many numbers have the same degeneracy. The degeneracy distribution of univocal
numeration systems is trivial since all the numbers have a unique representation (degeneracy 1).
In the following, we show the degeneracy distribution of the three binary systems shown in the
above examples.
Example A.4. Degeneracy distribution of the numeration systems of examples A.1, A.2
13
and A.3.

.........................................................
(a) power (b) signed-digit (c) Fibonacci (non-power)
degeneracy no. integers degeneracy no. integers degeneracy no. integers
1 64 5 2 5 2
4 2 4 5
3 4 3 8
2 3 2 4
1 5 1 2
(e) The non-power model of the genetic code

None of the three systems above matches the degeneracy distributions inside quartets of either
the nuclear or the mitochondrial genetic code shown in table 2. In both cases, the unique exact
non-power solution is given by the following set of weights:
type weights wi
Euplotid nuclear 874211
mitochondrial 884210
The complete non-power representation for the two systems is given in table 7. The associated
degeneracy distributions below match exactly those of the genetic codes of table 2:
874211
degeneracy no. integers 884210
4 8 degeneracy no. integers
3 2 4 8
2 12 2 16
1 2
For example, in the Euplotid nuclear code, there are two amino acids (Ile, Cys) that are
represented by three codons each (table 1). In the same way, in the non-power model, numbers 7
and 16 have degeneracy 3 since they are represented by three binary strings each (in green).
We have seen that the degeneration of both the Euplotid nuclear and of the mitochondrial
genetic code admit a non-power representation. Is this a fortunate coincidence?
The degeneracy distributions that admit a non-power representation are extremely rare. In
order to see this, note that symmetry of the degeneracy table is a necessary condition for a
existence of a non-power representation. The ratio between the number of symmetric degeneracy
tables over the total number of possible tables gives a coarse assessment of this fact under the
assumption that every table occurs with the same probability.
Counting the number of degeneracy tables is analogous to what in physics is called the Bose–
Einstein statistics, that is, counting the ways one can distribute k indistinguishable balls into n
labelled urns. In our 6-bit binary system, we have 64 strings/balls and 24 numbers/urns. Hence,
the total number of tables is

n+k−1 64 + 23
= = 6.42 × 1020 .
k−1 23
The number of symmetric tables is

⎛ ⎞
n k
⎜2 + − 1⎟ 43
⎝ k 2 ⎠= = 5.75 × 109
−1 11
2
Table 7. Complete representation of the two non-power models of the Euplotid nuclear and of the mitochondrial genetic code.
14
(Online version in colour.)

.........................................................
Euplotid nuclear mitochondrial
number 874211 874211 874211 874211 884210 884210 884210 884210

0 000000 000000 000001
..........................................................................................................................................................................................................
1 000001 000010 000010 000011

..........................................................................................................................................................................................................
2 000100 000011 000100 000101

..........................................................................................................................................................................................................
3 000101 000110 000110 000111

..........................................................................................................................................................................................................
4 001000 000111 001000 001001

..........................................................................................................................................................................................................
5 001001 001010 001010 001011

..........................................................................................................................................................................................................
6 001100 001011 001100 001101

..........................................................................................................................................................................................................
7 001101 010000 001110 001110 001111

..........................................................................................................................................................................................................
8 100000 010001 001111 010010 100000 100001 010000 010001

..........................................................................................................................................................................................................
9 100001 010100 100010 010011 100010 100011 010010 010011

..........................................................................................................................................................................................................
10 100100 010101 100011 010110 100100 100101 010100 010101

..........................................................................................................................................................................................................
11 100101 011000 100110 010111 100110 100111 010110 010111

..........................................................................................................................................................................................................
12 101000 011001 100111 011010 101000 101001 011000 011001

..........................................................................................................................................................................................................
13 101001 011100 101010 011011 101010 101011 011010 011011

..........................................................................................................................................................................................................
14 101100 011101 101011 011110 101100 101101 011100 011101

..........................................................................................................................................................................................................
15 101101 110000 101110 011111 101110 101111 011110 011111

..........................................................................................................................................................................................................
16 110001 101111 110010 110000 110001

..........................................................................................................................................................................................................
17 110100 110011 110010 110011

..........................................................................................................................................................................................................
18 110101 110110 110100 110101

..........................................................................................................................................................................................................
19 111000 110111 110110 110111

..........................................................................................................................................................................................................
20 111001 111010 111000 111001

..........................................................................................................................................................................................................
21 111100 111011 111010 111011

..........................................................................................................................................................................................................
22 111101 111110 111100 111101

..........................................................................................................................................................................................................
23 111111 111110 111111

..........................................................................................................................................................................................................
and the ratio results in 8.95 × 10−12 . Note that this is just a necessary condition and the actual
number of degeneracy tables that admit a non-power representation might be much smaller. This
is an important indication that the non-power representation of the genetic code cannot be seen
as a sheer product of chance. More importantly, the genetic code and the non-power model share
a number of symmetry properties that extend far beyond the degeneracy distribution [12] and
that reflect a deep mathematical connection between them.
References
1. Gonzalez DL. 2004 Can the genetic code be mathematically described? Med. Sci. Monit. 10,
11–17.
2. Hayes B. 1998 The invention of the genetic code. Am. Sci. 86, 8–14. (doi:10.1511/1998.17.3338)
3. Gamow G. 1954 Possible relation between deoxyribonucleic acid and protein structures.
Nature 173, 318. (doi:10.1038/173318a0)
4. Crick FHC, Griffith JS, Orgel LE. 1957 Codes without commas. Proc. Natl Acad. Sci. USA 43,
15
416–421. (doi:10.1073/pnas.43.5.416)
5. Nirenberg MW, Matthaei JH. 1961 The dependence of cell-free protein synthesis in E. coli upon

.........................................................
naturally occurring or synthetic polyribonucleotides. Proc. Natl Acad. Sci. USA 47, 1588–1602.
(doi:10.1073/pnas.47.10.1588)
6. Yockey HP. 2005 Information theory, evolution, and the origin of life. Cambridge, UK: Cambridge
University Press.
7. Rumer YuB. 1966 About the codon’s systematization in the genetic code. Proc. Acad. Sci. USSR
(Doklady) 167, 1393–1394. [In Russian.]
8. Fimmel E, Strüngmann L. 2016 Yury Borisovich Rumer and his ‘biological papers’ on the
genetic code. Phil. Trans. R. Soc. A 374, 20150228. (doi:10.1098/rsta.2015.0228)
9. Koonin EV, Novozhilov AS. 2009 Origin and evolution of the genetic code: the universal
enigma. IUBMB Life 61, 99–111. (doi:10.1002/iub.146)
10. Copley SD, Smith E, Morowitz HJ. 2005 A mechanism for the association of amino acids with
their codons and the origin of the genetic code. Proc. Natl Acad. Sci. USA 102, 4442–4447.
(doi:10.1073/pnas.0501049102)
11. Fraenkel AS. 1989 The use and usefulness of numeration systems. Inform. Comput. 81, 46–61.
(doi:10.1016/0890-5401(89)90028-X)
12. Gonzalez DL. 2008 The mathematical structure of the genetic code. In The codes of life: the rules
of macroevolution (eds M Barbieri, J Hoffmeyer). Biosemiotics, vol. 1, pp. 111–152. Dordrecht,
The Netherlands: Springer. (doi:10.1007/978-1-4020-6340-4_6)
13. Gonzalez DL, Giannerini S, Rosa R. 2006 Detecting structure in parity binary sequences:
error correction and detection in DNA. IEEE Eng. Med. Biol. Mag. 25, 69–81. (doi:10.1109/
MEMB.2006.1578666)
14. Gonzalez DL, Giannerini S, Rosa R. 2008 Strong short-range correlations and dichotomic
codon classes in coding DNA sequences. Phys. Rev. E 78, 051918. (doi:10.1103/PhysRevE.
78.051918)
15. Giannerini S, Gonzalez DL, Rosa R. 2012 DNA, dichotomic classes and frame synchronization:
a quasi-crystal framework. Phil. Trans. R. Soc. A 370, 2987–3006. (doi:10.1098/rsta.2011.0387)
16. Jukes TH. 1983 Genetic code: mitochondrial codes and evolution. Nature 301, 19–20.
(doi:10.1038/301019a0)
17. Ohama T, Inagaki Y, Bessho Y, Osawa S. 2008 Evolving genetic code. Proc. Jpn Acad. B 84,
58–74. (doi:10.2183/pjab.84.58)
18. Gonzalez DL, Giannerini S, Rosa R. 2012 On the origin of the mitochondrial genetic
code: towards a unified mathematical framework for the management of genetic
information. Available from Nature Precedings. See http://dx.doi.org/10.1038/npre.2012.
7136.1.
19. Osawa S. 1995 Evolution of the genetic code. Oxford, UK: Oxford Science Publications.
20. Watanabe K, Yokobori S. 2014 How the early genetic code was established? Inference from the
analysis of extant animal mitochondrial decoding systems. In Chemical biology of nucleic acids:
fundamentals and clinical applications (eds VA Erdmann, WT Markiewicz, J Barciszewski). RNA
Technologies, ch. 2, pp. 25–40. Berlin, Germany: Springer. (doi:10.1007/978-3-642-54452-1_2)
21. Baranov PV, Venin M, Provan G. 2009 Codon size reduction as the origin of the triplet genetic
code. PLoS ONE 4, e5708. (doi:10.1371/journal.pone.0005708)
22. Chen IA, Schindlinger M. 2010 Quadruplet codons: one small step for a ribosome, one giant
leap for proteins. BioEssays 32, 650–654. (doi:10.1002/bies.201000051)
23. Wang K, Neumann H, Peak-Chew SY, Chin JW. 2007 Evolved orthogonal ribosomes enhance
the efficiency of synthetic genetic code expansion. Nat. Biotechnol. 25, 770–777. (doi:10.1038/
nbt1314)
24. Neumann H, Wang K, Davis L, Garcia-Alai M, Chin JW. 2010 Encoding multiple unnatural
amino acids via evolution of a quadruplet-decoding ribosome. Nature 464, 441–444.
(doi:10.1038/nature08817)
25. Grosjean H, de Crécy-Lagard V, Marck C. 2010 Deciphering synonymous codons in the
three domains of life: co-evolution with specific tRNA modification enzymes. FEBS Lett. 584,
252–264. (doi:10.1016/j.febslet.2009.11.052)
26. Hornos JEM, Hornos YMM. 1993 Algebraic model for the evolution of the genetic code. Phys.
Rev. Lett. 71, 4401–4404. (doi:10.1103/PhysRevLett.71.4401)
27. Frappat L, Sciarrino A, Sorba P. 2001 Crystalizing the genetic code. J. Biol. Phys. 27, 1–34.
16
(doi:10.1023/A:1011874407742)
28. Forger M, Hornos YMM, Hornos JEM. 1997 Global aspects in the algebraic approach to the

.........................................................
genetic code. Phys. Rev. E 56, 7078–7082. (doi:10.1103/PhysRevE.56.7078)
29. Bashford JD, Tsohantjis I, Jarvis PD. 1998 A supersymmetric model for the evolution of the
genetic code. Proc. Natl Acad. Sci. USA 95, 987–992. (doi:10.1073/pnas.95.3.987)
30. Maddox J. 1994 The genetic code by numbers. Nature 367, 111. (doi:10.1038/367111a0)
31. Jose MV, Morgado ER, Sanchez R, Govezensky T. 2012 The 24 possible algebraic
representations of the standard genetic code in six or in three dimensions. Adv. Stud. Biol.
4, 119–152.
32. Jungck JR. 1978 The genetic code as a periodic table. J. Mol. Evol. 11, 211–224. (doi:10.1007/
BF01734482)
33. Karasev VA, Stefanov VE. 2001 Topological nature of the genetic code. J. Theor. Biol. 209,
303–317. (doi:10.1006/jtbi.2001.2265)
34. Sánchez R, Morgado E, Grau R. 2005 A genetic code Boolean structure. I. The meaning of
Boolean deductions. Bull. Math. Biol. 67, 1–14. (doi:10.1016/j.bulm.2004.05.005)
35. Tlusty T. 2010 A colorful origin for the genetic code: information theory, statistical mechanics
and the emergence of molecular codes. Phys. Life Rev. 7, 362–376. (doi:10.1016/j.plrev.2010.
06.002)
36. shCherbak V. 2008 The arithmetical origin of the genetic code. In The codes of life (eds
M Barbieri, J Hoffmeyer). Biosemiotics, vol. 1, pp. 153–185. Dordrecht, The Netherlands:
Springer. (doi:10.1007/978-1-4020-6340-4_7)
37. Findley GL, Findley AM, McGlynn SP. 1982 Symmetry characteristics of the genetic code.
Proc. Natl Acad. Sci. USA 79, 7061–7065. (doi:10.1073/pnas.79.22.7061)
38. Itzkovitz S, Alon U. 2007 The genetic code is nearly optimal for allowing additional
information within protein-coding sequences. Genome Res. 17, 405–412. (doi:10.1101/gr.
5987307)
39. Bollenbach T, Vetsigian K, Kishony R. 2007 Evolution and multilevel optimization of the
genetic code. Genome Res. 17, 401–404. (doi:10.1101/gr.6144007)
40. Bennetzen JL, Hall BD. 1982 Codon selection in yeast. J. Biol. Chem. 257, 3026–3031.
41. Sharp PM, Rogers MS, McConnell DJ. 1985 Selection pressures on codon usage in the complete
genome of bacteriophage t7. J. Mol. Evol. 21, 150–160. (doi:10.1007/BF02100089)
42. Sharp PM, Li W-H. 1987 The codon adaptation index—a measure of directional synonymous
codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295. (doi:10.1093/
nar/15.3.1281)
43. Gingold H, Pilpel Y. 2011 Determinants of translation efficiency and accuracy. Mol. Syst. Biol.
7, 481. (doi:10.1038/msb.2011.14)
44. Quax TEF, Claassens NJ, Söll D, van der Oost J. 2015 Codon bias as a means to fine-tune gene
expression. Mol. Cell 59, 149–161. (doi:10.1016/j.molcel.2015.05.035)
45. Arquès DG, Michel CJ. 1996 A complementary circular code in the protein coding genes.
J. Theor. Biol. 182, 45–58. (doi:10.1006/jtbi.1996.0142)
46. Gonzalez DL, Giannerini S, Rosa R. 2011 Circular codes revisited: a statistical approach.
J. Theor. Biol. 275, 21–28. (doi:10.1016/j.jtbi.2011.01.028)
47. Michel CJ. 2015 The maximal C3 self-complementary trinucleotide circular code x in genes
of bacteria, eukaryotes, plasmids and viruses. J. Theor. Biol. 380, 156–177. (doi:10.1016/j.jtbi.
2015.04.009)
48. Fimmel E, Giannerini S, Gonzalez DL, Strüngmann L. 2015 Circular codes, symmetries and
transformations. J. Math. Biol. 70, 1623–1644. (doi:10.1007/s00285-014-0806-7)
49. Gonzalez DL, Giannerini S, Rosa R. 2009 The mathematical structure of the genetic code: a tool
for inquiring on the origin of life. Statistica LXIX, 143–157. (doi:10.6092/issn.1973-2201/3553)
50. Freitas PJ, Shell-Gellasch A. 2012 When a number system loses uniqueness: the case of the
Maya. Convergence 9. (doi:10.4169/loci003883)

Aminoacidos Cod Binarios

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Aminoacidos Cod Binarios

Caricato da

Copyright:

Formati disponibili

Downloaded from http://rsta.royalsocietypublishing.

org/ on May 22, 2018

The non-power model of the

Accepted: 27 October 2015 In this article, we present a mathematical framework

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

2. Background and history

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

degeneracy no. amino acids degeneracy no. amino acids

3. A mathematical model of the genetic code

(a) The Euplotid nuclear genetic code

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

(a) (b) (c)

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

their chemical character as follows:

{Purine; Pyrimidine} R = {A, G}; Y = {C, U}

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

Euplotid nuclear 8,7,4,2,1,1

(b) The mitochondrial genetic code

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

(c) Coding strategies

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

of adaptors predicted (non-power) 46 33 22

non-power weights (16,16,8,4,2,1) (16,8,4,2,1,0) (8,8,4,2,1,0)

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

Appendix A. Redundant numeration systems and the degeneracy

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

(a) Power numeration systems

(i) The digits di range from 0 to b − 1: 0 ≤ di ≤ b − 1.

binary (b = 2) decimal (b = 10)

string: 010001 string: 17

— In both cases, seventeen has a unique representation/string (degeneracy 1).

(b) Signed-digits numeration systems

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

— 17 has two representations, i.e. degeneracy 2 (we use 1̄ to denote −1).

(c) Non-power numeration systems

— 17 has two representations (degeneracy 2).

(d) Degeneracy distribution

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

(e) The non-power model of the genetic code

The number of symmetric tables is

rsta.royalsocietypublishing.org Phil. Trans. R. Soc. A 374: 20150062

number 874211 874211 874211 874211 884210 884210 884210 884210

1 000001 000010 000010 000011

2 000100 000011 000100 000101

3 000101 000110 000110 000111

4 001000 000111 001000 001001

5 001001 001010 001010 001011

6 001100 001011 001100 001101

7 001101 010000 001110 001110 001111

8 100000 010001 001111 010010 100000 100001 010000 010001

9 100001 010100 100010 010011 100010 100011 010010 010011

10 100100 010101 100011 010110 100100 100101 010100 010101

11 100101 011000 100110 010111 100110 100111 010110 010111

12 101000 011001 100111 011010 101000 101001 011000 011001

13 101001 011100 101010 011011 101010 101011 011010 011011

14 101100 011101 101011 011110 101100 101101 011100 011101

15 101101 110000 101110 011111 101110 101111 011110 011111

16 110001 101111 110010 110000 110001

17 110100 110011 110010 110011

18 110101 110110 110100 110101

19 111000 110111 110110 110111