Sei sulla pagina 1di 8

2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing

Central Dogma of Molecular Biology- New


Paradigm in Evolutionary Computation

Corina Rotar
Computer Science Department
1 Decembrie 1918 University
Alba Iulia, Romania
crotar@uab.ro

Abstract The aim of this study is to develop a new Protein synthesis represents the fundamental biological
evolutionary computation paradigm in terms of molecular process by which the cells build their specific proteins. The
biology. Standard genetic algorithms are heuristics inspired central dogma of biology has made simple formula of gene
by the simplified model of natural evolution and genetics. flow from DNA to protein in the foreground. However, the
The latest discoveries and innovations from molecular modern hypothesis establishes other ways of genetic
biology, related to the conventional central dogma of transfer: from RNA to DNA by a process called reverse
molecular biology, generate the necessity of updating the transcription. This mechanism is interesting in our study,
genetic algorithms, although successfully applied in various from the perspective of developing a potential new source
complex tasks. In this direction, the research in Evolutionary
of inspiration in Evolutionary Computation and
Computation requires a reconsideration of the concepts and
optimization tools by which the solutions are created
theories underlying the development of these popular
optimization techniques. Since the emergence of the new through an iterative process that improves the outcomes
features is important in the evolution, the DNA code requires until reaching the desired accuracy. Since the aim of
progress. Evolutionary Computation which is based on the developing new biologically inspired computational
mutation and the natural selection can be reconsidered in technique does not mainly involves the strict copying of
terms of protein synthesis and reverse transcription. From the natural phenomenon, but rather the obtaining of the
the computational perspective, a biological phenomenon best solutions of the problem, certain adjustments of the
might be interpreted in various forms in order to obtain original paradigm are justified.
reliable computational techniques.
II. FROM NATURE TO COMPUTATION
Keywords evolutionary computation paradigm; protein
synthesis; central dogma of biology; optimization In the generous landscape of nature-inspired
computation, a computational model that is designed on
I. INTRODUCTION the central dogma of molecular biology is ignored. In [1], a
modern perspective is suggested, according to which
In the light of the discoveries and the hypotheses from nature-inspired computational tools needs revisions in both
biology, related to the popular central dogma of molecular theory and applications. Moreover, some criticisms of the
biology, genetic algorithms, although successfully applied simplified model underlying the standard genetic
in various difficult problems seem to be outdated. In this algorithm were mentioned in [2]. The algorithms which are
direction, the research in Evolutionary Computation comprised into the Evolutionary Computation are
requires a reconsideration of the concepts, processes and criticized for their lack of ability to solve other
theories underlying the development of these popular optimization problem than those which they were designed
optimization techniques. Obviously, the imitation of for [3]. The multi-criteria, dynamic or multimodal
biological phenomena requires transformations in order to optimization problems require major adjustments of the
obtain effective computational techniques. These required evolutionary algorithm in order to generate the accurate
adjustments are generated by the very nature of the solutions. The preceding statements are intended to be
elements being operated: so in molecular biology and arguments to a new perspective on Evolutionary
genetics, the chemical compounds and relationships Computation, that results in a general computational tool
through which they relate are fundamental issues, as inspired by the central dogma of molecular biology.
computer techniques work with numerically coded
possible solutions which are precisely controlled by A detailed analysis of Natural Computation, given in
computational tools. By exploiting the specific structures, [20], defines it as the combination of three major research
processes and concepts which are inspired from biology, directions: 1. computing inspired by nature, 2. simulation
specifically the central dogma of molecular biology - the and emulation of nature in computers and 3. computing
optimization problems are reformulated for determining with natural materials. Among the listed directions,
the DNA code that produces the most efficient protein or computing inspired by nature (e.g. evolutionary
proteome[7] (entire set of proteins expressed by the computation, neural networks, artificial immune systems,
genetic material of an organism) consistent with the swarm intelligence) seems to be the most popular, offering
problems objectives. many computational methods successfully applied in

978-1-4799-8448-0/15 $31.00 2015 IEEE 284


285
DOI 10.1109/SYNASC.2014.46
various problems. The proposed approach fits the first
category and focuses on the biological model covered by
the Central Dogma of Molecular Biology, trying to
identify the features that can become sources of inspiration
for designing suitable optimization techniques. The
considered biological paradigm represents in the nature-
inspired computing landscape an interesting and fertile
source of inspiration and in the following paragraphs we
Fig. 1. Graphical description of te DNA, RNA, Protein and genetic
illustrate how a protein synthesis - inspired algorithm can transfer
be extended with minimal effort to solve the various
optimization problems (single and multi-objective).
B. Protein - meaning, function
Recent advances in the field of molecular biology have
attracted the interest of the Natural Computing researchers. Protein is a large biological molecule consisting of
This rich source of inspiration was spotted and there were chains of amino acids. Proteins perform a vast array of
developed several methods such as DNA Computing [23] functions within living organisms, control all cell reactions
and Gene Expression Programming[24]. DNA Computing and they are responsible for the physical appearances
makes use of the DNA molecules to solve complex (phenotype). Genotype, which represents the set of genetic
problems. Gene expression programming, basically a information that defines an organism, controls phenotype
hybrid of the genetic algorithm and the genetic because genes direct the production of proteins. Proteins,
programming, is defined as an evolutionary technique that in turn, dictate every reaction in the cell and therefore are
generates computer programs. Unlike the mentioned directly responsible for observable characteristics
approaches, our research focuses on presenting the protein (Phenotype). The synthesis of proteins is one of the most
synthesis metaphor as a source of inspiration for the surprising phenomena observed in the cell. The complex
computational techniques for solving numerical processes by which the genetic information is transmitted
optimization problems. Moreover, we design a and converted into particular formula can be summed up in
straightforward method which can be easily extended for the rule Genotype -> Phenotype. The entire set of proteins
various types of optimization problems. expressed within a cell, tissue, or organism at a certain
time, under defined conditions is called proteome [7]. The
name of the concept derived from the words protein and
III. PROTEIN SYNTHESIS- FROM BIOLOGICAL PARADIGM TO genome. Protein synthesis occurs at the level of every cell
COMPUTATIONAL PROCEDURE in the body in two distinct phases: transcription and
The following paragraphs will describe several translation (Fig.1).
concepts from the molecular biologys background, based
on which we may develop computational techniques for C. Transcription. From DNA to RNA
solving optimization problems. The DNA in the genome does not perform protein
Protein synthesis is one of the most fundamental synthesis itself, but uses RNA as an intermediary. If the
biological processes by which individual cells construct cell needs a particular protein, a sequence of the long DNA
their specific proteins. The central dogma of molecular molecule will first be copied into RNA (transcription).
biology [4],[6] has made simple formula of gene flow from Next, these RNA copies of DNA segments will be used as
DNA to protein in the foreground. Later discoveries [5] templates to direct the synthesis of the protein
completed the original theory by bringing up the reverse (translation). The flow of genetic information in the cells is
transcription from RNA to DNA, which contradicts the from DNA to RNA to protein. Consequently, each cell
simple interpretation that genetic information flows only expresses its genetic information according to a principle
from DNA to RNA to Protein. This double way transfer so fundamental that it is named the central dogma of
between DNA and RNA represents a fertile source of molecular biology.
inspiration. Therefore, since the central dogma of the Single strands RNA molecules produced by
molecular biology is questionable concerning its transcription are released from the DNA template. In
incompleteness regarding the backwards genetic transfer addition, because they are copied from only a limited
from RNA to DNA (e.g. Retroviruses), the artificial sequence of the DNA, RNA molecules are shorter than
pattern exploits a specific module of reverse transcription DNA molecules. Unlike DNA, RNA molecule does not
in order to evolve the DNA code. permanently store the genetic information in cells. In
transcription process, there is a less significant copying
A. Central dogma of molecular biology error than that in DNA replication. Replication, and to a
The central dogma [4] states that the transfer of lesser extent, transcription, have been shown to affect the
biological information is mostly done in the following mutation rates in a variety of genomes: transcription can
direction: DNA can be copied to DNA (DNA replication), be a major source of mutations in nondividing cells. [8].
DNA information can be copied into RNA (transcription), Recent studies have shown that transcription is more than a
and proteins can be synthesized using the information in plain mechanism of copying the gene. In [9] there are
RNA as a template (translation). described several ways in which the transcription
endangers genome integrity, emphasizing that transcription
is a source of genomic instability. Therefore, transcription
may be exposed as a potential cause of DNA variation.

286
285
D. Translation: From RNA to PROTEIN IV. ARTIFICIAL PROTEIN SYNTHESIS
Translation is the conversion of information from RNA DNA changes due to mutations. At the macroscopic
to information required in the process of building the level, in evolution, mutation frequency is relatively low.
proteins. During the translation, the transcribed RNA from From the perspective of evolutionary computation,
the DNA is decoded by ribosomes for protein synthesis. mutations are produced with a high frequency in order to
RNA leaves the cell nucleus and passes into the cell obtain the necessary diversity in the search process.
cytoplasm. Once there, the "protein factory" of the cell, the Mutations and natural selection are those processes that
ribosome, reads the message encoded as RNA template generate evolution. This extremely popular theory is not
and produces the proteins according to the pattern. The entirely proven and the driving force behind evolution is
ribosomes (Palade, 1974) are large and complex molecular not fully understood. Besides mutations that are produced
machines of the cells having the main function in the during evolution, DNA can change, for example, by
translation process. During the process of translation, reverse transcription. Nevertheless, the natural evolution
ribosomes link amino acids together in a specific order as occurs in a long time. In order to solve the optimization
the RNA molecules indicate. By the contribution of the problems, the basic imitation of the natural evolutionary
ribosomes, the genetic message can be "read" and processes becomes inefficient. Thus, for defining the
therefore the proteins are synthesized. evolutionary algorithms, there are constantly involved
particular strategies such as the limitation of the exploited
E. Gene Regulation. Transcriptional Factors resource (small finite populations, limited number of the
The DNA chain stores the complete information for the generations), the exaggeration of the natural evolutions
synthesis of all proteins which are required in an organism. features (e.g. a much higher mutation rate), the selection
In the process of transcription, RNA copies a fragment of procedures which favor the best individuals, the crossover
the genes expressed in the DNA code, which encodes a operators that accelerate the convergence, or the
specific protein. This phenomenon, called gene expression hybridization with traditional computational techniques.
is supported by a suite of factors that facilitate copying or In the same manner as the evolutionary algorithms
ignoring certain genes. Essentially, gene regulation include extra ingenious mechanisms without too much
represents the natural process of switching the genes on or about the biological authenticity, in order to accelerate the
off. Gene regulation occurs during transcription phase and convergence and obtain a better approximation of the
involves the activation of specific proteins according to solutions, the proposed optimization method which is
different signals received from the environment. These inspired by the central dogma exploits an artificial
proteins are called transcription factors and they are procedure such as DNA shuffling.
required to initiate or to regulate the transcription process.
Transcriptional factors bind to regulatory regions of a gene A. DNA shuffling
and control the genetic flow from DNA to RNA.
Recent advances in biology made possible the
reconstruction in laboratory of DNA-RNA-Protein path.
F. Reverse Transcription
For objective reasons, in order to rapidly obtain some
Nowadays, it is proven that some viruses [10], whose useful structures or the desired characteristics, the
genetic material is an RNA molecule, are able to artificially driven process, in vitro or in vivo, is
determine, at the cell level, the synthesis of DNA with supplemented by certain mechanisms that accelerate the
which to replicate. This phenomenon is demonstrated and evolution towards the desired goal. These techniques are
it is called reverse transcription [11], [12]. The reverse framed into the powerful engineering tool, namely
transcription is discussed from an evolutionary viewpoint Directed Evolution [19]. In short, directed evolution can be
rather than expressing it from a biological perspective in defined as the evolving of the proteins toward a user-
[13]. It is stated that reverse transcription represents a defined goal and it is an iterative process which involves
unique tool for transmitting information from the dynamic the generation of a set of biological entities of interest and
RNA to the more inert DNA and has been instrumental the screening/selection to identify those variants which
in shaping extant genomes [13]. Virus as a computational display better properties. The best mutants of the each
concept is exploited in various fields. For the purpose of iteration will serve as templates for subsequent iterations
the present work, the virus, the retrovirus more precisely, of diversification and selection. The process is repeated
represents the special tool through which the reverse until the desired improvement is achieved.
transcription from RNA to the DNA is possible. Using the
mechanism of reverse transcription in the bio-inspired It is surprising to see that, basically, evolutionary
procedure, it will allow the DNA to evolve during the computations metaphor is more comparable to the
search process. In the absence of the reverse transcription, Directed Evolution pattern. Considering either the artificial
the system would be less dynamic and would not produce and the natural proteins synthesiss processes or the
a rapid convergence towards solutions of the problem process of evolving a population of possible individuals,
addressed. the acceleration of the evolution is desired. Therefore,
diversification process, especially DNA shuffling [14]
In a simple definition, reverse transcription represents proves itself attractive.
the process by which DNA is synthesized from an RNA
template. Therefore, the genetic information could be Directed Evolution requires a mechanism for
transported in a reverse way, from RNA to DNA, contrary introducing genetic variations. DNA shuffling represents a
to the Central Dogma of the molecular biology. method of recombining homologous genes and provides
diversity. The strength of iterative homologous

287
286
recombination in computational simulation is mentioned in shuffling technique which is framed in the Directed
[14]. For these considerations, a DNA shuffling-inspired Evolution. The method is valuable as it offers the
computational procedure is proposed for producing a better possibility to address a large range of optimization
diversity. problems.
General pattern for re-interpreting an optimization
B. Protein synthesis process versus optimization process as a protein synthesis mechanism consists in
In the above description, the central dogma of following steps:
molecular biology, by its structures and inner processes,
represents an unusually complex biological model that 0. DNA shuffling - the process by which the DNA code
requires more attention and prefigures a valuable paradigm varies; it is an artificial process that is designed in order to
in Evolutionary Computation. The biological model achieve a diverse genetic pool.
contains several features which must be emphasized: 1. Transcription the process by which the genetic
code is transcribed into RNA code
1. DNA is the genetic signature of an organism. The 2. Translation the process of producing the proteins
information that is stored in DNA is partially transferred from RNA code
into proteins using RNA as an intermediary. The same 3. Regulation the process which controls the
DNA code is responsible for more outputs (proteins). From transcription
these considerations, the DNA structure would be 4. Reverse transcription the process by which RNA
analogous to the set of possible solutions from the search code is reverse transcribed in DNA genetic code
space.
A. PSA for single objective optimization
2. The observable characteristics of an organism In the case of single objective optimization (SOP), the
(phenotype) are generated through the process of genes goal is to determine the best possible solutions from the
expression in two stages: transcription and translation. search space. From this point of view, the biological
The protein synthesis procedure is comparable to the paradigm described in the previous paragraphs can be
process of mapping the search space into the objective comprehended as the iterative process for the synthesis of
space. a protein with the desired features from a diverse library of
3. Proteins guide the reactions in each cell being genes. The DNA code resulting after the DNA shuffling
directly responsible for the phenotype; therefore, proteins process is viewed as a set of possible solutions in the
are the fundamental factors in phenotype. From the search space. At each iteration, to produce proteins
computational point of view, depending by the increasingly skilled according to the given objective, the
optimization problem category, the complete proteome or following main procedures are exploited: transcription,
only a specific protein represents the required output of the translation and reverse transcription.
procedure. The proteins structure corresponds to the
solution in the objective space. B. DNA, RNA and PROTEIN for SOP
4. The transcriptional factors influence the transcription Let us consider n the size of the search space, and f- the
process by which DNA is copied into RNA. RNA is not a single objective. A solution from the search space is
clone of the DNA. The process of transcription is represented by a gene of the DNA chain. Each gene
responsible for gene expression, respectively, for the codifies a possible solution: ( x1 , x2 ,, xn ). DNA chain
expressed output of the protein synthesis process. For the codifies the solutions in the search space and it is given by
bio-inspired computational technique, the transcriptional the vector:
factors would represent the probabilities of transcriptional
errors during the transcription process. DNA={( x11 , x12 , ..., x1n ),( x12 , x22 , ..., xn2 )...( x1l , x2l , ...,

5. By reverse transcription, the information is xnl ) }, where l- length of the DNA chain.
transmitted backwards, from RNA to DNA, contrary to the The RNA chain has the same size and structure, as being a
central dogma of molecular biology. In developing the transcription of the DNA template:
optimization technique, the reverse transcription procedure RNA={( y11 , y12 , ..., y1n ) ( y12 , y 22 , ..., y n2 ) ... ( y1l , y 2l ,
infuses new promising zones of the search space into the
DNA. Further, the protein synthesis procedure creates ..., y nl ) }, where l- length of the RNA chain and each
improved outputs according to the problems objectives. copied gene corresponds to one specific protein. Each
6. DNA shuffling is an artificial technique that is protein represents the objectives value of a gene,
designed in order to achieve a diverse genetic pool in considering the RNA chain.
Directed Evolution; it could be appropriated for our Proteins -represent the set of RNA code translated into the
purpose as it is a process by which the genetic code is objective space:
altered and therefore the genetic diversity improves. Proteins={
( ) ( ) ( )
f y11 , y12 ,..., y 1n , f y12 , y 22 ,..., y n2 ,... f y nl , y nl ,..., y nl }.
V. PROTEIN SYNTHESIS ALGORITM FOR
OPTIMIZATION C. PSA for multi-objective optimization
Protein synthesis algorithm (PSA) is an approach that In the case of a multi-objective optimization problem
is inspired by the combination of two important processes: (MOP), the aim is to find the set of Pareto optimal
the biological paradigm of protein synthesis and the DNA solutions. In this sense, approaching the multi-criteria

288
287
problem in terms of protein synthesis procedure is genetic material (information) into the approximation of
performed in the following way: considering DNA (the the Pareto front (proteome). The core of Protein synthesis
genotype) as a set of possible solutions in the search space, mechanism consists of three stages, as in the
each solution being encoded as a gene, the aim is to design corresponding biological paradigm. Firstly, the
the proteome (phenotype) that best satisfies the conditions transcription process takes part and involves the copy of
of the problem. Thus, the requirement is to generate all the genetic material (DNA chain) and also implicates a
expressed proteins which correspond to the best series of regulated artificial small alteration of the DNA
approximation of the Pareto front. chain in order to create the RNA templates. Secondly,
RNA is transformed into proteins through the translation
mechanism. This mechanism is simple due to the
straightforward translation of the solution from the search
space into the objective space in multi-objective
optimization situation. Thirdly, the reverse transcription
inserts variation into the original DNA chain. As the
genetic material needs to evolve, and reverse transcription
is not powerful enough to give the desired speed of the
evolution, an artificial procedure is taken into account,
namely DNA shuffling.
Fig. 2. Multiobjective optimization. Computational model: Protein F. DNA Shuffling
synthesis, structures and processes
DNA shuffling [14] represents a widely used method in
protein engineering technology for creating a diverse gene
D. DNA, RNA and Protein for MOP library and it is one of the most representative processes
Let us consider n the size of the search space. A of diversification in Directed Evolution.
solution from the search space is represented by a gene of Since the purpose of computational techniques is to
the DNA chain. Each gene codifies a possible solution: determine the solutions of the problem in a relatively good
( x1 , x2 ,, xn ). time, the DNA shuffling process becomes an idea that is
Considering l- the length of the DNA chain, the vector worthy of consideration. The core of the protein synthesis,
DNA codifies a set of possible solutions in the search which is inspired by the central dogma (DNA-> RNA>
space and it is represented by the following array: Protein) is enclosed in an iterative process involving an
acceleration of DNA evolution by applying the DNA
DNA={( x11 , x12 , ..., x1n ),...( x1l , x2l , ..., xnl ) }.
shuffling process. Thus, each gene from the DNA chain
The output of the PSA algorithm is given by the final blends with its homologous gene. Homologous gene
DNA chain that contains an approximation of the Pareto represents the closest different gene in the search space.
set. The usual termination condition of PSA is given by For simplicity, we use the Euclidian distance as a measure
attaining the maximum iterations number and not by of similarity.
generating a list that contains only Pareto non-dominated The resulted gene will survive among the DNA chain
solutions. Therefore, the DNA could comprise residual if it is superior to the original gene. The performance of
dominated solutions which can be further removed. genes is measured differently depending on the type of
problem considered. When referring to maximization
The RNA chain has the same size and structure, as being a problem with a single objective, the comparison of parent-
transcription of the DNA template: descendent genes is based on the values of the objective
RNA={( y11 , y12 , ..., y1n ),... ( y1l , y 2l , ..., y nl ) }, where l- function. For the optimization problems that involve two
length of the RNA chain and each copied gene or more objectives, the definition of Pareto dominance is
corresponds to one specific protein. considered. Therefore, the operator > from the next
The Proteome (totality of expressed proteins) contains an procedure has different meanings depending on the
approximation of the Pareto front considering the DNA problems type.
chain and it is given by: Procedure DNA_Shuffling:
1 1 1 l l l
Proteome= {( f , f ,... f ), ( f , f ... f )}, where m- Input: DNA={( x11 , x12 ,..., x1n ),...,( x1l , x2l ,..., xnl )}
1 2 m 1 2 m
represents the number of objectives. For each i=1 to l
Select the jth gene that is homologous of ith gene
E. Protein synthesis as a general optimization tool ( y1i , y 2i ,..., y ni )=Shuffling(( x1i , x2i ,..., xni ),( x1j , x2j ,..., xnj )
In single objective optimization case, the protein )
synthesis process is seen as the iterative process of If ( y1i , y 2i ,..., y ni ) > ( x1i , x2i ,..., xni ) then
searching best gene/genes which correspond to the desired
protein and thus maximizes the single objectives value. ( x1i , x2i ,..., xni )=( y1i , y 2i ,..., y ni )
For multi-objective optimization, protein synthesis Output: DNA={( x11 , x12 ,..., x1n ),...,( x1l , x2l ,..., xnl )}
process corresponds to the procedure by which an EndProcedure
approximation of the Pareto optimal set converts the

289
288
Shuffling procedure corresponds to the crossover procedure where these elements are reinterpreted as
operator in GA paradigm. For our study we used the extents of gene expression in the process of transcription.
discrete crossover as it is descibed in [21]. For these considerations, transcriptional factors (TF)
could represent activation or inhibition thresholds in gene
G. Transcription expression during transcription from DNA to RNA.
Transcription represents the process by which the Although the biological model is more strict about
RNA chain is created. In this procedure, the DNA chain is transcriptional factors action, either as activators by
the basis for the RNA chain. Each gene from DNA has its triggering the gene transcription, or as inhibitors by
own probability to be copied as it is or altered before it is causing the omitting of some sequences from DNA, the
passed to the RNA. The probabilities to alter the genes are computational model mimics the action of TFs by using
given by the transcriptional factors which are computed continuous values that affect the accuracy of copy
with respect to the quality of the genetic sequence. (accuracy in transcription) rather than the gene occurrence
in RNA template.
Considering TF={ 1 ,, l } the array of transcriptional
The transcriptional factors are given by the values TF
factors, the RNA is synthesized by the following which represent the degrees of dissimilarity between the
algorithm: original sequence of genes from the DNA chain and the
Procedure Transcription: genes in the RNA chain. Thus, a null value indicates that
the original DNA sub-sequence is cloned into the
Input:TF={ 1 , 2 ,, l } corresponding location into the RNA. Otherwise, for a
DNA={( x11 , x12 ,..., x1n ),...,( x1l , x2l ,..., xnl )} non-zero value, the original genes are altered and then
For each i=1 to l copied into RNA. These transcriptional errors would
For each j=1 to n sustain further the diversity in the search space.
Considering l the length of the DNA chain, the
If rnd < i then y ij =
transcriptional factors are computed as follows:
( )
x ij + rnd max j x ij , if rnd 0.5
i =
(
rank 1
i
)
+ (1 ) i , i {1,..., l} ,  randomly
i
( )
x j rnd x j min j , otherwise
i
l l
i i generated in [0,1], and:
Else y = x
j j
- rank i in multi-objective context, represents the non-
Output: RNA={( y11 , y12 ,..., y1n ),...,( y1l , y 2l ,..., y nl )}
domination rank [15] of the ith gene among the other
EndProcedure
genes of the DNA chain, taking into account the given
The values min j and max j represents the limits of objectives; for single objective optimization problems, it
the search interval for the jth variable. corresponds to the rank of the ith gene, taking into account
the single objectives values
H. Translation
- i - represents the number of all genes from the DNA
Translation represents the protein synthesis process.
By using the RNA chain obtained during the transcription chain which are in the vicinity of the ith gene; the radius of
procedure, translation constructs the protein or the set of the neighborhood  is computed as the average of the
proteins (the proteome) by computing the objective values Euclidian distances between each distinct pair of genes
for the solutions codified in the RNA. l 1, if euclidian_distance(i, j)
i = g (i, j ) , g (i, j ) =
Procedure Translation: j 0, otherwise
Input: RNA={ ( y11 , y12 , ..., y1n ), ..., ( y1l , y 2l , ..., y nl ) }
Consequently, the transcriptional factor is a convex
For each i=1 to l combination of the rank and the crowding degree.
For each j = 1 to m /*(m is 1 for single objective problems)*/ Regulation procedure computes the transcriptional factors
f ji = Compute jth objective of ith solution ( y1i , y 2i ,..., y ni ) by using the following measures:
Output: - (in genotype) the crowding of the solutions that are
Proteins/Proteome= ( f11 , f 21 ,..., f m1 ),,( f1l , f 2l ,..., f ml )
codified in RNA template as genes
- (in phenotype) the ranks of the solutions in the objective
EndProcedure space that are codified in the corresponding protein.
I. Regulation. Transcriptional factors Therefore, transcriptional factors are affected by the
genotype and phenotype accordingly. Further, these
From the description of biological model we
transcriptional factors are used in the transcription
emphasize the main point that the transcriptional factors
procedure as transcriptional error probabilities.
represent indispensable elements in the transfer of
information from DNA into RNA. In simple words, their Procedure Regulation:
role is defined by the effect they have on copying the Input: RNA={( y11 , y12 ,..., y1n ),..,( y1l , y 2l ,..., y nl )}
genetic information from DNA in the intermediate RNA.
By ignoring the intricate biological details about the Proteins (Proteome) =( f11 , f 21 ,..., f m1 ),,,( f1l , f 2l ,..., f ml )
transcriptional factors action we develop a computational For each i=1 to l

290
289
Compute the transcriptional factor i Non-dominated Sorting Genetic Algorithm II [15] for
multiobjective optimization problems.
Output: TF={ 1 , 2 , , l }
EndProcedure A. Single-objective optimization
J. Reverse transcription DNA synthesis For single-objective optimization, we consider the
In a simple definition, reverse transcription represents following test functions:
the process by which DNA is synthesized from the RNA Schaffers
f1 (x, y ) = 0.5 +
( )
sin 2 x 2 y 2 0.5
(1 + 10 (x + y ))
template. The biological process of reverse transcription F6 3 2 2 2
proves itself as a suitable mechanism for our purpose: as
long as the intention is to create the best possible DNA (x, y ) [ 100,100]2 , Global min: f1* (0,0) = 0.
chain which codifies the proteins/proteome, the DNA
chain should be transformed repeatedly. From this point
Ackley
(
f 2 (x, y ) = 20 exp 0.2 0.5 x 2 + y 2 -

)
of view, the reverse transcription mechanism represents a
exp(0.5(cos(2x ) + cos(2y ))) + 20 + e
proper technique to evolve the artificial DNA chain
corresponding to the problems solutions. Therefore, the (x, y ) [ 30,30]2 ,Global min: f 2* (0,0) = 0.
role of the reverse transcription procedure consists in Rastrigin f 3 (x, y ) = 20 + x 2 10 cos(2x ) + y 2 10 cos(2y )
adjusting the DNA chain in order to generate highly
qualified proteins. Thus, the RNA chain is copied in the (x, y ) [ 5,5]2 , Global min: f 3* (0,0) = 0.
DNA chain. Further, the newly transformed DNA is the The GA, PSO and PSA run for 20 times. We considered
basis for the next protein synthesis. The importance of the following parameters and setups for each run time:
reverse transcription is given by its strength in
transforming the inert DNA chain. By using this PSA 100 generation, Size of DNA=50, Size of RNA=50
procedure, better solutions are inserted backward into the GA 100 chromosomes, 100 iterations
DNA chain and contribute to the evolution. standard convex crossover (pc=0.9)
random uniform mutation (pm =0.1)
Procedure ReverseTranscription: fitness proportional selection
Input: RNA={ ( y11 , y12 , ..., y1n ),..., ( y1l , y 2l , ..., y nl ) } PSO 100 particles, 100 iterations
inertia weight linearly decreases from 0.9 to 0.4
For each i=1 to l
learning coefficients = 2.0, maximum velocity =10
If( y1i , y 2i ,..., y ni )>( x1i , x2i ,..., xni ) then
TABLE I. PSA VERSUS GA, PSO. COMPARISON RESULTS.
( x1i , x2i ,..., xni )=( y1i , y 2i ,..., y ni )
Test
Output: DNA={( x11 , x12 ,..., x1n ),...,( x1l , x2l ,..., xnl )} function
Schaffer Ackley Rastrigin
EndProcedure Mean 9.3E-16 8.3E-03 1.55E-13
PSA Stddev 7.8E-16 2.8E-03 2.5E-13
Depending of the problems type, single or multi-
objective optimization, the comparison between the Minimum 2.22E-16 8.28E-03 1.42E-14
sequence of RNA and original DNA is made either Mean 2.55E-03 1.18E-03 3.83E-04
according to the Pareto dominance or according to the PSO Stddev 4.14E-03 9.05E-04 8.17E-04
values of the single objective. Minimum 4.21E-07 1.07E-04 3.62E-06
Mean 6.72E-04 2.94E-02 2.78E-04
Next the outline of the PSA algorithm is given: Stddev 5.46E-04 2.66E-02 4.66E-04
GA
Algorithm PSA is Minimum 1.59E-04 9.22E-04 2.51E-07
DNA initialization
Loop Table I presents the average values of the best proteins
DNA_Shuffling found by each algorithm. The results show that PSA
TRANSCRIPTION performs better than standard GA for the considered test
functions. Also, for the Ackley test function, PSO offers
TRANSLATION
better results than both PSA and GA.
REGULATION
REVERSE_TRANSCRIPTION B. Multiobjective optimization
Until StoppingCondition is True In order to investigate the performance of PSA, we
End. used several well-known test problems: ZDT1, ZDT2,
ZDT3 [16] as the two objectives test functions with 30
VI. RESULTS variables, and DTLZ1, DTLZ2 and DTLZ3 problems [17]
The proposed algorithm is tested both for single- as the three objectives test functions with 7 variables
objective optimization and multi-objective optimization. (DTLZ1), respectively, 12 variables (DTLZ2 and DTLZ3).
For each type of problem (single or multi-objective), For performance assessment, we compute hyper-volume
several popular test functions are chosen and the metric (HV)[17]. Hyper-volume metric corresponds to the
comparisons are conducted against the popular algorithms: size of the objective space which contains the solutions
Genetic Algorithm and Particle Swarm Optimization which are Pareto - dominated by at least one of the
(PSO) [22] for single objective optimization problems and members of the set.

291
290
Independent-samples t-tests were conducted to REFERENCES
compare PSA results and NSGA2 results for ZDT and [1] Banzhaf, W., et al. "Guidelines: From artificial evolution to
DTLZ test functions. There were significant differences in computational evolution: a research agenda." Nature Reviews
the hypervolume metric for PSA and NSGA2, considering Genetics 7.9 (2006): 729-735..
99% confidence intervals. The results summarized in [2] Donald S. Burke, Kenneth A. De Jong, John J. Grefenstette,
Table II shows that PSA perform better than NSGA2 for Connie Loggia Ramsey, and Annie S. Wu. 1998. Putting more
genetics into genetic algorithms. Evol. Comput. 6, 4 (December
the all considered problems, except DTLZ2. 1998), 387-410.
TABLE II. PSA VERSUS NSGA2: STATISTICAL RESULTS FOR HV [3] Wolpert, D.H., Macready, W.G. (1997), "No Free Lunch Theorems
for Optimization", IEEE Transactions on Evolutionary
Test HV (PSA) HV (NSGA2) Computation 1, 67.
T_Stat p
function Mean (SD) Mean (SD) [4] Crick, F (August 1970). "Central dogma of molecular biology.",
ZDT1 8.71E-01 7.77E-01 5.38 1.12E-04 Nature 227 (5258): 5613.
4.57E-02 2.02E-02 [5] Temin H, Baltimore D. RNA-Directed DNA Synthesis and RNA
ZDT2 7.80E-01 5.08E-01 1.12E+01 1.E-06 Tumor Viruses. In Advances in Virus Research, vol. 17. Smith
7.36E-03 7.50E-02 KM, Lauffer MA, and Bang FB, Eds. New York: Academic Press,
ZDT3 6.52E-01 5.78E-01 9.06 8.82E-06 1972, pp. 129-186
1.44E-03 2.40E-02 [6] Watson JD, Crick FH. Molecular structure of nucleic acids; a
DTLZ1 9.38E-01 7.14E-01 4.75 3.90E-04 structure for deoxyribose nucleic acid. Nature. 1953 Apr
7.22E-04 2.15E-02 25;171(4356):737738
DTLZ2 8.84E-01 8.90E-01 -4.05 3.44E-04
[7] Wilkins, M. R. et al., From proteins to proteomes: large scale
1.17E-05 1.24E-05 protein identification by two-dimensional electrophoresis and
DTLZ3 2.69E-01 1.05E-01 8.08 1.59E-07 amino acid analysis. BioTechnology14, 6165 (1996).
2.51E-03 1.61E-03
[8] Polak P, Arndt PF. Transcription induces strand-specific mutations
at the 5 end of human genes. Genome Res 2008;18:1216-1223.
VII. CONCLUSIONS
[9] Kim N, Jinks-Robertson S (2012) Transcription as a source of
The flow of the concepts and ideas propagates in both genome instability. Nat Rev Genet 13: 204214.
ways, from nature toward computation and vice versa so [10] Robin A Weiss, The discovery of endogenous retroviruses,
that, it can be said that the artificial and the natural Retrovirology, 2006, Volume 3, Number 1, Page 67.
contexts merge harmoniously. The field of nature-inspired [11] Temin HM, Mizutani S (June 1970). "RNA-dependent DNA
computation grows rapidly and includes numerous bio- polymerase in virions of Rous sarcoma virus". Nature 226 (5252):
12113.
inspired paradigms that serve as practical models for
[12] Baltimore D (June 1970). "RNA-dependent DNA polymerase in
various techniques. Besides, novel discoveries in virions of RNA tumour viruses". Nature 226 (5252): 120911.
molecular biology occur with greater rapidity, making the [13] Mourie: Reverse transcription in genome evolution. Cytogenet
two directions to appear unsynchronized. For these Genome Res. 2005; 110(1-4):56-62.
reasons, a pattern that is inspired by the latest discoveries [14] Stemmer WP. Rapid evolution of a protein in vitro by DNA
and artificial technologies, which are framed in genetics shuffling. Nature, 1994 Aug 4;370(6488):389-91
and molecular biology, becomes required. In our study, we [15] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T. (2002a), A fast
emphasize the richness of such a model and we propose and elitist multi-objective genetic algorithm: NSGA-II, IEEE
the paradigm of protein synthesis as a new perspective in Transactions on Evolutionary Computation, 6 (2), pp. 182-197.
the Evolutionary Computations landscape. [16] Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C. M., & Da
Fonseca, V. G. (2003). Performance assessment of multiobjective
The algorithm that is inspired by the natural process of optimizers: An analysis and review. Evolutionary Computation,
protein synthesis and by the artificial DNA shuffling is IEEE Transactions on, 7(2), 117-132.
expected as a general optimization technique, by offering a [17] Zitzler, E., Deb, K., Thiele, L. (2000), Comparison of
simple use for different problems. This paper presents Multiobjective Evolutionary Algorithms: Empirical Results,
Evolutionary Computation, vol. 8 no, 2, pp. 173-195.
experimental preliminary results obtained with PSA for the
[18] Deb, K., Thiele, L., Laumanns, M., Zitzler, E. (2002b), Scalable
optimization problems with one or two objectives, taking Multi-Objective Optimization Test Problems, in CEC 2002:
into account various problems and comparing the Proceedings of the conference on Congress on Evolutionary
performance of the proposed approach with the popular Computation, IEEE Press, pp. 825 - 830.
standard techniques: GA, PSO and NSGA2. The results [19] Cobb, R. E., Chao, R. and Zhao, H. (2013), Directed evolution:
show that PSA is a suitable optimization technique, being Past, present, and future. AIChE J., 59: 14321440. doi:
able to offer better solutions in most cases. 10.1002/aic.13995.
[20] De Castro, L.N.. Fundamentals of natural computing: basic
The major advantages of the described method consist concepts, algorithms, and applications. CRC Press, 2006.
in both its comparable performance to the existing [21] Beyer, H. G., & Schwefel, H. P. (2002). Evolution strategiesA
algorithms performances, but also with the possibility to comprehensive introduction. Natural computing, 1(1), 3-52.
easily address the different optimization problems. As [22] Shi, Y., & Eberhart, R. (1998, May). A modified particle swarm
further research, addressing the optimization problems in optimizer. In Evolutionary Computation Proceedings, 1998. IEEE
World Congress on Computational Intelligence., The 1998 IEEE
dynamic environment and more numerical experiments are International Conference on (pp. 69-73). IEEE.
required to prove the strength of the proposed paradigm.
[23] Adleman, L. M. (1994). "Molecular computation of solutions to
Our study encourages a deeper exploitation of the protein combinatorial problems". Science 266 (5187): 10211024.
synthesis metaphor by analyzing the complex biological [24] Ferreira, C. (2001). "Gene Expression Programming: A New
structures and processes to design new computational Adaptive Algorithm for Solving Problems". Complex Systems,
techniques. Vol. 13, issue 2: 87129 .

292
291

Potrebbero piacerti anche