Sei sulla pagina 1di 8

Search for Maximal Snake-in-the-Box Using New Genetic

Algorithm
Kim-Hang Ruiz
International MIS
1065 Waltons Pass
Evans, GA 30809

kimhangruiz@aol.com
ABSTRACT

1. INTRODUCTION

The Snake-In-The-Box (SIB) problem is a challenging


combinatorial search problem to find the longest constrained open
path (k-spread snake) in n-dimensional hypercube (Qn). In
addition to constructive techniques, many search algorithms such
as Depth First Search (DFS), Genetic Algorithm (GA), hybrid
Evolutionary Computation algorithm (EC), and Nested MonteCarlo Search (NMCS) have been used to tackle this problem. To
get better results and to speed up the process, these techniques
often used a long snake as the starting point for the search
(priming/seeding).

The n-dimensional hypercube Qn is a graph whose vertices consist


of all binary strings bn-1b1 b0 where each vertex connects to n
adjacent vertices. Two vertices are adjacent if and only if their
binary strings differ by exactly one bit. By definition a k-spread
snake is an induced path in an undirected graph G, such that every
pair of vertices of a least distance k in the path are also at a least
distance k in graph G. The Snake-in-the-Box (SIB) problem, to
find the longest k-spread snakes in Qn (k-Sn), has been a challenge
for researchers over the past fifty years.

This paper reviews the hypercube fundamentals, then presents a


new search technique, Mitosis Genetic Algorithm (MGA), which
was applied in search for the four different spread snakes (spread
2 to 5) in seven different dimensional hypercubes (Q6 to Q13).
The MGA found three new record-breaking 3-spread snakes in
Q10, Q11 and Q13, all the previously known optimal snakes from
spread 2 to spread 5, and the best previous known maximal 3-S9
snake of length 63. It is remarkable that it found those within
minutes to hours without priming, significantly shorter than days
to weeks needed in the other techniques.

The properties of the SIB have been applied in error-detecting


codes, in analog-to-digital encoders and converters [5, 7],
modulation schemes in multi-level flash memories [16], efficient
resource distributions in high speed computer clusters [11], and in
identifying gene regulation networks in embryonic development
[17]. The accuracy of error detection in certain analog-to-digital
conversion systems increases as the length of the snake increases;
the greater the spread, the greater is the error-detection capability.
Accordingly, there has been much interest in finding the longest
maximal k-Sn.
When a snake cannot be extended at either end, it is maximal.
The number of maximal snakes vary with the hypercube
dimension and the spread. The longest maximal k-Sn is the
optimal.
Many mathematicians have suggested several
constructive techniques [1, 2, 12, 15] to build the longest maximal
k-Sn. Other scholars have either applied heuristic search such as
GA [2, 13], EC hybrids [3], and NMCS [8], or exhaustive search
such as DFS [6, 10] to find the optimal or the longest maximal kSn. These search techniques can find the optimal in the small
dimensional hypercubes Qn; but not in the large ones. Most
record-breaking k-Sn (for large Qn) reported in literature have not
been proven to be the optimal. The GA holds the record in
finding the maximal 2-S8 of length 98 [2] and the NMCS in 2-S10,
2-S11 and 2-S12 of lengths 370, 695 and 1274 respectively [8].
Only constructive techniques and DFS were used to build or to
search for snakes with spreads higher than 2. To get better results
and to speed up the search process, a longer snake is often used as
the starting point for search (priming) in these techniques.

Categories and Subject Descriptors


I.2.8 [Artificial Intelligence]: Problem Solving, Control
Methods, and Search backtracking, graph and tree search
strategies, heuristic methods, plan execution, formation, and
generation.

General Terms
Algorithms, Theory

Keywords
Genetic Algorithm, Mitosis Genetic Algorithm, Heuristics, Snake,
Hypercube

Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by
others than the author(s) must be honored. Abstracting with credit is permitted. To
copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from
Permissions@acm.org.
GECCO '14, July 12 16, 2014, Vancouver, BC, Canada.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-2662-9/14/07$15.00.
http://dx doi org/10 1145/2576768 2598296

This paper reviews the hypercube fundamentals in section 3 and


then introduces a new search technique: Mitosis Genetic
Algorithm (MGA) in section 4. The MGA implementation and
how the experiment was set up in search for the longest maximal
k-Sn snakes are described in section 5. The search results for the
four different spread snakes (spread 2 to 5) in seven different
dimensional hypercubes (Q6 to Q13) are discussed in section 6 and
section 7. The MGA found three new record-breaking 3-S10, 3-

831

2. TERMINOLOGY AND DEFINITIONS

Spread k is the distance between two nodes that are at least k


units apart.
A transition sequence of a k-Sn snake is the sequence of
integers ranging from 0 to (n-1), which corresponds to the bit
positions (index) in which successive ncode-words differ.
The Vicinity Vk consists of all nodes in Qn that have a distance
k from the snake path.
Unavailable nodes are nodes that have distances less than k
units from any snake node (in the vicinity V1 or Vk-1).
Unavailable transition is the transition that leads to an
unavailable node.

The following standard abbreviations and terms are used in this


paper.

2.3 Font Style

S11 and 3-S13 of lengths 125, 158, and 509, longer than the best
previously known value, 103, 157 and 493 respectively [6]. It
also found all the previously known optimal snakes from spread 2
to spread 5; and the known longest 3-S9 snake of length 63
without priming. It is remarkable that the search times to find
these optimal and longest maximal k-Sn snakes took minutes or
hours, significantly shorter than the days to weeks required by the
other techniques. A list of transition sequences of three recordbreaking snakes is provided in an appendix (section 9).

Italic font represents variable name in programming or in


calculating the search space and the search time of the
algorithm.
Bold underlined italic font represents definitions.

2.1 Terminology
k
k-Sn
n
Qn
O(n)
OGA(t)
O(t)
Ok(t)
Ok-1(t)

spread
n-dimensional k-spread snake, e.g 3-S8 denotes 8dimensional 3-spread snake. In the literature Sn is used
for 2-Sn.
number of dimensions.
n-dimensional hypercube, e.g Q9 denotes 9-dimensional
hypercube.
search space.
search time for each Genetic Algorithm in the MGA.
search time for the Mitosis Genetic Algorithm.
search time for k-spread snakes.
search time for (k-1)-spread snakes

3. HYPERCUBE FUNDAMENTALS
One of the most important properties of the hypercube is that the
number of vertices in Qn is always twice as much as in Qn-1.
Figure-1 shows how the sole node in Q0 duplicates itself by
translating in coordinate 0 (transition 0), to form a line in Q1
which continues to duplicate in transitions 1, 2, and 3 to form Q2,
Q3, and Q4 respectively.

2.2 Definitions

Available nodes are nodes that have distances at least equal to


spread k from any snake node except (k-1) nodes from the tail
(or the head).
Available transition is the transition that leads to an available
node.
Canonical form of a snake is a type that requires each
transition number to first appear in a transition sequence only
after all smaller transition numbers have appeared. For
example, the transition sequence 01203123 is in canonical form
but the transition sequence 10321032 is not.
Chromosomes represent possible solutions to the genetic
algorithm problem at hand.
Free neighbors are neighbors that are available nodes.
Fitness is the evaluation of a chromosome as a solution to the
given genetic algorithm problem.
An induced path in an undirected graph G, is a sequence of
nodes P, where every pair of non-consecutive vertices in P are
non-adjacent in G.
A maximal snake is not a subsequence of any other snake,
which cannot be extended at either end without violating the
non-chord constraint.
The n-dimensional k-spread snake k-Sn is an open induced
(chord free) path in an undirected graph G such that every pair
of vertices at distance at least k in the path are also at distance
at least k in G.
A node sequence of a k-Sn snake is the sequence of n-codewords which consist of all binary strings bn-1b1 b0 and two
words are adjacent if and only if their binary strings differ by
exactly one bit.
The optimal snake is the longest maximal snake proven.
Population is a collection of chromosomes in a genetic
algorithm problem.

Each vertex in Qn connects to n other vertices. If there is no


induced path in the graphs, every vertex has an equal chance to be
selected as a snake node based on the isomorphic symmetries of
the hypercube. If, however, there is an induced path (snake) in
the graph already, only some nodes can be selected to extend the
path. The conditions for a node to be selected are that, it must be
available and adjacent to the end nodes (tail or head). For
example, an induced path 3-S6 in Q6 with transition sequence
0123 is illustrated with the thick black line in Figure-2. Red
circle vertices represent unavailable nodes in vicinity V1 that are
one distance away from snake nodes. Green triangle vertices are
unavailable nodes in vicinity V2. Black square vertices are
available nodes that can be added to the path. There are 26
available nodes in Figure 2 but only 3 of them (marked e0, e4,
and e5) are adjacent to the tail and can extend the path. Thus the
next transition for the induced path (snake) can only be either 0, 4
or 5.
The left half of the graph can be considered as Q5 (Qn-1) where the
most significant bit of all nodes is equal to 0. The right half of the

832

graph is also considered Q5 (Qn-1) with the most significant bit


equal to 1. Some of the unavailable nodes can be connected to
form some paths which replicate part of the snake path. Notice
that:

process in biology where DNA replicates to prepare for cell


duplication. A cell with good DNA grows into many more cells
with good DNA. Imitating this natural process, Mitosis Genetic
Algorithm (MGA) was developed with the assumption that a long
k-Sn-1 can extend into a long k-Sn.

4. MITOSIS GENETIC ALGORITHM


Mitosis Genetic Algorithm is a heuristic search technique
applying a series of Genetic Algorithms to many parts of the
problem at hand to find the optimal or near optimal solutions for
each part which will then be combined or extended to ultimately
find the optimal solution to the problem. Genetic Algorithms are
heuristic search methods modeled from the theory of natural
evolution. The process in GA involves generating a population of
chromosomes (possible solutions to the problem) which will then
be modified by the genetic operators, namely, selection,
crossover, and mutation, to create a new generation. This
evolutionary process is repeatedly carried out a predetermined
number of times for each GA.

4.1 Encoding Schemes


Depending on the nature of the given problem, many encoding
techniques can be used. Normally, the chromosomes in the
population can be represented as strings of binary digits, integers,
real numbers, or ordered list. The bit string representations are
more commonly used because they are simple to create and easy
to manipulate. It is recommended that the representations in all
GAs are the same to ease the later process of combining or
extending partial solutions into the solution of the problem.

4.2 Objective Function

1) Path of unavailable nodes in vicinity V1 (red lines connecting


red circles) on the right half of figure replicates the snake path
on the left half minus 1 node.
2) Paths of unavailable nodes in vicinity V2 (green lines
connecting green triangles) on the right half of figure
replicates the path of unavailable nodes in vicinity V1 in the
left half of figure minus 1 node.
3) There are no unavailable nodes on the right half figure
corresponding to the unavailable nodes in vicinity V2 on the
left half. This is because the nodes on the right half are in
vicinity V3 which are available nodes according to the
definition of 3-Sn.
4) There is one available node in the right half figure
corresponding to each available node in the left half figure.

Objective Function is a formula used to evaluate the fitness of a


chromosome as a solution to the problem at hand. The formula
can be mathematical or non-mathematical depending on the
problem. Choosing an appropriate and effective objective
function in each GA is very crucial for MGA to find the efficient
solution of any given problem.

4.3 Selection
Based on the fitness which is evaluated by the objective function,
individual chromosomes with a higher value will have a better
chance to be selected as a parent to produce offspring of the next
generation. There are many different methods to select, the two
most common are: weighted Roulette-Wheel, and Tournament. In
Roulette-Wheel selection, the probability of a member being
selected is proportional to its fitness. In Tournament selection,
four or six members in the current population are randomly
selected; and the one best fitted will become parent
(Tournament-4 or Tournament-6).
Two selected parent
chromosomes may or may not be replaced back to the current
population. If it is, the selection is with replacement, otherwise,
selection without replacement. In the selection with replacement,
the fitness of the parent is unchanged after being selected. In the
selection without replacement, the fitness of the parent is set to 0
to eliminate the chance to be selected again. It is also
recommended that the selection types in all GAs be the same.

The first two notices indicate that a tight k-Sn-1 snake with less
unavailable nodes in vicinity V1 would be more likely to produce
a longer k-Sn snake. The third notice indicates that the number of
unavailable nodes in vicinity Vk-1 can be used to distinguish the
ability of a k-Sn-1 snake in extending into a longer k-Sn snake. If
two snakes k-Sn-1 have the same length but different number of
unavailable nodes in vicinity Vk-1, the snake with the larger
number would have higher ability to extend into a longer snake
(since nodes in Vk-1 will become available nodes in Qn). The
fourth notice indicates that there are 2 available nodes in Qn for
every available node in Qn-1. These important properties will be
incorporated in calculating the fitness function in the MGA.
The replication of the snake path and unavailable paths in the left
half of the hypercube (Qn-1) into the right half, in preparing for
extending k-Sn snakes in Qn is analogous to the natural mitosis

4.4 Crossover
Each time two parents are selected, their chromosomes are
interchanged to generate new chromosomes that are different but

833

retain some of their characteristics (crossover operator). A


number from 0 to 1 is randomly generated and the crossover
operator is then carried out only if the generated number is
smaller than the predetermined crossover probability. There are
two basic crossover techniques: one-point crossover and twopoint crossover, which can be utilized by MGA. Depending on
the type of crossover, one or two numbers between 1 and
chromosomes length are randomly selected as the crossover point
or points. In one-point crossover, two parent chromosomes
beyond the crossover point are swapped, rendering two children.
In two-point crossover, the part of the chromosome between the
two crossover points are swapped to create two children. The two
children chromosomes will then be copied to the new population.
If the crossover operation is not carried out, the two parent
chromosomes will be copied instead. The type of crossover can
be different for each GA. Using two-point crossover in MGA
tends to find the efficient solution to the problem more often than
using one-point crossover.

dimensional hypercube that is different than Qn-1. For example, a


series of GA can be applied to an initial population of transition
sequences of k-Sn-2 instead. In this case, the replications and
extensions must be repeated twice and the GA must be applied
three times to evolve the populations of transition sequences of kSn-2, k-Sn-1 and k-Sn.

The predetermined crossover probability can be different in each


GA, but it is often set to be the same.

To avoid adjacency violations during crossover and mutation


operators, the transition sequence is used as the representation for
each individual chromosome in the population. Each transition
sequence is stored in a one-dimension array whose length is at
least 5 units longer than the best previously known value. For
example, the best previously known values for 3-S8 and 4-S8
snakes are 35 and 19, thus the length of chromosomes for these
snakes are set at 40 and 24 respectively. When a new recordbreaking k-Sn is found, the number of available nodes will be
determined. If there are m available nodes left, the array length
for this k-Sn will be reset to the sum of the length of the longest
found maximal plus m. For example, the array length for 3-S10 is
initially set at 108. When a maximal with length 108 is found,
there are 17 available nodes left. Thus the array length is reset to
125 and the MGA is rerun with the same settings.

5.1 Representation

4.5 Mutation
A common single point mutation is used in MGA. It involves a
probability that an arbitrary gene in the chromosomes of the new
population will be changed from its original state. Each gene in
the chromosomes of the new population is visited by the
algorithm, and a number from 0 to 1 is randomly generated. If the
generated number is smaller than the predetermined mutation
probability, the gene will be replaced with a different gene. The
predetermined mutation probability can be different in each GA;
but it also is often set to be the same.

4.6 Repeating Time


Since the random function is called on many times in GAs
procedures, it is likely to produce a different solution in every
run. In order to increase the reliability of producing the same
result and the chance to get the optimal solution, the MGA
procedure was set to repeat a predetermined number of runs. The
best solution found in each run is stored in a variable local-best.
The best of all the local-best values is the solution. For example,
if the repeating time is set to 5 and the local best values are 46,
47, 48, 47, and 49, then the solution is 49. Thus the higher is the
repeating time, the higher is the reliability, but the longer is the
searching time.

5.2 Initial Population


By definition a k-spread snake is an induced path in an undirected
graph G such that every pair of vertices of at least distance k in
the path are also at least distance k in G. This indicates that if C is
the transition sequence for a k-Sn snake with length L > 2k, then
all the numbers in every block of (k + 1) consecutive numbers of
C are distinct (Douglas Remark [4]). For example, the four
transitions of the 3-S6 snake in Figure 2 are 0123, the integer that
can be assigned to the fifth gene can only be either 0, 4, or 5. If
the integer 0 is chosen for the fifth, then the integer that can be
assigned to the sixth must be either 4, or 5. Lets assume that the
transition sequence 012304 is formed. It can be seen that every
four consecutive numbers in the transition sequence are distinct.
Thus the following procedure is used to generate each
chromosome in an initial population of snakes in canonical form.
1) Assign integers from 0 to k to the first (k + 1) genes.
2) Randomly assign an integer that is different than k previous
genes for the next gene.
3) Repeat step 2 until all genes are assigned.

A hypercube Qn consists many lower dimensional hypercubes,


thus finding the optimal k-spread snake in Qn can be viewed as
finding many optimal k-spread snakes in many lower dimensional
hypercubes Qn-2 or Qn-1 which will then be combined or extended
to obtain the optimal k-spread snake in Qn. Thus MGA is well
suited to tackle SIB problem.

5. MGA IMPLEMENTATION
Figure 3 illustrates the MGA procedure in searching for the
longest maximal k-Sn. First, an initial population of transition
sequences of k-Sn-1 is generated. The GA is applied to it to find a
population of near optimal k-Sn-1 which will then be replicated
and extended to form an initial population of transition sequences
of k-Sn. The GA will be applied again to this new initial
population to find the longest maximal k-Sn, the solution to the
problem. The first initial population can be from any lower

5.3 GA Operators
In this study, both Tournament-4 and Tournment-6 selections
with and without replacement are used. Both one-point crossover
and two-point crossover are utilized. Since the representation of
the chromosome is the transition sequence, there is no special
works needed to perform during or after the crossover operators to
keep the node adjacency intact. Single point mutation is used.

834

selected. If the snake is extended with the extension H, the


transition 5 will be selected. If the snake is extended with the
extension R, either 0, 4 or 5 will be selected (based on a generated
random number). The transition 0 leads to the node e0 that has 2
free neighbors n0, both transitions 4 and 5 lead to nodes e4 and e5
that have 3 free neighbors n4 and n5 respectively. Thus if the
snake is extended with the extension F, the transition 0 will be
selected since it has the lowest free neighbors.

There are two different procedures to carry out these GA


operators. First procedure uses the selection with replacement,
and all GA operators are carried out repeatedly until the entire
new population is created. Second procedure uses the selection
without replacement to select members from the current
population and then directly copied them to the first sixty percent
of the new population. The single point mutation is then applied
to this part of the new population. After that, each pair of mutated
members will be selected in order from the top down to be
crossed over and copied to the remaining 40 percent of the new
population. For example, the crossover will be carried out on the
first and the second members in the new population to create the
61st and 62nd members. Ultimately, the fitness function is carried
out before and after crossover as well as mutation in the second
procedure.

The variable count increases by an increment each time an


available transition is selected. The extension stops when
available transition is no longer found and the value of the
variable count is reported as length L in the fitness function.
Since the MGA is a series of GA applied to different populations
of k-Sn-1 and k-Sn transition sequences in search for the longest
maximal k-Sn, the fitness function for each GA will be different.
Depending on the population where the GA is applied to, either
one of the two fitness functions below is used:

5.4 Fitness Function


After a new population is created, a transition sequence of each
member is mapped to node sequence before calculating snake
length. The function to calculate the snake length is outlined
below:
1.
2.
3.
4.

5.

Fitness = length + E

(1)

Fitness = length

(2)

Where E is the least potential extended snake nodes in the nexthigher-dimensional hypercube.
Function (1) is used in GAs to evolve k-Sn-1 populations in lower
dimensional hypercubes, which will be used further to extend into
k-Sn population. Thus this function must account for the ability of
k-Sn-1 to extend into k-Sn. Each time that one more node is added
to the snake path in the extension procedure, (n-k)*(k-1) nodes
will become unavailable. Based on the notices in the hypercube
fundamentals above, the total available nodes in Qn is equal to the
sum of all unavailable nodes in vicinity Vk-1 plus two times the
available nodes in Qn-1. The least potential extended snake nodes
in the next higher dimensional hypercube E roughly equals to the
ratio of the total available nodes in Qn over (n-k)*(k-1).
Therefore

Set one-dimensional array Status of length 2n to zero which


represents available nodes.
Mark Status of the first node in the chromosome as n which
represents snake nodes. Set variable count that represents
snake length to 0.
Check whether Status of the next node in the chromosome is
available. If it is not, return count; otherwise, increase count.
Mark Status of the next node as n (represents snake node)
and of its neighbors within (k-1) distances to an integer that
represents its distance from the snake node, e.g. mark 1 for 1
distance, (k-1) for (k-1) distances
Return to step 3.

The snake length calculation stops when the next transition in the
chromosome leads to an unavailable node. Since the subject of
this study is to find the longest maximal k-Sn snake, an extension
procedure will be applied to the snake to determine whether it
could be extended and if so what the extended length would be.
The extension procedure begins with the calculation of the
number of available transitions from the snake tail. If the number
of available transitions is zero, the snake is maximal (cant be
extended), and the length of the maximal k-Sn snake can be
reported in the fitness function. If the number of available
transition is greater than zero, replace the unavailable transition
with an available transition. Either one of four extension
procedures listed below can be used to select an available
transition:

E = (2*available nodes + unavailable nodes in Vk-1) / [(n-k)*(k1)]


Function (1) will distinguish between two equal-length k-Sn-1
snakes that can extend into two different length k-Sn snakes by
using E.
Function (2) is used in the last GA to evolve the k-Sn population
which contains the solution to SIB problem. Thus Function (2)
only utilizes the length of the longest maximal k-Sn.

5.5 Experiment Settings


In this study, the population sizes were initially set at 500, 1000,
1500, 2500, and 3000; and the number of generations at 50, 70,
80, 90, 110, 120, and 150. When the length of the k-Sn being
searched for is found to be shorter than the best previously known
value, the population size were increased to 5000, 10000, 20000,
and 30000; and the number of generations to 170, 200, and 300 in
an attempt to extend few more nodes. This action was taken
based on the assumption that the larger the population and the
number of generations, the better is the chance to find the longest
maximal. The crossover probability was set at 0.6 and 0.8. The
mutation probabilities were set at 0.02, 0.01, 0.005, 0.009, and
0.0001. The number of generations was set to 50 in GAs in
search for better k-Sn-1 populations (partial solutions). The
repeating time was initially set at 20, 50, 100 and 200. Later,
only the repeating time of 20 is used in high Qn, (n>10).

Extension F: selects the available transition that has the lowest


number of free neighbors.
Extension H: selects the available transition that has the highest
coordinate.
Extension L: selects the available transition that has the lowest
coordinate.
Extension R: randomly selects one available transition from the
set of available transitions.
For example, the snake with the transition sequence 0123 in
Figure 2 can only be extended with either the available transition
0, 4 or 5. The available transition 0 has the lowest index
(coordinate) and the transition 5 has the highest index. If the
snake is extended with the extension L, the transition 0 will be

835

selection with replacement than the selection


replacement, and especially in high dimensional Qn.

The MGA was written in VB.net (using Microsoft Visual Studio


2008 package) and run on five processors with 2.0 GHz Pentium
Dual-Core CPU.

Even though the MGA found the optimal 2-S7 of length 50 and
the maximal 2-S8 of length 97 within 5 minutes, it could not find
the maximal 2-Sn >8 snakes near the previously known records. It
found 2-S9, 2-S10, 2-S11, 2-S12 and 2-S13 of lengths 185, 350, 595,
1033, and 1887 while the previously established records are 190,
370, 695, 1274, and 2466 respectively. This indicates that in order
to be more effective in search for these 2-spread snakes, the
parameter settings in GAs, the fitness functions and/or the MGA
procedure should be modified.

6. RESULTS
A summary of the results is given in Table 1, where the lengths of
the best known k-Sn values for dimension n 13 and spread k
5 are listed. The values in parentheses are the best previously
known values published in [2, 6, 8, and 15]. ]. Recent tests at
http://ai.uga.edu/sib/sibwiki/doku.php/records discover longer 2S11, 2-S12 and 2-S13 but results have not been published yet.
Table 1. Longest known
values in parentheses)
Dimension
n
2
6
(26*)
7
(50*)
8
(98)
9
(190)
10
(370)
11
(695)
12
(1274)
13
(2466)
* optimal value

k-Sn snakes. (Best previously known


Spread K
3
4
(13*)
(8*)
(21*)
(11*)
*
(35 )
(19*)
(63)
(28*)
125 (103)
(47*)
158 (157)
(68)
(286)
(104)
509 (493)
(181)

without

7. DISCUSSION
It is noticed that the extension type, the length of chromosomes,
the number of replication, the population size, the number of
generations, the probability of crossover and mutation, all affect
the MGAs ability to find the longest k-Sn snakes.

5
(7*)
(9*)
(11*)
(19*)
(25*)
(39*)
(56)
(79)

7.1 The Effect of the Extension Types


The results show that different extension procedures have
different effects in finding the longest snakes in different spreads.
Extension R (randomly selects an available transition to extend)
can find longer 2-S8 snakes than the other extensions can.
However, for Qn with n > 8, the extension F (extending with the
available transition that has the lowest number of free neighbors)
can find longer 2-spread snakes than the others can.

The Mitosis Genetic Algorithm found three new record-breaking


3-S10, 3-S11 and 3-S13 of lengths 125, 158 and 509 within 7, 17
and 87 minutes of search time respectively. The previous best
known values are 103, 157 and 493 which were found by starting
from a longest known coil and backtracking for seven days [2]. It
is worth noting that the MGA was able to find these records
relatively quick without seeding. A list of transition sequences of
these three record-breaking snakes is provided in the appendix.

Extension L or H (extending with the available transition that has


the lowest or highest index respectively) can find much longer 3Sn snakes than Extension F or R can. The record-breaking 3-S10
was found by either using Extension L or H. The record-breaking
3-S11, and 3-S13 were found by using Extension L only. It is
noticed that a special repeating pattern (a _ b _ a _ b) occurs in the
transition sequences of the 3-S10, and 3-S13 produced by Extension
L and H, but does not occur by Extension F and R (where a b
and the underscore _ is different than a, and b).

The MGA also found all the previously known optimal k-Sn
snakes and the best previously known maximal 3-S9 of length 63.
Any optimal snake of length shorter than 26 was found within a
few seconds and longer ones within a few minutes. DFS took
only few milliseconds to find the former and days to find the
latter [2, 8]. The MGA found the optimal 2-S7 of length 50
within 5 minutes, much shorter than the days needed by the GA in
[13]. These results indicate that the DFS is best suited in search
for the optimal(s) shorter than 27 and the MGA for the optimal
and longest snakes longer than 26.

Similar effect of the extension types on spread-2 was found on


spread-4: Extension R can find longest 4-Sn snakes in low Qn and
Extension F in high Qn. In all cases, the extension R tends to find
longer 5-Sn snakes than the other extensions can.
In summary, Extension F works best for spread-2 and spread-4, H
and L extensions for spread-3, and Extension R best for spread-5
in the search for the longest maximal k-spread snakes in high Qn.

The MGA found the nearest values to the best previous known
maximal 2-S8, 4-S11, 4-S12, 5-S10, 5-S11, 5-S12 within minutes to
hours. In the attempt to reach the best previously known value,
the search for these snakes were repeated with much larger
population sizes and larger numbers of generations. The results
showed that the snake lengths did not improve and in some cases
were actually shorter. Thus, the assumption that the larger the
population size and the number of generations, the better is the
chance to find the longest maximal is not always true, especially
in the excessively large population sizes. This might have been
due to the convergence in the selection operator. When the
population size is too large, duplicate members are more likely to
happen. Duplicate better fit members will quickly flood the
next population which causes premature convergence, and the
snake closest to the longest will consequently be found instead of
the longest. This phenomenon happens more often in the

In most cases, Extension H tends to find shorter k-Sn snakes than


the others, except in searching for 3-Sn. To understand why, lets
examine the snake with the transition sequence 0123 in Figure 2.
If the extension H is applied, transition 5 will be selected and the
next extending node will be e5 which has 3 free n5 neighbors.
One of these n5 nodes (marked n5 and n4) will subsequently
become a snake node in the next extension and the other two
nodes will become unavailable. Similarly, if the snake is
extended with Extension L or F, transition 0 will be selected and
the next extending node will be e0 which has 2 free n0 neighbors.
One of these n0 nodes (marked n0 and n4) will subsequently
become a snake node and the other will become an unavailable
node. At this state, two 3-Sn snakes have the same length but
different number of available nodes. The snake extended with H
has one less available node than the others. If the extension
repeats many times as often happens in high dimensional Qn, H
will end up with much less available nodes. Thus, the snake

836

During mutation, each gene is determined whether to be mutated


or not, therefore the search time for m also is proportional to the
length of the chromosome l and to the population size p.

repeatedly extended by H will more likely be shorter since the


snakes ability to extend depends on the amount of available
nodes.

m = lp

7.2 The Effect of the Number of Replications


In this experiment, the optimal 2-S6 of lengths 26 and the optimal
2-S7 of lengths 50 can only be found when the number of
replications is set to 0 and 1 respectively. The maximal 2-S8 of
length 97 and 2-S9 of length 185 can only be found when the
number of replications is set to 2. This correlation between the
number of replications and the length of the snake being searched
for are also observed in other spread. In general, in order to find
the optimal, or the longest maximal k-Sn snakes, the number of
replications should be set based on the length of the snake being
searched for. Zero should be set for lengths less than 26, one for
lengths between 27 and 65, and greater than one for lengths
greater than 65.

Replace s, c and m in OGA(t)


OGA(t) = prg (6p + lp + lp)
OGA(t) = prg (6p + 2lp)
OGA(t) = 2p2rg (3 + l)
When l is large, 3 + l ~ l, thus
OGA(t) = 2rgp2l
The MGA repeats the GA operations R times (R represents the
number of replications). Since the number of generations is
usually set to 50 for any dimension less than n, the search time for
the MGA is
O(t) = (50*R + g) 2rp2l

7.3 Search Time


In general, the search time for k-Sn snakes is longer than the
search time for k-Sn-1 snakes because the length of k-Sn snakes are
longer and the number of vertices in Qn is twice as much as Qn-1.
However, the results show that the search time for k-Sn-1 snakes
with large population size is generally longer than the search time
for k-Sn snakes with lower population size. The results also show
that for the same settings, the search time for k-Sn snakes is more
than double the search time for k-Sn-1 snakes, even though the
number of vertices in Qn is twice as much. For example, with
setting of population size at 5000 and number of generations at
180, the search time [O(t)] for finding 2-S13 is about 15 hours,
while the search time for finding 2-S12 is 4 hours. It is almost four
times more.

(3)

For any Qn, the maximum length of k-Sn snakes must be less than
the depth of the search tree, 2n/k. Thus
O(t) < (50*R + g) 2rp2(2n/k)
O(t) < (50*R + g) rp2(2n+1/k)

(4)

Functions (3) and (4) indicate that the MGA search time is
linearly proportional to the number of generations, the repeating
time, the length of the chromosomes, and the square of the
population size, and grows exponentially with n. Thus, when n is
raised by one increment, the number of vertices double and the
search time grows exponentially with n (not double).
The results show that the real search time for 4-S10 is 35 minutes,
much longer than the search time for 2-S7 of 5 minutes, even
though the length of chromosome representing 4-S10 (52) is
slightly shorter than the length of chromosome representing 2-S7
(55) and the population size in the search for 4-S10 (3000) is
smaller than the population size in the search for 2-S7 (5000). The
search time functions above did not include the run time needed
to mark unavailable nodes each time a snake node is assigned or
extended in the chromosome. In general, the higher the spread,
the more unavailable nodes need to be marked. The number of
unavailable nodes that need to be marked in the MGA are:

Lets look at the search space and the search time of the MGA to
understand why. The search space is clearly proportional to the
population size p, the number of generation g, and the repeating
time r. Thus
O(n) = pgr
Each time a new generation is built, the selection, crossover and
mutation operators in the GA are carried out, therefore the search
time in the GA can be formulated as followed:
OGA(t) = prg(s+c+m)

(n - 1) for spread-2,
(n - 1)*(n - 2) / 2 for spread-3,
(n - 1)*(n - 2)*(n - 3) / 4 for spread-4, and
(n - 1)*(n - 2)*(n - 3)*(n - 4) / 8 for spread-5.

Where

p is the population size.


g is the number of generation.
r is the repeating time.
s is the search time during the selection operator.
c is the search time during the crossover operator.
m is the search time during the mutation operator.
If the selection is Tournament-6, the operator will locate the best
fit chromosome out of six randomly selected chromosomes. Thus
the search time for the selection is

Thus for the approximately same length of chromosomes


Ok(t) = (n - k - 1) / 2 Ok-1(t)
Where Ok(t) and Ok-1(t) are search time for k-spread snakes and
(k-1)-spread snakes respectively. Thus, the search time for higherspread snakes is much longer than the search time for the lower
spread snakes even though the length of the higher-spread snakes
is slightly shorter than the lower spread snakes.

s = 6p
Accordingly, s = 4p for Tournament-4 selection.
During the crossover operator, each gene in two parent
chromosomes will be copied to the next generation despite the
type of crossover operator. Thus the search time for c is
proportional with the length of the chromosome l and the
population size p.

The facts discussed in this section have been utilizing to modify


the MGA procedure and/or alter the experiment settings to
improve the speed as well as the results in search for k-Sn snakes
which is the subject of another paper. For example, by applying
Function (3), the search time for 2-S8 snake of length 97 can be
improved from 5 minutes to 1 minute by changing the settings of

c = lp

837

the population size from 5000 to 500 and of the number of


generations from 150 to 400. Preliminary results are promising
but more tests need to be done.

[2] Carlson, B. P. and Hougen, D. F., Phenotype feedback


genetic algorithm operators for heuristic encoding of snakes
within hypercubes, in Proceedings of GECCO10, Portland,
Oregon, ACM, (2010), 791798.

7.4 Advantages of Using MGA

[3] Casella, D. and Potter, W., New lower bounds for the snakein-the-box problem: Using evolutionary techniques to hunt
for snakes and coils, in Proceedings of the Florida Artificial
Intelligence Research Society Conference, (2005).

MGA is a very versatile genetic algorithm. It has many


advantages over other GAs. It functions the same as GA when
the number of replications is set to 0, while GA cannot function as
MGA. In solving the same optimization problem such as SIB,
MGA can utilize many fitness functions in different parts of the
problem to get better results in a relatively quick run time. To
find the optimization solution to the problem, GA can use only
one fitness function for the whole search. Generally, the use of
GA requires longer processing times and may achieve the same
results as in MGA.

[4] Douglas, R. D., Some Result on the Maximum Length of


Circuits of Spread K in the d-Cube, Journal of
Combinatorial Theory 6, (1969), 323-339.
[5] Hiltgen, A. P. and Paterson, K. G., Single-track circuit codes,
IEEE Transactions on Information Theory, 47, (2000), 2587
2595.
[6] Hood, S., Recoskie, D., Sawada, J., and Wong, C., Snakes,
coils, and single-track circuit codes with spread k, Journal of
Combinatorial Optimization (May 2013).

8. CONCLUSIONS
The MGA found three new record-breaking 3-spread snakes in
Q10, Q11 and Q13, all the previously known optimal snakes from
spread 2 to spread 5, and the known longest maximal 3-S9 snake
of length 63. It is remarkable that it found those within minutes
to hours without using any longest previously known snake to
seed. This proves that MGA is a very effective technique in
tackling SIB problem. Modifications to the MGA procedure or
settings have been researched to improve its search for 2-spread
snakes in dimensions higher than 8.
Preliminary results are
promising but more tests need to be done before those results can
be reported.

[7] Kautz, W. H., Unit-distance error-checking codes. IRE


Trans Electronic Computers, (1958), 179180,.
[8] Kinny, D., A New Approach to the Snake-In-The-Box
Problem, European Conference on Artificial Intelligence
(ECAI), (2012).
[9] Klee, V., The use of circuit codes in analog-to-digital
conversion. In Graph Theory and its Applications. B. Harris,
Ed. New York: Academic, (1970).

9. APPENDIX

[10] Kochut, J., Snake-in-the-box codes for dimension 7,


Combinatorial Mathematics and Combinatorial
Computations, 20, (1996), 175185.

The transition sequence for each new record discovered by this


study is listed below. The symbols for transition sequence are
from 0 to 9 for the bit positions 0 to 9, and A to C for the bit
positions 10-13.

[11] Livingston, M. and Stout, Q. Distributed resources in


hypercube computers. In Proceedings 3rd Conference on
Hypercube Concurrent Computers and Applications, ACM,
(1988), 222-231.

A 3-spread snake of length 125 in Q10 is presented below:

[12] Paterson, K. G. and Tuliani, J. Some new circuit codes,


IEEE Transactions on Information Theory, 44(3), (May
1998), 13051309.

81748675837486728175847683758470827684738576847182738
67485738679817486758374867281758476837584708276847385
7684718273867485738

[13] Potter, W., Robinson, R., Miller, J., Kochut, K., and Redys,
D. Using the genetic algorithm to find snake-in-the-box
codes. In Proceedings of the 7th International Conference on
Industrial & Engineering Applications of Artificial
Intelligence and Expert Systems, (1994) 307-314.

A 3-spread snake of length 158 in Q11 is presented below:


01234015206130281072401320516024103290152701320416250
13208140271032501426013291A0241032801742013251420612
30917201530241056201328041720132504236029150271032410

[14] Rajan, D. and Shende, A. Maximal and reversible snakes in


hypercubes. 24th Annual Australasian Conference on
Combinatorial Mathematics and Combinatorial
Computation, 1999.

A 3-spread snake of length 509 in Q13 is presented below:


0712051304120516071203150412031906150214031502170A15
04120315041807120513041205160712031504120319061502140
3150217061504120315041B071205130412051607120315041203
1806150214031502170A15041203150419071205130412051607
120315041203180615021403150217061504120315041C0712051
304120516071203150412031906150214031502170A150412031
50418071205130412051607120315041203190615021403150217
061504120315041B0712051304120516071203150412031806150
214031502170A150412031504190712051304120516071203150
412031806150214031502170615041203150

[15] Wynn, E., Constructing circuit codes by permuting initial


sequences, arXiv:1201.1647v, (Jan. 2012).
[16] Yehezkeally. Y., and Schwartz, M., Snake-in-the-box codes
for rank modulation, (2011). Available at
http://arxiv.org/abs/1107.3372
[17] Zinovik, I., Chebiryak, Y. and Kroening, D., Periodic orbits
and equilibria in glass models for gene regulatory networks,
IEEE Transactions on Information Theory, 56(2), (Feb
2010), 805820.

10. REFERENCES
[1] Abbott, H. L. and Katchalski, M., On the construction of
snake in the box codes, Utilitas Mathematica, 40, (1991),
97116.

838

Potrebbero piacerti anche