Sei sulla pagina 1di 6

Codes and ciphers

Although the words are often used loosely we shall distinguish between codes and
ciphers. In a code common phrases, which may consist of one or more letters,
numbers, or words, are replaced by, typically, four or five letters or numbers,
called code groups, taken from a code-book. For particularly common phrases or
letters there may be more than one code group provided with the intention that
the user will vary his choice, to make identification of the common phrases more
difficult. For example, in a four-figure code the word Monday might be given three
alternative code groups such as 1538 or 2951 or 7392. We shall deal with codes in
Chapter 6. Codes are a particular type of cipher system but not all cipher systems
are codes so we shall use the word cipher to refer to methods of encipherment
which do not use code-books but produce the enciphered message from the
original plaintext according to some rule (the word algorithm is nowadays preferred
to rule, particularly when computer programs are involved). The distinction
between codes and ciphers can sometimes become a little blurred, particularly for
simple systems. The Julius Caesar cipher could be regarded as using a one-page
code-book where opposite each letter of the alphabet is printed the letter three
positions further on in the alphabet. However, for most of the systems we shall be
dealing with the distinction will be clear enough. In particular the Enigma, which is
often erroneously referred to as the Enigma code, is quite definitely a cipher
machine and not a code at all.
Characteristics of codes As was mentioned in Chapter 1, the distinction between
codes and ciphers is not always clear, but one might reasonably say that whereas
most codes tend to be static most ciphers are dynamic. That is to say that a letter
or phrase enciphered simply by means of a code will produce the same cipher each
time the code is used, whereas a letter or phrase enciphered by a cipher system
will generally produce different cipher text at different times. This is because most
cipher systems involve one or more parameters, such as keywords or, as we shall
see later, wheel settings, which are changed at regular or irregular intervals and so
cause the cipher outputs from the same plaintext to be different. The basic
mechanism, or algorithm, for generating the cipher doesnt change, but the
parameters do. In general, a code has no such parameters though the entire code
may itself be changed, in which case it becomes a different code. In practice this is
achieved by issuing a new code-book every now and then. Using this criterion the
Julius Caesar cipher would be classed as a code, because the encipherment of a
fixed letter across many messages is invariably the same. We can, however, say
that there is a parameter associated with Julius Caesar ciphers, namely the shift,

which gives us 25 different ciphers and if the value of the shift is somehow
incorporated in the message, i.e. in the indicator, the Julius Caesar system can
reasonably be considered as a cipher, not as a code. Example 6.1 Although there
are much earlier examples of codes the one devised by Samuel Morse (17911872)
in 1832 for the purpose of transmitting messages by telegraphy is probably the
best-known. In this code the letters of the alphabet are represented by up to four
dots and dashes, the digits 0 to 9 by five and certain punctuation symbols by
six. To transmit a dot the telegraph key is depressed for about one 24th of a
second; for a dash the key is depressed for about one 8th of a second; the interval
between the components of a letter is the same as that for a dot and the interval
between letters is equal to that for a dash. The Morse code was designed so that
the most frequent letters in English had shorter transmission times than the less
frequent letters. Thus E was represented by a single dot and T by a single dash
whereas J required four symbols, dot dash dash dash. The reason for this was to try
to minimise the time required to transmit a message. The international wireless
version of the letters of the Morse code is shown in Table 6.1.
The Morse code was not, of course, designed to protect the secrecy of a message
but merely to provide a means for transmitting it efficiently. A good wireless
operator using this code would be able to transmit about 30 average words per
minute. As was mentioned in Chapter 1, there are other codes which are designed
to ensure the accuracy of messages or data rather than to preserve the secrecy of
their contents. Among such codes are those used to transmit data from spacecraft
or to store data in computer-readable form. If secrecy is not needed the details of
the code will usually be available to anyone who wants them. If secrecy as well as
accuracy is required the details may not be made public and some form of
encryption of the data will also be applied. One-part and two-part codes Most
codes involve the use of a code-book, which may contain thousands of code
groups. A code used by the military would typically represent letters, numbers or
phrases by code groups consisting of four or five letters or digits. It is not
necessary that all the code groups contain the same number of symbols; the
famous Zimmermann telegram of January 1917, which was deciphered by British
cryptanalysts and which was a Codes 65 major factor in Americas decision to
enter the War, had a mixture of code groups of both four and five digits [6.1]. The
main advantage of a code is that it can provide many code groups; up to 10
thousand for a four-digit code and nearly 12 million for a five-letter code. The
disadvantages are that (i) it is necessary for the users to carry the code book(s)
with them and (ii) if the enemy acquire a copy of the book, either by capture or by
breaking the code, reading of future messages is straightforward. For these

reasons codes are more likely to be employed by embassies or large military units,
such as ships, than by individuals. Breaking of a code is made very much easier if
the code is a one-part code which means that the same code-book is used for both
encipherment and decipherment. If this is so the code groups for words or phrases
which are close in a dictionary will be close to each other numerically. Thus a
section of a four-digit one-part codebook might look like Table 6.2.
Table 6.2
A 0001
ABLE 0013
AFTER 0023
AM 0051
AN 0075
AND 0078
ANY 0081
AS 0083
ASK 0091
AT 0097
Not all of the 10 thousand possible code groups would normally be used. Gaps
would probably be left which would allow other words or phrases to be inserted
later if desired.

AM 9075
AN 6948
AND 4729
ANY 8532
AS 4271
ASK 2163
AT 1894
From a cryptographic point of view a one-part code offers the cryptanalyst too
great an advantage by enabling him to guess at the meanings of as yet
unidentified code groups simply by looking for plausible words in a dictionary
which are close to identified words. This weakness can be removed by making the
numerical ordering of the code groups unrelated to the alphabetical or numerical
order of the codewords. We then have a two-part code and the users need two
code-books, one for encipherment and one for decipherment. The codewords
above might then appear like Table 6.3 in the encipherment book, whilst the
decipherment book might begin as in Table 6.4 and so on.
Table 6.4
0005 TOMORROW
0009 ATTACK
0014 COME

Table 6.3
A 5832
ABLE 2418
AFTER 6941

In all cases, it is likely that very common codewords would be allocated more than
one code group and the users instructed to use each of the alternatives in a
random manner. Although codes which have not been subjected to further
encipherment do not offer a high level of security they have been used in wartime;
the Italian Navy used a one-part code, known as Mengarini [6.2], for some very low
grade messages, and the Japanese Navy used a two-part code, known to them as
OTSU and to the British as JN4 [6.3], during the Second World War. A somewhat

different code using two letters followed by four digits was used by U-boats of the
German Navy to report their positions in the Atlantic and to receive instructions for
attacking Allied convoys. The letters were subjected to digraph substitution tables
and the digits could also be modified [6.4]
Code plus additive No matter how many code groups a code contains, a
cryptanalyst, given enough messages, will eventually find certain groups occurring
more than once, even when the same plaintext word or phrase has several
alternative code groups allocated to it. Also, if the code-book is captured by the
enemy, decryption of all messages becomes trivial. To overcome these weaknesses
the code groups themselves are usually enciphered. A standard way of doing this is
to apply an additive key to the code groups using non-carrying, or modular,
addition. Although this has been mentioned Codes 67 before, to remind ourselves
how it is done, let us look at an example. Suppose that we have the code group
6394 and that the key to be applied to it is 2798; then the code group is written
down, the key placed directly underneath it and corresponding digits are added
without carrying so that when we add the last digits of the code and key, 4 and 8,
the sum is written as 2, not 12 (that is: we are adding digit by digit (mod 10)). So
we have
Code group 6394
Key 2798
Sum 8082
and the cipher text is 8082. The key would not be the same for the other groups
since in practice the key either would not repeat at all or, if it did repeat, would
only do so after many digits. Since encryption involves adding the key to the code
groups the person receiving the message would have to subtract the key digit by
digit (mod 10) from the cipher in order to recover the code groups and so decipher
the message; thus:
Cipher 8082
Key 2798
Code group 6394

Obviously the code groups are now disguised, and the security of the system is
substantially increased provided that the key does not repeat for a sufficiently long
period. The question of how to produce sequences of digits which do not repeat
until many thousands have been generated is one of considerable interest to
mathematicians and cryptographers, and we consider it more fully in Chapter 8. In
the mean time, by way of illustration, here is a very simple method, which
generates a sequence which repeats after 60 digits. Example 6.2 Generate a
sequence of digits (mod 10) by starting with the digits 3 and 7 and forming each
new digit by adding together the two previous digits (mod 10).
Solution
The sequence starts 3 7
so the next digit is (37)10 which is 0 (mod 10) and the 4th digit is (70)7 and so the
5th digit is (07)7. Continuing in this way we find that the sequence which is
generated is 68
37077415617853819099875279651673033695493257
2 9 1 0 1 1 2 3 5 8 3 1 4 5 9 4 ...
and the sequence begins to repeat after 60 digits, as indicated by the underlining
of the last 3 digits, which are the same as the first 3. Since each digit is the sum
(mod 10) of the previous 2 the key will begin to repeat when 2 digits occur which
have already occurred in the same order earlier in the sequence. It follows that any
sequence generated (mod 10) in this way cannot have a period longer than 100,
since there are only 100 pairs of digits (mod 10). The sequence of the example,
with a period of 60, is the longest available sequence in this case. Had we begun
with the first 2 terms both equal to 0 we would have produced an all-zero
sequence which, if used as a key, would leave the code groups unaltered. Although
it is the longest available by this simple method the key sequence of the example
unfortunately has certain numerical properties which make it undesirable from a
cryptographic point of view. One that is particularly bad is that two-thirds of the
digits are odd and only one-third are even, instead of there being approximately
equal numbers of each. This is because the sequence has a very simple oddeven
pattern, as we can see immediately, viz: odd odd even odd odd even.... Another
property is that double digits (77, 99, 33 and 11) occur regularly, 15 places apart.
This particular sequence is very well-known and is a special case of what is
probably the most intensively studied sequence in mathematics. The usual form of

it starts with 0 and 1 as the first 2 values and continues as in the example but
without reduction (mod 10), i.e. the terms are added normally. The sequence then
begins
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597...
The terms grow at a tremendous rate (the mathematical expression is that they
grow exponentially); each term after the 5th is more than 1.5 times bigger than
its predecessor and so, for example, the 100th term in the sequence contains 21
digits. If we reduce all these numbers (mod 10), which is the same as replacing
each of them by its last digit, we get
0 1 1 2 3 5 8 3 1 4 5 9 4 3 7 0 7 7...
which is the same as the sequence in the example but starting at position 48.
The sequence beginning 0 1 1... is known as the Fibonacci sequence; it was
introduced into mathematics by Leonardo of Pisa, also known as Fibonacci, in the
thirteenth century. For further information on this famous sequence see M5.
Problem 6.1 (1) Generate the key (mod 10) as above but starting with 0 and 2 as
the first 2 terms. What is the period and why would this key be rejected by a
cryptographer? (2) What is the period when we start with 1 and 3 as the first 2
terms?
The Fibonacci sequence is the simplest of a class of sequences that can be used for
the generation of keys for use in cryptography, although whether any particular
sequence provides good keys, in that all possible key values are equally likely to
occur, is a question which can only be answered by using some advanced
mathematics [1.2, 1.3]. A fairly obvious generalisation of the Fibonacci sequence is
obtained by forming each new term by adding together (mod 10) the 3 preceding
terms which may produce sequences of longer cycle length thus: Example 6.3
Starting with 0, 1, 1 as the first 3 terms generate the sequence obtained (mod 10)
by adding together the previous 3 terms at each stage.
Solution The sequence begins 0 1 1 2 4 7 3 4 4 1... and repeats after 124 terms.
The frequencies of the individual digits are slightly non-uniform; each should occur
12 or 13 times but 3 occurs only 6 times whereas 4 and 9 both occur 18 times.
Verification of these facts is left to the reader. If we choose three different terms for
starting the sequence we may get shorter cycle lengths: 0, 1, 2 produces a cycle of
length 62 whilst 0, 5, 0 would be a very poor choice since it generates a sequence

of cycle length just 2. This method can be used to generate keys to any modulus.
For example, the sequence which starts 0, 1, 1 to moduli 2, 3, 5 and 7 produces
cycles of length 4, 13, 31 and 48 respectively. From a purely mathematical point of
view this is interesting but, in general, cryptologists would probably use 2, 10 or
100 as the modulus. 70 chapter 6 If we generate the same sequence (mod 100) it
begins 00 01 01 02 04 07 13 24 44 81 49 74 ... and is found to repeat after 1240
terms. It is possible to modify the Fibonacci sequence so that the oddeven ratio is
somewhat reduced, making it slightly better for encryption. A modest step in this
direction is shown by Example 6.4 Generate 20 terms of the Fibonacci sequence
(mod 100) starting with 13 and 21 as the first 2 terms then interchange the second
and third digits in each group of four to give 20 terms of a two-digit key stream.
Solution The first 20 terms of the Fibonacci Sequence (mod 100) starting with 13
and 21 are 13 21 34 55 89 44 33 77 10 87 97 84 81 65 46 11 57 68 25 93 We
interchange the second and third digits in each group of four 12 31 35 45 84 94
37 37 18 07 98 74 86 15 41 61 56 78 29 53 and this is the resultant key. The bias
of odd:even numbers has now been reduced (from 2 : 1 to about 7 : 5) and the key,
though still unsatisfactory, is stronger for that. Problem 6.2 A two-digit code
represents the letters of the alphabet as follows: A17, B20, C23, ..., Z92, each
number being 3 more than the one before it. A message is then enciphered using
this code and the additive key (12 31 35...) obtained in the example above, the
addition being digit by digit with no carrying. The resultant cipher text is 86 69 42
19 60 35 08 13 76 48 23 02 50 91. Decrypt the message.
Example 7.3 Convert the alphabet to numbers beginning with A0, B1 etc. Then
add together the two texts below (mod 26) and re-convert the resulting numbers
to letters. Text 1 THEXCURFEWXTOLLSX Text 2 ONCEXUPONXAXTIMEX Solution We
repeat Table 1.1 as Table 7.1.
Table 7.1
AB CD EF GH I JK LM NO P QRS T U V WX YZ
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

We first convert the texts to numbers by using the table:


Text 1 T H E X C U R F E W X T O L L SX

19 7 4 23 2 20 17 5 4 22 23 19 14 11 11 18 23

which is Text 1, the original message.

Text 2 O N C E X U P O N X A X T I M EX
14 13 2 4 23 20 15 14 13 23 0 23 19 8 12 4 23
Now we add them and then reduce them (mod 26):

19 7 4 23 2 20 17 5 4 22 23 19 14 11 11 18 23
Sum: 33 20 6 27 25 40 32 19 17 45 23 42 33 19 23 22 46
(mod 26): 7 20 6 1 25 14 6 19 17 19 23 16 7 19 23 22 20
Finally we convert the numbers back into letters, using the table:
H U G B Z O G T R T X Q H T X WU
and this is the cipher text that would be sent. The recipient would, of
course, need to subtract Text 2 from the cipher (mod 26) in order to recover

Obviously it would be a very tedious and error-prone process to have to convert


the texts to numbers, add them, subtract 26 where necessary, and re-convert to
letters every time a message was to be enciphered so it is very worthwhile having
two tables, one for enciphering and one for deciphering, from which the result of
applying these processes can be read off immediately. Experienced users would
not need such tables since they would soon learn to add the letters at sight but
for others the tables save a lot of time and effort. They are given in Tables 7.2 and
7.3. Notice that in the encipher table (Table 7.2) it makes no difference whether we
call the 14 13 2 4 23 20 15 14 13 23 0 23 19 8 12 4 23 76 chapter 7 message text
Text 1 and the key text Text 2 or vice versa since adding the two texts gives the
same result either way (addition is commutative is the mathematical phrase to
describe this), but in the decipher table (Table 7.3) the cipher and the key must be
correctly identified since to get the plaintext we must subtract the key from the
cipher and not vice versa. This is evident when we recall that to encipher:
cipherkeytext and so to decipher: textcipherkey where the additions and
subtractions are carried out (mod 26) of course. Ciphers for spies 77 Table 7.2
Encipher table for a book cipher

Text 1, viz:
Cipher H U G B Z O G T R T X Q H T X WU
Text 2 O N C E X U P O N X A X T I M EX
Convert 7 20 6 1 25 14 6 19 17 19 23 16 7 19 23 22 20
14 13 2 4 23 20 15 14 13 23 0 23 19 8 12 4 23
Subtract (mod 26), i.e if the result is negative, add 26:

(TABLE)
message text Text 1 and the key text Text 2 or vice versa since adding the two
texts gives the same result either way (addition is commutative is the
mathematical phrase to describe this), but in the decipher table (Table 7.3) the
cipher and the key must be correctly identified since to get the plaintext we must
subtract the key from the cipher and not vice versa. This is evident when we recall
that to encipher: cipherkeytext and so to decipher: textcipherkey where the
additions and subtractions are carried out (mod 26) of course.

19 7 4 23 2 20 17 5 4 22 23 19 14 11 11 18 23
Re-convert to letters:
T H E X C U R F E W X T O L L SX

Problem 7.2 A message has been enciphered using a book cipher. The book used
was The Poems of Rupert Brooke and the key for the message was the passage
beginning STANDSXTHEXCHURCHXCLOCKXATXTENXTOXTHREE The cipher text was
LAEKV MPILG QZOUJ ZTLXP RZDLX EFOIE MHCIQ Decrypt the message.

Letter frequencies in book ciphers The frequencies of the letters of the alphabet in
the cipher produced by a book cipher when the key is a passage of English text will
be different from those of unenciphered English. Whilst it will certainly be true that
some letters, such as those letters used for space, E and T, will no longer occur
very much more frequently than others such as Z or J, it is also true that the letters
will not be equally represented. It is possible to estimate the frequencies of the
cipher letters and we find that although the frequencies of the individual letters do
not vary so much as they do in samples of normal (unenciphered) English they are
still by no means uniform and observation of this type of variation would alert the
cryptanalyst to the possibility that a book cipher was being used. This variation can
be seen in Table 7.4. In the left-hand column are the frequencies of the 26 letters
of the alphabet, and a 27th letter which covers all punctuation marks and space,
as they occur in a typical sample of normal English and, in the right-hand column,
the frequencies of the same letters as they are predicted to occur in a passage
enciphered by a book cipher using English texts. The sample size in both cases is
1000 so that on average each letter should occur about 37 times. It will be seen
that this is very far from being the case in the unenciphered text, and even in the

enciphered text there is considerable variation in the frequencies, though not a lot
more than we would expect at random (for further comments see M6). A book
cipher might be regarded as an extreme case of a Vigenre cipher in which the key
length is the same as the length of the message itself. Evidently a book cipher
ought to be more secure than a Vigenre since the latter uses a key of fixed length.
An alternative to using a key which is an English text is to use a key in a different
language but which uses an alphabet of not more than 26 letters. Diacriticals, such
as accents or umlauts, would be ignored if French or German were used. This
would make life somewhat harder for the cryptanalyst at least until he realised
what was going on.
https://plus.maths.org/content/cracking-codes
https://readandlaugh.files.wordpress.com/2014/12/thecodebook.pdf
http://www.ik4hdq.net/codici_cifr.pdf
https://nrich.maths.org/2197
http://www.wikihow.com/Decipher-a-Secret-Code

Potrebbero piacerti anche