Sei sulla pagina 1di 28

Classical Cryptography

Xu Jia, XueMingQiang, Zhang Weiqi, Zhu Qian

School of Computing, National University of Singapore

{xujia, xuemingq, zhangwei, zhuqian}@comp.nus.edu.sg

Abstract

This paper is a survey on classical cryptography. In this paper, the ciphers of classical cryptography are

mainly discussed. Starting from mathematical and information theory background, ciphers with two major

categories – substitution and transposition are analyzed with encoding, decoding algorithms, cryptanalysis

matters and some applications. At last, some of the machines used in the early age of cryptography are

discussed.

Key words: Cryptography, cipher, encryption, decryption, key, plaintext, ciphertext

Introduction

Classical Cryptography, the history of which has at least 4000 years as we know 1 , is mainly used

in diplomacy and war over centuries. However, comparing to modern cryptography which are

mainly used in computer security nowadays, most of the classical ciphers are claimed to be

vulnerable in front of today’s powerful computers.

The motivation for us to write the paper is that the classical cryptography was useful in history

and is useful in some of the recent applications as well. They give basic ideas of how people do

confusions 2 and diffusions 3 which are the properties of some secure ciphers nowadays. Moreover,

they give clues on how cryptography theory is developed along the history.

1 From http://williamstallings.com/Extras/Security-Notes/lectures/classical.html

2 confusing: an encrypting algorithm that make the original message unrecognizable.

3 Diffusion :a principle that changes in one part of the plaintext will affect many parts of the entire plaintext.

In the following sections, we are going to survey on each classical ciphers starting from

mathematical background and information theory. In our analysis, English (26 letters) is used as

the template for most of the ciphers. Similar analysis can be conducted for other languages which

have different alphabet size. For example, German language (30 letters in the alphabet).

Swedish(29 letters in alphabet).

Background Information

Mathematical Backgrounds

XOR operation

XOR operation is a binary bitwise operator who takes in two operands which could be either 0 or

1(True or False) and output 0 or 1 (or Boolean value)

XOR operation satisfies the following four rules:

Table 1: Truth table for XOR

A

B

A XOR B

0

0

0

0

1

1

1

0

1

1

1

0

Modular Arithmetic

Mod is a binary operator that takes in two integers as its operands. The result of mod operation is

the reminder of integer division of the left argument and the right argument.

Example of mod operation:

5

mod 3 =2

9

mod 5 =4

Congruence is a mathematic concept closely related with mod.

Integers a and b are called

congruent (denoted as follow) modular a non-zero integer n iff a mod n= b mod n.

follow) modular a non-zero integer n iff a mod n= b mod n. Another equivalent definition

Another equivalent definition is that integers a and b are congruent modular a non-zero integer n

iff the difference of a and b is divisible by n.

For example, 63 and 83 are congruent to each other modular 10.

example, 63 and 83 are congruent to each other modular 10. Information theory background Index of

Information theory background

Index of coincidence

Index of coincidence was discovered by Philip Friedman stated in his article “The Index of

Coincidence and its Applications in Cryptography”, Riverbank Publications Number 22.

The index of coincidence of a ciphertext measures the probability of two letters that are randomly

selected from text to be identical. It will become less when the key length goes larger. The formula

is given by:

IC =

z

a

Freq

(

)*(

Freq

(

) 1)

N

*(

N

1)

IC stands for index of coincidence, Freq(x) is the number of occurrences of symbol x in the a text,

N is the length of the text.

The value for IC ranges from 0.0384, for a polyalphabetic substitution with a perfect flat

distribution, to 0.068, for a monoalphabetic substitution from common English text.

Table 2: Number of Enciphering Alphabets Versus Index of Coincidence;

No.

of

1

2

3

4

5

10

Large

alphabets

IC

0.068

0.052

0.047

0.044

0.044

0.041

0.038

Unicity Distance

Unicity distance measures the minimal length of cipher text for which there is only one single

possible plaintext decryption. Usually the larger distance value, the better the cryptosystem is. For

example, unicity distance for substitution for English text is 27 which means given a 27 letters

long message it is possible to get a unique meaning.

U logK/RlogP

(K is the size of key space, R is the redundancy, P is the size of alphabet used)

Frequency Analysis

In most languages, certain letters, words or symbols appear at certain frequencies if the text is long

enough. Frequency analysis is based on this idea. For Example, in English text, ‘e’ is the most

frequently used letter, that means it appears at highest frequency. The differences between the high

frequency letters and the low frequency letters can be used to analyze the cipher text. In the

appendix, there are statistic data for most common used letters and digrams and trigrams.

Substitution Ciphers

In substitution ciphers one letter is replaced by another letter. There are many categories of

substitution ciphers. In this section, we are going to discuss monoalphabetic substitution ciphers,

homophonic ciphers, polygraphic ciphers, polyalphabetic substitution ciphers and the one time

pad.

Monoalphabetic Substitution ciphers

The Monoalphabetic Substitution cipher, also called as Simple Substitution cipher, is the one in

which each character in the plaintext is replaced by a corresponding one from a cipher alphabet.

The cipher alphabet can be reversed or shifted or scrambled. Although the number of possible

keys is very large (e.g, 26!-1 for English), this cipher is not very strong and considered easily

breakable by frequency analysis. However, the advantage for this cipher is that it can be

performed by direct lookup, and the time to encrypt message of n characters is proportional to n.

We are going to look at Caesar cipher in details and briefly introduce some other ciphers such as

Affine cipher and Atbash cipher.

Caesar Cipher

Caesar cipher is one of the simplest encryption methods by shifting the alphabet to a fixed number

of positions.

For example, in English, with a shifted position of 23:

Plain:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

Cipher: XYZABCDEFGHIJKLMNOPQRSTUVW

The encryption can also be transformed by modular arithmetic. Letters are transformed to

respective numbers. Let A=0, B=1, C=2, D=3 … Z=25. Encryption a letter X by shifting a

position of n can be encoded as: E(x) = x +n mod 26 while the decryption can be represented as

D(x) = x -n mod 26. Thus, in English, there are 25 different ciphers while a language with an

alphabet consisting m letters has m-1 different Caesar ciphers.

General history of Caesar Cipher

The Caesar Cipher is said to be invented by Julius Caesar to communicate with his army. He is

considered as the first person who has ever used encryption for secure messages. Although this

cipher is relatively easy to break at present, it is unlikely to be broken at that time.

Cryptanalysis of Caesar Cipher

It is quite easy to break the Caesar cipher. Take English as an example, since the key space is only

25, we can break it by hand with less than 25 tries (exhaustive key search). That is, rotate it and

see whether the resulting decoded text is readable according to English syntax and common sense.

However, with frequency analysis of English letters 4 , it becomes easier and faster to break the

cipher.

By roughly mapping the frequency distribution curve (rearrange the letters to enable the curve

increasing) of the ciphertext with the normal frequency distribution curve of normal English, we

may get a readable English text. This method works well especially for the messages with long

content.

Another way to break the cipher is the recognition of the short, commonly-used words. For

example, in English, “the”, “and” and “of” appear regularly 5 . When the cipher text includes the

spaces, the two or three letters—so called digram and trigram--- are likely to be standing out and

repeated. Trying the regularly used digrams and trigrams, it is possible to decode the cipher easily.

Besides short words, consecutive and repeated letters also give hints to break the cipher. In

English, “tt”, “ss” and “ee” are the ones commonly repeated consecutively.

Application

ROT13

ROT13 is a self-reversing Caesar cipher popularly used on Usenet and other online forums as a

means of masking joke punchlines, movie and story spoilers, and offensive expressions from the

casual glance 6 .

The name “ROT13” stands for “Rotate by 13 places”. Since there are 26=2*13 letters in English,

the ROT13 function is its own inverse:

4 Discussed in the background section

5 From http://www.all-science-fair-projects.com/science_fair_projects_encyclopedia/Caesar_cipher

6 From http://www.fact-index.com/r/ro/rot13.html

ROT 13 (ROT 13 (x)) = ROT 26 (x) = x for any text x

To apply ROT13 to a piece of text, simply shift every English letter by 13 places leaving numbers,

symbols and other characters unchanged.

ROT13 is not intended to be secure. Instead of protecting the message, ROT13 protecting the

readers from materials they may not wish to view in the forums. Thus the viewer of the message

will be the ones who consciously choose to decipher it using rotate by 13 scheme.

Affine Cipher

Affine Cipher is a special case for the substitution cipher.

The encryption function for the cipher is e(x) = ax + b mod m where a and m are relatively prime

and m is the size of the alphabet.

The decryption function is d(x) = a - 1 (x - b) mod m where a 1 is the multiplicative inverse

modular m.

The cipher is less secure in the way that if a cryptanalyst can discover two of the ciphertext

characters then the key can be obtained by solving the equations system.

Atbash Cipher

Table3 : Atbash Cipher

Atbash Cipher A B G D H V Z Ch T Y K L M
Atbash Cipher
A
B
G
D
H
V
Z
Ch T
Y
K
L
M N
S
O
P
Tz Q
R
Sh Th
Th
Sh R
Q
Tz P
O
S
N
M L
K Y
T
Ch Z
V
H
D
G
B
A

The Atbash Cipher is a simple substitution cipher in Hebrew. It substitutes the first letter by the

last one and the second letter by the second last one As shown in the table

Homophonic Substitution Ciphers

Homophonic ciphers are invented to increase the difficulty of frequency analysis attacks on

substitution ciphers. The way is to disguise the letter frequencies by homophony. Usually in this

cipher, high frequency letters are given more ciphertext symbols while the lower frequency ones

are given less. Thus, it is different from monoaphabetic cipher in the way that one letter can be

mapped to more than one ciphertext.

Book Cipher

The key of a book cipher is the identity of a book. The ciphertext of a plaintext word is the

location of the word in the book. One of the problems is that the word in the plaintext may not

appear in the book. So one of the alternative ways is to encode the plaintext letter by the location

of the letter in the book. However, when a large ciphertext is needed, the time for encoding the

message is long.

Straddling checkerboard

The Straddling checkerboard is a device to convert letters into digits.

An example of the checkerboard is like this 7 :

Table 4 : Straddling checkerboard

0 1 2 3 4 5 6 7 8 9 E T A O N
0
1
2
3
4
5
6
7
8
9
E
T
A
O
N
R
I
S
2
B
C
D
F
G
H
J
K
L
M
6
P
Q
U
V
W X
Y
Z
.

From the table A=3 B=20 C=21 …Z=68, a plaintext of A T T A C K A T D A W N becomes 3

1

3

21 27 3

1

22 3

65 5 . Then add a secret key number (0452) by non carrying addition.

3 1 1 3 2 1 2 7 3 1 2 2 3 6 5 5

+ 0 4 5 2 0 4 5 2 0 4 5 2 0 4 5 2

1

7 From http://en.wikipedia.org/wiki/Straddling_checkerboard

= 3 5 6 5 2 5 7 9 3 5 7 4 3 0 0 7

Then use the same cherkerboard to turn it into letters:

3 5 65 25 7 9 3 5 7 4 3 0 0 7

A N W H R S A N R O A E E R

Decryption is simple reverse the process.

Polygraphic Substitution Ciphers

Instead of substitute individual plaintext letters, the polygraphic Substitution Ciphers substitute for

larger letter groups. It is more difficult for cryptanalyst to use frequency analysis to break the

cipher. However, for a specific language, there are still some frequency patterns for larger letter

groups.

Playfair cipher

The Playfair Cipher is the earliest practical Polygraphic Substitution Cipher. The cipher used a 5

by five table and a key.

In

order to create the 5 by 5 table and use the cipher, one needs to

remember the key and the four rules below 8 :

If the letters of a pair are both the same (or only one letter is left), add an "X" after the

first letter. Encrypt the new pair and continue.

If the letters appear on the same row of your table, replace them with the letters to their

immediate right respectively (wrapping around to the left side of the row if a letter in the

original pair was on the right side of the row).

If the letters appear on the same column of your table, replace them with the letters

immediately below respectively (wrapping around to the top side of the column if a letter

in the original pair was on the bottom side of the column).

If the letters are not on the same row or column, replace them with the letters on the same

row respectively but at the other pair of corners of the rectangle defined by the original

pair.

Use the inverse of these four rules can decrypt the message.

8 From http://www.fact-index.com/p/pl/playfair_cipher.html

Hill Cipher 9

The Hill cipher is a polygraphic substitution which can combine much larger groups of letters

simultaneously, using linear algebra. Each letter is treated as a digit in base 26: A = 0, B =1, and so

on. A block of n letters is then considered as a vector of n dimensions, and multiplied by a n x n

matrix, modulo 26. The components of the matrix are the key, and should be random provided that

the matrix is invertible in GF(26 n ) .

Polyalphabetic substitution ciphers

In order to make substitution ciphers more secure, more than one cipher alphabet can be used to

encode a single alphabet in the plaintext. Such ciphers are called polyalphabetic substitution

cipher. Such a one-to-many correspondence makes the use of frequency analysis much more

difficult to attack.

Leon Battista Alberti invented the first published polyalphabetic cipher around 1467.[1] At the

beginning, a good polyalphabetic substitution cipher was extremely hard to break. But after the

mid-1800s when Friedrich Kasiski published the first procedure for attacking polyalphabetic

cipher, especially Vigenere cipher. 10

Vigenere cipher

This cipher is named after a Frenchman--Blaise de Vigenere. The encoding and decoding

procedures utilize a tableau rectum called Vigenere tableau and a key. A Vigenere tableau is a

collection of 26 permutations of 26 English letters. Usually, these permutations are written as a

square matrix indexed by a pair of English letters, with all 26 letters in each row and each column.

9 From http://en.wikipedia.org/wiki/Polygraphic#Polygraphic

10 In the book “Die Geheimschriften und die Dechiffrierkunst” (“Secret writing and the Art of Deciphering” in English), the polyalphabetic cipher was no longer considered as secure.

Suppose the key is K=<k(0),k(1),k(2),(3),…,k(d-1)>, where k(i) is a symbol from the alphabet

used, typically an English letter . The length of the key is d. For example, if the key is “BAD”,

then d=3, k(0)=”B”, k(1)=”A”, k(2)=”D”. The key may repeat as many times as needed because it

is often shorter than the plaintext.

Suppose the plaintext is P=<p(0),p(1),p(2),…, p(n-1)>, where n is the length of plaintext P and

each of p(i) is a symbol from the plaintext.

Suppose the ciphertext encrypted from plaintext P with key K using Vigenere cipher is

C=<c(0),c(1),c(2),c(3),…,c(n-1)> where each of c(i) is a letter and n is the length of the ciphertext.

Note that the ciphertext C and plaintext P are of the same length. This is a characteristic of

Vigenere cipher.

Denote the Vigenere tableau with Vigenere_table. Then the encryption of Vigenere cipher can be

described as:

C(i)=Vigenere_table[k(i mod d)][p(i)], 0<= i < n.

Because the Vigenere tableau is symmetric, i.e. Vigenere_table[i][j]=Vigener_table[j][i] for all

pairs of i and j, the above formula can be written equivalently as:

C(i)=Vigenere_table[p(i)][k(i mod d)].

If we code 26 letters A to Z with 26 integers from 0 to 25 respectively, mathematically we can

describe the encryption rule:

C(i)=p(i) + k(i mod d) mod 26.

Example. For the message COMPUTER SECURITY and keyword LUCKY we proceed the

encryption as follows:

Table 5:

L

U

C

K

Y

L

U

C

K

Y

L

U

C

K Y

L

C

O

M

P

U

T

E

R

S

E

C

U

R

I T

Y

For each letter of the message, we use the letter of the keyword to determine a row and go across

the row to the column headed by the corresponding letter of the message. As in the following

table (Table 6), it follows that the first two letters "CO" in the message are encoded as "NI".

Table 6: Vigenere Cipher Scheme

are encoded as "NI". Table 6: Vigenere Cipher Scheme Continuing in this way we find the

Continuing in this way we find the encoded message that appears in table 6

Table 7: Encryption of Vigenere Cipher

L

U

C

K

Y

L

U

C

K

Y

L

U

C

K

Y

L

C

O

M

P

U

T

E

R

S

E

C

U

R

I

T

Y

N

I

O

Z

S

E

Y

T

C

C

N

O

T

S

R

J

We can use the following formula to decrypt the message.

p(i)=c(i) - k(i mod d) mod 26.

Or use the reverse procedure as for encryption.

Beaufort cipher

Beaufort cipher is another polyalphabetic cipher which is very similar to the Vigenere Cipher. The

only difference is that Beaufort cipher uses reverse alphabets.

For encryption, we use C(i)= k(i mod d) - p(i) mod 26.

And for decryption we use P(i)=k(i mod d) – c(i) mod 26

Running key cipher

The running key cipher is a type of polyalphabetic substitution cipher, in which a text, typically

from a book, is used to provide a very long key stream. Generally speaking, such a book has to be

determined ahead of time, while the passage to be used as the key would be chosen randomly for

each message. Obviously, nobody except the sender knew the key if it’s not indicated somewhere

in the message. Like Vigenere cipher, running key cipher also employs Vigenere tableau. But in

running key cipher the key is not repeated, instead this cipher uses a key stream, which is as long

as the message. We need a predefined secure way to tell the recipient where to find the running

key for the message in the book.

To our surprise, the security of running key cipher is not as secure as we might image due to the

low entropy per character of both plaintext and key. The most obvious and easiest way to improve

the security is to use a predefined mixed alphabets table instead of the tableau recta (Vigener

table).

Autokey cipher

An autokey cipher incorporates the message into the key. It’s also called self-synchronizing stream

cipher. There are two kinds of autokey cipher: key autokey cipher, in which the next element of

the key is determined by the previous elements in the key stream, and text autokey ciphers, in

which the next element in the key is determined by the previous message.

Cirolamo Cardano, a methematian in Italy, invented the first autokey cipher.

Vigenère also invented a kind of autokey cipher. His innovation was to append the message to the

keyword to form the real key. So it’s a text-autokey cipher.This text-autokey cipher was

undeciphered for over 200 years, until Charles Babbage discovered a means of breaking the

cipher.

Cryptanalysis of Polyalphabetic Substitutions

The method to break the polyaphabetic ciphers is to determine the number of alphabets used,

break the ciphertext into pieces which were enciphered with the same alphabet, and solve each

piece as a monoalphabetic substitution. There are two tools that can decrypt messages written with

a large number of alphabets. They are the Kasiski method, to determine when a pattern of

encryption permutation has repeated, and the index of coincidence, to predict the number of

alphabets used for substitution.

The Kasiski Method for Repeated Patterns

The method of Kasiski, named from its developer Friedrich Kasiski, a Prussian military officer, is

a way of finding the number of alphabets that were used for encryption.

The method relies on the regularity of English. Not only letters but also letter groupings and words

are repeated. (e.g. –th, -ing, -ed, -ion, -tion, -ation, im-, in-, un-, re-,–eek-, -oot-, -our-, and words

like of, and, to, with, are, is, that etc.)

The Kasiski method follows this rule: if a message is encrypted with n alphabets (e.g., key length

is n for Vigener cipher), and if a particular word or letters group appears k times in the plaintext,

then it should be encrypted approximately k/m times (ceiling of k/m 11 ) from the same alphabet.

This is resulting from the Pigeon Hole Principle 12 . The distance between the repeated pattern in

cipher text should be a multiple of the key length or say the number of alphabets used.

The algorithm for Kasiski method is as follow:

1. Identity repeated patterns of three or more letters

2. For each pattern, calculate the distance between the position of starting point of

successive instances of the pattern

3. Determine the great common divisor of all distances obtained from step 2

4. If polyalphabetic substitution is used, the key length should be one factor of the

GCD(great common divisor) obtained from step 3.

Short repeated patterns, such as 2 letters pattern, are often accidental, so it’s more trouble to

consider it that to ignore it. Any pattern over 3 characters is almost certainly not accidental. (The

likelihood of two four letters pattern not being from the same plaintext segment is 1/26 4 ) [security

in computing]. The distance of two repeated pattern should be divided evenly by the key length.

So if the distance is calculated with two non-successive instances, the number of candidates for

11 Ceiling is a mathematic function which takes into a real number as argument and output the least integer value which is larger than or equal to the argument.

12 If you have fewer pigeon holes than pigeons and you put every pigeon in a pigeon hole, then there must result at least one pigeon hole with more than one pigeon.

the key length would become larger.

For the details of the index of coincidence method, we can calculate the IC and look for the table

to find the corresponding key length 13 .

One Time Pad

One time pad uses a random key to encrypt the message. The reason why it is called one time pad

is because the key is used only once for each segment of message and never used again. Simple

XOR operation is used to encrypt the message.

Example:

Message: COMPUTER

KEY: SECURITY

COMPUTER: 01000011 01001111 01001101 01010000 01010101 01010100 01000101 01010010 SECURITY: 01010011 01000101 01000011 01010101 01010010 01001001 01010100 01011001

CIPHERTEXT: 00010000 00001010 00001110 00000101 00000111 00011101 00010001 01001011

Each encryption is independent of any other encryption thus the pattern cannot be detected. The

unicity distance 14 for one time pad is infinite because the key length should be equal to or longer

than text length. Thus, it is the only cipher that has been proven to be perfectly theoretical secure.

However, the length of key is an obvious drawback for one time pad(In one time pad the key

should be at least as long as the plaintext that is to be encrypted). Moreover, one needs to require

the user to agree on a key in advance, thus cause the problem of transmitting the key securely.

Cryptanalysis

One time pad is said to be a key transmission not message transmission. In order of one time pad

to be effectively secure, the key should be random enough. As long as the key is random enough

13 Refer to the background section.

14 Refer to the background section for unicity distance

and can be kept safe, one time pad is perfectly secure.

Transposition Ciphers

Transposition

is

a

classical

cryptography

technique

that

is

different

from

substitution.

Transposition means reorder the elements of plaintext according to some rule agreed by the sender

and receiver and makes it unrecognizable to adversaries.

The major property of Transposition cipher is that the number of each element in the plaintext is

the same as that are in cipher text, because elements are simply reordered but not substituted. Thus,

it has preservation of frequency distribution. However, the frequencies for digrams and trigrams

are probably not equal to the frequency distribution of original language. From this we may detect

one ciphertext is encrypted with transposition cipher. Transposition is not safe because modern

computer can easily decode the cipher by trying all the possible ways of arrangement and do it

quickly.

Simple Example of Transposition cipher:

plaintext: ILOVECOMPUTERSECURITY

ciphertext: YTIRUCESRETUPMOCEVOLI

If you read carefully, you can find that the plaintext is simply reversed. That is if we reverse the

cipher text we will get the original text.

Examples of Transposition ciphers

For most applications, they apply some bijective function to plaintext. The procedure to encode

the message can be used reversely to decode the message.

Rail Fence Cipher

Write the plaintext into a matrix row by row and the cipher output is column by column. The key

of this cipher is the number of letters in a row.

Example:

Message: WEAREDOINGCOMPUTERSECURITYASSIGNMENT

Key length: 6

Matrix:

W

E

A

R

E

D

O

I

N

G

C

O

M

P

U

T

E

R

S

E

C

U

R

I

T

Y

A

S

S

I

G

N

M

E

N

T

Cipher: WOMSTGEIPEYNANUCAMRGTUSEECERSNDORIIT

Columnar transposition

If we want to complicate the route in Rail Fence Cipher, we can permute the column to enhance

security. The way is to read the column in alphabetic order of the key.

Message: WEAREDOINGCOMPUTERSECURITYASSIGNMENT Key: BIRDAY Read the column from A->B->I->R->Y Matrix:

W

E

A

R

E

D

O

I

N

G

C

O

M

P

U

T

E

R

S

E

C

U

R

I

T

Y

A

S

S

I

G

N

M

E

N

T

Cipher: ECERSNWOMSTGRGTUSEEIPEYNANUCAMDORIIT

Double transposition

Double transposition is to apply columnar transposition twice on the text to enhance the security.

In one time transposition, the adversary could try all the possible length of the key and get the

plaintext while double transposition will complicate the situation. Since one time transposition we

need one key, double transposition we need two keys.

Other transposition ciphers are: ADFGVX Cipher, Grille.

Machine and Rotors

Encryption and decryption can be done by a rotor machine practically. A rotor machine is a device

to implement the encoding and decoding algorithm of classical cryptography. It is constructed by

matching 26 switches and 26 light bulbs.

To make a rotor machine an encipherer, we need to do the following steps. Firstly, when turning

on anyone of the switches, a corresponding light bulb lights up.

Secondly, we replace the switches to the keys on a typewriter attached to the switch. And the light

bulbs are labeled with letters as well.

For example, when you press key “A”, the light bulb “A”

will light up.

But this is not an encryption; we need to make it a mono-alphabetic encipherer.

Thirdly, in order to turn it into an encryption system, we simply change the writing by light up

different light bulbs corresponding to each letter pressed on the keyboard. For example, when an

“A” is typed, light bulb “X” will light up.

Thus, when we type a message, the lighting of the

light bulbs will encrypt the message. This is similar to a single-alphabet (mono-alphabet)

substitution system, which is insecure and easy to break by frequency analysis.

Since this kind of simple substitution is not safe, how can we make the machine rotor more secure?

The solution is to introduce a poly-alphabetic substitution cipher system by using a rotor in the

machine and rotate it!

While rotating the rotor, a new substitution will be generated every time

the same letter is pressed. For example, the first time

you press an “A”, light bulb ”X” lights up,

the second time you press “A”, light bulb “ S” lights up, the third

time you press ”A” some other

letter will light up, And so on. There is a website 15 simulate “enigma machine” (an example of

rotor machine) where you can try to press the letters on the keyboard and get a view of how the

light bulbs lights up.

The algorithm involved here is “use the next alphabet with every key press 16

the rotor is

generating the key by rotating, and the key is hidden on the wiring of the disk. The first key you

pressed is very important since it is used to generate a large key which is used to encrypt the

following keys

The generation of the large key is done by rotating from the first key you pressed

(which could be either a number or a letter).

The number of the rotors is also an important factor concerning its degree of security

If

a

machine with a single rotor is considered not secure enough, the security level can be increased by

simply more rotors.

The reason is one rotor is a poly-alphabetic substitution system with 26 keys

while 2 rotors will give you 26*26 = 676 keys.

With more than one rotor, another rotor spin one position after the first rotor spins “all the way”.

For example, after the first rotor spins from position “A” to “Z”, the second rotor spins from ”A”

to “B”. If you are using 3 rotors, the third rotor will spin one position after the second rotor spins

“all the way”, etc.

This is how encryption is done using a rotor machine In order to turn the rotor machine a

decipherer, we could use a symmetrical approach.

Enigma Machine

15 The website is : http://www.ugrad.cs.jhu.edu/~russell/classes/enigma/

16 From http://www.fact-index.com/r/ro/rotor_machine.html

Enigma machine is a typical example for rotor machine. (examples of rotor machines are : Enigma

machine ,Fialka ,Hebern rotor machine ,HX-63 ,KL-7 ,Lucida ,NEMA ,SIGABA , Typex )

Enigma machine is a rotor machine with 3 rotors, a unique feature and a reflector.

The

mechanization for enigma machine is a complex algorithm. The task of encoding and decoding it

could be solved mechanically.

of encoding and decoding it could be solved mechanically. Enigma machine has been used during the

Enigma machine has been used during the World War II in early 1920s, most famously by Nazi

Germany 17 .

17 From http://webhome.idirect.com/~jproc/crypto/enigma.html

18 Other machines There are other machines used for encryption and d ecryption purposes. The

18

Other machines

There are other machines used for encryption and decryption purposes. The algorithms and

principles behind also vary from one to another.

Jefferson Cylinder (1790)

This is a cylinder of wood which is 15cm in length and 4 cm in across width.

A picture is more than one thousand words.

in across width. A picture is more than one thousand words. 19 ( The cylinder is

19

( The cylinder is cut to slices with each slice 5 mm in width) and

on each slice, there are 26

random allocated equal size letters written on the side of the slice. )

An important feature of the Jefferson Cylinder is that, the person who receive the secrete message

should have an exactly same allocated cylinder as the the person who encrypt and send the

message. In another words, there must be 2 identical cylinders to carry on the encryption and

18 From http://en.wikipedia.org/wiki/Enigma_machine

19 From http://williamstallings.com/Extras/Security-Notes/lectures/classical.html

decryption process.

The encryption process is carried out like this: firstly, you turn the wheels on the cylinder and get

the letters of the secrete message alone the side of the cylinder. And another random chosen line of

letters (on which the order of the letters also appears to be quite random) is copied.

The random

letters of that line is the cipher text to be sent to the receiver.

At the receiver side, as he received the cipher text, he could just organize the letters on his

cylinder by arranging each letter of the cipher text on his cylinder. Since the cylinder used for

encrypt the message is identical to the one used to encrypt, when he turn the cylinder around, he

will be able to find a line of letters which is meaningful thus can find the plaintext.

Wheatstone Disk ( 1870)

Consider the case where we have 2 concentric wheels. Each wheel have 26 letters at the edge, by

rotating the 2 wheels, the inner wheel will have all letters towards to the letter at the outer wheel.

This is similar to the Caesar cipher.

at the outer wheel. This is similar to the Caesar cipher. The encryption will generate a

The encryption will generate a poly-alphabetic cipher. The construction of the Wheatstone disk is

similar to a clock. There are 2 hands on the disk, one big hand, one small hand, which look like

the hour and minute pointer on the clock. These 2 hands are connected by gears. When the large

hand is pointing to a letter, the small hand will point to the corresponding cipher text. That is how

encryption is done using the Wheatstone disk. Note that when you rearrange the gears, the

encryption will be changed, which means, the small hand will not point to the same position when

the large hand is pointing the same letter.

Conclusion

In this paper, we have discussed the various kinds of ciphers and some of their applications. We

have seen that most of the ciphers are based on changing characters or stream of characters and

most of them are symmetric – once you know how to encode it, you will know how to decode it.

From the analysis, we have seen that most of the classical encryption methods are vulnerable and

can be easily attacked by the technology today. That is why we seldom use them in the computer

security nowadays. However, these encryption methods had given us clues on how cryptography

can be done. These basic theories, concepts of classical cryptography are important to the

development

of

the

modern

cryptography in the future.

cryptography

and

will

be

References

important

to

the

development

of

[1] Definition of substitution cipher http://www.wordiq.com/definition/Substitution_cipher#Polyalphabetic http://rinkworks.com/words/letterfreq.shtml

[2] English words frequency table

http://www.edict.com.hk/TextAnalyser/default.htm

[3] Classic Cryptography and Diagraphic Substituion

http://www.thinkquest.org/library/site_sum.html?tname=27158&url=27158/conce

pt1_13.html

[4] Codes and Ciphers Wheatstone Disk http://www.otr.com/ciphers.shtm

l

[5] Computer Security Website

https://www.maths.uwa.edu.au/~praeger/teaching/3CC/WWW/chapter7.html#tth

_chAp7

[6] Enigma http://webhome.idirect.com/~jproc/crypto/enigma.html

[7] Historical Cryptography http://starbase.trincoll.edu/~crypto/

[8] Frequency Analysis http://www.fact-index.com/f/fr/frequency_analysis.html

[9] Rotor Machine http://www.fact-index.com/r/ro/rotor_machine.html

[10] Rotor Machines http://raphael.math.uic.edu/~jeremy/crypt/rotor.html

[11] Secret Language http://www.exploratorium.edu/ronh/secret/secret.htm

l

[12] U-boot Enigma Simulation http://www.u-boot-greywolf.de/uenigmasimulation.htm

[13] Unicity Distance http://www.u-boot-greywolf.de/uenigmasimulation.htm

[14] Encryption-Wikipidea http://en.wikipedia.org/wiki/Cipher

[15] Book reference “Security in Computing” by Charles P.Pfleeger

Appendix

Table I: English words frequency table Words listed by frequency: the first 2000 most frequent words from the Brown Corpus (1,015,945 words). These lists reflect general non-academic English as it is used in newspapers, magazines and books.

Word Instances % Frequency

1. The

69970

6.8872

2. of

36410

3.5839

3. and

28854

2.8401

4. to

26154

2.5744

5. a

23363

2.2996

6. in

21345

2.1010

7. that

10594

1.0428

8. is

10102

0.9943

9. was

9815

0.9661

10. He

9542

0.9392

11. for

9489

0.9340

12. it

8760

0.8623

13. with

7290

0.7176

14. as

7251

0.7137

15. his

6996

0.6886

Word Instances % Frequency

16. on

6742

0.6636

17. be

6376

0.6276

18. at

5377

0.5293

19. by

5307

0.5224

20. I

5180

0.5099

21. this

5146

0.5065

22. had

5131

0.5050

23. not

4610

0.4538

24. are

4394

0.4325

25. but

4381

0.4312

26. from

4370

0.4301

27. or

4207

0.4141

28. have

3942

0.3880

29. an

3748

0.3689

30. they

3619

0.3562

(From http://www.edict.com.hk/TextAnalyser/default.htm)

Table II : English letters frequency table Analysis of 45406 Common Words

This table analyzes a pool of words includes plurals and words with common suffix

# of Occurrences # of Words

e

42689

11.74% 30254 66.63%

i

31450

8.65% 23875 52.58%

s

29639

8.15% 22697 49.99%

a

28965

7.97% 23408 51.55%

r

27045

7.44% 22642 49.87%

n

26975

7.42% 21644 47.67%

t

24599

6.76% 20040 44.14%

o

21588

5.94% 17776 39.15%

l

19471

5.35% 16289 35.87%

c

15002

4.13% 13142 28.94%

d

13849

3.81% 12334 27.16%

u

11715

3.22% 10894 23.99%

g

10339

2.84%

9426 20.76%

 

# of Occurrences

# of Words

p

10063

2.77% 8952 19.72%

m

9803

2.70% 8871 19.54%

h

7808

2.15% 7372 16.24%

b

7368

2.03% 6880 15.15%

y

6005

1.65% 5881 12.95%

f

4926

1.35% 4385

9.66%

v

3971

1.09% 3884

8.55%

k

3209

0.88% 3091

6.81%

w

3073

0.85% 2997

6.60%

z

1631

0.45% 1555

3.42%

x

1053

0.29% 1046

2.30%

j

727

0.20%

727

1.60%

q

682

0.19%

681

1.50%

(From http://www.edict.com.hk/TextAnalyser/default.htm)