Selection of Hashing Algorithms: Gary Fisher

SELECTION OF HASHING ALGORITHMS
Tim Boland
Gary Fisher
JUNE 30, 2000
INTRODUCTION
The National Software Reference Library (NSRL) Reference Data Set (RDS) is built on
file signature generation technology that is used primarily in cryptography. The selection
of the specific file signature generation routines is based on customer requirements and
the necessity to proide a leel of confidence in the reference data that will allow it to be
used in the !.S. "ourts. This document gies an oeriew of the arious hashing
algorithms considered# as well as implementations of those algorithms. $t also gies
factors regarding their selection and use.
%ashing is an e&tremely good way to erify the integrity of a sequence of data bits (e.g.#
to ma'e sure the contents of the sequence haen(t been changed inadertently). The
sequence might ma'e up a character string# a file# a directory# or a message representing
data (binary )s or *s) stored in a computer system. The word +hash, means to +chop into
small pieces, (R-.)). / hashing algorithm is a mathematical function (or a series of
functions) ta'ing as input the aforementioned sequence of bits and generating as output a
code (alue) produced from the data bits and possibly including both code and data bits.
Two files with e&actly the same bit patterns should hash to the same code using the same
hashing algorithm. $f a hash for a file stays the same# there is only an e&tremely small
probability that the file has been changed. 0n the other hand# if the hashes for the files do
not match# then the files are not the same. Thus# hashes could be used as a primary
erification tool to find identical files. The output code of the hash function should hae a
+random, property# so that different sequences of bits hash to different alues as much as
possible. %ashes are used in +scatter storage, systems# in digital signature applications#
and recently in computer forensics applications# to determine whether the contents of a
suspect machine hae been modified maliciously. %ashing algorithms can be efficiently
implemented on modern computers.
%ashing algorithms fall within the realm of error detection techniques. $n a general sense#
the aim of an error detection technique is to enable the receier of a message transmitted
through a noisy (error1introducing) channel to determine whether the message has been
corrupted. Some hashing algorithms perform comple& transformations on the message to
in2ect it with redundant information# while others leae the data intact and append a hash
alue on the end of a message. $n any case# the transmitter may construct a hash alue
that is a function of the message. The receier can then use the same hashing algorithm to
compute the hash alue of the receied message and compare it with the transmitted hash
alue to see if the message was correctly receied. $f the hash alues match# then the
message was correctly receied3 if not# then there must hae been an error in one or more
of the data bits of the message.
The National $nstitute of Standards and Technology (N$ST) of the !.S. Department of
"ommerce has been as'ed to inestigate commonly1used hashing algorithms in support
of the National Software Reference Library (NSRL). There are seeral algorithms
aailable# differing in comple&ity# robustness# ease of use# and machine efficiency.
4enerally# +hardware (physical deice), methods of computing hashes inole e&tensie
bit manipulations# and are relatiely inefficient for a software (programmatic)
computation# so the algorithms discussed contain software methodologies to streamline
the hashing process. 5hat is inoled in all the algorithms is a method to brea' up the
input into manageable portions# and manipulate the input in a systematic way oer and
oer (iteratiely). The algorithms generally differ in the degree to which they do this# and
the number of iterations inoled.

4ien the aboe# N$ST inestigated aailable implementations of four different hashing
algorithms and tested the algorithm output on some test data. The purpose of this e&ercise
was to define some +reference, implementations for erifying the correctness of entries
in the NSRL Reference Data Set (RDS). 6ultiple algorithms were considered because of
the need for +double1chec'ing, results and because many facilities use multiple hashing
algorithms simultaneously. 7erformance metrics mentioned aboe were used in
ealuating candidate implementations.
.or each algorithm# the authorities (official sources and sanctions) of the source programs
and tests used for testing accuracy will be described. /lgorithms mentioned in this report
may hae limitations (clashes found# performance# etc.)# which will be mentioned as
appropriate. /ll implementations ealuated are freely aailable from the $nternet. -ach of
the four algorithms will be described below# starting with a high1leel oeriew and
progressing to more detail as appropriate.
CRC32
The cyclic redundancy code ("R") algorithm is the simplest of the four hashing
algorithm choices# but also the least robust. The name means that the algorithm operates
in repetitie (cyclic) redundant cycles to produce an output hash code. The +89, indicates
the number of bits being considered to produce the hash code (e&plained below). The
"R" algorithm is a 'ey component in the error1detecting capabilities of many
communications protocols. $n a "R" algorithm# the transmitter of a message constructs a
alue (called the chec'sum) and appends it to the message. The receier can then use the
same function to compute the chec'sum of the receied message and compare it with the
appended chec'sum to see if the message was correctly receied. .or e&ample# if we
chose a chec'sum function which was the sum of the decimal numbers in a message# it
might go something as follows: 6essage1) 9 8# 6essage with chec'sum ; ) 9 8 < (< is
sum of ) and 9 and 8)# 6essage after transmission ; ) 9 = <. %ere the third decimal
number was corrupted from 8 to =# and the receier can detect this by computing the
chec'sum (>?)@9@=) from the message# and compare it with the transmitted chec'sum of
<. 0biously# both sender and receier must be using the same algorithm to be consistent.
$f the chec'sum itself is corrupted# a correctly transmitted message might be incorrectly
identified as a corrupted one. %oweer# this is a side1safe failure. / dangerous1side
failure occurs where the message andAor chec'sum is corrupted in a manner that results in
a transmission that is internally consistent. !nfortunately# this possibility is completely
unaoidable and the best that can be done is to minimiBe its probability by increasing the
amount of information in the chec'sum (R-.9).
The aboe e&ample is obiously ery simple# and would not suffice for rigorous error
detection. / more comple& chec'sum function is needed. 5hile addition is clearly not
strong enough to form an effectie chec'sum# it turns out that diision is# so long as the
diisor (number to diide by) is about as wide as the chec'sum register (place to store the
chec'sum alue).

The basic idea behind "R" algorithms is simply to treat the message as an enormous
binary number# to diide it by another fi&ed binary number# get a quotient# and ma'e the
remainder from this diision the chec'sum. !pon receipt of the message# the receier can
perform the same diision and compare the remainder with the +chec'sum, (transmitted
remainder). .or e&ample# when diiding decimal )) (message) by = (diisor) we get a
alue of 9 (quotient) with a remainder of 8.
5ith "R" diision# instead of iewing the numbers mentioned aboe as positie
integers# they are iewed as polynomials with binary coefficients. This is done by treating
each number as a bit1string whose bits are the coefficients of a polynomial. .or e&ample#
the ordinary number 98 (decimal) is )*))) (binary) and so it corresponds to the
polynomial &CC= @ &CC9 @ &CC) @ &CC*. 7olynomials are used because they proide
useful mathematical machinery in the calculations. "R" arithmetic is primarily about
D0Ring (e&clusie10Ring) particular alues at arious shifting offsets# which has the
effect of doing the binary diision. /n e&clusie10R function produces ) if the two input
bits are different3 otherwise it produces *.
The "R" algorithm can be applied to messages of different widths ()9# )<# or 89 bits).
5e are considering the 891bit ("R"89) algorithm here because it is the most robust.
$n this case the polynomial is 89 bits wide and the "R"89 chec'sum is also 89 bits.
This also simplifies the calculation on most modern computers. 0ther "R" polynomials
used besides "R"89 are "R")9# "R")<# and "R"1""$TT# from the "onsultatie
"ommittee for Telephone and Telegraph (""$TT).

0n 7"s one can deal with binary numbers of only 89 bits or fewer# so one must brea' up
the enormous binary number mentioned aboe into manageable chun's. That(s e&actly
what the two "R" algorithms mentioned below do. $n order to speed up the process# the
algorithms use a pre1calculated loo'1up table3 the table contains a "R" for each character
code between * and 9EE# so that the calculation doesn(t need to be repeated as the te&t
strings are processed. This process has the effect of performing the diision of the
enormous binary number by the generator polynomial# but in increments# due to the
limitations of modern computing. $n other words# instead of computing the "R" bit by
bit# a 9E<1element loo'up table can be used to perform the equialent of F bit operations
at a time.

To perform a "R" calculation# the user needs to choose a diisor. 4enerally the diisor is
called the +generator polynomial, or simply the +polynomial,# and is a 'ey parameter of
any "R" algorithm. 0ne can choose any polynomial and come up with a "R" algorithm.
%oweer# some polynomials are better then others. /n e&ample of a polynomial used
might be >G#><=#G)G decimal# or *&*=c))db> he&adecimal. Theoretical mathematicians
hae calculated certain polynomials to proide the least duplications in remainders.
CRC Implementation
To implement a "R" algorithm is to implement "R" diision. There are two reasons
why the diide instruction of the host machine cannot be used. The first is that the
diision must be in "R" arithmetic. The second is that the diidend might be ten
megabytes () byte?F bits) long# and today(s processors do not hae registers large enough
to hold a diidend of this siBe. To implement "R" diision# we hae to feed the message
in smaller chun's through a diision register.
0riginally there were seen candidate "R"89 implementations (using " or "@@ high1
leel programming languages) under consideration (which represents nearly all of the
researched "R"89 implementations publicly aailable). 7erformance metrics used to
ealuate these implementations were the following (in no particular order of importance):
speed of e&ecution# ease of use# accuracy# ability to operate on entire files# and choice of
generator polynomial. 0ne implementation was re2ected because it did not produce
accurate results# two were not set up to operate on entire files (only te&t strings)# and two
were slow (because they were not +table1drien,). 0nly two were reasonably fast#
produced accurate information# were table drien# and used generally accepted generator
polynomials. Hoth are table1drien# but one uses a polynomial is from an /merican
National Standards $nstitute (/NS$) D8 "ommittee# while the other polynomial is not
e&plicitly specified in code# but the table entries are the same as compared to the other.
The two implementations are about the same number of programming statements. Slight
preference was gien for the algorithm that computes alues for directories of files as
well as indiidual files.
The test data used to erify the routines was from commonly used 7IJ$7 (R-.8) and
5$NJ$7 (R-.=) products# and other arious test character strings and file directories.
Since these products are commonly used and routinely generate "R"s# they would be
alid benchmar's of accuracy. The "R" outputs are in he&. Hoth implementations
erified correctly against the data. There are no apparent limitations in the
implementations# other than the inherent "R"89 limitations# although one
implementation produces more cursory output on only one file at a time. 6ore
information on each of these implementations is gien below.
The first candidate "R" program (using the " language) computes the 891bit "R" used
as the frame chec' (error1detection) sequence in .$7S >) (R-.E) This source code is
from the Snippets file collection (R-.<). $t consists of a header file# crc.h# and a main
program crcK89.c. .or this drier routine# first the polynomial itself and its table of
feedbac' terms is proided. The polynomial is:
x**32 + x**26 + x**23 + x**22 + x**16 + x**12 + x**11 + x**10 + x**8 +
x**7 + x**5 + x**4 + x**2 + x**1 + x**0.
The polynomial is ta'en bac'wards and the highest1order term is placed in the lowest1
order bit. The &CC89 term is +implied3 the least significant bit is the &CC8) term# etc. The
&CC* term (usually shown as +@),) results in the most significant bit being ). / hardware
shift register implementation shifts bits into the lowest1order term (to the right). $t is
optimiBed here by shifting eight1bit chun's at a time. The calculated "R" must be
transmitted in order from highest1order term to lowest1order term. The feedbac' terms
table consists of 9E< 891bit entries. The feedbac' terms represent the results of eight
shiftAD0R operations for all combinations of data and "R" register alues. The "R"
accumulation logic is the same for all "R" polynomials3 the appropriate table 2ust needs
to be chosen. The table can also be generated at runtime. The alues must be right1shifted
by eight bits by the logic in the updateCRC routine called from the main program3 the
shift must be unsigned (mas'ed with Beroes in the high1order bits). 0n some hardware
the shift could probably be optimiBed by using byte1swap instructions. !nsigned
ariables need to be used consistently.
The second candidate "R" program (also using the " language) computes a composite
"R" that is not dependent on the endian type of the machine e&ecuting the program.
+-ndian, refers to the order in which receied bits are stored. This means the composite
"R"189 can be used to test the transfer of a set of files# when transferred in binary mode#
between machines of different architecture. $t is adapted from the +charcnt.c, program
and +crc)<.u, unit modified to include a "R"89 table from 6icrosoft Systems Lournal
(6SL) (R-.)<). +crc89, gies the same alues as the 7IJ$7 utility# and has been erified
using ()) Horland "A"@@ (R-.)=) and (9) Sun "@@ (R-.)E). This source code was
copyrighted by -arl .. 4lynn in )GGF and is aailable from efg(s "omputer Lab
6athematics site (R-.>). This implementation consists of seeral files: a drier program
crc89.c# and seeral header files. The drier program first defines a table used for byte1
wise calculation of "R"89. The routine in the main program e&ecutes ery quic'ly as
follows: ()) the input byte is D0Red with the low1order byte of the "R" +register, to get
an inde& into the table# (9) the "R" +register, is shifted eight bits to the right# and (8) the
"R" register is D0Red with the contents of TableM$nde&N. Steps ()) through (8) are
e&ecuted for all input bytes. The result in the "R" register is the "R".
$n sum# "R" hashes are the simplest of those considered# but are also the wea'est# in that
"R" alues can be compromised in terms of erification and error detection. (The final
decision of which form of the algorithm to use will be based on compatibility with
software proided by the customer. The software is still an un'nown in this equation.)
MD
6D= (message digest leel =) is a one1way hash function designed by Ron Riest. 0ne1
way hash functions (see R-.F) hae certain characteristics# in addition to the
characteristic of ta'ing an arbitrary1length input and returning an output of fi&ed length3
they are able to proide a +fingerprint, of a message that is unique. Thus# the 6D=
algorithm ta'es as input a message of arbitrary length3 the algorithm produces as output a
)9F1bit hash# or message digest# of the input message. 6D= is more comple& than
"R"89 mentioned aboe# so it is more +computer1intensie#, but it performs
transformations on the data itself instead of 2ust appending a chec'sum as in "R"89# so it
is more robust# and proides a greater erification and error1detection ability at the price
of greater comple&ity. The design goal was that it would be computationally infeasible to
find two messages that hashed to the same alue# or produced the same message digest. $t
would also be computationally infeasible to produce any message haing a gien pre1
specified target message digest.
The 6D= message1digest algorithm proides a +fingerprint, of a message of arbitrary
length. The difficulty of coming up with two messages haing the same message digest is
on the order of 9CC<= operations# and the difficulty of coming up with any message
haing a gien message digest is on the order of 9CC)9F operations.
The 6D= algorithm is intended for digital signature applications# where a large file must
be +compressed, in a secure manner before being encrypted with a priate (secret) 'ey
under a public1'ey cryptosystem. This inoles disguising the contents of a file so that it
may be read only by intended recipients. The 6D= algorithm is designed to be quite fast
on 891bit machines. $n addition# the 6D= algorithm does not require any large
substitution tables3 the algorithm can be coded compactly. 6D=(s security is not based on
any assumption# such as the difficulty of factoring. 6D= is suitable for high1speed
software implementations3 it is based on a simple set of bit manipulations on 891bit
operands.
6D= is as simple as possible# without large data structures or a complicated program# and
is optimiBed for microprocessor architectures. The 6D= implementation (in the "
language) chosen was from $nternet -ngineering Tas' .orce ($-T.) R.")89* (R-.G). $t
was chosen because it faithfully reproduces the 6D= algorithm (also found in R.")89*)
and is portable. $n summary form# the 6D= algorithm wor's as follows: ()) padding bits
are appended to the file# (9) the length is appended# (8) the 6D buffer is initialiBed# the
message is processed in )<1word bloc's# and (E) output is produced. / detailed
description of these steps follows.
Suppose there is a b1bit message as input# and desired message digest from that input as
output. %ere b is an arbitrary nonnegatie integer# and it may be arbitrarily large. Thus
the message may be written down as follows: mK* mK) O. 6(b1)).
.or ()) aboe# the message is then padded (e&tended) so that its length (in bits) is 2ust <=
bits shy of being a multiple of E)9 bits long. 7adding is performed as follows: a single
+), bit is appended to the message# and then +*, bits are appended sub2ect to the
requirement aboe. $n all# at least ) bit and at most E)9 bits are appended.
.or (9) aboe# a <=1bit representation of b (the length of the message before the padding
bits were added) is appended to the result of the preious step. /t this point# the resulting
message (after padding with bits and with b) has a length that is an e&act multiple of E)9
bits (meaning that the message length diided by E)9 is an integer). -quialently# this
message has a length that is an e&act multiple of )< (891bit) words.
.or (8) aboe# a four1word buffer (/#H#"#D) is used to compute the message digest. %ere
each of /# H# "# and D is a 891bit register (high1speed storage unit)# initialiBed as follows:
5ord /: *) 98 =E <># 5ord H: FG ab cd ef# 5ord ": fe dc ba GF# 5ord D: >< E= 89 )*.
.or (=) aboe# three au&iliary functions (.# 4# %) are defined that each ta'e as input three
891bit words and produce as output one 891bit word. $n each bit position . acts as a
conditional: if D then P else J. $n each bit position 4 acts as a ma2ority function: if at
least two of D# P# and J are on# then 4 has a +), bit in that bit position3 otherwise# 4 has
a +*, bit. The function % is the bit1wise D0R or +parity (error chec'ing), function3 it has
properties similar to those of . and 4. Then each )<1word bloc' is processed in three
rounds as outlined below. -ach round uses a different operation )< times# and each round
inoles one of the functions .# 4# or % (e.g.# round ) uses .# round 9 uses 4# and round 8
uses %). -ach operation performs a function on three of /# H# "# and D. Then it adds that
result to the fourth ariable# a sub1bloc' of the te&t# and a constant (which could be *). $t
then rotates that result to the right a ariable number of bits. Then the result replaces one
of /# H# "# or D. .inally# each of the four registers /# H# "# D mentioned aboe# is
incremented by the alues held before processing. This concludes the processing of a )<1
word bloc'3 then it(s time to moe on to the ne&t )<1word bloc' until there are none left.
.or (E) aboe# the resultant message digest produced is the alues contained in /# H# "#
and D. These alues are +concatenated,# such that the digest starts with the first bit of /
and ends with the last bit of D.
$n summary# the 6D= algorithm produces a +fingerprint#, or message digest of a message
of arbitrary length. $t has been carefully scrutiniBed for wea'nesses3 howeer# further
security analysis is 2ustified. $t may be used if a greater degree of error detection than
"R"89 is desired.
The 6D= implementation chosen consists of four files: global.h# md=.h# md=c.c# and
mddrier.c. The drier compiles for 6DE by default but can compile for 6D9 or 6D= if
the symbol 6D is defined on the " compiler command lines as 9 or =. The file global.h
defines data types and constants. The file md=.h is a header file for md=c.c. $n md=c.c#
other constants are defined# and then steps ()) through (E) are implemented as described
aboe. The program mddrier.c is a test drier for 6D=.
The implementation is portable and should wor' on many different platforms. $t is not
difficult to optimiBe the implementation on particular platforms.
6D= test results were compared against sample data from the %andboo' of /pplied
"ryptography (R-.)*)# as well as test data in the program from $-T. R.")89*# and
were found to be in agreement with test data. There are no 'nown limitations of the
implementation other than inherent limitations in the 6D= algorithm. Researchers hae
shown that clashes# or duplicate hash strings# can be generated from different files.
MD!"
6essage Digest leel E (6DE) is an improed ersion of 6D=. /lthough more comple&
than 6D=# it is similar in design and also produces a )9F1bit hash. /fter some initial
processing# 6DE processes the input te&t in E)91bit bloc's# diided into )< 891bit sub1
bloc's. The output of the algorithm is a set of four 891bit bloc's# which concatenate to
form a single )9F1bit hash alue. The 6DE algorithm potentially offers a greater degree
of error detection than does 6D=# at the price of slightly more complication.
The 6DE message1digest algorithm ta'es as input a message of arbitrary length and
produces as output a )9F1bit +fingerprint, or +message digest, of the input. $t is
computationally infeasible to produce two messages haing the same message digest# or
to produce any message haing a gien pre1specified target message digest. The 6DE
algorithm is intended for digital signature applications# where a large file must be
+compressed, in a secure manner before being encrypted with a priate (secret) 'ey
under a public1'ey cryptosystem. This inoles disguising the contents of a file so that it
is recogniBable only by intended recipients.
The 6DE algorithm is designed to be ery fast on 891bit machines. $n addition# the 6DE
algorithm does not require any large substitution tables3 the algorithm can be coded quite
compactly.
The 6DE algorithm is an e&tension of the 6D= message1digest algorithm. 6DE is
slightly slower than 6D=# but is more +conseratie, in design. 6DE was designed
because it was felt that 6D= was perhaps being adopted for use more quic'ly than
2ustified by the e&isting critical reiew3 because 6D= was designed to be e&ceptionally
fast# it was +at the edge, in terms of ris'ing successful cryptanalytic attac' (meaning that
someone# if they tried enough times# could +brea', the code# or recogniBe a message
from its code). 6DE bac's off somewhat# giing up a little in speed for a much greater
li'elihood of ultimate security. $t incorporates some suggestions made by arious
reiewers# and contains additional optimiBations.
The 6DE implementation (using the " language) chosen was from $-T. R.")89)
(R-.))) . $t was chosen because it faithfully reproduces the 6D= algorithm (also found
in $-T. R.")89)) and is portable.
The 6DE message1digest algorithm performs the following fie steps: ()) append
padding bits# (9) append length# (8) initialiBe 6D buffer# (=) process message in )<1word
bloc's# and (E) generate output. These steps are described in detail below.
Suppose there is a b1bit message as input# and desired message digest from that input as
output. %ere b is an arbitrary nonnegatie integer# and it may be arbitrarily large. Thus
the message may be written down as follows: mK* mK) O. 6(b1)).
.or ()) aboe# the message is then padded (e&tended) so that its length (in bits) is 2ust <=
bits shy of being a multiple of E)9 bits long. 7adding is performed as follows: a single
+), bit is appended to the message# and then +*, bits are appended sub2ect to the
requirement aboe. $n all# at least ) bit and at most E)9 bits are appended
.or (9) aboe# a <=1bit representation of b (the length of the message before the padding
bits were added) is appended to the result of the preious step. /t this point# the resulting
message (after padding with bits and with b) has a length that is an e&act multiple of E)9
bits (meaning that the message length diided by E)9 is an integer). -quialently# this
message has a length that is an e&act multiple of )< (891bit) words.
.or (8) aboe# a four1word buffer (/# H# "# and D) is used to compute the message digest.
%ere each of /# H# "# and D is a 891bit register (high1speed storage unit)# initialiBed as
follows:
5ord /: *) 98 =E <># 5ord H: FG ab cd ef# 5ord ": fe dc ba GF# 5ord D: >< E= 89 )*.
.or (=) aboe# four au&iliary functions (.# 4# %# and $) are defined such that each ta'es as
input three 891bit words and produces as output one 891bit word. $n each bit position# .
acts as a conditional: if D then P else J. The functions 4# %# and $ are similar to the
function ..
The function % is the bit1wise D0R(e&clusie10R# or +parity,) function of its inputs.
This step uses a <=1element table constructed from the sine trigonometric function. Then
each )<1word bloc' is processed in four rounds as outlined below. -ach round inoles
using a different operation )< times# and each round inoles one of the functions .# 4#
%# or $# plus the table T (e.g.# round ) uses . and T# round 9 uses 4 and T# round 8 uses %
and T# and round = uses $ and T). -ach operation performs a nonlinear function on three
of /# H# "# and D. Then it adds that result to the fourth ariable# a sub1bloc' of the te&t
and a constant. $t then rotates that result to the right a ariable number of bits and adds
the result to one of /# H# "# or D. The result replaces one of /# H# "# or D. .inally# each
of the four registers /# H# "# and D is incremented by the alues held before processing.
This concludes the processing of the )<1word bloc'# and it(s time to moe to the ne&t one
until there are none left.
.or (E) aboe# the resultant message digest produced is /# H# "# D. These alues are
+concatenated, such that the digest starts with the first bit of / and ends with the last bit
of D.
The following are the differences between 6D= and 6DE: ()) a fourth round has been
added in 6DE3 (9) each step in 6DE has a unique additie constant3 (8) the function 4 in
round 9 was changed to ma'e 4 less symmetric (balanced)3 (=) each step in 6DE adds in
the result of the preious step (promoting a faster +aalanche effect,)3 (E) the order in
which input words are accessed in rounds 9 and 8 is changed# to ma'e these patterns less
li'e each other3 and (<) the shift amounts in each round hae been appro&imately
optimiBed to yield a faster +aalanche effect,# and the shifts in different rounds are
distinct.
The 6DE message1digest algorithm is relatiely simple to implement# and proides a
+fingerprint, or message digest of a message of arbitrary length. The difficulty of coming
up with two messages haing the same message digest is on the order of 9
<=
operations#
and that the difficulty of coming up with any message haing a gien message digest is
on the order of 9
)9F
operations.
The 6DE implementation chosen consists of four files: global.h# mdE.h# mdEc.c# and
mddrier.c# The file global.h defines common data types and constants. The file mdE.h is
a header file for mdEc.c. The file mdEc.c defines other constants and then performs the
processing defined in steps ()) through (E) aboe. The file mddrier.c is a test drier for
6DE. The implementation is portable and should wor' on many different platforms. $t is
not difficult to optimiBe the implementation on particular platforms.
6DE test results were compared against sample data from the %andboo' of /pplied
"ryptography# as well as test data in the program from $-T. R.")89)# and were found
to be in agreement with this test data.. There are no 'nown limitations of the
implementation other than the inherent limitations of the 6DE algorithm. The 6DE
algorithm has been susceptible to sustained attac's in the past# but it still is the most
robust choice so far.
SHA!#
N$ST# along with the National Security /gency (NS/)# designed the Secure %ash
/lgorithm Reision ) (S%/1)) for use with the Digital Signature Standard (DSS)
(R-.)9)3 this standard is the Secure %ash Standard3 S%/1) is the algorithm used in the
standard. /dditionally# for applications not requiring a digital signature# the S%/1) can
be used wheneer a secure hash algorithm is required. The S%/1) is specified by N$ST
.ederal $nformation 7rocessing Standard (.$7S) )F*1) (R-.)8). The S%/1) is a
technical reision of the S%/ (specified by .$7S )F*# which has been superseded). The
S%/1) can be used to generate a condensed representation of a message called a message
digest. The S%/1) is used by both the transmitter and intended receier of a message in
computing and erifying a digital signature.
5hen a message of any length less than 9
<=
bits is input# the S%/1) produces a )<*1bit
output called a message digest3 this output is longer than that of 6DE.. The message
digest is usually much smaller in siBe than the message. The S%/1) is called secure
because it is computationally infeasible to find a message which corresponds to a gien
message digest# or to find two different messages which produce the same message
digest. /ny change to a message in transit will# with a ery high probability# result in a
different message digest. The S%/1) is based on principles similar to those used in the
deelopment of 6D=# and is closely modeled after 6D=. The S%/1) may be
implemented in software# firmware# hardware# or any combination thereof.
$mplementations of the S%/1) may be alidated by N$ST in accordance with a reference
implementation produced at N$ST. $t is this reference implementation that is selected in
this study.
$n summary# the S%/1) is more comple& than the choices considered thus far# but
presents a more robust solution than the other hashing algorithms considered. Thus far
the S%/1) has been imperious to compromise.
$nput to the S%/1) should be considered to be a bit string. The length of the message is
the number of bits in the message. The purpose of message padding is to ma'e the total
length of a padded message a multiple of E)9. The S%/1) sequentially processes bloc's
of E)9 bits when computing the message digest. / sequence of logical functions f(*)#
f())# O # f(>G) is used in the S%/1). The message digest is computed using the final
padded message. The computation uses two buffers# each consisting of fie 891bit words#
and a sequence of eighty 891bit words. / more detailed description of the S%/1)
algorithm is gien below.
.irst# the message is padded to ma'e it a multiple of E)9 bits long. 7adding is e&actly the
same as in 6DE: first append a one# then as many Beros as necessary to ma'e it <= bits
short of a multiple of E)9# and finally a <=1bit representation of the length of the message
before padding. Then fie 891bit ariables (in contrast to the four for 6DE) are initialiBed
as follows (in he&adecimal):
/?*&<>=E98*)# H?*&efcdabFG# "?*&GFbadcfe# D?*&)*89E=><# and
-?*&c8d9e)f*.
The main loop of the algorithm then begins. $t processes the message E)9 bits at a time
and continues for as many E)91bit bloc's as are in the message. .irst the fie ariables
are copied into different ariables: a gets /# b gets H# c gets "# d gets D# and e gets -.
The main loop has four rounds of 9* operations each (in contrast to 6DE# which has four
rounds of )< operations each). -ach operation performs a nonlinear function on three of
a# b# c# d# and e# and then does shifting and adding similar to 6DE. Shifting the ariables
accomplishes the same purpose as in 6DE by using different ariables in different
locations. /fter all of this# a# b# c# d# and e are added to /# H# "# D# and - respectiely#
and the algorithm continues with the ne&t bloc' of data. The final output is the
concatenation of /# H# "# D# and -.
The S%/ is 6DE with the addition of an e&panded transformation# and e&tra round# and
better +aalanche, effect. There are no 'nown cryptographic attac's against the S%/1).
Hecause it produces a )<*1bit hash# it is more resistant to brute1force attac's than )9F1bit
hash functions.
The N$ST S%/1) implementation (using the " language) consists of nine files: Sha.h#
6ain.c# "onfig.h# Shautil.ob2# Sha.c# Shautil.c# Sha.ob2# ma'efile# and 6ain.ob2. The file
6ain.c is the drier program for the implementation3 it indicates whether to test against
.$7S)F*# to hash a string# or to hash one or more files. The routine Shautil.c merely
computes the S%/ of the string +abc, and chec's the result against the 'nown hash of
+abc,. The file Sha.c is the routine that actually implements the S%/. The
implementation is conformant to .$7S)F*1)# and was tested against reference data in the
.$7S# as well as test data from the %andboo' of /pplied "ryptography. Results erified
against all test data.
Con$l%&ion
$n this document we hae proided information concerning our selection of
implementations for the hashing algorithms "R"89# 6D=# 6DE# and S%/1)# as well as
bac'ground and information on the hashing algorithms themseles. Qarious products use
one or more of the hashing algorithms 6D=# 6DE# S%/1)# and "R"89. $n summary#
moing from "R"89 to 6D= to 6DE to S%/1)# comple&ity increases# but so does
robustness and degree of error1detection. 5hich algorithm to use (and which
implementation of an algorithm) depends on a number of factors# including degree of
security required and machine limitations.
Bibliography
(R-.)) 5ebster(s Ninth New "ollegiate Dictionary# )GF=# 6erriam 5ebster $nc.
(R-.9) / 7ainless 4uide to "R" -rror Detection /lgorithms $nde& Q8.**# September
)GG<# http:AAwww.repairfaq.orgAfilipgAL$NIA.KcrcK8.html
(R-.8) 7IJ$7 7rogram# http:AAwww.p'Bip.com
(R-.=) 5$NJ$7 7rogram# http:AAwww.winBip.com
(R-.E) N$ST .$7S >)# 6ay )GF*
(R-.<) Snippets "ollection# http:AAwww.bro'ersys.comAsnippetsA
(R-.>) efg(s "omputer Lab 6athematics# "yclic Redundancy "ode "alculator#
http:AAwww.efg9.comAlabA6athematicsA"R".htm
(R-.F) Schneier# Hruce: /pplied "ryptography# Second -dition# )GG<# Lohn 5iley R
Sons.
(R-.G) $-T. R.")89*# R.L.Riest# +The 6D= 6essage Digest /lgorithm# /pril )GG9
(R-.)*) /.L.6eneBes# 7. an 0orschot# S.Qanstone# The %andboo' of /pplied
"ryptography# "R" 7ress# 0ctober )GG<
(R-.))) $-T. R.")89)# R.L.Riest# +The 6DE 6essage Digest /lgorithm# /pril )GG9
(R-.)9) N$ST .$7S )F<# Digital Signature Standard# !.S. Department of "ommerce#
6ay )GG=
(R-.)8) N$ST .$7S )F*1)# Secure %ash Standard# !.S. Department of "ommerce# /pril
)GGE (supersedes .$7S )F*)
(R-.)=) Horland "orporation# http:AAwww.borland.com
(R-.)E) Sun 6icrosystems "orporation# http:AAwww.sun.com
(R-.)<) 6icrosoft Systems Lournal# http:AAwww.microsoft.comAms2

Selection of Hashing Algorithms: Gary Fisher

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Selection of Hashing Algorithms: Gary Fisher

Caricato da

Copyright:

Formati disponibili

SELECTION OF HASHING ALGORITHMS

Potrebbero piacerti anche