Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
By
DEPARTMENT OF
SESSION 2010-2011
CERTIFICATE
This is to certify that Mr. Akshay Sharma, Ms. Riddhi Surana and Mr. Swapnil Bhatnagar
the students of B.Tech(Information Technology) Sixth(VIth) Semester have submitted their
Seminar entitled “MD5 Encryption” under my guidance.
(Guide)
CERTIFICATE OF COMPLETION
This is to certify that Mr. Akshay Sharma, Ms. Riddhi Surana and Mr. Swapnil Bhatnagar
the students of B.Tech(Information Technology) Sixth(VIth) Semester have presented and have
successfully completed their Seminar entitled “MD5 Encryption” in the presence of
undersigned dignitaries.
(H.O.D)
ACKNOWLEDGEMENT
It is our profound privilege to express deep sense of gratitude towards our institute Sir
Padampat Singhania University, Udaipur. We would also like to thank Prof. P.C. Deka, Vice –
Chancellor (SPSU); Prof. Arun Kumar, H.O.D (CSE/IT) for having permitted us to carry out this
project work; immense pleasure in thanking Mr. Deepak Gour and Mr. Sandeep Chourasia,
major seminar in charge for their encouragement and their appreciation.
We wish to extend our gratitude in all sincerity to our internal guide, Mr. Avinash Panwar and
other faculties for their able guidance and useful suggestions, which helped us in completing
the report work, in time.
Finally, yet importantly, we would like to express our heartfelt thanks to our beloved parents
for their blessings, our friends, classmates for their help and wishes for successful completion
of this report.
Previously to MD5, MD2 and MD4 were developed; in which MD2 was developed for 8-bit
processors whereas MD4 was not used at a large scale due to lack of security and MD4 &
MD5 were optimized for 32-bit processors which are widely used in different sectors today.
Message-Digest Algorithm 4 holds a good capability but was not at all good with respect to
security issues. MD5 was developed to enhance the security feature of the algorithm.
However, in 1996, it was discovered that MD5 was also vulnerable to some attacks; though it
was not as severe but it sent a message for a better replacement of the algorithm. Thus, MD5
is not used in some fields of applications like SSL or digital signatures that rely on this
property.
Rivest produced this product for the purpose of using MD5 with digital signature
applications. A Digital signature program includes compression of large-sized files using
safest method possible before it undergoes encryption using a password, which is under a
public key cryptosystem.
Today, MD5 algorithm is widely used in the process of assessing the authenticity of certain
files. The procedure of verification occurs in such a way that a 128-bit-message-digest of any
length is formed based on a primary data input. This is considered to be exclusive to just a
single data and it acts as an individual fingerprint as well. MD5 has been widely used in the
software world to provide some assurance that a transferred file has arrived. But now days, it
is quite easy to generate MD5 collisions, and so far, it is possible for the person who created
the file to create a second file with the same checksum, so the file can be protected against
some forms of malicious tampering.
MESSAGE-DIGEST
ALGORITHM 5
TABLE OF CONTENTS
TITLE
1. Introduction
2. History of MD5
3. Hash Functions
4. MD5 hashes
5. Visual Examples
6. Cryptography
7. Different Cryptographic Hash Functions
8. MD5 – The Algorithm
9. PHP Syntax
10. Applications of MD5 algorithm
11. How to use MD5 Hash to check the Integrity of Files?
12. Validity check in UNIX
13. Why is MD5 still used widely?
14. Performance of MD5
15. Difference between MD5 and its predecessors
16. Limitations
a. Vulnerabilities
Collision vulnerabilities
Pre-image vulnerability
Other vulnerabilities
Conclusions
References
Introduction compressing large-sized files using a
safe method before encryption (using
MD5 encryption is always mentioned a password), which is under a public
when we discuss over the topic of key Cryptosystem.
Cryptography. MD stands for „Message
– Digest‟ and describes a mathematical It is a very rare sight when any
function that takes into account a cryptographic algorithm has been
variable length string for processing. proved to be perfect. Algorithms are
The number 5 at the end simply examined as well as tested very
signifies that MD5 was the successor to carefully by the cryptographers but
MD4. only a few stands the test of time. The
MD5 Algorithm is also not perfect
Message-Digest algorithm 5, more because MD5 too has some problems.
commonly known as MD5, is a type of
cryptographic hash function that is It was proposed that, to produce two
generally used together with a 128-bit messages which have the same
hash value. MD5 is greatly utilized in Message Digest is computationally
different security functions as infeasible. But, unfortunately, it led to
identified in the standard Internet the problems concerning Multi-
Engineering Task Force (IETF). collision attacks. Collisions have been
According to the experts, MD5 hash is yielded for several one-way hash
commonly expressed as 32-digit algorithms. Of these, MD5 is the most
hexadecimal number. problematic due to its heavy
deployment. It largely affected, HMAC
MD5 was created and developed by and Digital Signatures within Digital
Prof. Ronald L. Rivest of MIT, which is Rights Management (DRM) Systems.
the third in this series of message
digests. The other two encryptions Nevertheless, MD5 has been around
were the MD2 and MD4 and these for years and still provides some
were quite the same with MD5 with decent level of security for certain
respect to their structure. However, things, it is commonly used to store
MD2 was more preferred on machines passwords in databases since MD5
that run in 8-bit while the two more cannot be reversed, passwords are
recent algorithms (MD4 and MD5) consider secure and safe if they are
were designed to work on 32-bit-type stored in this format.
of computers. The word “Message-Digest” here implies
a unique identification or a fingerprint
The most common use of MD5 is to of a file. Since any small change in the
validate the authenticity of any file. file can change its hash string, so it is
Rivest developed MD5 with a view to most commonly used in checking the
use it widely in digital signature integrity of the file. The MD5 algorithm
applications. Digital signature is designed to be quite fast on 32-bit
programs involve with the function of machines. In addition, the MD5
algorithm does not require any large
substituting tables; the Algorithm can For example, MD5 ("The quick brown
be coded quite compactly. fox jumps over the lazy dog") =
9e107d9d372bb6826bd81d3542a419d
MD5 is somewhat slower than that of 6
MD4 algorithm, but is more securely
and conservatively designed.
History of MD5
On March 18th, 2006, Klima published
MD5 is an algorithm developed by
an algorithm which was capable of
Professor Ronald Rivest of MIT
finding collision in MD5 on a single
University in the series of Message-
notebook computer, using a method
Digest Algorithms. When statistics and
particularly known as Tunnelling.
analytics indicated that the
predecessor of MD5, i.e. MD4 In 2008, United States Cyber
algorithm is quite insecure and Command used a MD5 hash of their
vulnerable, MD5 was designed in mission statement as a part of their
1991 to be more securable and official emblem.
conservative replacement against On December 24th, 2010, Tao Xie and
attacks. Dengguo Feng announced the first
In 1993, Dan Boer and Bosselaers were published single-block MD5 collision
succeeded partially in finding that two (two 64-byte messages with same
different initialization vectors produce MD5 hash). Previous collision
an identical digest. discoveries relied on multi-block
attacks. Xie and Fend, for some
In 1996, Dobbertin announced a
reasons didn‟t disclose the new attack
collision of compression function of
method. They have challenged the
MD5 algorithm. This was actually not
Cryptographic community of $10,000
an attack over the whole MD5 function
for the one who finds any other 64-
but it suggested considering any
byte collision method before January
better cryptographic replacement for
1st, 2013.
use.
Hash-function
A hash function is any well-defined codes, hash sums, checksums or hashes.
function which converts large amount Hash functions are most commonly
of data into a small data used for searching the data or for tasks
representation. like comparison, finding items in large
databases, detecting similar or
The Hash Function returns a calculated
duplicate records in large files, etc.
value known as hash values, hash
Hash functions are also used in hash The 128-bit or, more specifically, 16-
tables to quickly locate the data, used byte MD5 hashes (also termed
to build cache for large data. message digests) can be represented in
the form of a sequence of 32 hex digits.
The following demonstrates a 43-byte
ASCII input and the corresponding
MD5 hash:
MD5 ("") =
d41d8cd98f00b204e9800998ecf8427e
MD5 Hashes
Cryptography
Cryptography or cryptology is the used in ATM cards, computer passwords
pattern and study of hiding and electronic commerce.
information or data. Modern
History of cryptography indicates that
cryptography mainly deals with the
it is purely used only for conversion of
field of mathematics, computer science
message (i.e. encryption). [5]The origin
and electrical engineering.
of cryptography probably goes back to
Cryptography applications are mostly
the very beginning of human
existence, as people tried to learn how the use of asymmetric key algorithms
to communicate. They consequently rather than symmetric key algorithms,
had to find means to guarantee Quantum Cryptography gives the
secrecy as part of their quantum mechanical effects, Caesar
communications. Howsoever, the first cipher also known as shift cipher, it is
deliberate use of these kinds of simplest and widely used encryption
technical methods may be attributed technique. It is like a substitution
to the ancient Greeks, around 6 years cipher in which letter in the text is
BC: a stick, named "scytale" was used. replaced by the some fixed numbers.
[2]
The sender would roll a strip of paper Within the context of any application-to-
around the stick and write his message application communication, there are
longitudinally on it. He then unfolds some specific security requirements,
the paper and sends it over to an including:
addressee. Decrypting the message Authentication: The process of
without knowledge of the stick‟s width proving one's identity. (The primary
- acting here as a secret key - was forms of host-to-host
meant to be impossible. Later, Romans authentication on the Internet
used Caesar‟s Cipher code to
today are name-based or address-
communicate (a three letter alphabet based, both of which are
shift). Main use of encryption was to notoriously weak.)
ensure the privacy of the data or
message in communication from spies Privacy/confidentiality: Ensuring
or diplomats. In this day‟s this filed that no one can read the message
make so much progress with new except the intended receiver.
techniques like message integrity
Integrity: Assuring the receiver that
checking, sender/receiver identity
the received message has not been
authentication, digital signatures,
altered in any way from the
secure computations etc.
original.
The modern field of cryptography have
Non-repudiation: A mechanism to
lots of area for studying some are like
prove that the sender really sent
public-key cryptography which include
this message.
Different Cryptographic Hash Functions:
MD5 ALGORITHM
The MD5 algorithm is an extension of MD5 is a block-chained digest
the MD4 message-digest algorithm. algorithm, computed over the data in
phases of 512-byte blocks organized
as little-endian 32-bit words (Figure). MD5 algorithm uses four rounds, each
The first block is processed with an applying one of four non-linear
initial seed, resulting in a digest that functions to each sixteen 32-bit
becomes the seed for the next block. segments of a 512-bit block source
When the last block is computed, its text. The result is a 128-bit digest.
digest is the digest for the entire Figure 1 is a graph representation that
stream. This chained seeding prohibits illustrates the structure of the MD5
parallel processing of the blocks. algorithm.
The Algorithm
[3][4]
The algorithm takes as input a
message of arbitrary length and
produces as output a 128-bit
Where, "fingerprint" or "message digest" of
the input. It is conjectured that it is
Denote the XOR, OR, AND,
computationally infeasible to produce
and NOT operations respectively.
two messages having the same
Here, we will also use a 64-element message digest, or to produce any
table T [1 ... 64] constructed from the message having a given pre-specified
Sine function. Let T[i] denote the i th target message digest. The MD5
element of the table, which is equal to algorithm is intended for digital
the integer part of 4294967296 times signature applications, where a large
abs (sin (i)), where I is in radians. file must be "compressed" in a secure
manner before being encrypted with a
5. Output. private (secret) key under a public-key
cryptosystem such as RSA
FOR I = 0 TO N/16-1 DO //PROCESS EACH 16 -WORD BLOCK.
FOR J = 0 TO 15 DO //COPY BLOCK I INTO X.
SET X[J] TO M [I*16+J].
END //OF LOOP ON J
AA = A //SAVE A AS AA, B AS BB, C AS CC, AND D AS DD.
BB = B
CC = C
DD = D
/* THEN PERFORM THE FOLLOWIN G ADDITIONS. (THAT IS INCREMENT EACH OF THE FOUR
REGISTERS BY THE VALUE IT HAD BEFORE THIS BLOCK WAS STARTED.) */
A = A + AA
B = B + BB
C = C + CC
D = D + DD
END
if (md5($_POST["pass"] == $real_pass)
// Password correct
// Set cookies, redirect, display page
else
// Password incorrect
Applications of MD5
md5sum -c file.md5
find -s directory -type f -exec md5sum {} MD5 is widely used as a checksum
\; >> file.md5 hash function because it‟s fast and
presents an extremely low collision
This will now store the checksum for all
ratio. An MD5 checksum is composed
the files inside the directory into of 32 hexadecimal digits which
file.md5. We can now check the copied
together provide a 1 in ~3.42e34 odds
directory with respect to this file. Now
of a collision.
go to the location of the copied
directory and issue: MD5 can be read easily as it is short.
For unskilled tasks MD5 hash is good
2. md5sum -c /path/to/file.md5
enough. For example, if we download
an e-book from a trusted mirror and
3. We can now install md5deep, want to check whether the file that has
which has a recursive option.
been downloaded is correct or not, we
md5deep -rl directory > file.md5 can do it so by generating the MD5
hash of it. Then compare the hash with
Now go to the location of the copied the generated hash of the file. If both
directory and issue: hashes match the e-book is
downloaded correctly and completely.
md5sum -c /path/to/file.md5
For cryptography, MD5 is a valid
alternative if security is only a
moderate concern. It's a very viable
Why MD5 is still used option for hashing database passwords
or other fields requiring internal
widely? security for its speed mostly, but also
because MD5 does offer a reasonable
An MD5 exposure is well documented
level of security where strong
and it remains distributed in its usage.
encryption is not a concern.
MD5 is used as a checksum hash
function because it is very fast and MD5 security can be improved by
collision is very low in ratio and if preserving it.
collision is possible then that is not a
big problem. MD5 is very quick to
create.
Figure above Demonstrating the MD5 hashes in Database.
Figure Showing a Windows Application that generates MD5 Hashes to
compare two files.
___________________________________________________________________
Key size/hash size Extrapolated Speed PRB Optimized
__________________________________________________________________________________________
RSA 512 7 -
__________________________________________________________________________________________
The above figure shows the size of the proprietary as well as optimized
key or more specifically, size of the implementations of the algorithms in
result for some safe and secure digest assembly language. As is shown in the
functions and the speed of some most figure, MD5 performs much faster than
popular encryption algorithms. all other algorithms in both the cases.