Sei sulla pagina 1di 6

Implementation of AES Algorithm in parallel

with Different Approach and its Performance


Zeru Kifle Eleni Megerssa
GSE/1772/10 And GSE/4313/10
Email: zmskifle@gmail.com Email:elenimegerssa@gmail.com

Abstract— Today most of manual information transfer traditionally when the amount of input data size becomes
from source to destination changed to electronic or digital large AES algorithm need much computational resource
information sharing through internet using computer in day- and its performance of encryption/decryption process of
to-day activity. Transferring digital information through
data becomes very slow; in addition to data and channel
internet without advanced security mechanism easily
security level we must consider the performance of
vulnerable to malware attacker, information leakages and
accessed by unauthorized users. In order to avoid this
algorithm response in real time system. All paper
security issue researchers select and implement an advanced discussed in this section to solve the above problems
cryptography algorithm. AES one of advanced cryptography rather than implementing AES algorithm in sequential
algorithm, that requires large number of mathematical they implement in parallel, we discussed at section 4 with
computation in order to encryption and decryption to paper title such as Performance Analysis of Parallel
protect and secure data and information transmission Implementation of Advanced Encryption Standard (AES)
channel. over Serial Implementation, Parallelization of AES
algorithm using OpenMP, Accelerating-Encryption-
Implementing AES algorithm in sequential takes much
time and not efficient for real time application that needs Decryption-using-GPUs-for-AES-algorithm, Parallel
fast encryption decryption process; so that different Implementation of AES algorithm on GPU, and An
researches try implement AES algorithm in parallel with Optimized parallel computation of advanced encryption
different approach that we discussed in section four. algorithm using OpenMP.

The Aim of this paper study about different research The performance of AES algorithms enhances with
papers that implement AES in parallel with different
implementing AES in parallel using different mechanisms
approach and compare the performance enhancement of
each on AES algorithm encryption and decryption ability.
of Parallelization. In addition to this, we have seen
Finally identify the loophole on those reviewed papers and implementing AES in parallel on GPU less cost and more
suggest what will done in the future. efficient rather than using multi-core system.

Keywords— Advanced Encryption Standard (AES), This paper is; organized as follows. Section 2. describes
Application Programming Interface (API), Compute Unified the AES Algorithm. Section 3. Parallelization tools used
Device Architecture (CUDA, Data Encryption Standard (DES),
in this paper with short explanation. Section Reviewed
Graphics Processing Unit (GPU) and Message Passing
Papers Summary of papers idea that paper titles are listed
Interface (MPI)
above in this section and result analysis. 5. Ends up with
conclusion.
1. INTRODUCTION
Today most users transfer information from source to
destination by using computer through internet in the form
of digital data. To keep this information secure from 2. AES ALGORITHM DESCRIPTION
unauthorized users, access, use, record, disruption, In cryptography, the Advanced Encryption Standard
modification, deletion without the permission of sender (AES) or Rijndael Algorithm is a symmetric block cipher,
and receiver. To secure means keep the data availability, which translates the plaintext into cipher text in blocks.
integrity and confidentiality using cryptography This algorithm process fixed input block size of 128 bits
algorithm. AES algorithm is a symmetric block cipher that using the cipher key length of 128, 192,256 bits, which is
can encrypt and decrypt information thus providing a high specified by flips standard.
level of security to the electronic data.
AES Cipher:
Advanced Encryption Standard (AES) have the ability Rijndael can be specified with the block and key size in a
encrypt and decrypt digital data sequentially in multiple of 32 bit with a minimum of 128 bits and
maximum of 256bits and also Rijndeal was designed to I. The Encryption process consists of 4 phases, which is
support additional block size and key length which is used for convert plaintext to cipher text listed below
called Expanded key size. A number of AES parameters 1. Key Expansion :
depend on the key length. The following table shows the Key expansion takes the input key of 128, 192
commonly used input data size with corresponding or 256 bits and produces an expanded key for
encryption /decryption key length in AES Algorithm. use in the subsequent stages. The expanded key’s
size is related to the number of rounds to be
Key Size 4/16/128 6/24/192 8/32/256 performed. For 128-bit keys, the expanded key
(Words/byte/bit) size is 352 bits. For 192 and 256-bit keys, the
Plaintext block size 4/16/128 4/16/128 4/16/128 expanded key size is 624 and 960 bits. It is the
(word/byte/bits ) expanded key that is used in subsequent phases
Number of Rounds 10 12 14 of the algorithm. During each round, a different
Round Key 4/16/128 4/16/128 4/16/128 portion of the expanded key is used in the
Size(Words/bytes/bits) AddRoundKey step [3].
Expanded Key size 44/176 52/208 60/240
(words/bytes) 2. Initial round
Table 1 AES Parameters a) AddRoundKey : During this stage of the
algorithm, the message is combined with the
The Encryption and decryption process consists of a state using the appropriate portion of the
number of different transformations applied consecutively expanded key.each byte of state is combined
over the data block bits, in a fixed number of iterations, with the round key using bitwise XOR[3].
called rounds (Nr).Each round consists of several 3. Rounds
processing steps, including one that depends on the a) SubBytes :
encryption key length. A set of reverse rounds are applied
to transform cipher text back into the original plain text During this stage, the block is modified
using the same encryption key. The key expansion module by using an 8-bit substitution, or SBox. This
modified in reverse order in which last round key in is a non-linear transformation used to help
encryption considered as the first round in decryption avoid attacks based on algebraic
process. The Middle round will undergo Nr-1 iterations. manipulation [3].
b) ShiftRows:
The algorithm for encryption/decryption process is shown This stage of the algorithm shifts
in figure 1[3]. cyclically shifts the bytes of the block by
certain offsets. Blocks of 128 and 192 bits
leave the first 32-bits alone, but shift the
subsequent 32-bit rows of data by 1, 2 and 3
bytes respectively[3].

c) AddRoundKey
4. Final Round (no MixColumns)
a) Substitute Bytes
b) Shift Rows
c) Add Round Key
II. The Decryption process consists of 4
phases, which is used for convert cipher text
to plaintext listed below
1. Key expansion
2. Initial round
Figure 1 : AES Algorithm a. Add Round Key
3. Middle Round
a) Inverse Shift Rows
b) Inverse Substitute Bytes
High-Level Description of Encryption and Decryption
c) Add Round Key
of AES Algorithm
d) Inverse Mix Columns
4. Final Round

2
a) Inverse Shift Rows portable model for developers of shard memory parallel
b) Inverse Substitute Bytes application. OpenMP supports different programming
c) Add Round Key languages like C, C++ and FORTRAN on several
architectures like Linux and window platform. All
High-Level Description of Encryption and Decryption OpenMP program starts execution with single thread
of AES Algorithm Description called it as master thread. Master thread execute the
region until parallel constructed is countered. After
parallel construct is encountered master thread, get divide
into number of lightweight parallel threads. OpenMP Uses
3. PARALLELIZATION SOFTWARE TOOLS AND
fork – join model for the execution of the program. The
ACCELERATORS USED IN THIS PAPER WITH SHORT
statement, which lies between parallel region construct
EXPLANATION
are, execute in parallel among various threads. After
Many programming API’s are available that support execution of parallel region group of light weighted
parallel programming development .we explained in short parallel threads are terminated as master treads continues
some of parallel programming API, that the researches its execution CUDA platform [2].
implement the Advanced Encryption Standard (AES) in
parallel on multi- core system and graphical processing
unit (GPUs) papers that we discussed in the next paper
D. GPU
review part.
A graphical processing unit (GPU) previously
A. MPI dedicated for visual processing now a days GPUs
used for their highly parallel structure makes them
MPI stands for Message Passing Interface that is one
more efficient than general purpose CPUs for
of parallel computing platform programming model
algorithm where processing of large blocks of data is
standard for distributed memory program. This program
done in parallel. GPU act as co-processor with the
used to send and receive data between processors.
CPU reducing the overload of CPU by enhancing the
processing time. GPUs are significantly faster than
B. CUDA
CPUs and the memory interfaces are suitable to shift
CUDA is a parallel computing platform and
around a lot of more data than CPU.GPU accelerates
programming model invented by NVIDIA. It enables
application running on CPU by offloading more time
dramatic increase in computing performance by
consuming computation done on its memory.
harnessing the power of the graphics processing unit
(GPU). CUDA is a compiler and a toolkit for CPUs and GPUs have fundamentally different design
programming NVIDIA GPUs. CUDA API extends the philosophies. The design of the GPUs is forced by the
C programming language. It runs on thousands of fast growing video game industry that exerts
threads and is scalable model. It supports languages of tremendous economic pressure for the ability to
C, C++, OpenCl and also compatible with windows, perform a massive number offloating-point
Linux, and OSX.. A major Design goal of CUDA is to calculations per video frame in advanced games. The
support heterogeneous computations in a sense that general philosophy for GPU design is to optimize for
applications are serial parts of an application are the execution of massive number of threads.
executed on the CPU and parallel parts on the GPU. [3] Architecture of GPUs and CPUs architecture [6]

C. OpenMP API
The equations are an exception to the prescribed
specifications of this template. You will need to determine
whether or not your equation should be typed using either
the Times New Roman or the Symbol font (please no
other font). To create multileveled equations, it may be
necessary to treat the equation as a graphic and insert it
into the text after your paper is styled.
OpenMp API (Open Multi- processing application
programming interface) is a tool, which support
Multiprocessing. A group of computer software and
hardware vendors defines it. OpenMP is scalable and

3
4. PAPER REVIEW

4.1. Paper by Uzzal Kumar Prodhan1, A.H.M. Shahariar


Parvez2, Md. Ibrahim Hussain2,Yeasir Fathah
Rumi2,Md. Ali Hossain titled “PERFORMANCE
ANALYSIS OF PARALLEL IMPLEMENTATION OF
ADVANCED ENCRYPTION STANDARD (AES)
OVER SERIAL IMPLEMENTATION”

The aim of this paper solves the issue that occurred


during cryptograph algorithm (AES) in serial. The issue
occurred during serial implementation of AES algorithm,
it is efficient only on single processor with hardware
implementation in software and the algorithm does not
respond when the flow discovered or change in standard
during AES implementation.
After parallel implementation of AES with MPI on
two processors that input file size greater than 108
execution time will require several days to encrypt and
decrypt in serial this is significantly reduced and increase
the performance of AES algorithm as the number of
processor increases. The loophole in this research paper
researcher not address factors that affects AES algorithm
in parallel implementation such as the architecture of
parallel computer, number of processor and
interconnection network.
The following Picture [1] shows experimental result of
AES implementation in parallel with MPI vs serial
encryption decryption performance of AES with
encryption decryption key 128 bit.
Figure 2: CPUs and GPUs Design architecture
Figure 2 shows the architecture of a typical GPU
today. It is organized into 16 highly threaded
streaming Multiprocessors (SMs). A pair of SMs
forms a building block. Each SM has eight streaming
processors (SPs), for a total of 128 (16*8). Each SP
has a multiply-add (MAD) unit, and an additional
multiply (MUL) unit. Each GPU currently comes with
1.5 megabytes of DRAM. These DRAMs differ from
the system memory DIMM DRAMs on the Figure 3: Speedup of AES in parallel with 128-bit key
motherboard in that they are essentially the frame size
buffer memory that is used for graphics. For graphics
4.2. Paper by S.S. Navalgund, Akshay Desai, Krishna
applications, they hold high-definition video images,
Ankalgi and Harish Yamanur titled “Parallelization
and texture information for 3D rendering as in games.
of AES Algorithm Using OpenMP”
But for computing, they function like very high
bandwidth off-chip cache, though with somewhat
more latency regular cache or system memory. If the In this paper, researcher implement optimized
chip is programmed properly, the high bandwidth parallel architecture of AES algorithm at both data level
makes up for the large latency [3]. and control level with C language that easily parallelized
and they use Intel VTuneTM Amplifier X 2013 software
performance standard analyzer benchmark that used to
find the most time consuming part of the AES algorithm.

4
Generally, when we see the implementation result in 1 23 110.00 3272.00 30.6846 34.628
sequential and parallel code at different input file size the
AES in parallel implementations have better performance 2 58 420.00 20500 30.786 34.551
over sequential while the input file size or data size less
than 5KB the difference is not visible in both cases. Still 3 115 770.00 440128 32.134 36.0249
implementing AES algorithm in parallel on CPU is
4 457 4010.00 6347.00 39.4578 44.113
expensive. The following result depicted in table taken
from the paper that show AES implementing with 5 914 8870.00 10347.00 78.4082 88.131
OpenMP in parallel have better performance over CPU. 0
File Execution Time
size Sequential Parallelized code Table 3: AES algorithm parallel implementation
code result on NVIDIA GPU VS CPU execution time
5KB 0.045sec 0.025 sec
10KB 0.072 sec 0.058sec
20KB 0.128 sec 0.104sec
40KB 0.245sec 0.211 sec 4.4. Paper by SanketWagh, PawanPhad, AmitSurwade
50KB 0.334 sec 0.279 sec “Parallel Implementation of AES Algorithm on GPU”
100KB 0.770 sec 0.6698 sec
In this paper, researchers study the efficiency of any
200KB 1.343 sec 1.002 sec
400KB 2.461 sec 1.187 sec
vendor of GPU to implement cryptographic algorithm
800KB 5.439 sec 4.238 sec (AES) in parallel by using CUDA platform. This research
1MB 6.799 sec 5.010 sec solves the vender specific issue in the previous paper that
2MB 13.649 sec 11.483 sec implement only on NVIDIA GPU. The drawback of
5MB 32.37 sec 27.3890 sec implementing AES in parallel on GPU; GPU operation
Table 2: AES algorithm encryption and decryption unable to produce result if the data that processed on GPU
execution time on CPU and GPU dependent on each other. The following tables shows AES
algorithm implementation on any vender GPU.
4.3. Paper by Sanjanaashreep “accelerating
encryption/decryption using gpu’s for aes algorithm”

File Encryption Encryption Decryption Decryption


In this paper, researchers study about optimized size on CPU on GPU on CPU on GPU
architecture of parallel computing platform; GPU that (ms) (ms) (ms) (ms)
used to implement AES in parallel rather than using CPU. 10MB 6705 983.353 13743 987.341
Accelerate AES algorithm in parallel on NVIDIA GPU by 20MB 12868 1916 26076 1920
using CUDA platform, which is use in parallel 100MB 68934 10257 144102 1020
200Mb 123632 177334 231973 17765
programing model created by NVIDIA.
300MB 189748 26000 364066 27103
The result shown in this paper implementing AES in Table 4: AES algorithm in parallel implementation on
parallel on GPU enhance the performance of Decryption GPU result table
and Encryption process compared to CPU result. It reduce
4.5. Paper by M.Tech Computer science and
the cost of parallelizing AES on CPU or multi-core
Engineering, RCOEM Nagpur, Maharashtra titled
system because of GPU have high performance and large “An Optimized Parallel Computation of Advanced
amount of throughput for large input file size processing Encryption Algorithm using Open MP”
of AES algorithm and also its cost is cheaper compared to
using CPU to process large amount of data with small
In this paper, researchers design flexible parallel
throughput. The following result depicted in table taken
algorithm using OpenMP on multi-core system for AES
from the paper that show AES implementing with CUDA
algorithm depending on input file size. The input file size
platform in parallel have better performance over CPU
is smaller the performance of the AES algorithm in
parallel is not visible; in order to solve this issue they
S. Input CPU CPU GPU CPU implement switching algorithm for AES based on input
N File Encryptio Decryption Encryptio Decrypti file or data size with OpenMP. When they implement this
o Size n time time (ms) n time on time algorithm, calculate the threshold value by using trial and
(KB) (ms) (ms) (ms) error method. If the calculated the threshold value less
than input file size activate the sequential execution

5
algorithm for AES and if the value is greater than the
threshold value activate parallel execution algorithm for
AES. Loophole in this paper is calculating threshold value
in order to decide parallel execution of AES or sequential
execution of AES activate.

5. CONCLUSION
In this paper, parallelization of AES algorithm with
the help of parallel programming software API tools on
multi-core system and GPUs research papers are revised
and Compare their performance according to the method
they use and the output result they get. All papers seen in
section four shows that, if the amount of input file size
large the encryption and decryption time required is
greatly reduced when AES algorithms runs in parallel and
also the last paper revised at section four have additional
future that AES algorithm runs in two modes depend on
the input file size. We will recommend that future work
will include implementing AES algorithm to calculate the
threshold value automatically and implement AES
algorithm on GPU with OpenCL.

ACKNOWLEDGMENT
We would like to express our deepest gratitude to
our instructor Mr. Fitsum Assamnew for making us to
explore the subject matter purposeful in parallel
computing, so that we learnt more from this.

REFERENCES
[1] “PERFORMANCE ANALYSIS OF PARALLEL
IMPLEMENTATION OF ADVANCED ENCRYPTION
STANDARD (AES) OVER SERIAL IMPLEMENTATION”Uzzal
Kumar Prodhan1, A.H.M. Shahariar Parvez2, Md. Ibrahim
Hussain2,YeasirFathah Rumi2,Md. Ali Hossain.International
Journal of Information Sciences and Techniques (IJIST) Vol.2,
No.6, November 2012
[2] “Parallelization of AES Algorithm Using OpenMP”, S. S.
Navalgund, Akshay Desai, Krishna Ankalgi, and Harish Yamanur.
Lecture Notes on Information Theory Vol. 1, No. 4, December
2013
[3] “Accelerating Encryption/Decryption using gpu’s for aes
algorithm” Sanjanaashree P .International Journal of Scientific &
Engineering Research, Volume 4, Issue 2, February-2013 ISSN
2229-5518
[4] “parallel Implementation of AES Algorithm on GPU
”SanketWagh, PawanPhad, AmitSurwade. AMonthly Journal of
Computer Science and Information Technology,IJCSMC, Vol. 4,
Issue. 3, March 2015, pg.247 – 252
[5] “An Optimized Parallel Computation of Advanced Encryption
Algorithm using Open MP -A Review ”M.Tech Computer science
and Engineering, RCOEM Nagpur, Maharashtra.International
Journal of Advanced Research in Computer and Communication
Engineering Vol. 5, Issue 2, February 2016
[6] http://www.ncsa.illinois.edu/People/kindr/projects/hpca/files/singa
pore_p1.pdf
[7] http://mpitutorial.com/tutorials/mpi-introduction/

Potrebbero piacerti anche