Sei sulla pagina 1di 6

Implementation of AES Algorithm in parallel with Different Approach and its Performance

Zeru Kifle

GSE/1772/10

And

AbstractToday most of manual information transfer from source to destination changed to electronic or digital information sharing through internet using computer in day- to-day activity. Transferring digital information through internet without advanced security mechanism easily vulnerable to malware attacker, information leakages and accessed by unauthorized users. In order to avoid this security issue researchers select and implement an advanced cryptography algorithm. AES one of advanced cryptography algorithm, that requires large number of mathematical computation in order to encryption and decryption to protect and secure data and information transmission channel.

Implementing AES algorithm in sequential takes much time and not efficient for real time application that needs fast encryption decryption process; so that different researches try implement AES algorithm in parallel with different approach that we discussed in section four.

The Aim of this paper study about different research papers that implement AES in parallel with different approach and compare the performance enhancement of each on AES algorithm encryption and decryption ability. Finally identify the loophole on those reviewed papers and suggest what will done in the future.

KeywordsAdvanced Encryption Standard (AES), Application Programming Interface (API), Compute Unified Device Architecture (CUDA, Data Encryption Standard (DES), Graphics Processing Unit (GPU) and Message Passing Interface (MPI)

1.

INTRODUCTION

Today most users transfer information from source to destination by using computer through internet in the form of digital data. To keep this information secure from unauthorized users, access, use, record, disruption, modification, deletion without the permission of sender and receiver. To secure means keep the data availability, integrity and confidentiality using cryptography algorithm. AES algorithm is a symmetric block cipher that can encrypt and decrypt information thus providing a high level of security to the electronic data.

Advanced Encryption Standard (AES) have the ability encrypt and decrypt digital data sequentially in

Eleni Megerssa

GSE/4313/10

Email:elenimegerssa@gmail.com

traditionally when the amount of input data size becomes large AES algorithm need much computational resource and its performance of encryption/decryption process of data becomes very slow; in addition to data and channel security level we must consider the performance of algorithm response in real time system. All paper discussed in this section to solve the above problems rather than implementing AES algorithm in sequential they implement in parallel, we discussed at section 4 with paper title such as Performance Analysis of Parallel Implementation of Advanced Encryption Standard (AES) over Serial Implementation, Parallelization of AES algorithm using OpenMP, Accelerating-Encryption- Decryption-using-GPUs-for-AES-algorithm, Parallel Implementation of AES algorithm on GPU, and An Optimized parallel computation of advanced encryption algorithm using OpenMP.

The performance of AES algorithms enhances with implementing AES in parallel using different mechanisms of Parallelization. In addition to this, we have seen implementing AES in parallel on GPU less cost and more efficient rather than using multi-core system.

This paper is; organized as follows. Section 2. describes the AES Algorithm. Section 3. Parallelization tools used in this paper with short explanation. Section Reviewed Papers Summary of papers idea that paper titles are listed above in this section and result analysis. 5. Ends up with conclusion.

2. AES ALGORITHM DESCRIPTION

In cryptography, the Advanced Encryption Standard (AES) or Rijndael Algorithm is a symmetric block cipher, which translates the plaintext into cipher text in blocks. This algorithm process fixed input block size of 128 bits using the cipher key length of 128, 192,256 bits, which is specified by flips standard.

AES Cipher:

Rijndael can be specified with the block and key size in a multiple of 32 bit with a minimum of 128 bits and

maximum of 256bits and also Rijndeal was designed to support additional block size and key length which is called Expanded key size. A number of AES parameters depend on the key length. The following table shows the commonly used input data size with corresponding encryption /decryption key length in AES Algorithm.

Key

Size

4/16/128

6/24/192

8/32/256

(Words/byte/bit)

Plaintext

block

size

4/16/128

4/16/128

4/16/128

(word/byte/bits )

Number of Rounds

 

10

12

14

Round

Key

4/16/128

4/16/128

4/16/128

Size(Words/bytes/bits)

Expanded

Key

size

44/176

52/208

60/240

(words/bytes)

Table 1 AES Parameters

The Encryption and decryption process consists of a number of different transformations applied consecutively over the data block bits, in a fixed number of iterations, called rounds (Nr).Each round consists of several processing steps, including one that depends on the encryption key length. A set of reverse rounds are applied to transform cipher text back into the original plain text using the same encryption key. The key expansion module modified in reverse order in which last round key in encryption considered as the first round in decryption process. The Middle round will undergo Nr-1 iterations.

The algorithm for encryption/decryption process is shown in figure 1[3].

for encryption/decryption process is shown in figure 1[3]. Figure 1 : AES Algorithm High-Level Description of

Figure 1 : AES Algorithm

High-Level Description of Encryption and Decryption of AES Algorithm

2

I. The Encryption process consists of 4 phases, which is used for convert plaintext to cipher text listed below

1. Key Expansion :

Key expansion takes the input key of 128, 192 or 256 bits and produces an expanded key for use in the subsequent stages. The expanded key’s size is related to the number of rounds to be performed. For 128-bit keys, the expanded key size is 352 bits. For 192 and 256-bit keys, the expanded key size is 624 and 960 bits. It is the expanded key that is used in subsequent phases of the algorithm. During each round, a different portion of the expanded key is used in the AddRoundKey step [3].

2.

Initial round

 

a)

AddRoundKey : During this stage of the algorithm, the message is combined with the state using the appropriate portion of the expanded key.each byte of state is combined with the round key using bitwise XOR[3].

3.

Rounds

 

a)

SubBytes :

During this stage, the block is modified by using an 8-bit substitution, or SBox. This is a non-linear transformation used to help avoid attacks based on algebraic manipulation [3].

b)

ShiftRows:

 

This stage of the algorithm shifts cyclically shifts the bytes of the block by certain offsets. Blocks of 128 and 192 bits leave the first 32-bits alone, but shift the subsequent 32-bit rows of data by 1, 2 and 3 bytes respectively[3].

c)

AddRoundKey

 

4.

Final Round (no MixColumns)

 

a)

Substitute Bytes

 

b)

Shift Rows

c)

Add Round Key

 

II.

The

Decryption

process

consists

of

4

phases, which is used for convert cipher text to plaintext listed below

1. Key expansion

2. Initial round

a. Add Round Key

3. Middle Round

a) Inverse Shift Rows

b) Inverse Substitute Bytes

c) Add Round Key

d) Inverse Mix Columns

4. Final Round

a) Inverse Shift Rows

b) Inverse Substitute Bytes

c) Add Round Key

High-Level Description of Encryption and Decryption of AES Algorithm Description

3. PARALLELIZATION SOFTWARE TOOLS AND ACCELERATORS USED IN THIS PAPER WITH SHORT EXPLANATION

Many programming API’s are available that support parallel programming development .we explained in short some of parallel programming API, that the researches implement the Advanced Encryption Standard (AES) in parallel on multi- core system and graphical processing unit (GPUs) papers that we discussed in the next paper review part.

A. MPI

MPI stands for Message Passing Interface that is one of parallel computing platform programming model standard for distributed memory program. This program used to send and receive data between processors.

B. CUDA

CUDA is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increase in computing performance by harnessing the power of the graphics processing unit (GPU). CUDA is a compiler and a toolkit for programming NVIDIA GPUs. CUDA API extends the

C programming language. It runs on thousands of

threads and is scalable model. It supports languages of

C, C++, OpenCl and also compatible with windows,

Linux, and OSX

A major Design goal of CUDA is to

support heterogeneous computations in a sense that applications are serial parts of an application are executed on the CPU and parallel parts on the GPU. [3]

C. OpenMP API

The equations are an exception to the prescribed specifications of this template. You will need to determine whether or not your equation should be typed using either the Times New Roman or the Symbol font (please no other font). To create multileveled equations, it may be necessary to treat the equation as a graphic and insert it into the text after your paper is styled.

OpenMp API (Open Multi- processing application programming interface) is a tool, which support Multiprocessing. A group of computer software and hardware vendors defines it. OpenMP is scalable and

3

portable model for developers of shard memory parallel application. OpenMP supports different programming languages like C, C++ and FORTRAN on several architectures like Linux and window platform. All OpenMP program starts execution with single thread called it as master thread. Master thread execute the region until parallel constructed is countered. After parallel construct is encountered master thread, get divide into number of lightweight parallel threads. OpenMP Uses fork join model for the execution of the program. The statement, which lies between parallel region construct are, execute in parallel among various threads. After execution of parallel region group of light weighted parallel threads are terminated as master treads continues its execution CUDA platform [2].

D.

GPU

A graphical processing unit (GPU) previously dedicated for visual processing now a days GPUs used for their highly parallel structure makes them more efficient than general purpose CPUs for algorithm where processing of large blocks of data is done in parallel. GPU act as co-processor with the CPU reducing the overload of CPU by enhancing the processing time. GPUs are significantly faster than CPUs and the memory interfaces are suitable to shift around a lot of more data than CPU.GPU accelerates application running on CPU by offloading more time consuming computation done on its memory.

CPUs and GPUs have fundamentally different design philosophies. The design of the GPUs is forced by the fast growing video game industry that exerts tremendous economic pressure for the ability to perform a massive number offloating-point calculations per video frame in advanced games. The general philosophy for GPU design is to optimize for the execution of massive number of threads.

Architecture of GPUs and CPUs architecture [6]

Figure 2: CPUs and GPUs Design architecture Figure 2 shows the architecture of a typical

Figure 2: CPUs and GPUs Design architecture

Figure 2 shows the architecture of a typical GPU today. It is organized into 16 highly threaded streaming Multiprocessors (SMs). A pair of SMs forms a building block. Each SM has eight streaming processors (SPs), for a total of 128 (16*8). Each SP has a multiply-add (MAD) unit, and an additional multiply (MUL) unit. Each GPU currently comes with 1.5 megabytes of DRAM. These DRAMs differ from the system memory DIMM DRAMs on the motherboard in that they are essentially the frame buffer memory that is used for graphics. For graphics applications, they hold high-definition video images, and texture information for 3D rendering as in games. But for computing, they function like very high bandwidth off-chip cache, though with somewhat more latency regular cache or system memory. If the chip is programmed properly, the high bandwidth makes up for the large latency [3].

4

4. PAPER REVIEW

4.1. Paper by Uzzal Kumar Prodhan1, A.H.M. Shahariar Parvez2, Md. Ibrahim Hussain2,Yeasir Fathah Rumi2,Md. Ali Hossain titled “PERFORMANCE ANALYSIS OF PARALLEL IMPLEMENTATION OF ADVANCED ENCRYPTION STANDARD (AES) OVER SERIAL IMPLEMENTATION”

The aim of this paper solves the issue that occurred during cryptograph algorithm (AES) in serial. The issue occurred during serial implementation of AES algorithm, it is efficient only on single processor with hardware implementation in software and the algorithm does not respond when the flow discovered or change in standard during AES implementation.

After parallel implementation of AES with MPI on two processors that input file size greater than 10 8 execution time will require several days to encrypt and decrypt in serial this is significantly reduced and increase the performance of AES algorithm as the number of processor increases. The loophole in this research paper researcher not address factors that affects AES algorithm in parallel implementation such as the architecture of parallel computer, number of processor and interconnection network.

The following Picture [1] shows experimental result of AES implementation in parallel with MPI vs serial encryption decryption performance of AES with encryption decryption key 128 bit.

performance of AES with encryption decryption key 128 bit. Figure 3: Speedup of AES in parallel

Figure 3: Speedup of AES in parallel with 128-bit key

size

4.2. Paper by S.S. Navalgund, Akshay Desai, Krishna Ankalgi and Harish Yamanur titled “Parallelization of AES Algorithm Using OpenMP”

In this paper, researcher implement optimized parallel architecture of AES algorithm at both data level and control level with C language that easily parallelized and they use Intel VTuneTM Amplifier X 2013 software performance standard analyzer benchmark that used to find the most time consuming part of the AES algorithm.

Generally, when we see the implementation result in sequential and parallel code at different input file size the AES in parallel implementations have better performance over sequential while the input file size or data size less than 5KB the difference is not visible in both cases. Still implementing AES algorithm in parallel on CPU is expensive. The following result depicted in table taken from the paper that show AES implementing with OpenMP in parallel have better performance over CPU.

File

 

Execution Time

size

Sequential

Parallelized code

code

5KB

0.045sec

0.025

sec

10KB

0.072

sec

0.058sec

20KB

0.128

sec

0.104sec

40KB

0.245sec

0.211

sec

50KB

0.334

sec

0.279

sec

100KB

0.770

sec

0.6698

sec

200KB

1.343

sec

1.002

sec

400KB

2.461

sec

1.187

sec

800KB

5.439

sec

4.238

sec

1MB

6.799

sec

5.010

sec

2MB

13.649 sec

11.483

sec

5MB

32.37

sec

27.3890 sec

Table 2: AES algorithm encryption and decryption execution time on CPU and GPU

4.3. Paper by Sanjanaashreep “accelerating encryption/decryption using gpu’s for aes algorithm”

In this paper, researchers study about optimized architecture of parallel computing platform; GPU that used to implement AES in parallel rather than using CPU. Accelerate AES algorithm in parallel on NVIDIA GPU by using CUDA platform, which is use in parallel programing model created by NVIDIA.

The result shown in this paper implementing AES in parallel on GPU enhance the performance of Decryption and Encryption process compared to CPU result. It reduce the cost of parallelizing AES on CPU or multi-core system because of GPU have high performance and large amount of throughput for large input file size processing of AES algorithm and also its cost is cheaper compared to using CPU to process large amount of data with small throughput. The following result depicted in table taken from the paper that show AES implementing with CUDA platform in parallel have better performance over CPU

S.

Input

CPU

CPU

GPU

CPU

N

File

Encryptio

Decryption

Encryptio

Decrypti

o

Size

n time

time (ms)

n time

on time

(KB)

(ms)

(ms)

(ms)

5

1

23

110.00

 

3272.00

30.6846

34.628

2

58

420.00

 

20500

30.786

34.551

3

115

770.00

 

440128

32.134

36.0249

4

457

4010.00

 

6347.00

39.4578

44.113

5

914

8870.00

 

10347.00

78.4082

88.131

 

0

 

Table

3:

AES

algorithm

parallel

implementation

result on NVIDIA GPU VS CPU execution time

4.4. Paper by SanketWagh, PawanPhad, AmitSurwade Parallel Implementation of AES Algorithm on GPU

In this paper, researchers study the efficiency of any vendor of GPU to implement cryptographic algorithm (AES) in parallel by using CUDA platform. This research solves the vender specific issue in the previous paper that implement only on NVIDIA GPU. The drawback of implementing AES in parallel on GPU; GPU operation unable to produce result if the data that processed on GPU dependent on each other. The following tables shows AES algorithm implementation on any vender GPU.

File

Encryption

Encryption

Decryption

Decryption

size

on CPU

on GPU

on CPU

on GPU

(ms)

(ms)

(ms)

(ms)

10MB

6705

983.353

13743

987.341

20MB

12868

1916

26076

1920

100MB

68934

10257

144102

1020

200Mb

123632

177334

231973

17765

300MB

189748

26000

364066

27103

Table 4: AES algorithm in parallel implementation on GPU result table

4.5. Paper by M.Tech Computer science and Engineering, RCOEM Nagpur, Maharashtra titled “An Optimized Parallel Computation of Advanced Encryption Algorithm using Open MP”

In this paper, researchers design flexible parallel algorithm using OpenMP on multi-core system for AES algorithm depending on input file size. The input file size is smaller the performance of the AES algorithm in parallel is not visible; in order to solve this issue they implement switching algorithm for AES based on input file or data size with OpenMP. When they implement this algorithm, calculate the threshold value by using trial and error method. If the calculated the threshold value less than input file size activate the sequential execution

algorithm for AES and if the value is greater than the threshold value activate parallel execution algorithm for AES. Loophole in this paper is calculating threshold value in order to decide parallel execution of AES or sequential execution of AES activate.

5.

CONCLUSION

In this paper, parallelization of AES algorithm with the help of parallel programming software API tools on multi-core system and GPUs research papers are revised and Compare their performance according to the method they use and the output result they get. All papers seen in section four shows that, if the amount of input file size large the encryption and decryption time required is greatly reduced when AES algorithms runs in parallel and also the last paper revised at section four have additional future that AES algorithm runs in two modes depend on the input file size. We will recommend that future work will include implementing AES algorithm to calculate the threshold value automatically and implement AES algorithm on GPU with OpenCL.

ACKNOWLEDGMENT

We would like to express our deepest gratitude to our instructor Mr. Fitsum Assamnew for making us to explore the subject matter purposeful in parallel computing, so that we learnt more from this.

REFERENCES

[1] “PERFORMANCE ANALYSIS OF PARALLEL IMPLEMENTATION OF ADVANCED ENCRYPTION STANDARD (AES) OVER SERIAL IMPLEMENTATION”Uzzal Kumar Prodhan1, A.H.M. Shahariar Parvez2, Md. Ibrahim Hussain2,YeasirFathah Rumi2,Md. Ali Hossain.International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012

[2] Parallelization of AES Algorithm Using OpenMP”, S. S.

Navalgund, Akshay Desai, Krishna Ankalgi, and Harish Yamanur. Lecture Notes on Information Theory Vol. 1, No. 4, December

2013

[3] “Accelerating Encryption/Decryption using gpu’s for aes algorithm” Sanjanaashree P .International Journal of Scientific & Engineering Research, Volume 4, Issue 2, February-2013 ISSN

2229-5518

[4] parallel Implementation of AES Algorithm on GPU ”SanketWagh, PawanPhad, AmitSurwade. AMonthly Journal of Computer Science and Information Technology,IJCSMC, Vol. 4, Issue. 3, March 2015, pg.247 252 [5] “An Optimized Parallel Computation of Advanced Encryption Algorithm using Open MP -A Review ”M.Tech Computer science and Engineering, RCOEM Nagpur, Maharashtra.International Journal of Advanced Research in Computer and Communication Engineering Vol. 5, Issue 2, February 2016 [6] http://www.ncsa.illinois.edu/People/kindr/projects/hpca/files/singa

pore_p1.pdf