Sei sulla pagina 1di 5

Sumanth Kumar Reddy S et al.

/ (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES


Vol No. 6, Issue No. 1, 022 - 026

VLSI Implementation of AES Crypto Processor for


High Throughput
Sumanth Kumar Reddy S R.Sakthivel P Praneeth
SENSE SENSE SENSE
VIT University VIT University VIT University
Vellore, India Vellore, India Vellore, India
sumanthsannala@gmail.com rsakthivel@vit.ac.in praneeth.mvd@gmail.com

Abstract—Advanced Encryption Standard (AES), has received The rest of the paper is organized as follows. Section II
significant interest over the past decade due to its performance describes basic AES algorithm. Section III describes novel on-
and security level. Many hardware implementations have been the-fly key expansion module. Section IV describes pipeline

T
proposed. In most of the previous works subbytes and inverse design. Section V describes comparison work. Finally we
subbytes are implemented using lookup table method. In this concluded the paper in section VI.
paper we used combinational logic which helps for making inner
round pipelining in an efficient manner. Furthermore, composite
field arithmetic helped in obtaining lesser area. Using proposed II. AES ALGORITHM
architecture, a fully sub pipelined encryptor/decryptor with 3 The AES algorithm is a symmetric block cipher that
substage pipelining in each round can achieve a throughput of processes data blocks of 128 bits using a cipher key of length
ES
25.89Gbps on Xilinx xc5vlx110t-1 device which is faster and is
48.78% more effective than the fastest previous FPGA
implementations known to date. Also our ASIC implementation
achieved 58.18Gbps which is faster compared to the previous
ASIC implementations.
This AES design was implemented using Verilog HDL and
synthesized with RTL Compiler using TSMC’s 90 nm standard
cell library, physical design implementation was done using SOC
128, 192, or 256-bits. In addition, the AES algorithm is an
iterative algorithm. Each iteration can be called a round, and
the total number of rounds, Nr, is 10, 12, or 14, when the key
length is 128, 192, or 256 bits, respectively. Table 1 shows the
number of rounds as a function of key length.

TABLE I. Different AES specifications

Encounter and achieved the maximum through put of 58.18 Key length Block size Number of
Nk words NB works rounds(Nr)
Gbps.
AES-128 4 4 10
AES-192 6 4 12
Keywords—AES, Pipelined AES, sub pipelined design, ASIC,
AES-256 8 4 14
FPGA, VLSI.
A
The 128-bit data block is divided into 16 bytes. These bytes
I. INTRODUCTION are mapped to a 4x4 array called the State and the state
The large and growing number of internet and wireless undergoes all the internal operations of AES algorithm. Every
communication users has led to an increasing demand of byte in the State is denoted by Si,j(0 ≤ i, j < 4), and is
security measures and devices for protecting the user data considered as an element of GF(28) . Although different
transmitted over the open channels. Two types of irreducible polynomials can be used to construct GF(28), the
irreducible polynomial used in the AES algorithm is p(x) = x8 +
IJ

cryptographic systems are mainly used for security purpose,


one is symmetric-key crypto system and other is asymmetric- x4 + x3 + x + 1. Block diagram of the AES encryption and the
key crypto system. Symmetric-key cryptography (DES, 3DES equivalent decryption structures are shown in Fig 1.
and AES) uses same key for both encryption and decryption.
After an initial round key addition, a round function
The asymmetric-key cryptography (RSA and Elliptic curve
consisting of four different transformations sub-bytes, shift-
cryptography) uses different keys for encryption and
rows, mix-columns, and add-round-key are applied to the data
decryption. The major disadvantage of DES is its key length is
block in the encryption procedure and in reverse order with
small. In November 2001, the National Institute of Standards
inverse transformations in Decryption procedure. But last
and Technology (NIST) of the United States chose the
round in encryption contains only sub bytes, shift rows and add
Rijndael algorithm as the suitable Advanced Encryption
round key. Last round in decryption contains only inverse sub
Standard (AES) to replace previous algorithms like DES
bytes, inverse shift rows and add round key. Four
algorithm.
transformations in a round function are examined and
The AES encryption is considered to be efficient both for optimally designed to achieve efficient implementation.
hardware and software implementations. Compared to
software, hardware implementation is more reliable. Some
works have been presented on hardware implementations of the
AES algorithm using ASIC [6], [7], [8] and FPGA [9], [10].

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 22


Sumanth Kumar Reddy S et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 6, Issue No. 1, 022 - 026

A. SubByte/Inv SubByte transformations


Subbyte transformation is a non linear byte substitution.
This can be done by using two methods. One is by using
lookup tables (LUT); other is by using a combinational logic.

Round Plain text Round CipherText


Key (128bit) Key (128bit)
K0 KNr
AddRoundKey AddRoundKey
(0) (Nr)

Sub Bytes () Inv Sub Bytes ()


Figure3. Shift rows transformation

Shift Rows () Inv Shift Rows () C. MixColumn/InvMixColumn transformation

Mix Column () Add Round Key The MixColumns() transformation operates on the State
(i)
column-by-column, treating each column as a four-term
Add Round Key Inv Mix Column polynomial. The columns are considered as polynomials over
GF(28) and multiplied modulo x4 + 1 with a fixed polynomial

T
(i) ()
Ki Ki a(x), given by a(x) = {03}x3 + {01}x2 + {01}x + {02} .
Sub Bytes () Inv Sub Bytes ()
The function xtime is used to represent the multiplication
Shift Rows () Inv Shift Rows ()
with ‗02‘, modulo the irreducible polynomial m(x)= x8 + x4 + x3
+ x + 1. Implementation of function xtime() includes shifting
and conditional xor with ‗1B‘. Fig. 4 shows the mixed column
KNr
Add RoundKey
(Nr)

CipherTxt(128bit)

1(a). Encryption
K0 ES
Add Round Key

Figure 1. AES encryption and decryption algorithm


(0)

PlainTxt(128bit)

1(b). Decryption

In LUT based approach, the unbreakable delay of lookup


module. In matrix form, the MixColumns transformation can
be expressed as

S‘0,c
S‘1,c =
02
01
03
02
01 01
03 01
S0,c
S1,c
S‘2,c 01 01 02 03 S2,c
tables is greater than the other logic. By using LUT method it is
difficult to use sub pipeline structure with two pipeline stages, S‘3,c 03 01 01 02 S3,c
which prevents the further speedup. An alternative method is to
use combinational logic, which is faster than the LUT and can 0 ≤ c < 4.
also be divided into two pipeline stages, allowing further
A
speedup. In non LUT method sub bytes can be implemented by
finding multiplicative inverse followed by affine transform.
Similarly inverse sub bytes implemented by using inverse
affine transform followed by multiplicative inverse. Here
multiplicative inverse is common; by taking this advantage we
can implement a single structure for both subbytes and inverse
subbytes which is shown in Fig. 2.
IJ

Figure 4. Mix column module

The InvMixColumns multiplies the polynomial formed by


each column of the State with a-1(x) modulo x4+1, where
Figure 2. subbyte/inverse subbyte implementation
a-1(x) = {0b}x3 + {0d}x2 + {09}x + {0e}.
B. ShiftRows/InvShift Rows In matrix form, the InvMixColumns transformation can be
expressed by
ShiftRows is a simple shifting transformation. First row of S‘0, 0e 0b 0d 09 S0,c
the state is kept as it is, while the second, third and fourth rows
cyclically shifted by one byte, two bytes and three bytes to the S‘1,c = 09 0e 0b 0d S1,c
left, respectively. In the InvShiftRows, the first row of the S‘2,c 0d 09 0e 0b S2,c
State does not change, while the rest of the rows are cyclically
shifted to the right by the same offset as that in the ShiftRows. S‘3,c 0b 0d 09 0e S3,c 0 ≤ c < 4.

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 23


Sumanth Kumar Reddy S et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 6, Issue No. 1, 022 - 026

D. Add Roundkey if (I mod Nk = 0)


Add RoundKey involves only bit-wise XOR operation. temp = SubWord(RotWord(wi-1)) XOR Rcon(i/Nk)
After every round output of the mixcolumn is added with round
key. else if
wi = wi-Nk XOR temp
By inverting the encryption structure one can easily derive end.
the decryption structure. However, the sequence of the
transformations will be different from that in encryption. This
feature prohibits resource sharing between encryptors and
decryptors. Equivalent decryption structure is shown in Fig. IV. PIPELINING AND SUBPIPELINING
1(b). To speed up the AES algorithm we can use three
architectural optimization techniques. These architectures are
III. KEY EXPANSION based on pipelining, sub pipelining and loop unrolling. The
AES encryption for pipeline design is shown in Fig. 6. Here we
In the AES algorithm, the key expansion module is used for include pipeline registers in between every round so as to
generating round keys for every round. There are two increase the throughput.
approaches to provide round keys. One is to pre-compute and

T
store all the round keys, and the other one is to produce them
on-the-fly. First approach consumes more area. In second
approach, the initial key is divided into Nk words (key0,
key1,…, keyNk-1) which are used as initial words. With the help
of these initial words rest the words are generated iteratively. It
can be computed that is 4, 6, or 8, when the key length is 128,

is formed by concatenating four words:


ES
192 or 256-bit, respectively. Each round key has 128 bits, and

Roundkey(i) = {w4i,w4i+1,w4i+2,w4i+3}.

W3

Figure 6. AES encryption with pipelining


A
Similar to the pipelining, sub pipelining can be
implemented by inserting registers in combinational logic, but
registers are inserted both between and inside each round. By
using pipelining and sub pipelining we can process multiple
W11
blocks of data simultaneously. Among these architectural
x x optimizations sub pipelining gives maximum speed and better
throughput/area. Fig. 7 shows the sub pipelined architecture
IJ

with r sub stages. Each round unit is divided into r sub stages
Y Y with equal delays.
X Sbox(Rot (Y)) Rcon[i] X Y In LUT method sub pipelining is limited to only two sub
stages whereas combinational logic can be divided into more
Figure 5. Data path for key generator sub stages with equal delays. In this pipelining or sub
pipelining architectures, the plain text is received at each clock
cycle through input register. A single round of algorithm is
The key expansion procedure can be described by the completed depending on the number of sub stages. Round keys
pseudo code listed below are generated by using key expansion module. Generated round
for i = 0 to Nk-1 keys are supplied to each round. At each clock cycle data is
shifted to next stage and final output is appeared only after the
wi = keyi end of ((10*r)+10)th clock cycle. Here ‗r‘ represents number
end of sub pipeline stages. Advantage of this structure is second
output can be obtained immediately in the next clock cycle
for i = Nk to 4(Nr + 1)-1 after the first output. Internal design of the each round contains
temp = wi-1

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 24


Sumanth Kumar Reddy S et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 6, Issue No. 1, 022 - 026

Sub bytes, Shift rows, Mix columns, and add round key which
are explained in previous sections.

V. RESULTS COMPARISION
The AES architecture was implemented using Verilog HDL,
and simulated using Cadence ncsim. Here we implemented two
types of designs. AES(LUT) is pipelined implementation using
lookup table method with an initial latency of 10 clock cycles
and AES(SP) is a fully sub-pipelined implementation with non
LUT method, which is having 3 sub-stages in each round with
an initial latency of 40 clock cycles. Compared to LUT, non
LUT implementation results in lesser area. FPGA
implementation of this design has been done using Xilinx
XC5VLX110T-1, and the corresponding results are tabulated
in TABLE 2. The fully sub-pipelined architecture of 128 bit-
length having 10 round units has been synthesized in RTL
Compiler using TSMC‘s 90 nm standard cells and the

T
corresponding results are tabulated in TABLE 3. This fully
sub-pipelined design achieves a throughput of 58.18 Gbps
which is faster compared to the previous ASIC
implementations. The backend of the design has been done in
Figure 7. Sub pipelining architecture SOC encounter and final chip layout is shown in fig. 8

Design

Elbirt el al*
Device

Xcv1000-4
ES TABLE 2
FPGA comparision results
Fmax
(Mhz)
31.8
Throughput
(Gbps)
1.938
Slices

10992
BRAMS

0
Mbps/slice

0.176
Mcloone el al* Xcv812e-8 93.9 12.02 2000 244 0.362
Jarvinen * Xcv1000e-8 129.2 16.5 11719 0 1.4
Saggese * Xcv2000e-8 158 20.3 5810 100 1.09
A
Standert * Xcv3200e-8 145 18.5 15112 0 1.28
Parhi (r = 3)* Xcv812e-8 93.5 11.965 9406 0 1.272
Parhi (r = 7)* Xcv1000e-8 168.4 21.556 11022 0 1.956
Ours (LUT-pipelined) Xc5vlx110t-1 103.4 13.238 4611 60 1.077
Ours (Sub pipelining ) Xc5vlx110t-1 202.26 25.89 8896 0 2.91
IJ

*results are estimated from [12]

TABLE 3
Synthesis results (ASIC)
VI. CONCLUSION
Design AES(LUT) AES(SP) AES(SP)
In this paper, we presented a hardware implementation of
Technology 90nm 90nm 180nm
efficient pipeline AES architecture which includes both
Area (um2) 740870 564036 2258469 encryption and decryption. Also sub pipelining architecture
Power (mw) 136.995 147.78 655.5 helped us to get higher throughput than earlier
implementations. The design is modeled using Verilog HDL
Critical path 3.9ns 2.2ns 4.2ns
and simulated with the help of Cadence NCsim. Synthesis is
Fmax (Mhz) 256.4 454.5 238 done by using RTL Compiler v9.10 and physically designed
Throughput 32.82 58.18 30.47 with SOC Encounter, with the proposed sub-pipelining
(Gbps) architecture, throughput has increased and reached to 58.18
Gbps.

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 25


Sumanth Kumar Reddy S et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES
Vol No. 6, Issue No. 1, 022 - 026

[12] X.zhang and k.parhi ― high-speed VLSI architectures for the AES
algorithm‖ IEEE transactions on VLSI systems, vol.12 sep 2004.
[13] N. Sklavos and O. Koufopavlou, ― Architectures and VLSI
Implementations of the AES-Proposal Rijndael,‖ IEEE Trans. on
Computers, vol. 51, Issue 12, pp. 1454-1459, 2002.
[14] R. Karri, K. Wu, P. Mishra, and Y. Kim, ― Concurrent Error Detection
Schemes for Fault-Based Side-Channel Cryptanalysis of Symmetric
Block Ciphers,‖ IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems, vol. 21, No. 12, Dec. 2002.
[15] C.-H. Yen, T.-Y. Pai, and B.-F. Wu, ― The implementations of the re-
configurable Rijndael algorithm with throughput of 4.9 Gbps,‖ in
Proc. 16th VLSI Des./CAD Symp., Hualien, Taiwan, Aug. 2005.
[16] M. Alam, W. Badawy, and G. Jullien, ― A novel pipelined threads
architecture for AES encryption algorithm,‖ in Proc. IEEE Int. Conf.
Appl.-Specific Syst., Architectures, Process., San Jose, CA, Jul. 2002,
pp. 296–302.

T
Figure 8. Final chip layout

[1]

[2] ―
J.Daemen and V.Rijmen, ―
algorithm
REFERENCES

submission,‖ September
http://www.nist.gov/CryptoToolkit.
Draft FIPS for the AES,‖
3,
ES
AES Proposal: Rijndael, AES

http://csrc.nist.gov/encryption.aes , February 2001.


1999,

available
available:

from:

[3] E. J. Swankoski, R. R. Brooks, V. Narayanan, M. Kandemir, and M.


J. Irwin, ― A parallel architecture for secure FPGA symmetric
encryption,‖ in Proc. 18th Int. Parallel Distrib. Process. Symp., Santa
Fe, NM, Apr. 2004, p. 132.
[4] A. Hodjat and I. Verbauwhede, ― Minimumarea cost for a 30 to 70
A
Gb/s AES processor,‖ in Proc. IEEE Comput. Soc. Annu. Symp.,
Lafayette, LA, Feb. 2004, pp. 83–88.
[5] C.-P. Su, T.-F. Lin, C.-T. Huang, and C.-W. Wu, ― A high-throughput
low-cost AES processor,‖ IEEE Commun. Mag., vol. 41, no. 12, pp.
86–91, Dec. 2003.
[6] I. Verbauwhede, P. Schaumont and H. Kuo, ― Design and Performance
Testing of a 2.29-GB/s Rijndael Processor,‖ IEEE Journal of Solid
State Circuits, Vol. 38, No. 3, March 2003, pp. 569-572.
IJ

[7] T. Ichikawa, T. Kasuya, and M. Matsui, ― Hardware Evaluation of the


AES Finalists,‖ in Proc. 3 rd AES Candidate Conference, pp. 279-285,
New York, April 2000.
[8] L. Deng, H. Chen, A new VLSI implementation of the AES
algorithm, in: Communications, Circuits and Systems and West Sino
Expositions, IEEE 2002 International Conference on, June 2002, pp.
1500-1504.
[9] N. Sklavos, O. Koufopavlou, Architectures and VLSI
implementations of the AES-proposal Rijndael, IEEE Transactions on
Computers, 51(12) 2(002) 1454–1459.
[10] J. H. Shim, D. W. Kim, Y. K. Kang, T. W. Kwon, and J. R. Choi, ― A
rijndael cryptoprocessor using shared on-the-fly key scheduler,‖ in
Proc. 3rd IEEE Asia-Pacific Conf. ASIC, Taipei, Taiwan, Aug. 2002,
pp. 89–92.
[11] P. Chodowiec, P. Khuon and K. Gaj, ― Fast Implementations of
Secret-Key Block Ciphers Using Mixed Inner- and Outer-Round
Pipelining,‖ Proc. ACM/SIGDA Int. Symposium on Field
Programmable Gate Arrays, FPGA'01, Monterey, CA, Feb.2001.

ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 26

Potrebbero piacerti anche