Sei sulla pagina 1di 6

An Efficient Hardware Design for Combined AES

and AEGIS

Abstract—This paper presents an integrated design of AES, the on different FPGA platforms, and the comparison with the
block cipher standard and AEGIS, an AES based authenticated existing related works. Finally, Section V concludes our work.
encryption. Our design tries to exploit the common functionalities
of AES and AEGIS to achieve both confidentiality as well as II. AES AND AEGIS-128 OVERVIEW
confidentiality and authenticity together. The proposed design
provides a cost-effective implementation on various FPGA plat- AES [1] is symmetric block cipher that operates on 128,
forms, and it achieves both the goals by using a minimum amount 192 and 256 key size on 128 bit fixed size message blocks.
of extra resources compared to the stand-alone AES and AEGIS AES implements different number of rounds for different key
design. The performance of our design implementation has been size as for example 10 rounds for 128 bit key, 12 for 192 and
compared with the similar design work, and it has been shown
that the throughput and frequency of our design outperform the 14 for 256. Each intermediate result of the rounds is called
best result available in the literature. a ‘state’. Moreover, the round key for each state is generated
Index Terms—Encryption, AES, Authenticated Encryption, from the encryption key. Each round of AES comprises of
AEGIS, Integrated Architecture, FPGA four basic operations SubBytes, ShiftRow, MixColumn and
addroundkey, except for the last round where the MixColumn
I. I NTRODUCTION operation is not performed [1].
In recent times the frequency of digital transaction has AEGIS [3] is an authenticated encryption technique that
increased manifolds. This has led to the security of the transac- uses the AES round function. The intermediate cipher is
tion to become a critical aspect due to the increased probability known as state in AEGIS. AEGIS algorithm consists of ini-
of attacks. Many of the cases in secure digital transaction need tialization phase, associated data processing phase, encryption
only data confidentiality, while some require both confiden- and MAC generation phase. All the phases use one common
tiality as well as authenticity. AES [1] is the standard cipher function known as StateUpdate. StateUpdate function is used
which is used for encryption. Recently, CAESAR competition to perform five, six and eight AES round functions on the state
has been announced to provide a standard Authenticated in AEGIS128, AEGIS-256 and AEGIS-128L respectively. The
Encryption (AE) scheme. Most of the CAESAR candidates only difference between AES and AEGIS round function is
are either AES based or permutation based AE. We observe AES has addroundkey operation additionally. Apart from that
that most entries for this competition are AES based such AES uses 10 rounds for the encryption technique whereas
as ACORN [2], AEGIS [3], MOURS [4] and so on. AEGIS AEGIS performs 5 round operations for each state update
appears as one of the CAESAR finalist in this competition. In function. In order to achieve both AES and AEGIS in single
this paper, we integrate AES and AEGIS in a cost-effective hardware platform, we have to design a control circuit in such
manner through the reuse of common functionalities between a way that we can reuse the common hardware block between
them. Currently, no attack has been shown against AEGIS, AES and AEGIS. The common hardware block mainly con-
hence in our design we choose AEGIS-128 and merge it with sists of SubByte, ShiftRow and MixColumn operation inside
AES architecture to achieve both encryption and authenticated round function. AEGIS operates based on four counters viz.
encryption. A very few researches exist on the implementation aegis counter, associated data counter, ciphertext counter and
of achieving both the goals. Paper [5] is one of the combined mac counter. Among these last three counters solely depend
design where the authors informed the performance of their on the first counter. Three bit aegis counter counts from four
design in terms of throughput and frequency. to zero. Associated data counter and ciphertext counter are
In our work, we proposed an integrated architecture of AES initialized with external 64 bit data and mac counter initialized
and AEGIS by exploiting the common functionalities between with fixed value seven. The initial value of ciphertext counter
them, and also implemented on various FPGA platforms. We indicates the number of blocks used for encryption. All these
observed that our design achieves almost double speed up in last three counters are decremented by one, once aegis counter
compare to [5]. reaches from four to zero in the corresponding states. Key and
Following the introduction in section I, the rest of the paper IV are the private key and initialization vector respectively.
is organized as follows. Section II gives a brief overview
of AES and AEGIS. Section III describes the integrated III. I NTEGRATED A RCHITECTURE OF AES AND AEGIS
architecture of AES and AEGIS, where the focus is on the This section describes the overall architecture of our pro-
design circuitry to reuse the common functionalities of AES posed design. In this design, we explain how AES and
and AEGIS. Section IV illustrates the design performances AEGIS are combined together by reusing some common
Figure 1: AES and AEGIS combined architecture

functionalities between them. Figure 1 represents the complete of Key schedule module for AES and another for AEGIS.
architecture in details. In this diagram, we present five main MUX M1 is used for AES, whereas MUX M5 is used for
modules viz Controller, Assigninput, State update, Key sched- AEGIS. The overview of design are depicted in Figure 2.
ule and Round function, and four counters like aegis counter,
associated data counter, ciphertext counter and mac counter.
All the aforesaid counters are used in AEGIS only. In addition,
Round function module consists of two shiftrow block, one
intermediate register, one subbyte block, one mixcolumn block
and four MUXes are shown as dotted box. Moreover, Assign-
input module generates ciphertext and MAC. For AES, only
ciphertext is generated but AEGIS produces both ciphertext
and MAC.
Among these previous five modules three modules are
shared by both AES and AEGIS, whereas one of the remaining
modules is used for AES only and another is used only for
AEGIS. These are as follows
• Controller block used for both AES and AEGIS Figure 2: Overall design of AES and AEGIS
• Assigninput block used for both AES and AEGIS
• Round function block used for both AES and AEGIS Now, all these five modules of Figure 2 are described in
• Key schedule block used only for AES details sequentially.
• State update block used for only AEGIS
The inclusion of all these modules in a single implemen- A. Controller Block
tation needs one control input to perform AES and AEGIS This section, describes the operational flow between AES
separately. User provides the control input externally to select and AEGIS. It takes the value of aegis counter, associated
either AES or AEGIS at any time. Figure 2 shows how all data counter, ciphertext counter, mac counter, clock and user
these five modules are activated based on user input. If user input (AES or AEGIS) as shown in Figure 3. The Controller
selects AES then all the modules inside the bold dotted box acts as finite state machine that changes the state based
are performed, whereas the modules inside thin dotted box are on the present state and aforesaid inputs (counters) which
realized for AEGIS operations. generates different kind of control signals like initialization of
We assume triplet hKey, IV, M i where key, IV and M associated data process, ciphertext process and mac process.
represents the private key, initialization vector and plain- In this design, FSM consists of 24 states. In case of AES,
text/ciphertext message respectively. The triplet hKey, IV, M i state transition occurs from 1 to 11, whereas it moves from
is initialized to Assigninput block. MUX M6 selects the output state12 to state24 for AEGIS. The preprocessing phase is
occurred between state12 to state21. Moreover, Associated For AES, plaintext/ciphertext and key are forwarded to the
data processing phase, ciphertext processing phase and mac Round function block and Key schedule block respectively.
generation phase occurs at state22, state23 and state24 re-
spectively. For AES, each state transition occurs in every
clock pulse. On the contrary, AEGIS state transition happens
after the aegis counter(3 bits) reaches from four to zero. In
addition, associated data process, ciphertext process and mac
generation process occur only when the aegis counter and
corresponding phase counters reach zero concurrently. Once
the user provides control input (AES or AEGIS), Controller
block activates Assigninput block to start the operation for
AES or AEGIS respectively.

Figure 4: Assign input block architecture

Figure 3: Controller block diagram C. Round Function Block


Both AES and AEGIS reuse this module to achieve their
B. Assigninput Block goal. Here, round function block mainly takes three inputs
viz output of MUX M1, output of MUX M5 and output of
This block starts the initiation as well as generates the final MUX M6 and produces one output from MUX M4 as shown in
output for both AES and AEGIS. Figure 4 depicts the overall Figure 1. The output of M1 refers to the plaintext or ciphertext
architectural diagram of Assigninput block. Here, we describe used only for AES encryption or decryption and the output
AEGIS [3] operation followed by AES. of M6 used for both AES and AEGIS where this operation
For AEGIS operation, this block first generates 80-byte data is known as addroundkey in AES. The output of M5 refers
hS−10,0 , S−10,1 , S−10,2 , S−10,3 , S−10,4 i from key and IV and to the data of R̂ or the addroundkey result for AEGIS or
then these data are forwarded only once to the five128-bit state AES respectively2 . The final output of this module (from M4)
registers R = hRS0 , RS1 , RS2 , RS3 , RS4 i in the state update contains 16-byte data for AES (after 11 rounds) and 80-byte
block. Recall that preprocessing phase contains 10 states (state data for AEGIS(after five rounds). 16-byte data of AES is then
12 to state 21). In this phase, key or key ⊕ IV is treated as forwarded to the Assigninput block as ciphertext. For AEGIS,
message m (128-bit) in even (state 12, state 14, state 16, state 16-byte data per round (total 80-byte) is forwarded towards
18 and state 20) or odd (alternate state) state respectively1 . the state update module.
Moreover, in case of associated data processing phase and The detail description of each submodules inside this block
plaintext phase corresponding associated data and plaintext are are described below.
served as m. Both associated data and plaintext data are stored 1) SubByte Operation: Subyte operation takes 128-bit or 16
in a register inside Assigninput block. For mac processing bytes input data. Each byte is updated using 8-bit substitution
phase, corresponding m is calculated from the length of box [1]. SubByte operation is a non-linear byte substitution
associated data, length of plaintext data and previously stored method. The easiest way of S-box implementation is ROM
value of RS3 register [3]. In each state update operation, m as based in which all 256 precomputed values are stored for
well as 80-byte data (R) are forwarded towards the state update future use. However, the shortcomings of this approach are
byte selection module which is internal module of Assigninput usage of more hardware resources and consumption of con-
block. So, state update byte selection module contains 96-byte sistent latency due to fixed ROM access time. So, we have
data (contents of R and m). Next state R̂=80-byte data is used on-the-fly mode to implement S-box in our design. In
generated from the contents of R and m as described in [3]. this mode, SubByte operation performs two types of transfor-
1 In hardware, odd or even is determined by checking the LSB (1 or 0) mation viz. Affine transformation and Multiplicative inverse
of the result of the logical AND operation between preprocessing signal and
state value 2 M5 selects addroundkey result for all rounds of AES except the first round
transformation. In our design, both the transformations are for addroundkey operation in AES mode through MUX M6
housed in a single S-box block as shown in Figure 5. Both as shown in Figure 1.
inverse and normal S-box operations are segregated by the
inverse-or-not signal (Figure 5). Multiplicative inverse and E. State Update Function Block
affine transformation architectures are described in paper [6]. In this section, the intermediate state update functionalities
Figure 5 shows the S-box datapath and control path diagram. of AEGIS [3] are discussed. It is the main module for AEGIS.
This approach uses multiplicative inverse operation followed Figure 6 describes the State update diagram. 80-byte data from
by affine transformation for normal S-box operation. In case Round function module is assigned to the five 128-bit registers
of the inverse S-box, it follows the alternate path as shown in R in state update module. This module works in the following
Figure 5. This design performs SubByte operation by setting way repeatedly until the mac generation phase completes.
the inverse-or-not signal to zero, otherwise performs inverse • The contents of R are forwarded to the Assigninput
SubByte operation. block and then divided into five 16-byte data D =
hD1 , D2 , D3 , D4 , D5 i once aegis counter reaches to zero.
• Each Di for i ∈ [1, 5] is passed one by one to the Round
function block through M5 and produces the output O =
hO1 , O2 , O3 , O4 , O5 i.
• For i = 1, 2, 3, 4, the result of Oi ⊕Di+1 is stored in N Si
where Di+1 is selected by M6 in Figure 6. In addition,
when i = 5 as well as aegis counter reaches 4, then
N S1 , N S2 , N S3 , N S4 together with O5 ⊕ D1 ⊕ m are
stored in R.
Figure 5: SBox architecture [6]
During ciphertext processing phase, The ciphertext (C) is
2) ShiftRow: Shift row is linear diffusion [7] process and generated using the following expression.
operates on the individual row. In this design, shift row block C = M ⊕ RS1 ⊕ RS4 ⊕ (RS2 &RS3 )
takes 128-bit data as an input and produces 128-bit as output, In order to get back the orginal message during decryption,
both are arranged in a 4x4 matrix format. The first row of need to replace M by C. MAC is calculated in mac generation
matrix data is left unchanged. Each byte of the second row is phase using the following expression.
shifted one to the left. Similarly, the third and fourth rows are M AC = RS0 ⊕ RS1 ⊕ RS2 ⊕ RS3 ⊕ RS4
shifted by offsets of two and three respectively.
3) MixColumn: It is a matrix multiplication over GF(28 ).
Each column vector is multiplied with a fixed matrix where
the bytes are treated as degree 7 polynomial in GF(28 ) rather
than numbers. For example, the bit string 10011001 repre-
sents x7 +x4 +x3 +1. In our design, we have taken architectural
diagram of [8] for mixcolumn operation.

D. Key Schedule Block


This block illustrates about the roundkey generation for
AES. The Key Schedule block has the self-inverting prop-
erty. The complexity of key expansion algorithm and other
stages determine the security of AES. AES encryption process
involves eleven 128-bit roundkey that needs to be generated
from the original key(key). In decryption, the same roundkey
is used but in reverse order. The general implementation
to produce intermediate roundkey is first computed by key
expansion algorithm, then all these roundkeys are stored in a
RAM for future use. But due to cache attack, this approach
of implementation is quite vulnerable. In order to avoid this,
key expansion algorithm has been modified from RAM based Figure 6: State update function architecture
to On-the-fly [9] based implementation. In this mode, the key
schedule unit stores only ith round key, namely as Roundkeyi .
Now, Roundkeyi+1 and Roundkeyi−1 are calculated from F. Data Path of AES and AEGIS
Roundkeyi in encryption and decryption mode respectively. This section presents how AES and AEGIS are performed
Paper [9] completely describes the implementation for this individually in the proposed integrated architecture. Here,
technique. For every round, this block generates the roundkey AEGIS data path is described followed by AES data path.
1) AES data path: Figure 7 describes separately only the
AES part of Figure 2. Its elaborate how AES performs in the
proposed integrated design. The input of M1 and output of
M5 (Plaintext/Ciphertext) are the part of Assigninput module
whereas Key Schedule module and Round function module
are shown in dotted boxes in Figure 7. The following steps
are executed based on Figure 7, if the user selects AES mode
of operation.
1) Roundkey of 1-st round is forwarded towards the Round
function block and simultaneously plaintext or cipher-
text based on encryption or decryption mode is also
forwarded to the round function block through MUX
M1. Then, both these data are XORed and the resultant
value is passed towards the shiftrow1 block.
2) M3 has two inputs; one from shiftrow1 and another from
shiftrow2 block. M2 selects the data of shiftrow2 block Figure 7: AES BLOCK DIAGRAM
except the 1-st round of AES and this data is forwarded
to the intermediate register.
3) MUX M2 operates based on AES or AEGIS mode (XC3S1400AN-4FGG676C). For synthesis purpose, we have
selected by user. In case of AES, intermediate register used Altium Designer 10.0. For AES, fips-197 that describes
data is passed towards the subbyte block through MUX test vector and for AEGIS-128, a 128-bit key, 128-bit IV
M2 and then perform the mixcolumn operation based on and 384-bit associated data. Table I shows the test vector for
encryption or decryption mode. AEGIS-128 and results. The associated data is kept as 384 bits
4) M4 has two inputs; one from mixcolumn and another a57a7496a010270a452800f1618c839c1cab29e5cc460c6836
from subbyte blocks. M4 selects mixcolumn data except 9fcf5ca5a27f0223ea0b8c9826ad9817b54c2e6f09ce6e”. IV as
the last round of AES. “a23c1211032336b1a21ba21102112304” and Input key as
5) The output of M4 is XORed with the intermediate “2b7e151628aed2a6abf7158809cf4f3c”. Following tables give
roundkey which comes from M6 of Key Schedule block. the ciphertext message and mac corresponding to the plaintext
6) M5 has two inputs; one from XOR operation and another message.
from Assigninput block. M5 selects XOR operation
result for AES and Assigninput block for AEGIS. The Table I: AEGIS test vector
output of M5 is forwarded to the shiftrow2 block. Input Output
7) Step 2 to Step 6 are executed for all the rounds of AES. Input msg cipher text MAC
8) Finally, the output of M5 is treated as ciphertext for 616268686a636467 7ae41da54727aee3 7dcb1aac99516099
6667666764666467 8d243373d7eba965 1f05e1f840637998
encryption technique. 616268686a636467 7ae41da54727aee3 a625c39275dc7c5a
9) During Decryption all the steps are same as encryption 6667666764666466 8d243373d7eba964 4bd0027940394c37
but we use inverse mixcolumn and inverse subbyte
operation instead of normal mixcolumn and subbyte.
For decryption mode, all the steps are same as encryption Post-synthesis simulation is done using Xilinx based
except the intermediate key being transformed from inverse spartan-3an environment to estimate the design throughput and
mix column operation. Then add round key operation is its efficiency. The simulation result establishes that our design
performed because of AES linear property which is described can operate on 57.019MHz frequency and a throughput of
in the paper [10]. 1.359 Gb/s can be achieved. On higher version boards we can
2) AEGIS Data path: For AEGIS, same Round function achieve higher frequency and throughput. Table II describes
block as shown in Figure 7 are performed except a few the performance of our design in terms of frequency and
changes. Here, Round function block is initiated through MUX throughput on the other hardware configuration.
M5 where M5 selects the data from Assigninput block. The
output of M3 is directly forwarded to the subbyte block via M2
without storing in intermediate register. Recall that we already Table II: Performances on different hardware
explained how Assigninput block, Round function block and Sl. No FPGA Board Frequency(MHz) Throughput(bits/sec) for (AEGIS)
State update block together generate ciphertext and MAC for 1 Spartan -3 45.743 1.090 Gb/s (approx.)
2 Spartan -3AN 57.019 1.359 Gb/s (approx.)
AEGIS. 3 Spartan -6 70.826 1.688 Gb/s (approx.)
4 Virtex -4 105.814 2.522 Gb/s (approx.)
IV. I MPLEMENTATION : T EST AND R ESULT 5 Artix -7 142.24 3.391 Gb/s (approx.)
6 Virtex-7 158.52 3.779 Gb/s (approx.)
The design is implemented on Altium Nano board 7 Kintex-7 179.160 4.267 Gb/s (approx.)
of series NB3000XN-05 with Xilinx Spartan-3AN device
Table III depicts the resource requirement and performance [2] H. Wu, “ACORN: a lightweight authenticated cipher (v3),” Candidate
for the below two design scenarios. Our work is based on the for the CAESAR Competition. See also https://competitions. cr. yp.
to/round3/acornv3. pdf, 2016.
2-nd design approach. Both these approaches achieve the same [3] H. Wu and B. Preneel, “AEGIS: A fast authenticated encryption algo-
goal (AES + AEGIS). rithm,” in International Conference on Selected Areas in Cryptography.
Springer, 2013, pp. 185–201.
1) Stand-alone approach of AES and AEGIS [4] A. Mileva, V. Dimitrova, and V. Velichkov, “Analysis of the authenti-
2) Integrated architecture of AES and AEGIS by reusing cated cipher MORUS (v1),” in International Conference on Cryptog-
the common functionalities between them (our proposed raphy and Information Security in the Balkans. Springer, 2015, pp.
45–59.
design). [5] F. Şahin, H. F. Uğurdağ, and T. Yalçın, “Combined AES + AEGIS ar-
It shows that integrated design requires 35% less hardware chitectures for high performance and lightweight security applications,”
in ICT Innovations 2014. Springer, 2015, pp. 213–224.
resources (slice) compared to the stand-alone approach. [6] E. N. Mui, R. Custom, and D. Engineer, “Practical implementation of
It can also be noted that we implement the proposed design rijndael s-box using combinational logic,” Custom R&D Engineer Texco
by using 50% more resources (slice) than the standalone AES Enterprise Pvt. Ltd, 2007.
[7] W. Stallings, Cryptography and Network Security: Principles and Prac-
implementation. Moreover, the maximum power required tices. Pearson Education India, 2006.
for this design is 0.158 Watt on a spartan3a xc3s140an device. [8] H. Li and Z. Friggstad, “An efficient architecture for the AES mix
columns operation,” in International Symposium on Circuits and Systems
(ISCAS 2005), 23-26 May 2005, Kobe, Japan, 2005, pp. 4637–4640.
[9] S. Mangard, M. Aigner, and S. Dominikus, “A highly regular and
scalable AES hardware architecture,” IEEE Transactions on Computers,
Table III: Performance of integrated design vol. 52, no. 4, pp. 483–491, 2003.
[10] A. Paul, P. Mithili, and V. Paul, “Fast symmetric cryptography using key
Parameters AES AEGIS AES+AEGIS and data based masking operations,” in Proceedings of the International
Slice 1857 2367 2753 Conference on VLSI and Communication Engineering, 2009.
Flip flop 337 1681 1823
Frequency 91.23(MHz) 161.826(MHz) 57.019(MHz)
Throughput 1.08GB/s 3.996Gb/s 0.679Gb/s (AES)
and 1.359Gb/s (AEGIS)

Although the combined architecture that has been men-


tioned in paper [5] is based on ASIC implementation, still we
observe that our design achieves higher speed and throughput
in most of the hardware devices (Sl. No. 4 to 7) in Table II.
The comparison analysis of this performances is shown in
Table IV.
Table IV: Performance comparison
Performance parameters paper [5] Proposed Design
105.814 MHz (Virtex-4)
142.24 MHz (Artix-7)
Frequency 91 MHz
158.52 MHz (Virtex-7)
179.16 MHz (Kintex-7)
2.522 GB/s (Virtex-4)
1.034 3.391 GB/s (Artix-7)
Throughput
GB/s 3.779 GB/s (Virtex-7)
4.267 GB/s (Kintex-7)

V. C ONCLUSION AND FUTURE WORK


A new design architecture for combining AES and AEGIS
is proposed in this paper. The design is implemented in a
cost-effective manner by reusing the common functionalities
of AES and AEGIS. It also provides high throughput while
tested in a various hardware platform. Maximum frequency
of 180 MHz and throughput 4Gbps are observed in kintex-7
based configuration. By implementing the pipeline technique
we can enhance the frequency and throughput further.

R EFERENCES
[1] J. Daemen and V. Rijmen, The Design of Rijndael: AES-The Advanced
Encryption Standard. Springer Science & Business Media, 2013.

Potrebbero piacerti anche