Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract Chaos computing provides a large number of Keywords chaos computing, side channel, power
functions from a single hardware. Large scale reconfig- profile, obfuscation, instruction classification, hardware
urability can be achieved flexibly by tuning only a few security
parameters from a chaos based computing system. Im-
plementation of reconfigurable complex functions from
a single chaos circuit can alleviate area and power con- 1 Introduction
cerns due to decreasing technology nodes. It is possible
to make a multi-input multi-output complex instruc- Since Lorenz’s discovery of chaotic motion on a strange
tion set using the chaos generated functionalities where attractor in 1963 [27], chaos has attracted a lot of atten-
operations are more uniform than conventional imple- tion in areas such as chemistry, physics, biology, ecology
mentations. Lack of uniformity in implementation of and financial systems [35]. Many natural and engineer-
instructions in traditional computing system provides ing systems have been modeled using dynamical sys-
opportunity for attackers to reverse engineer based on tems [10, 26]. Over the years, non-linear dynamics in
side channel power analysis. In this paper, it is pro- chaotic systems has become an active field of research
posed that chaos based implementation of a complex due to advancements in chaotic neural networks and
instruction set is immune to classification based reverse chaos communications [1, 16, 25]. Applications of chaos
engineering attack. Cross obfuscation and self obfusca- are significant in the field of engineering, especially in
tion schemes are proposed in this work which leverage cryptography, secure communication, plasma technolo-
reconfigurability of chaotic system for obfuscating the gies and lasers [2, 13, 18].
power profile of the instruction set and it has been made Computing based on chaotic progression of state in
immune to reverse engineering attacks. The design uti- non-linear circuits, also known as chaos computing, has
lizes 3-input multi-output instructions by using a sin- been an exciting research area. Researchers have worked
gle chaotic iterative map. We analyzed the immunity on various aspects of chaos computing exploring recon-
of this design against classification based reverse engi- figurability, flexibility and security in the non-linear sys-
neering attack for six different classification algorithms tems [5, 7, 22]. Chaos based computation can facilitate
with five dimensionality reduction techniques. a single hardware to perform a large number of op-
erations by changing a small set of parameters in the
circuit topology. A single operation can be performed
∗
Md Sakib Hasan in many ways by changing the threshold voltage, con-
343 Min H. Kao Building trol bits, initial state, iteration number and bifurcation
1520 Middle Drive
Knoxville, TN 37996-2250 USA
parameter.
Tel.: +1-865-974-0229 Researchers have proposed different chaos based im-
E-mail: mhasan4@utk.edu plementations of basic logic gates. Higher input func-
1
Department of Electrical Engineering and Computer Sci- tions are possible to implement using chaos in addition
ence, The University of Tennessee, Knoxville, TN, USA E- to 2-input basic logic functions [21]. Chaos computing
mail: {mmajumde,ashanta1,garose}@utk.edu demonstrates promise in the field of secure and confi-
2 Md Sakib Hasan∗1 et al.
dential computing. It has been proposed as a means of linear dynamical systems are studied which can gener-
obfuscating power profile and hence mitigating power ate multiple logic functions using the same design.
analysis based side channel attack [28, 31]. In chaotic systems, the non-linear dynamics of CMOS
In this work, a new chaos based design for imple- circuits and their intrinsic computational capability is
menting a complex micro-instruction set comprising 3- being explored. The circuit maps the initial state of
input multi-output (1−8) digital operation is proposed. the circuit to future states. The dynamical system can
Since each instruction in traditional instruction set con- evolve in continuous time or in discrete time. Continu-
tains distinguishing power signatures, they can be clas- ous time chaotic systems have high complexity and low
sified with high accuracy using standard classification efficiency compared to discrete time chaotic systems.
algorithms. Training power signatures are collected from In order to design a discrete time chaotic map, the out-
a reference computing machine for performing instruc- put of the map is connected to the input of the circuit
tion reverse engineering in different machines. This work creating a feedback path [23].
shows that by leveraging different configurations of chaos A disadvantage of using chaotic logic gate is that
operation, each machine can perform instructions with they require more hardware compared to standard CMOS
unique power profile. Moreover, an uniform implemen- logic gates. In order to overcome this limitation, each
tation of the instruction set is also proposed where each chaotic gate should be able to generate increased func-
operation exhibits very similar power signatures and tionality. The number of functions that a single chaotic
can not be distinguished. It has been demonstrated that circuit can implement increases exponentially with the
both of these methods can help mitigate side channel number of iterations. The functions generated by the
reverse engineering attacks performed using different chaotic circuit can be dynamically chosen to implement
classification and dimension reduction algorithms. different logic functions in each clock cycle. The chaotic
The paper is organized as follows. Section 2 provides system is able to exhibit different behaviors by changing
background and necessary details about chaos comput- the initial state of the circuit or by changing the circuit
ing and its application to perform reconfigurable and parameters. In practice, all the functions may not be
flexible logic operations. Design and working principles accessible or usable due to noise or instability [22].
of a chaotic map circuit is described in Section 3. Sec- Chua’s circuit is one of the most popular chaotic
tion 4 describes the design of complex micro-instruction oscillators which uses an element called “Chua diode”
set. Section 5 goes through different classification algo- (piecewise-linear) to implement the non-linearity of dy-
rithms and dimensionality reduction techniques used to namical systems. Several approaches such as “cubic-
reverse engineer instruction using side channel power like” non-linearity or cubic non-linearity have been dis-
signatures. Classification results of CMOS based tradi- covered to replace the Chua diode [33]. Previously, arith-
tional implementation of the instruction set is provided metic operations were performed using arrays of chaotic
in Section 6. Both of the proposed defense models along elements. Chaotic systems are now used to implement
with their corresponding results are explained in Sec- logic gates which are capable of implementing AB, A +
tion 7 and 8, respectively. Section 9 discusses about lim- B, AB, A + B, A ⊕ B, A ⊕ B, ON and OFF. The one-
itations and possible future directions for this research dimensional system is able to implement only eight of
work. Finally, the paper is concluded in Section 10. the 16 possible functions because the initial state is a
function of the sum of the inputs [7]. This problem can
be resolved by assigning each of the inputs to its own
2 Chaos Computing Preliminaries state variable.
Chaotic circuits are made reconfigurable and flexi-
Despite existing proof of tremendous success of digital ble by changing parameters in the circuit topology. In
systems, there are areas where the systems do not meet 1998, Sinha et al. proposed that chaotic systems can
the demands and specifications of current applications. be used to build computers [32]. Murali et al. used a
Traditional computer systems are built from Boolean thresholding mechanism to implement a NOR gate with
circuits which contain switches (transistors). The tran- continuous-time chaotic system [30]. Rizk et al. also ap-
sistors open and close based on the the incoming input plied a threshold technique to Chua’s circuit in order
applied to the gates of the transistors. Different circuit to obtain all the functions such as AND, OR, NAND,
topologies need to be designed in CMOS technology to NOR and NOT. Chua’s circuit has been used to de-
implement different logic functions [21]. Hence, digital sign a flipflop which is a building block for memory
systems require millions of transistors which may cause devices [4]. Rose constructed a chaos based arithmetic
problems such as excessive power consumption and heat logic unit where different functions are selected by al-
production in a chip. As a solution to this problem, non- tering the control input and iteration number [31]. Bohl
A Chaos-based Complex Micro-Instruction Set for Mitigating Instruction Reverse Engineering 3
fined by a D-dimensional unit vector, u1 . The projec- reduction technique geared towards multi-class classifi-
tion of each observation, xn , onto this subspace is given cation, a reasonable choice is to take the feature points
by u1 T · xn . If all the observations are stacked up into accounting for the maximum variance of the original
a matrix, the projection of each row of the matrix can data across the different classes. In order to identify
be represented as UT X, where U is a matrix consisting these points the mean of each class, µk is needed, where
of eigenvectors of the covariance matrix, σ. The projec- 1 ≤ k ≤ K. If the mean values are put into a matrix
tion of the observations into a D-dimensional subspace (with k-th row being the mean of the k-th class), a
that maximizes the projected variance is given by D K × L matrix will be created, where L is the dimen-
eigenvectors, u1 , ..., ud with the D largest eigenvalues sion of the original data. The variance of each column
λ1 , ..., λd [34]. The effectiveness of PCA depends on the is the inter-class variance of each feature point. Finally,
number of reduced dimensions and on the nature of the the dimension is reduced by taking the first D columns
analyzed data. In this work, first few principal compo- with the highest variance.
nents contain most of the variance of the features as
shown in Fig. 8. While using PCA, the dimensionality
of the problem has been reduced to first 30 principal 5.2.5 Fishers Linear Discriminant Analysis (FLDA)
components for all cases since most of the information
is retained in the reduced data. Fishers Linear Discriminant Analysis (FLDA) is an ap-
proach used in pattern recognition to find a linear com-
5.2.2 Means-PCA bination of features which characterizes two or more
class observations [8,14]. The resulting combination may
PCA maximizes the overall variance of class observa- be used for dimensionality reduction before classifica-
tions but does not take the variance between classes tion. However, instead of maximizing the variance of the
into account. A reasonable choice is to maximize the intra-class data like PCA, information regarding the co-
variance of inter-class observations since moving the variance of different classes is taken into consideration.
class means apart may result in a higher classification These are the between-class and within-class covariance
rate. Here the class means are considered as instances matrices. If N number of L-dimensional observations
and the projection coefficients are computed using the for each class, C are considered, then the within-class
techniques discussed in Section 5.2.1. These projection covariance, σW and the between-class covariance, σB
coefficients are then used to transform the observations. are computed as:
In this method, the number of reduced dimensions are
K − 1, where K = number of classes. k
X
σW = Ni σ i (1)
5.2.3 Sum of Difference of Means (SDM) i=1
After the templates are created using dimensionality re- A support vector machine or SVM [19] is a supervised
duction techniques, the next step is to use a classifica- learning algorithm primarily used for classification. Given
tion method to classify the test data and determine its a set of training examples, each marked as belonging to
accuracy. In a supervised learning setting, the training one or the other of two categories, an SVM training al-
data is an ordered pair (x, y) where x is an instance and gorithm builds a model that assigns new examples to
y is its class label. The goal of the algorithm is to assign the appropriate category making it a non-probabilistic
a class for a given instance x. Many different classifiers binary classifier. An SVM model is a representation of
are used in machine-learning problems and their rela- examples as points in space mapped in such a way that
tive superiority depends on the speed, implementation the examples of separate categories are divided by a
cost, accuracy and most importantly, the nature of the wide gap. New examples are mapped into the same
problem. In this subsection, we briefly discuss several space and predicted to belong to a category depend-
classification algorithms used in this work. ing on which side of the gap they fall into. SVM can be
used as a non-linear classifier by using suitable kernel
functions e.g. Gaussian radial basis function. The stan-
5.3.1 k-Nearest Neighbors Algorithm (kNN)
dard SVM supports only binary classification, but it
can be extended by transforming multi-class classifica-
The kNN is a non-parametric lazy supervised learn-
tion to multiple binary classification problems [17]. De-
ing algorithm. The algorithm is called non-parametric
pending on the application, different number of binary
because it does not make assumptions about the data
classifiers such as ‘onevsone,’ ‘onevsall,’ ‘binary com-
and data generalization is not needed. In this algorithm,
plete,’ and ‘denser random’ are used in practice. In this
the training means storing the training data along with
work, we have reported the results for ‘onevsall’ and
their class labels. During classification, the classifier
‘onevsone’ techniques and the results also show a K-
computes the distance between the instance, x and all
way multiclass problem, ‘onevsall’ and ‘onevsone’ train
training instances, x ∈ X. It then keeps the k closest
k and k(k−1) binary classifiers, respectively.
training instances, where k ≥ 1. The class that is most 2
PL 1 2 T
( L
P
x1 )( L
P 2 Discriminant analysis create a linear combination of
i=1 (x .(x ) ) i=1 x )
1 2 L − i=1 L 2 features that characterizes or separates two or more
dcor (x , x ) = . (4)
σx1 σx2 classes. It is often used for dimensionality reduction
as described in section 5.2.5. For two classes, DA ap-
The cosine distance between two points x1 and x2 proaches the problem by assuming that the conditional
can be defined as: probability density functions p(x|y = 0) and p(x|y = 1)
are both normally distributed with mean and covari-
ance parameters µ0 , Σ0 and µ1 , Σ1 , respectively. In
x1 .(x2 )T
dcos (x1 , x2 ) = qP . (5) this work, we have used Linear Discriminant Analy-
L 1 2
PL 2 2
i=1 (xi ) i=1 (xi ) sis based on the assumption of homoscedasticity, i.e.
8 Md Sakib Hasan∗1 et al.
Σ0 = Σ1 and that the covariances have full rank. In signatures. Therefore, instructions can be classified us-
this paper, Linear Discriminant Analysis leverages the ing a sufficient amount of power data by applying dif-
‘onevsall’ and ‘onevsone’ techniques. ferent random operands. As demonstrated in previous
work, instructions implemented on a traditional CMOS
5.3.5 Naive Bayes(NB) based processor can be classified with a high accuracy
[28, 29].
Naive Bayes is a conditional probability model based This work shows the classification of CMOS based
on Bayes’ theorem with additional simplifying assump- traditional implementation of the instruction set which
tions. Given a problem instance to be classified, repre- has been performed using all the classification algo-
sented by a vector x = (x1 , . . . , xn ) which has n features rithms described earlier. For each classification algo-
(independent variables), it assigns to the instance prob- rithm, several dimension reduction techniques are used.
abilities, p(Ck |x1 , . . . , xn ) for each of k possible out- The data has also been analyzed with no reduction tech-
comes or classes and label the data as belonging to the nique performed on it. Classification results for the dis-
class with the highest probability. Combining Bayes’ cussed techniques are tabulated in Table 2. The best
theorem and very simplistic conditional independence classification accuracy was achieved for 1-NN with co-
assumption, theQproblem boils down to determining the sine distance after using all the sample points of the
n
value of p(Ck ) i=1 p(xi |Ck ) for each class and then dataset. A confusion matrix showing detailed result for
choosing the class with maximum value. In this work, each instruction of this classifier is shown in Table 3.
the prior probability is estimated from the training set The overall accuracy is 94.2% which is very close to the
and features are assumed to follow a Gaussian distri- ideal value of 100%.
bution.
7 Cross Obfuscation
5.3.6 Multivariate Gaussian Probability Density
Function As proven in Section 6, traditional implementation of
instructions in a processor can be accurately profiled
Given µk and σk of each instruction, classification is based on its power trace. Instruction power profiles can
performed as follows. Let W be the power consumption be successfully used to reverse engineer instructions on
waveform captured at runtime and assuming that its any other machine using classification algorithms de-
samples are drawn from a Multivariate Gaussian Nor- scribed earlier. However, with chaos-based computing,
mal Distribution model [15]. The noise introduced into functionality can be chosen from a large space where a
the power waveform, W, is extracted by subtracting the single function can be implemented using different con-
mean value from the waveform as in figurations, each with a unique power signature. Conse-
quently, profiling instructions based on the power sig-
nk = (W[1] − µk [1]), (W[2] − µk [2]), .., (W[p] − µk [p])
natures from a reference machine is not sufficient to
(6)
classify instructions in other machines. This idea was
where, µk is the mean of instruction Ik and p is the first proposed in [28], where seven two-input ALU in-
number of selected features after the original dimen- structions were implemented using three basic chaotic
sionality is reduced. The probability of observing the logic gates, AND, OR and XOR. A sequence of Vc along
noise, nk in the device’s power trace is then computed with variable number of iterations φs was used with no
as: control bit and fixed threshold.
In this work, the same technique has been extended
1 1
N (nk , (µk , σk )) = exp(− (nk )σk−1 (nk )T ). (7) for preventing classification attack among eleven three-
(2π)D/2 2 input multi-output operations, each implemented with
The instruction with the template that generates a single chaos-based logic gate. The control operation
the highest probability of observing noise, nk is classi- has been simplified by using a single Vc (chosen inside
fied as the correct instruction. the chaotic region in Fig. 3) for a particular operation
with variable threshold, δ and 6-bit control input, C for
expanding the design space.
6 CMOS Classification Five different set of configurations are chosen in the
chaos circuit for getting all the instructions in the in-
Traditional CMOS based implementation of the instruc- struction set. As already described, 4 different parame-
tion set is vulnerable to power based classification at- ters of the chaos circuit comprise the configuration for
tack. Each instruction exhibits distinguishable power each operation. The parameters are bias voltage, Vc ,
A Chaos-based Complex Micro-Instruction Set for Mitigating Instruction Reverse Engineering 9
Table 2: Classification accuracy among instructions using different classifiers and dimensionality reduction algo-
rithms for CMOS implementation.
Table 3: Confusion matrix of classification accuracy for different instructions in CMOS implementation. Rows and
columns represent the test instruction and percentage of their matched class respectively.
Operation Configurations
Config.1 Config.2 Config.3 Config.4 Config.5
Vc C δ φs Vc C δ φs Vc C δ φs Vc C δ φs Vc C δ φs
AND 0.69 58 1.02 5 0.74 63 1.17 3 0.62 0 0.69 4 0.69 58 1.02 9 0.72 41 1 9
OR 0.71 40 0.3 3 0.65 26 0.3 5 0.62 46 0.27 8 0.63 24 0.27 8 0.74 49 0.24 3
XOR 0.65 0.42 28 5 0.67 41 0.51 7 0.73 63 0.5 15 0.71 23 0.61 15 0.69 54 0.96 15
NAND 0.74 63 0.32 5 0.66 61 0.34 4 0.62 0 0.4 3 0.65 39 0.38 3 0.7 8 0.25 3
NOR 0.63 34 0.61 7 0.69 54 1.17 8 0.74 63 1.19 1 0.72 45 1.08 10 0.66 60 0.78 21
XNOR 0.65 28 0.69 6 0.66 63 0.81 15 0.69 63 0.82 13 0.71 23 0.42 14 0.72 62 0.51 18
ADD 0.65 28 0.42 5 0.65 40 0.51 5 0.65 45 0.42 5 0.65 26 0.42 5 0.72 58 0.52 19
SUB 0.65 28 0.69 6 0.65 48 0.77 6 0.68 41 0.51 8 0.65 51 0.6 6 0.72 57 0.83 15
MUX 0.69 62 0.55 7 0.62 38 0.63 8 0.65 55 0.94 8 0.65 53 0.86 0.65 0.74 19 1.018 11
DEC 0.64 36 1.09 15 0.68 23 0.92 10 0.71 63 1.18 8 0.67 49 0.72 6 0.68 7 0.49 21
ENC 0.71 50 0.95 10 0.67 4 0.7 5 0.62 36 0.76 2 0.65 39 0.53 2 0.74 0 1.03 6
10 Md Sakib Hasan∗1 et al.
Table 5: Classification accuracy of instruction set among different chaos-based machines (cross-obfuscation) using
several classification and data reduction algorithms.
Table 6: Confusion matrix of classification accuracy of different instructions for chaos-based cross-obfuscation im-
plementation. Rows and columns represent the test instruction and percentage of their matched class, respectively.
and ENC, (δ, φs ) pair only the first output is shown. an example to illustrate the premise that chaos-based
This strict design choice constrained the design space. design can be effective for immunity against side chan-
However, it does not create design issues since multiple nel attack. However, the design methodology does not
configurations are not required for this method. The depend on this particular three transistor chaotic map
results are shown in Table 8. The best results were ob- circuit topology. Any combination of ingenuous topol-
tained for NB with ‘onevsone’ technique with a classi- ogy and/or emerging device can be used as long as we
fication accuracy of 11.67%. This result is very close to get the ‘V’ shape, or alternatively, inverted ‘V’ or tent
the ideal value of 11.11% (1/9) for perfect obfuscation shape transfer curve. New device and/or topology can
involving 9 instructions. The confusion matrix repre- improve the overhead related to chaos-based design to
senting detailed classification results for this classifier make it competitive against conventional CMOS de-
is shown in Table 9. signs. Moreover, in order to overcome the susceptibility
The results for ADD and SUB are not shown in of chaotic gates to noise, detailed noise analysis needs
these tables, since they have sequential operation in to be done to make the design robust.
contrast to the other 9 instructions. Therefore, even As can be seen from Table 5, cross obfuscation ac-
with same Vc and Cb , they can not be fully obfuscated curacy varies among various designs. More quantitative
and can be classified with a relatively high accuracy. measures and metrics can be developed to aid the de-
The accuracy of classifying ADD in this configuration sign process for multiple implementations.
using NB classifier with ‘onevsone’ technique is 77.5%. Self-obfuscation is a more advanced form of obfus-
Additional design techniques have to be used for obfus- cation which renders the power traces of an instruc-
cating such instructions. tions set inherently indistinguishable. However, to get
the same Vc and C for all the instructions is a very
challenging task. Moreover, in its current form it can
9 Discussion and Future Work
not obfuscate sequential instructions such as ADD and
The classification results of a complex micro-instruction SUB. New design techniques need to be explored to
set using power traces for traditional CMOS implemen- overcome this short-coming.
tation as well as two obfuscation schemes using chaos- The overhead of using chaotic map for digital circuit
based systems are reported in this paper. The results implementation needs to be reduced. Since the main
from various classifiers using five different dimensional- objective of this work is to illustrate that chaos-based
ity reduction techniques are shown for comparison. The systems can be used to implement arbitrary complex
classification accuracy clearly shows that CMOS imple- 3-input multi-output functionality and that the corre-
mentations are vulnerable to side channel power analy- sponding large design space can enable designers to se-
sis attack, whereas chaotic circuit implementation can cure hardware against side channel power attack for
alleviate this problem to a large extent. The cross ob- instruction classification. It is a fact that if a single
fuscation scheme can produce indistinguishable power basic gate is implemented, the chaotic design has large
traces provided the gate design is done carefully in overhead. However, the chaotic design is reconfigurable,
a manner so that different implementations of identi- and unlike CMOS implementation, one can implement
cal instruction have sufficiently different configurations. both simple basic gates and relatively complex func-
The self obfuscation method is an extension of cross tionality like ADD, SUB, MUX and DEC using the
obfuscation which gets rid of designing multiple imple- same configuration. Moreover, as shown in [28], a suit-
mentations since the power traces of a single implemen- able combination of chaotic and CMOS gates can lead
tation can be made almost indistinguishable when two to an almost ideal obfuscation. The preliminary results
of the four configuration parameters namely Vc and C show that the number of chaos gates required increases
are chosen to be identical across the instruction set. The roughly logarithmically with the number of bits. More
obfuscation is almost ideal even for the best performing careful analysis needs to be done to come up with an op-
classifier. timization method combining chaotic and CMOS gates
There are many opportunities for extending this for overhead reduction.
work in the future. First of all, the map circuit used
in this work is a scaled down topology from [9]. De-
tailed analysis is required to optimize the circuit de- 10 Conclusions
sign to reduce power consumption, delay and area. The
width and location of chaotic region in the bifurcation In this work, it is shown that a chaos-based design can
diagram is also an important factor in choosing design be used to generate simple as well as complex 3-input
parameters. This particular chaotic map is chosen as multiple-output functions using a simple 3-transistor
A Chaos-based Complex Micro-Instruction Set for Mitigating Instruction Reverse Engineering 13
Table 7: Classification of different instructions for chaos-based self obfuscation implementation. Rows and columns
represent the test instruction and percentage of their matched class, respectively.
Config Instructions
AND OR XOR NAND NOR XNOR MUX DEC ENC
δ 1.02 0.28 0.46 0.37 1.18 0.56 0.44 1.18 0.93
φs 5 7 18 4 1 19 9 1 3
Vc 0.69
C 58
Table 8: Classification accuracy among instructions using different classifiers and dimensionality reduction algo-
rithms for chaos-based self obfuscation implementation.
Table 9: Confusion matrix of classification accuracy for different instructions in chaos-based self obfuscation im-
plementation. Rows and columns represent the test instruction and percentage of their matched class, respectively.
chaotic map circuit. The parameters in the chaotic os- same machine. It has been successfully demonstrated
cillator has been chosen carefully in order to minimize using various dimensionality reduction techniques along
delay, area and power consumption. Two different de- with several classification algorithms that the logic func-
sign methodologies have been proposed for obfuscation tions built from properly designed chaos gates can en-
namely, cross obfuscation and self obfuscation. Cross sure security against power based side channel attack.
obfuscation ensures that an attacker cannot reverse en- Proposed design is found to be capable of bringing the
gineer instructions on any other machine by accumulat- accuracy of instruction classification close to a level of
ing data from a reference machine. Self obfuscation is perfect ambiguity in classification.
possible by using chaos gates with a suitable common
configuration for certain parameters since it leads to
similar power traces among different instructions in the
14 Md Sakib Hasan∗1 et al.