Mishra Princeton 0181D 10442 PDF

Low-power FinFET Circuit Design and Synthesis under
Spatial and Temporal Variations
P RATEEK M ISHRA
A D ISSERTATION
P RESENTED TO THE FACULTY
OF
IN
P RINCETON U NIVERSITY
C ANDIDACY FOR THE D EGREE

OF
D OCTOR OF P HILOSOPHY
R ECOMMENDED FOR ACCEPTANCE

BY THE
D EPARTMENT OF
E LECTRICAL E NGINEERING
A DVISER : N IRAJ K. J HA
N OVEMBER 2012
c Copyright

by Prateek Mishra, 2012.
All Rights Reserved
Abstract
Moores law has enabled the scaling of CMOS technologies over the past several decades. However,
the scaling of conventional transistors beyond 22nm is limited by various factors, such as power
consumption and process variation effects. With every successive technology generation, leakage
current has been increasing exponentially due to the various short-channel effects, such as threshold voltage (Vth ) roll off, drain-induced barrier lowering (DIBL) and gate-induced drain leakage
(GIDL). Thus, the major challenge in continuing the Moore scaling lies in controlling the shortchannel effects. Double-gate field-effect-transistors (DGFETs) have been proposed as a promising
alternative to the conventional transistor technology. Due to the superior electrostatic integrity of
the channel, provided by the double-gate structure, they can significantly mitigate the effects of
short-channel effects. Thus, they have been proposed as an attractive solution for scaling beyond
22nm. Among DGFETs, FinFETs have recently attracted a lot of attention due to their superior fabricatability. The fabrication process of FinFETs is quite similar to that of conventional transistors.
FinFETs are quasiplanar structures in which the channel is made to stand up on its edge. FinFETs consist of a thin silicon fin around which a gate electrode is wrapped. This results in a
dual/tri-gate structure, depending upon the thickness of the oxide at the top of the channel. FinFETs have also been shown to have a superior ION /IOF F ratio as compared to the conventional
transistor at the same technology node. Hence, FinFETs can be used to increase performance and
reduce leakage current of a chip simultaneously. The two gates of the FinFET can be made independent of each other by etching out the top portion of the FinFET. Such FinFETs have been exploited
by researchers to develop various innovative standard cell designs. Also, the Vth of the front gate
of the FinFET can be controlled by applying a bias to its back gate. Since Vth controls both the
subthreshold leakage and the delay of a logic gate, the back-gate bias can be used as an important
knob to optimize the delay and power of circuits that employ independent-gate FinFETs. Another
important property of FinFETs is that they can be easily fabricated along the < 110 > channel
orientation by rotating the fins by 45o from the < 100 > wafer plane. Since the electron mobility
is maximum along the < 100 > channel orientation and the hole mobility is maximum along the
< 110 > channel orientation, optimized logic gates can be built by fabricating the pull-up network
of the logic gates in the < 110 > channel orientation and the pull-down network in the < 100 >
iii
channel orientation.
In this thesis, we first propose a methodology for low-power FinFET based circuit synthesis
which uses multiple supply and threshold voltages. The scheme is quite different from the conventional multiply supply voltage methods that target power optimization. We also propose a lowpower FinFET based circuit synthesis methodology based on channel orientation optimization. We
investigate various logic design styles that depend on different channel orientations.
Though FinFETs are a promising alternative to conventional transistors, they are still likely to
suffer from the effects of process variations. Process variation can be either environmental or lithographic in nature. Environmental variations can be attributed to both spatial and temporal changes
to temperatures and supply voltages in a chip. Lithographic variations results from an aberration
in the optical lens used to create the mask in the fabrication process. They are manifested both as
systematic and random variations in chip parameters, such as gate length, gate-oxide thickness, fin
thickness, etc. Thus, it is imperative to study the effects of process variation on important FinFET
circuit metrics, such as delay and power.
In this thesis, we study the effects of lithographic variations on FinFET leakage power. We investigate the leakage power of various standard cells under process variations in gate length and fin
thickness. Further, we propose a methodology to analyze leakage power of the full chip under process variations, as well as for a leakage power variation-aware low-power FinFET circuit synthesis.
We also perform a statistical delay characterization of FinFET standard cells under both environmental and lithographic variations. We use a central composite rotatable design under the response
surface methodology to characterize the delay of various standard cells under varying lithographic
and environmental parameter values.
iv
Acknowledgments
Ya devi sarvabhutesu sumati rupen samsthita
namastaseya namastaseya namastaseya namoh namah
First and foremost, I would like to thank the divine mother for all her inspiration, intellect, and
wisdom she bestowed upon me to complete this important task. Next, I would like to pay homage
to Prof. Niraj Jha. I have been very fortunate to have him as my advisor. He has treated me like his
own son. He kept encouraging me whenever I felt depressed or disappointed by events. He is one of
the most fascinating persons I have ever met in my life. His deep understanding and sharp acumen
of circuit design and electronic design automation have provided an excellent basis for this thesis.
I would also like to extend my sincere gratitude to my father, Dr. Ravindra Nath Mishra, and
my mother, Mrs. Sunita Mishra, who have been a tremendous source of love, encouragement, and
inspiration. The support from my parents has helped me finish this industrious work. I would
also like to thank my tauji, Dr. Virendra Nath Mishra, for encouraging me to pursue my dreams.
I would also like to thank my wife, Pallavi, for sticking with me in difficult times through the
course of this thesis. Next, I would like to extend my gratitude to the thesis readers, Prof. Saibal
Mukhopadhyay and Prof. Li-Shiuan Peh, for taking the time out of their busy schedules to go
through my thesis. They provided valuable feedback on my thesis that helped improve its quality. A
special thanks to Sarah Braude and Roelie Abdi for helping me with various non-academic issues.
My stay at Princeton would not have been exciting without the company of some good friends.
Shushobhan, Vaneet, and Aman helped me get acclimatized to Princeton during my initial days
here. Thereafter, they became really close friends and we shared some wonderful times together.
Abhishek, Arnab, Niket, Arun, CJ, and Varun became close friends right from the first year. I have
shared some wonderful times in Princeton with DJ. Our philosphical talks about life made my stay
at Princeton such a wonderful experience. Parthav and Harish were the best roommates one could
ask for. I would also like to thank my labmates, Muzaffer, Najwa, Wei, Sourindra, Meng, Aoxiang,
Chun-Yi, Maxwell, Joseph, and Ting, for fostering an atmosphere of creativity in the lab.
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 Introduction
1.1
Obstacles to scaling of the conventional transistor . . . . . . . . . . . . . . . . . .
1.2
Different kinds of DGFETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
FinFETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4
Thesis contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Related Work
10
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.2
FinFET fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.3
FinFET logic synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.4
FinFET SRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.5
FinFET process variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.6
Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
Low-power FinFET Circuit Synthesis using Multiple Supply and Threshold Voltages
19
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.2
Background work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.3
The principle of TCMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.4
Library design using TCMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.5
Power optimization methodology . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.5.1
28
Optimization flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
3.5.2
Phase I: Initialization of the circuit . . . . . . . . . . . . . . . . . . . . . .
30
3.5.3
Phase II: Linear programming formulation . . . . . . . . . . . . . . . . .
31
3.5.4
Application of methodology to c17 . . . . . . . . . . . . . . . . . . . . .
33
3.5.5
Comparison to conventional multiple-Vdd approach . . . . . . . . . . . . .
36
3.6
Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.7
Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4 Low-power FinFET Circuit Synthesis Using Surface Orientation Optimization

4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4.2
FinFET device simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
4.2.1
FinFET device parameters . . . . . . . . . . . . . . . . . . . . . . . . . .
46
4.2.2
Channel orientation effects . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.2.3
Optimal reverse bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
Library design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
4.3.1
Logic design using surface orientation optimization . . . . . . . . . . . . .
50
4.3.2
Library characterization and area effects . . . . . . . . . . . . . . . . . . .
51
Power optimization methodology . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
4.4.1
Optimization flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
4.4.2
Linear programming framework . . . . . . . . . . . . . . . . . . . . . . .
52
4.5
Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.6
Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
4.3
4.4
45
Die-level Leakage Power Analysis of FinFET Circuits Considering Process Variations 57

5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
5.2
Background work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
5.3
Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
5.4
Modeling leakage in FinFET logic gates . . . . . . . . . . . . . . . . . . . . . . .
61
5.4.1
Leakage in a single SG/IG FinFET device . . . . . . . . . . . . . . . . . .
62
5.4.2
Leakage in FinFET standard cells . . . . . . . . . . . . . . . . . . . . . .
65
5.5
Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
5.6
Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
vii
5.7
6
Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
Statistical Delay Characterization of FinFET Standard Cells Under Design of Experiments Using Response Surface Methodology
81
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
6.2
Delay modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
6.2.1
Effect of temperature on delay . . . . . . . . . . . . . . . . . . . . . . . .
83
6.2.2
Screening spatial process parameters for relative importance . . . . . . . .
85
6.3
Design of experiment (DOE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
6.4
Validation of the RSM model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
6.5
Dependence of delay model on temperature . . . . . . . . . . . . . . . . . . . . .
98
6.6
Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Conclusions and Future Research
101
viii
List of Figures
1.1
MEDICI-predicted DIBL and subthreshold swing for DGFETs and bulk silicon
transistor at various channel lengths [1] . . . . . . . . . . . . . . . . . . . . . . .
1.2
IDS -VGS characteristics for DGFETs and bulk-silicon transistors at equalized subthreshold current [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
Different kinds of DGFETs [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4
FinFET structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5
A multiple-fin FinFET structure . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6
FinFET structures: (a) SG and (b) IG . . . . . . . . . . . . . . . . . . . . . . . . .
1.7
Oriented FinFETs with nFinFETs along < 100 > sidewalls and pFinFETs along
< 110 > sidewalls [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Breakdown of power consumption of ICs in future designs [4] . . . . . . . . . . .
11
2.2
Comparison of fin density in spacer and optical lithography [5] . . . . . . . . . . .
12
2.3
Different kinds of FinFET NAND gate designs [6] . . . . . . . . . . . . . . . . . .
13
2.4
An SRAM cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.1
Multi-fin FinFET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.2
The principle of TCMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.3
Simulated Ids -Vgf s characteristics for an overdriven 32nm nFinFET . . . . . . . .
25
3.4
NAND gate employing the TCMS principle . . . . . . . . . . . . . . . . . . . . .
26
3.5
Power optimization flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.6
Example circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.7
A circuit to illustrate delay constraints . . . . . . . . . . . . . . . . . . . . . . . .
33
ix
3.8
Delay-minimized netlist obtained using Design Compiler . . . . . . . . . . . . . .
34
3.9
Power-minimized netlist obtained using the TCMS principle . . . . . . . . . . . .
35
3.10 Power-minimized netlist obtained using ECVS . . . . . . . . . . . . . . . . . . .
38
3.11 Power breakdown for delay-minimized circuits . . . . . . . . . . . . . . . . . . .
40
3.12 Power breakdown for power-optimized circuits . . . . . . . . . . . . . . . . . . .
40
3.13 Reduction in power consumption at various ATCs . . . . . . . . . . . . . . . . . .
42
3.14 Constitution of circuits by mode in ECVS circuits . . . . . . . . . . . . . . . . . .
43
4.1
BSIM-simulated Ids vs. Vds characteristics for different orientations . . . . . . . .
48
4.2
BSIM-simulated DC transfer characteristics for a 32nm FinFET . . . . . . . . . .
49
4.3
Optimal back gate bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
4.4
Optimization flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
5.1
FinE simulation framework for double-gate circuit design space exploration [7] . .
60
5.2
Two dimensional (X-Y) cross-section of an nFinFET simulated in Sentaurus TCAD
61
5.3
ILEAK spreads for LU N , TOX , LG and TSI , each varying independently . . . . . .
62
5.4
Matching SG-mode FinFET TCAD simulations with the macromodel for different
LG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5
64
Matching SG-mode FinFET TCAD simulations with the macromodel for different
TSI
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
5.6
Matching IG-mode TCAD simulations with the macromodel for different Vb . . . .
66
5.7
Matching QMC TCAD data with the macromodel . . . . . . . . . . . . . . . . . .
67
5.8
Schematics of SG-, LP-, and MT-mode NAND gates . . . . . . . . . . . . . . . .
68
5.9
Layouts of SG-, LP-, and MT-mode NAND gates . . . . . . . . . . . . . . . . . .
68
5.10 SG-mode NAND leakage from TCAD for different LG . . . . . . . . . . . . . . .
69
5.11 SG-mode NAND leakage from TCAD for different TSI
69
. . . . . . . . . . . . . .
5.12 SG-mode NAND I00 distribution predicted by the model and TCAD QMC simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
5.13 I10 distributions for SG-, LP-, and MT-mode NAND gates . . . . . . . . . . . . .
70
5.14 Simulation flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
5.15 Grid assignment for spatial correlation . . . . . . . . . . . . . . . . . . . . . . . .
72
5.16 Spreads in IT OT in the correlated and uncorrelated cases for benchmark circuit c880. 76
5.17 Effect of mixing LP-mode gates into a pure SG-mode c880 benchmark circuit, normalized to the 100% SG-mode case at iso-delay. . . . . . . . . . . . . . . . . . . .
77
5.18 Effect of mixing LP-mode (MT-mode) gates into a pure SG-mode c880 benchmark
circuit, normalized to the 100% SG-mode case with delay slacks. . . . . . . . . . .
78
5.19 Cumulative distribution function of IT OT for 100% SG-mode vs. 40% SG + 60%
LP-mode (MT-mode) gates at iso-delay for benchmark circuit c880. . . . . . . . .
79
6.1
Variation of nFinFET saturation current with voltage and temperature . . . . . . .
85
6.2
Saturation current dependence on temperature and fin thickness . . . . . . . . . . .
86
6.3
Saturation current dependence on process parameters for SG- and IG-nFinFET . .
87
6.4
Effect of process variation on physical gate length of traditional planar MOSFETs .
88
6.5
Effect of process variation on physical gate length of FinFETs . . . . . . . . . . .
89
6.6
Absolute S values with respect to different process parameters . . . . . . . . . . .
91
6.7
CCRD for k=3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
6.8
MC and RSM based delay distributions for SG-INV with n assumed to have a
6.9
Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
Delay vs. temperature at two supply voltages . . . . . . . . . . . . . . . . . . . .
99
xi
List of Tables
3.1
Power Savings Using the TCMS Scheme . . . . . . . . . . . . . . . . . . . . . . .
39
3.2
Area savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.3
Power Savings Using ECVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.1
FinFET parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.2
Power savings using oriented FinFETs . . . . . . . . . . . . . . . . . . . . . . . .
55
4.3
Accurate area estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
5.1
FinFET device parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
5.2
Comparison of SG-, SG + LP- and SG + MT-mode synthesis techniques for ISCAS

85 benchmarks at iso-delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3
Comparison of SG-, SG + LP- and SG + MT-mode synthesis techniques for ISCAS

85 benchmarks at iso-delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4
74
75
Mean and std. deviation of IT OT for ISCAS 85 benchmarks for TSI = 0 and
LG = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
6.1
FinFET device parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
6.2
Relationship between coded and actual variable values . . . . . . . . . . . . . . .
93
6.3
Process parameters along with their levels for CCRD . . . . . . . . . . . . . . . .
94
6.4
Coded process parameters along with their actual values . . . . . . . . . . . . . .
94
6.5
RSM delay model coefficients for SG-INV . . . . . . . . . . . . . . . . . . . . . .
95
6.6
RSM delay model coefficients for SG-NAND, LP-INV and LP-NAND . . . . . . .
97
6.7
Average testing error for SG-INV, SG-NAND, LP-INV, and LP-NAND . . . . . .
97
xii
Chapter 1
Introduction
The steady miniaturization of the metal-oxide-semiconductor field-effect transistors (MOSFETs)
with each new generation of CMOS technology has provided us with improved circuit performance
and cost per function over several decades. Transistor scaling has been enabled in the past few years
with the aid of innovative methods, such as shallow junctions and the use of halo doping for channel
engineering. However, three obstacles: (a) subthreshold leakage current, (b) gate-dielectric leakage,
and (c) threshold voltage (Vth ), have become the dominant barrier for further CMOS scaling even
for highly leakage-tolerant integrated circuits, such as microprocessors. The main challenges for the
sub-22nm gate length regime are two-fold: (a) minimization of leakage current, and (b) reduction in
the device-to-device variability to increase yield [8]. Several innovative device structures, such as
ultra-thin body silicon-on-insulator (SOI) and double-gate field-effect transistors (DGFETs), have
been proposed to address these challenges. These devices have an increased surface-to-volume ratio,
which improves device electrostatics, resulting in better short-channel characteristics. FinFETs are
DGFETs in which the channel is made to stand up. Amongst DGFETs, FinFETs have emerged as a
suitable candidate owing to their ease of fabrication in terms of processing and gate alignment [1].
1.1 Obstacles to scaling of the conventional transistor

As the channel length of a MOSFET is reduced, the drain voltage starts to influence the channel
potential, enabling it to conduct current even though the gate is turned off. This short-channel effect
is countered through he use of a thin gate oxide to increase the control of the gate on the channel.
Figure 1.1: MEDICI-predicted DIBL and subthreshold swing for DGFETs and bulk silicon transistor at various channel lengths [1]
Figure 1.2: IDS -VGS characteristics for DGFETs and bulk-silicon transistors at equalized subthreshold current [1]
However, gate oxides cannot be scaled beyond a certain threshold because of the increasing tunneling current associated with smaller gate-oxide thicknesses. Another technique, which is used to
mitigate short-channel effects, is to reduce the depletion width below the channel to the substrate.
A reduced depletion width corresponds to shortened depletion regions and, hence, reduced parasitic
capacitances. This results in improved subthreshold slope in the leakage regime. However, a reduction in the depletion width corresponds to degraded gate influence on the channel, which leads to a
slower turn on/off of the channel region.
In DGFETs, the drain potential does not effect the channel potential because of the proximity of
the second gate. This results in reduced short-channel effects, such as drain-induced barrier lowering
(DIBL) and degraded subthreshold slope (S). Fig. 1.1 shows the MEDICI-predicted DIBL and S at
various effective channel lengths (LEF F ), both for bulk silicon and DG devices [1]. It can be seen
that both DIBL and S are dramatically improved in DGFETs as compared to bulk silicon devices.
Thus, DGFETs can help us extend Moores law beyond the 22nm technology node. Fig. 1.2 shows
the IDS -VGS characteristics of DGFETs and conventional bulk silicon transistors. DGFETs not
only reduce the leakage current, but they also have an improved ION /IOF F ratio.
1.2
Different kinds of DGFETs
DGFETs come in three different flavors, as illustrated in Fig. 1.3. In Type I DGFET, a second gate
is buried in the body of the planar conventional transistor. In Type II DGFET, the silicon body is
rotated to a vertical orientation with the drain and source being on the top and bottom boundaries of
the body, respectively. In Type III DGFET, commonly referred to as the FinFET, the body is made
to stand up, but the drain and source are on either side of the channel instead of being at the top
and bottom. There are four obstacles in manufacturing of DGFETs: (a) fabrication of both the gates
to be of the same size, (b) alignment of the top and bottom gates, (c) alignment of the source and
drain regions to both the gates, and (d) providing for an area-efficient means to connect to the two
gates. Clearly, Type I DGFETs require extra material to be introduced in buried silicon whenever
we need a separate contact to the back gate. Also, Type I DGFETs do not easily meet the first
three requirements of manufacturing [2]. Type II DGFETs have been shown to meet manufacturing
requirement (b) and (d) easily. However, fabricating the source and drain regions such that they are
aligned to the top and bottom gates is difficult [2]. Type III DGFETs (FinFETs) have emerged as the
most promising candidate due to their ease of fabrication, gate alignment and easy access to both
gates.
DGFETs can also be classified as symmetric or asymmetric. Symmetric DGFETs have the
same gate material and oxide thickness for the front and back gates. On the other hand, asymmetric
DGFETs have different strengths for the front and back gates. Different strengths can be obtained by
using different gate-oxide thickness for the front and back gates, or by using materials of different
workfunctions for them. The Vth of the DGFETs can be adjusted through workfunction engineering
of the metal gates. Thus, DGFETs obviate the need for doping of the channel to control the Vth of
Figure 1.3: Different kinds of DGFETs [2]
the device. This results in no random dopant fluctuation effects in DGFETs. Further, in symmetric
DGFETs, two inversion channels are formed, one on each side of the transistor. However, due to
the thin size of the body, the two channels are effectively merged, providing a single channel. In
asymmetric DGFETs, the channel is only formed near the more conducting gate. The other gate
still contributes to controlling of the channel voltage, but acts as though it has a thicker effective
gate oxide.
1.3 FinFETs
FinFETs are quasiplanar field-effect transistors. The device physics governing the functionality of
FinFETs is exactly the same as that of planar MOSFETs. Fig. 1.4 shows the structure of a FinFET. A
silicon film of thickness TSI is patterned on an SOI wafer. The gate wraps around both sides of the
fin. The channel is formed perpendicular to the plane of the wafer. Its length is shown as LG . This is
the reason that the device is termed quasiplanar. The effective width of a FinFET is 2nHF in , where
n is the number of fins and HF in is the fin height. Thus, wider transistors with higher on-currents
are made possible by using multiple fins. Fig. 1.5 shows the structure of a FinFET employing two
fins. It should be noted that FinFET width is quantized, in terms of the number of fins. This leads to
important design considerations such as functionality, performance and power, which are sensitive
to the ratio [6].
Beyond the technology-driven benefits offered by FinFETs, circuits can also benefit from the
double-gate structure of FinFETs to further optimize power and performance. Etching out the top
part of the FinFET leads to some interesting designs that exploit its independent-gate structure.
Various innovative circuit structures have been suggested in the literature based on independent4
Z
Y
Figure 1.4: FinFET structure
TSI
drain
H Fin
source
Figure 1.5: A multiple-fin FinFET structure
LG
drain
front
gate
TSI
source
LG
TSI
HFin
gate
drain
H Fin
source
(a) SG-FinFET
back
gate
(b) IG-FinFET
Figure 1.6: FinFET structures: (a) SG and (b) IG
gate (IG) FinFETs. The FinFETs in which the two gates are shorted are referred to as shorted-gate
(SG). Figs. 1.6(a) and 1.6(b) show the structure of an SG- and IG-FinFET, respectively.
In IG-FinFETs, the Vth of the front gate can be controlled by applying a bias to the back gate.
Since Vth controls the subthreshold leakage and delay its controllability can be a powerful tool for
circuit optimization. Another important characteristic of FinFETs is that they can be fabricated
along the < 110 > channel orientation easily by rotating the fins by 45o from the < 100 > plane.
Electron mobility is highest along the < 100 > plane while the hole mobility is maximum along
the < 110 > plane orientation due to carrier mobility anisotropy in crystalline silicon [9]. Hence,
logic gates with pFinFETs along the < 110 > channel orientation and nFinFETs along the < 100 >
channel orientation are the fastest. Fig. 1.7 shows the nFinFETs and pFinFETs in a < 100 > wafer,
where the nFinFETs sidewalls are oriented in the < 100 > direction while the pFinFETs sidewalls
are oriented in the < 110 > direction [3]. Such a device orientation leads to non-Manhattan layouts,
which might pose an yield issue for sub-wavelength lithography.
Though FinFETs are supposed to mitigate the effects of process variations, they still suffer from
their effects. FinFETs are generally patterned using direct or spacer lithography. Owing to the small
dimensions involved and various factors, such as line edge roughness, both techniques can result in
variations in the values of the chip parameters. Also, the variations can be environmental in nature.
Such variations are generally temporal in nature and can occur at a frequency of nanoseconds to
years [10]. For example, effects, such as negative bias temperature instability (NBTI) and positive
Figure 1.7: Oriented FinFETs with nFinFETs along < 100 > sidewalls and pFinFETs along <
110 > sidewalls [3]
bias temperature instability (PBTI), lead to variations in Vth over the circuit lifetime. On the other
hand, varying computing workload leads to temporal variations in the chip temperature. Since
FinFETs are manufactured on an SOI wafer, heat dissipation issues become an important concern
for FinFETs. Process variations can be classified into different categories:
Systematic vs. random: systematic variations can be modeled using various mathematical
functions. On the other hand, random variations are unpredictable. They cannot be modeled
mathematically. Variations, such as lithography proximity effects, come under the realm of
systematic variations. Dopant fluctuations in the channel are random in nature.
Inter-die vs. intra-die: variations can be classified as inter-die or intra-die depending on the
spatial scale of the variation. Inter-die variations correspond to variation of a parameter value
across nominally identically die. Such variations may be die-to-die, wafer-to-wafer or even
lot-to-lot. Intra-die variations, on the other hand, correspond to spatially distributed parameter
variation inside a die. Intra-die variations are generally spatially correlated, i.e., devices in
close proximity get affected similarly.
Process vs. environmental: variations, which occur at runtime, are classified as environmental.
On the other hand, variations, which occur during the manufacturing of FinFETs, are termed
process variations.
In this thesis, we develop variation-aware logic synthesis methodologies, which work across all
three categories of variations.
7
Recently, the semiconductor industry has shown a lot of interest in FinFETs. TSMC plans
to have 14nm FinFET chips in full production on 450mm wafers by 2015 or 2016 [11]. A fully
functional FinFET SRAM at the 45nm node was announced by Samsung in 2005 [12]. A research
team from IBM Research, GlobalFoundries, Toshiba and NEC produced an SRAM cell with an area
of 0.063 square microns using FinFETs and optical lithography at the 22nm technology node [13].
The team claimed that the cell area produced by its work is the smallest SRAM cell produced using
optical lithography. The cell was shown to be operational down to the supply voltage of 0.4V.
Infineon fabricated a fully functional chip employing 3000 FinFETs in 65nm SOI technology [14].
A fully functional SRAM at the 22nm technology node was demonstrated by Intel [15]. It uses a
variant of FinFETs called tri-gate transistors. Indeed, Intel has announced a complete transition to
tri-gate chips at the 22nm node. Thus, it can be seen that many major semiconductor companies
have taken interest in multi-gate transistors, most notably FinFETs, to address the challenges posed
by the scaling of the conventional transistor.
1.4 Thesis contributions

FinFETs have been advocated as the most promising substitute for conventional transistors. FinFETs are expected to supplement or supplant bulk transistors in the near future. Hence, we need new
circuit design methodologies, standard cell library design, circuit synthesis schemes and computeraided design (CAD) tools to exploit the benefits offered by the double-gate structure of FinFETs.
FinFETs need be exploited from the device to circuit and architecture levels. Our research focuses
on how FinFETs can be used to design innovative circuits and on circuit synthesis schemes. We also
extend the circuit synthesis schemes to take process variations into account. The thesis is organized
as follows:
Chapter 2 details related work in the realm of FinFET design, circuit synthesis schemes and
performance/leakage analysis under process variations. It discusses the work done in the
realm of FinFET SRAM design and RF circuits. It also provides a brief overview of performance optimization under fin orientation.
Chapter 3 proposes a methodology for low-power FinFET based circuit synthesis. It dis-
cusses a mechanism called TCMS (Threshold Control through Multiple Supply Voltages) for
improving the power efficiency of FinFET logic circuits. This scheme presents a significant
divergence from the conventional multiple supply voltage schemes considered, and is shown
to be significantly better than schemes such as extended clustered voltage scaling.
Chapter 4 proposes a low-power FinFET based circuit synthesis methodology that exploits
surface orientation optimization. It includes a study of various logic design styles, which
depend on different FinFET channel orientations, for synthesizing low-power circuits.
Chapter 5 proposes a variation-aware low-power FinFET circuit synthesis methodology. It
discusses leakage current macromodels for various standard cells implemented in different
logic styles. Further, it proposes a methodology to calculate full-chip leakage under process
variations.
Chapter 6 shows how to perform FinFET standard cell statistical delay characterization under design of experiments using the response surface methodology. It shows how the delay
of FinFET standard cells can be characterized statistically under spatial and environmental
variations, using central composite rotatable design.
Chapter 7 concludes the thesis and discusses future research directions.
Chapter 2
Related Work
2.1 Introduction
The increase in chip power consumption with CMOS scaling has significantly affected the designs
of CMOS circuits. The semiconductor industry has been successfully scaling the gate length for
the past few decades. Transistor scaling has necessitated a decrease in gate length, gate dielectric
thickness and an increase in doping concentration [16]. This has resulted in an increase in leakage
current and increased reliability issues with each successive technology generation. Fig. 2.1 shows
the expected trend in the total power consumption of ICs. It can be clearly seen that contribution of
leakage power to the total power consumption is expected to be very significant in future technologies. In the figure, it is assumed that CMOS technology will be used until 2013 and then scaling
will be continued with the use of multi-gate CMOS technology.
FinFETs have been touted as the most promising DGFET technology. In this chapter, we review the work done in the area of FinFETs. Firstly, we study the work done in the area of FinFET
fabrication. We discuss the two most prominent techniques currently used to manufacture FinFETs.
Thereafter, we review the work done in the area of FinFET logic and physical synthesis. We analyze
various innovative standard cell designs proposed in the literature to reduce power consumption.
Next, we analyze the work done in the area of FinFET SRAM design. Various innovative techniques have been proposed to enhance SRAM metrics, such as read margin, write margin and cell
stability. We also review the work in the area of FinFET process variation. FinFETs generally have
a lightly doped channel surface and thus are unlikely to suffer from the effects of random dopant
10
Figure 2.1: Breakdown of power consumption of ICs in future designs [4]
fluctuation effects. However, lithographic variations, such as gate length, gate oxide thickness and
fin thickness, are likely to affect the FinFET manufacturing process, resulting in leakage power and
delay distributions. Also, FinFETs are likely to suffer from environmental variations, such as those
in temperature and supply voltage. Since FinFETs are usually built on an SOI structure, they also
suffer from the ill effects of self-heating.
2.2 FinFET fabrication

In this section, we detail the various methods used to fabricate FinFETs and discuss their pros and
cons. FinFETs can either be manufactured using optical lithography or spacer lithography. With
optical lithography, FinFETs are fabricated on bonded SOI wafers with a modified planar CMOS
process. The optical lithography technique is used to pattern a thin silicon film followed by deposition of a thin oxide on the top of the fin. Thereafter, a pattern reduction technique is used to deposit
the metal electrode. The metal electrode is doped using ion implantation to achieve a specified
workfunction. Nitride oxide is used as the gate insulator [5, 17]. However, this technique can result
in non-uniform fin thickness in a single device. Spacer technology is attractive for overcoming such
limitations. Further, spacer lithography provides for doubling of fin density, which doubles the drive
11
Figure 2.2: Comparison of fin density in spacer and optical lithography [5]
current at a given lithography pitch. Fig. 2.2 shows the comparison of fin density achieved using
optical lithography and spacer lithography. A spacer lithography process technology uses a sacrificial layer and a chemical vapor deposition (CVD) technique to achieve uniform silicon fins. The
minimum-sized features are not decided by photolithography, but by the CVD film thickness [5].
For FinFETs, short-channel effects can be controlled easily when the fin thickness is approximately half of channel length [18]. This becomes impossible by standard lithographic techniques
when gate length reaches the limit of lithographic dimension. Further, standard lithographic techniques produce silicon fins, which are highly non-uniform. Uniformity of silicon fin thickness is
very critical for FinFETs because line width roughness in silicon fins leads to large threshold variations [19, 20]. Also, the gate length-to-silicon fin thickness ratio should be less than 1.5 to keep
short-channel effects under control in FinFETs [18]. All the above requirements can be met using
the spacer lithography technique. Further, since the spacer lithography technique doubles the drive
current in a given area because of the doubled fin density, it has emerged as the technique of choice
for fabrication of FinFET chips.
12
Figure 2.3: Different kinds of FinFET NAND gate designs [6]
2.3
FinFET logic synthesis
Various researchers have explored logic synthesis with FinFETs. The property that has been exploited the most is the use of a back-gate bias in IG-mode FinFETs to modulate the Vth of the front
gate. Various innovative standard cell designs have been proposed using different combinations of
SG- and IG-mode FinFETs. In [6], different logic gate styles are presented and thereafter a linear
programming based sizing algorithm is used to optimize the circuit for power.
Fig. 2.3 depicts the SG-, LP-(low power), IG- and IG/LP-mode NAND gates [6]. SG-NAND
gates have the lowest delay among the different logic styles since fast SG-FinFETs are employed
both in the pull-up and pull-down network of the NAND gates. LP-NAND gates have more than
double the delay of SG-NAND gates. However, the leakage power of LP-NAND gates, averaged
over all the input vectors, is reduced by over 90% when compared to SG-NAND gates. This is
because LP-NAND gates employ IG-mode transistors with reverse bias on the back gates. The
reverse bias increases the Vth of the front gate, thereby reducing the leakage but increasing the delay
of the LP-NAND gates. In IG-NAND gates, only one transistor is used in the pull-up network. To
achieve equal rise and fall delays, the size of the pull-up network needs to be scaled up. IG-NAND
gates can achieve equal delay as that of the SG-NAND gates. However, the gates occupy more area
as compared to that of SG-NAND gates. The fourth design, IG/LP-NAND gate, is a hybrid of the
IG- and LP-NAND gates. The leakage/delay characteristics of the IG/LP-NAND gate lie in between
those of LP-NAND and IG-NAND gates. It should be noted that sizing of NAND gates for equal
rise and fall delay is a challenge because of the fin width quantization. The design rules for sizing
the NAND gates are also specified in [6].
Several low-power logic gate options using independent gates are presented in [21]. An effi-
13
cient circuit synthesis methodology based on the proposed low-power logic options has been developed. In [22], a genetic algorithm based power optimization framework for FinFET based circuits
is proposed. The authors exploit IG-mode FinFETs along with other low-power techniques, such
as multi-VDD and gate sizing, for power optimization. A novel look-up table based approach for
design of FinFET circuits is proposed in [23]. It is shown to be accurate by comparisons against
mixed-mode device simulations.
FinFET physical synthesis is still a nascent area of research. There is still a lack of FinFET
physical synthesis tools. However, researchers have looked into the layouts of various standard
cells employing SG- and IG-mode transistors. The layout structure of the FinFET depends upon
the type of process used. The increased fin density made possible by spacer lithography [5, 17, 24]
can be translated to increased layout densities. Another process knob that can be used to improve
layout density is fin height. An increase in fin height can translate to increased current in the same
area [25]. In [26, 27, 28], a comparative study of layout densities in SG-, and IG-FinFET standard
cells is done. It is shown that SG-mode standard cells occupy the same area as the standard bulk
transistor cells at the same technology node. However, the IG-mode standard cells occupy almost
double the area of SG-mode standard cells.
2.4 FinFET SRAM

A typical CMOS SRAM cell is a six-transistor (6T) structure consisting of two cross-coupled inverters, as shown in Fig. 2.4. Access to the SRAM cell is enabled by the word line, which controls
the two access transistors M5 and M6 . These two transistors control whether the cell should be
connected to the bit lines or not. They are used to transfer data for both read and write operations.
The major SRAM design metrics are read margin, write margin and cell stability. Researchers aim
to improve the above SRAM metrics while not sacrificing performance/leakage of the SRAM cells.
FinFET SRAM is a heavily researched area. However, in this section, we only review the most
seminal works in the area of FinFET SRAM. In [29], SRAM is investigated and compared to an
implementation in the 90nm node planar partially-depleted silicon-on-insulator (PDSOI) technology. It is shown that FinFET SRAM exhibits reduced delays and lower standby leakage current
when compared to its PDSOI counterpart. The effect of width quantization on FinFET SRAM is
14
Figure 2.4: An SRAM cell
also explored and demonstrated to be acceptable. In [30], both a forward bias to reduce Vth , while
performing Read/Write operations in an SRAM, and a reverse bias to reduce the leakage power in
the standby mode are used. In [31], the static noise margin (SNM) of FinFET SRAM cells operating in the subthreshold region is investigated. The 6T FinFET SRAM cell is also shown to be fully
functional in the subthreshold regime. Further, a stability analysis is performed for various novel
IG-mode SRAMs. A device optimization technique for robust and low-power FinFET SRAM is
presented in [32]. In this work, the gate sidewall spacer thickness is optimized to simultaneously
minimize leakage current and drain capacitance to on-current ratio. Further, it is shown that the
optimization reduces the sensitivity of the device Vth to fluctuations in gate length and fin thickness. In [33], a joint exploration of VDD -fin height-Vth design space is done for a 65nm FinFET
SRAM. It is shown that taller fins can accommodate lower VDD as well as a higher Vth to deliver
iso-performance at reduced leakage. An optimization study to improve cell stability in the design
space of silicon fin thickness and fin ratio is done in [34]. An alternative to sizing for stability in
FinFET memory cells is studied in [35]. It is shown how multiple workfunctions can be used to
control the Vth of the six transistors to improve stability at lower leakage power consumption. An
analysis of the impact of channel orientation on stability, performance and power of 6T and 8T
FinFET SRAMs is done in [36].
15
2.5 FinFET process variation

As stated earlier, process variations can be environmental or lithographic in nature. Environmental
factors arise during the operation of a circuit and include variations in supply voltage, switching
activity and temperature across the chip. Lithographic variations are permanent in nature. These
variations arise due to processing and masking limitations, and result in random or spatially varying
deviations from the nominal value. A large amount of work has been done in analyzing conventional
transistors under process variations. Various delay/leakage models have been proposed, which capture the deviations of process parameters. Further, these models have been used to propose statistical static timing analysis (SSTA) and full-chip leakage analysis algorithms to calculate the yield of
circuits under process variations.
SSTA falls under two categories. The first is path based algorithms, in which a set of important
paths is selected and submitted to a statistical timer for detailed analysis. In [37], a set of paths is
selected and then the maximum of these path delays is calculated. However, correlations among
path delays are ignored. In [38], these assumptions are relaxed and correlations among path delays
are considered. Path based algorithms have several disadvantages. First, it is not obvious how to
select the set of important paths for calculation of path delays, since the path that is omitted may
be important for some part of the process space. Second, path based algorithms are not good at
handling independent randomness in gate delays [39].
The second kind of SSTA algorithms falls into the category of block based statistical timers.
Such algorithms traverse the graph in a breadth-first manner as compared to the path based algorithms, which traverse the graph in a depth-first manner. In [40], a block based SSTA algorithm,
which takes into account correlations due to reconvergent fanouts, is proposed. In [41], the concept of parameterized delay models is introduced. The concept of spatial correlations in SSTA
algorithms is introduced in [42, 43]. In [39], a novel linear-time block based SSTA algorithm is
employed. Further, an incremental block based statistical timer, which is suitable for use in the
inner loop of physical synthesis or any other optimization program, is proposed. In [44], a parameterized block based statistical timer is proposed, which can handle nonlinear functions of delay and
non-Gaussian parameters as well.
A large amount of work has also been done in the realm of leakage power analysis under process
16
variations. In [45], the process parameters, which affect the leakage current of a device exponentially, are identified. Further, a process variation aware leakage current model is developed for a
single device. This model is validated against Monte Carlo simulations and is shown to be very
accurate. In [46], an analytical expression is given to calculate the probability density function of
leakage currents for stacked devices in CMOS gates. Then, these distributions of individual gate
leakage currents are combined to obtain the mean and variance of the leakage current of an entire circuit. Accurate estimation and modeling of total circuit leakage distribution considering both intraand inter-die variations are done in [47]. Leakage power and temperature variation are strongly
coupled. In fact, leakage power varies exponentially with an increase in temperature. Temperature
variations and electrothermal coupling between subthreshold leakage and junction temperature is
studied in [48]. It is shown that it is critical to consider die-to-die temperature variations for accurate leakage estimation. A novel framework for accurate estimation of subthreshold leakage in
process, temperature and supply voltage space, considering both inter-die and intra-die variations,
is presented in [49].
Though FinFET circuit synthesis and SRAM design have attracted a lot of attention from researchers, few researchers have also worked in the area of FinFET process variations. One of
the major differences between a FinFET and a planar device is that the FinFET consists of multiple
small fins. Thus, previous analytical models for obtaining the leakage distribution of a gate or a chip
cannot be directly applied to FinFET circuits. New analytical methods need to be developed, which
take into account the width quantization property. In [50], statistical leakage estimation of FinFETs
is estimated under the width quantization property. It is shown that conventional approaches can
significantly underestimate leakage current by as much as 43%. The effect of process variation on
device temperature in FinFET circuits is studied in [51], where a Monte Carlo simulation methodology based on thermal models is used to solve temperature and leakage power self-consistently.
The influence of process variation on device performance of the optimized 10nm FinFET is studied in [52]. The sensitivity of on-current, leakage current, threshold voltage, drain-induced barrier
lowering and subthreshold swing to process variation is also studied. In [53], engineering the workfunction of the gate materials is shown to be effective in controlling Vth under variations. Further,
the sensitivity of the electrical parameters of the device to several important physical fluctuations,
such as gate length, fin thickness and gate dielectric thickness, is analyzed. Variability of FinFET
17
based devices and circuits considering quantum-mechanical effects and width quantization property
is studied in [54].
2.6 Chapter summary

In this chapter, we discussed how leakage power can be a major barrier in scaling of conventional
transistors. To circumvent this barrier, FinFETs have been proposed as a possible solution. We discussed how FinFETs can be used to synthesize innovative logic circuits. Several novel low power
circuit synthesis methodologies were also described. In addition, work done in the area of FinFET
SRAM was discussed, including those that make innovative use of the independent-gate structure
of FinFETs. These designs offer enhanced read margins, write margins and cell stability as compared to their bulk counterparts. We also discussed existing work on the effect of process variation
on FinFET circuits. We reviewed how the width quantization property significantly impacts the
development of the leakage probability density function of a circuit. The self-heating effects of
SOI-FinFETs were also reviewed.
18
Chapter 3
Low-power FinFET Circuit Synthesis

using Multiple Supply and Threshold
Voltages
3.1 Introduction
Technology scaling has resulted in continual improvement in the performance of digital circuits.
With each technology generation, the device power supply voltage, Vdd , reduces by approximately
20% to 30%. The reduction in Vdd reduces the active power dissipation quadratically. A reduction in Vdd also necessitates a reduction in threshold voltage Vth to maintain the gate drive strength
(Vdd Vth ). The reduction in Vth with each technology generation leads to an exponential increase
in leakage current. Also, the number of transistors in a chip increases exponentially, resulting in an
increased power density. Thus, power consumption has become a major concern for chip designers because of the increased packaging and cooling costs as well as potential reliability problems.
Therefore, power efficiency has assumed increased importance. This chapter explores how circuits
based on FinFETs, an emerging transistor technology that is expected to supplant bulk CMOS at
the 22nm node or beyond, can be made power-efficient.
The steady miniaturization of MOSFETs with each new generation of CMOS technology has
provided us with improved circuit performance and cost per function over several decades. How-
19
TSI
drain
H Fin
source
Figure 3.1: Multi-fin FinFET

ever, continued transistor scaling will not be straightforward in the sub-32nm regime because of
fundamental material and process technology limits [3]. Several innovative device structures, such
as ultra thin-body silicon-on-insulator (SOI) and FinFETs, have been proposed to address the challenges being posed by continued scaling. These devices have increased surface-to-volume ratio,
which improves the devices electrostatics, resulting in better short-channel characteristics. FinFETs (Fig. 3.1) have emerged as the best solution for next-generation transistors because of better
scalability and ease of fabrication. FinFETs, with gate lengths down to 10nm, have already been
demonstrated with excellent control of short-channel effects and less than 0.5ps intrinsic delay [55].
Digital logic circuits implemented in FinFETs have been shown to be much more power-efficient
than the same circuits implemented in bulk CMOS at the same gate length [56].
However, beyond the technology-driven benefits, circuits can also benefit from the dual-gate
structure of FinFETs to further optimize power and performance. Such a structure provides us with
the ability to control the Vth of one gate by applying a voltage bias at the other gate. This property
leads to easy Vth controllability of FinFETs. Since Vth controls the delay as well as leakage current
of a transistor, its controllability can be a powerful tool for circuit optimization. A new circuit
synthesis style based on multiple supply and threshold voltages is presented in this chapter. The
synthesis style is dependent on TCMS, which is an innovative way to control Vth of connected-gate
FinFETs. TCMS is based on the principle that in an overdriven gate, delay and subthreshold current
can be reduced simultaneously. TCMS is explained in a greater detail in Section 3.3. In classical
multiple supply voltage schemes, logic gates on the critical path are typically assigned high supply
voltage while the gates on the non-critical paths are connected to a low supply voltage in order to
20
reduce power consumption while maintaining circuit performance [57, 58]. In addition to multiple
supply voltage design techniques, lowering the Vth can maintain high performance while lowering
the supply voltage. Unfortunately, this leads to an exponential increase in the leakage current, which
has become an important concern in low-voltage high-performance designs [59].
Using our TCMS scheme, one can sharply diverge from the way circuits have been designed in
the past. This scheme does not make use of a lower supply voltage and thus no lowering of Vth is
required to maintain performance. It uses a nominal and a higher supply voltage. A possible consequence of an increased supply voltage is an increased Vth . Thus, the leakage power can be reduced
L ), a slightly
drastically. We employ a set of three supply voltages: a nominal supply voltage (Vdd
H ), and a slightly negative supply voltage (V H ). The scheme is based on
higher supply voltage (Vdd
ss
the principle that in an overdriven gate (a gate which is driven by an input voltage that is higher than
its supply voltage), the delay and subthreshold leakage can be reduced simultaneously [60, 61]. We
make the following contributions in this chapter:
We propose the TCMS scheme for arbitrary logic circuits, which uses multiple supply and
threshold voltages to reduce circuit power consumption.
We discuss a library consisting of inverters and two-input NAND and NOR gates based on
the TCMS scheme. The library consists of seven different types of inverters and 25 different
types of NAND and NOR gates.
We extend a linear programming based optimization methodology to implement the TCMS
scheme for delay-constrained power optimization.
Experimental results show that the application of TCMS to a set of benchmarks reduces power
consumption, on an average, by 67.6% at 30% slack.
We propose two variants of the TCMS scheme. The first uses dual supply and threshold
voltages to reduce circuit power consumption. The second uses a TCMS scheme with a
single Vth . These schemes also result in significant power savings.
We also implement traditional extended clustered voltage scaling (ECVS) [58] using the linear programming framework and show that, even under an optimistic scenario, the power
saving obtained by ECVS, on an average, is lower than that obtained by the TCMS scheme.
21
The chapter is organized as follows. In Section 3.2, we review the background work. In Section 3.3, we discuss the TCMS principle, which forms the basis for the scheme presented in this
chapter. In Section 3.4, we discuss gate library design using TCMS. In Section 3.5, we discuss the
power optimization methodology and the implementation of ECVS. In Section 3.6, we present the
experimental results and conclude in Section 3.7.
3.2 Background work

Next, we review prior work in the area of FinFET circuit design. FinFETs have been used in a
variety of innovative ways in digital and analog circuit designs. However, the property that has been
exploited the most is the ability to control the two gates of a FinFET independently. The independent
gates can be used to merge parallel transistors to reduce circuit power and area [62, 21]. Another
principle, which has often been employed, is the use of a back-gate voltage bias to modulate front
gate Vth . In [30], both a forward bias to reduce Vth while performing Read/Write operations in
an SRAM and a reverse bias to reduce the leakage power in the stand-by mode was used. In [63,
64, 65, 66], various circuits employing back-gate voltage bias to control subthreshold leakage were
presented. Though a large part of previous work has been devoted to the design of specific circuits
using FinFETs, some researchers have also explored the area of FinFET circuit synthesis. A logical
effort based algorithm for gate sizing using FinFETs was presented in [56]. Different logic design
styles, leading to low leakage, based on independent control of FinFET gates were studied in [67,
6]. A tool that directly translates CMOS netlists to FinFET netlists was presented in [1]. The
downside of the use of independently-controllable FinFET gates is that its fabrication requires an
extra processing step. This also complicates the layout due to the extra wiring required to feed the
bias to the back gate.
In this chapter, low-power FinFET circuits are synthesized using the TCMS concept. TCMS is
also based on the ability to control FinFET Vth . However, no voltage biases are fed to the back gate
in order to control the Vth of the front gate. In fact, the TCMS scheme is applied to shorted-gate
(SG) FinFETs. TCMS was only applied to global interconnects in [68]. We propose a significant
generalization of the TCMS concept to the synthesis of any logic circuit. This approach is very different from existing multiple supply and threshold voltage power optimization schemes. In existing
22
schemes, the gates on the critical path operate at the higher Vdd , i.e., nominal supply voltage, or
lower Vth to meet the performance requirements, and the gates on the non-critical paths operate at
the lower Vdd or higher Vth , thereby reducing the overall power consumption without performance
degradation. In contrast to the above, in our scheme, both a nominal Vdd and a higher Vdd as well
as a nominal Vth and a higher Vth are deployed on critical as well as non-critical paths.
3.3 The principle of TCMS

The Vth at the front (back) gate of a FinFET can be controlled not only through process-related engineering such as (a) controlling the number of dopant atoms in the channel, and (b) using different
values for the gate workfunction, but also through the application of a voltage at the other gate. A
general relationship between the threshold voltage (Vthgf ) of the front gate (gf ) and applied voltage
bias Vgb s on the back gate (gb ) is given in [69]. However, for the purpose of this work, the following
0
(minimum observed
simple relationship among Vthgf , Vgb s , Vthgb (Vth of the back gate), and Vth
g
f
value of Vthgf ) would suffice:
Vthgf
0
Vth
0
Vth
gf
(Vgb s Vthgb ) if Vgb s < Vthgb ,
(3.1)
otherwise.
gf
where is a positive quantity whose value depends upon the ratio of gate and body capacitances. If
the FinFET is operated in SG mode, the Vth of both gates responds simultaneously to the change in
the voltage at the other gate. This happens because when the back gate is in depletion mode, charge
coupling occurs between the front and back gates. However, when the back gate is in strong inversion mode, the free carriers effectively screen the back-gate electric field, making Vthgf independent
of Vgb s .
TCMS exploits the fact that in an overdriven FinFET, the delay and subthreshold leakage can
be reduced at the same time. Fig. 3.2 is used to further illustrate this point. In this figure, the
H and V H , and for the NAND gate they are V L and V L . A
supply voltages for the inverters are Vdd
ss
ss
dd
H , V H and V L are 1.08V, 0.08V and 1.0V. V L is assumed to be tied to
possible set of values of Vdd
ss
ss
dd
H is also
ground. In the remainder of this chapter, it is assumed that any logic gate connected to Vdd
23
connected to VssH and similarly for the lower supply voltages. Let V 1 and V 2 in Fig. 3.2 be held at
H
Vdd
VLdd
V1
V1'
H
Vss
H
Vdd
V2'
V2
L
Vss
H
Vss
Figure 3.2: The principle of TCMS
0
logic 0. This would lead to a logic 1 at V 1 and V 2 . Thus, the nFinFETs in the NAND gate will
be conducting and the pFinFETs will be leaking. Both the subthreshold leakage current and delay
can be controlled through the control of FinFET Vth . In this case, it can be seen that the nFinFETs
experience a bias voltage of 1.08V, which is higher than the normal gate drive of 1.0V. On the
other hand, pFinFETs are reverse biased by 0.08V. Thus, the nFinFETs experience an increased
gate-to-source voltage, compared to the case when they are driven by a supply voltage of 1.0V.
The increased drive strength of nFinFETs results in a reduction in the falling delay of the NAND
gate. The applied bias causes the Vth of the pFinFETs to be increased, thereby resulting in a lower
subthreshold leakage. In addition, a negative gate-to-source bias on the pFinFETs further brings
down their subthreshold leakage current. Similarly, the application of a logic 1 at the circuit inputs
leads to a reduction in the leakage current in the nFinFET and improvement in the drive strength of
the pFinFET. The use of TCMS-style logic gates in circuit synthesis is explained in greater detail in
the next section.
TCMS is based on the principle that an nFinFET (pFinFET) experiences an overdrive when it is
conducting, and simultaneously a pFinFET (nFinFET) experiences a reverse-biased voltage, which
leads to very low subthreshold currents. TCMS can provide considerable power savings despite
the use of an increased Vdd . In TCMS there is a limitation to lay out an additional VssH line. This
limitation can be addressed by using supply-double grid suggested in [70]. Another way is to get
24
rid of VssH supply and still apply TCMS principle. This is explained in greater detail in Section 3.6.
In conventional multiple supply voltage schemes, power savings can be attributed to the use of a
lower Vdd on non-critical paths, which results in lower leakage and dynamic power dissipation.
However, in TCMS, power savings can mainly be attributed to the reduction in the leakage current.
Although the dynamic and leakage power may slightly increase for gates operating at the higher
supply voltage, this is far outweighed by the reduction in leakage power in overdriven gates.
We performed HSPICE simulations on an overdriven nFinFET using the predictive technology
model (PTM) for 32nm FinFETs. PTMs are available from [71] and have also been used for all
other HSPICE experiments reported in this chapter. These models have been verified against manufactured 32nm FinFETs [72] and have been widely used for circuit simulations [73, 74, 6]. Fig. 3.3
L and the
shows the simulation results. In the simulation, the drain of the nFinFET was tied to Vdd
H and V H . Thus, the nFinFET
source terminal to ground. The gate voltage level varied between Vdd
ss
H was applied at its gate and reverse-biased when V H was applied. Let
was forward-biased when Vdd
ss
L (I L ) and I H (I H ) denote the on-currents (off-currents) through the FinFET at normal drive
Ion
on of f
of f
L ) and overdrive (V = V H ), respectively. As shown in Fig. 3.3, I H exceeds I L by 3.4%.
(Vgs = Vdd
gs
on
on
dd
H is almost 5X smaller than I L . The large reduction in the subthreshold
On the other hand, Iof
f
of f
current in an overdriven FinFET is the key to the large power savings in TCMS schemes.
Figure 3.3: Simulated Ids -Vgf s characteristics for an overdriven 32nm nFinFET
25
3.4 Library design using TCMS

The concept of TCMS was illustrated through its application to a NAND gate in the previous section.
However, TCMS can be extended to any logic gate based on SG FinFETs. This is explained next.
L
Vdd
b
L
Vss
Figure 3.4: NAND gate employing the TCMS principle

Consider the two-input NAND gate shown in Fig. 3.4. The power supply voltages for the NAND
L and V L . Consider the two inputs a and b. They may be the outputs of a high-V gate
gates are Vdd
dd
ss
H and V H , and a low-V gate to V L and V L ).
or a low-Vdd gate (a high-Vdd gate is connected to Vdd
dd
ss
ss
dd
During circuit synthesis, when this gate is embedded in a larger circuit, it might so happen that a is
the output of a high-Vdd gate and b comes from a low-Vdd gate or vice versa. Suppose the former is
true. Thus, the FinFETs connected to input a follow the TCMS principle explained above. FinFETs
connected to input b cannot employ the TCMS principle because there is no gate-to-source voltage
difference to exploit.
H and V H , then the
On the other hand, if the power supply voltages for the NAND gate are Vdd
ss
FinFETs connected to input a will not be able to take advantage of the TCMS principle. Also, input
b is from the output of a low-Vdd gate and is driving a high-Vdd gate. This results in an increased
H V L.
leakage current because the pFinFET is forward-biased by Vdd
dd
26
H . These
To avoid the above problem, a level-converter may be used to restore the signal to Vdd
level-converters may be combined with a flip-flop, as in the clustered voltage scaling (CVS) technique [57], to minimize the power for voltage level restoration. In an asynchronous approach,
L and V H .
such as ECVS [58], level-converters may be inserted between logic gates connected to Vdd
dd
In such schemes, the power and delay overheads for the level-converters are large.
In the case of TCMS, using level-converters is not an attractive option because power savings
are obtained through the use of overdriven gates, the frequent use of which necessitates frequent
level conversion. However, level conversion can be built into logic gates without requiring the use
of level-converters [75], through the use of a high-Vth FinFET at the inputs of high-Vdd gates that
need to be driven by a low-Vdd input voltage. FinFET Vth may be controlled through a number of
mechanisms. For example, there are several process-related options to statically control the Vth of
a FinFET, e.g., channel doping, gate workfunction engineering or asymmetrical double gates [76].
The first step towards evaluating the utility of the TCMS principle for arbitrary logic circuits
involves the design of technology libraries, consisting of high-Vdd cells, low-Vdd cells, low-Vdd cells
that are being driven by high-Vdd cells and high-Vdd cells that are being driven by low-Vdd cells.
All these cells have to be characterized both at high-Vth and low-Vth . Thus, the design variables
that need to be targeted are supply voltage, input gate voltage and threshold voltage. Hence, for
a two-input NAND gate of a given size, we have five design variables: supply voltage, gate input
voltage for input a, gate input voltage for input b, Vth for FinFETs connected to input a and Vth for
FinFETs connected to input b. If the Vth of a pFinFET connected to an input is high (low), then the
corresponding nFinFET connected to the same input also has a high (low) Vth . It can be easily seen
that 32 two-input NAND gates of a particular size are possible, because of the five design variables.
H ,V H ), a low input a gate
For example, one type of NAND gate may have a high supply voltage (Vdd
ss
voltage, a high input b gate voltage, a high Vth for FinFETs connected to input a and a low Vth for
FinFETs connected to input b. Let 1 denote the case when either a high supply voltage or a high
input gate voltage or a high Vth is used. Similarly, let 0 denote when either a low supply voltage
or a low input gate voltage or a low Vth is used. Using this convention, the example NAND gate can
be termed nand10110. The first 1 in nand10110 denotes a high supply voltage, thereafter 0 denotes
a low input a gate voltage, third 1 denotes a high input b gate voltage, the fourth 1 represents the
high Vth for input a and the fifth 0 represents a low Vth for input b. Thus, 32 NAND gate modes are
27
possible ranging from nand00000 to nand11111. However, certain combinations of design variables
are not allowed: a logic gate with a high supply voltage and low input gate voltages cannot employ
low-Vth transistors as this will lead to a large leakage current, as explained earlier. Thus, nand10000,
nand10001, nand10010, nand10100, nand10101, nand11000 and nand11010 are not allowed. This
leads to 25 NAND gate modes instead of 32. Similarly, there are 25 NOR gate modes. Since the
inverter is a one-input gate, it has three design variables: supply voltage, input gate voltage and Vth .
This leads to seven valid modes for inverters. For each NAND, NOR and inverter mode, we include
five sizes: X1, X2, X4, X8 and X16. The library is characterized by simulating the delay, leakage
and short-circuit power consumption of each constituent cell in HSPICE. Transistor capacitance is
also measured using HSPICE. To model interconnect delay and load, fanout and size-dependent
wire load models were obtained by scaling the wire characteristics available as part of a 130nm
technology library, according to the method presented in [77].
3.5 Power optimization methodology

In this section, the methodology for implementing the TCMS scheme, via the use of multiple supply
and threshold voltages, for delay-constrained power optimization is presented. Our power optimization flow is shown in Fig. 3.5. The optimization methodology uses a two-phase strategy to find the
circuit with the best power consumption. In the initialization phase, called Phase I, the logic netlist
is first mapped to low-Vdd gates with low Vth . The circuit is then levelized into alternate levels
of high-Vdd and low-Vdd gates. In Phase II, an extension of linear programming based gate sizing
algorithm [78] is applied to the netlist obtained from Phase I. We also illustrate our methodology
through its application to a small benchmark (ISCAS85 c17) in this section. In addition, we discuss
the implementation of ECVS using the linear programming framework described in Phase II and
illustrate its impact through application to c17.
3.5.1 Optimization flow

The power minimization flow shown in Fig. 3.5 starts by mapping the logic netlist to low-Vdd gates
with low-Vth and finding its delay-minimized configuration, using Synopsys Design Compiler. The
library of low-Vdd gates with low-Vth is referred to as the SG library because of its use of SG
28
SG
library
Verilog
netlist
Delay-minimized netlist
by Design Compiler
Phase I: Divide into

alternate levels of high
(odd) and low (even)
Vdd gates
TCMS
library
Phase II: Linear

programming formulation
Delay
constraints met
?
No
Yes
P >
Yes
No
Power-optimized netlist
Figure 3.5: Power optimization flow

FinFETs. Thereafter, in Phase I, the circuit is divided into alternate levels of high-Vdd and low-Vdd
gates. The gates at odd levels are changed to high-Vdd gates with high-Vth . The gates at even levels
are changed to other modes of low-Vdd gates to maintain circuit consistency, as will be clearer later.
Next, in Phase II, a linear programming based algorithm is used to assign gate sizes and modes to the
mapped circuit by selecting cells from the TCMS library. Cell selection is based on the algorithm
presented in [78]. The linear programming formulation can be used for reducing both delay and
power in the circuits. The iteration terminates when all the delay constraints are met and the change
in power consumption between successive iterations is less than some pre-specified percent. We
give details of the optimization flow next.
29
3.5.2
Phase I: Initialization of the circuit
H and V H (V L and V L ). The

Recall that the pair of high (low) supply voltages are denoted by Vdd
ss
ss
dd
H and V L , respectively. First, the circuit is
high and low threshold voltages are denoted by Vth
th
L . During the initialization
synthesized by mapping it to low-Vdd gates with low threshold voltage Vth
procedure, the circuit is levelized. The level of each primary input is defined to be 0. The level of
a gate G, denoted as l(G), can be calculated by l(G) = 1 + maxi{1,2,...,F N } l(GIi ) where GIi is
the ith fanin of gate G and F N is the gate fanin. Next, all the gates located at an odd level in the
H at FinFETs connected to input a if this input
initial netlist are replaced by high-Vdd gates, with Vth
arrives from an even level, i.e., input a is the output of a low-Vdd gate. On the other hand, if the
input arrives from an odd level, the threshold voltage of the FinFETs can, in general, be allowed to
L or V H . However in our approach, we replace it by V H to reduce the initial leakage
be either Vth
th
th
H assignment can be changed to V L in Phase II if the optimization algorithm
power. Note that this Vth
th
deems it necessary. The gates at an even level are replaced with other modes of low-Vdd gates to
maintain circuit consistency, as mentioned earlier. We next illustrate the initialization phase through
an example.
L and V L , i.e.,
Consider the circuit shown in Fig. 3.6. Initially, the circuit is synthesized using Vdd
th
all the NAND gates and inverters are of the form nand00000 and inv000, respectively. Thereafter, as
explained earlier, the inverters of size X4 at level 1 are replaced with other inverters from the TCMS
H ,V H ) and their
cell library. The replaced cells have size X4, but their supply voltages are (Vdd
ss
H , i.e., the replaced inverters are of mode inv101. Similarly, the NAND
threshold voltages are Vth
H as the threshold voltage,
gate at level 3 is replaced with a high-Vdd NAND gate, which employs Vth
i.e., it has the nand10111 mode. The NAND gate at level 2 is replaced with nand01011, and the
inverter at level 4 is replaced with inv011. This is done so that modes of the gates at an even level
are consistent with the circuit topology.
When a gate is changed from a low-Vdd gate to a high-Vdd gate, it is not necessary that both of
its inputs will come from an even level and will thus be low-Vdd signals. It might so happen that one
of the inputs comes from an odd level and is the output of a high-Vdd gate. This explains the need
for 25 different NAND and NOR gate modes and seven different inverter modes in the cell library.
The circuit is divided into alternate levels of high-Vdd and low-Vdd gates to make use of the TCMS
30
scheme, which is based on the principle of a high-Vdd gate driving a low-Vdd gate. This also leads
H is
to a low-Vdd gate driving a high-Vdd gate. However, as explained above, in such a situation, Vth
used at the inputs to reduce leakage currents.

In Phase II, the circuit, initialized in the above fashion, is fed to a gate sizing algorithm. This is
described next.
VHdd
Vdd
x1
X4
x2
Vdd
X8
x3
b
X4
Level :
Vdd
Vdd
X4
2
X2
4
Figure 3.6: Example circuit
3.5.3
Phase II: Linear programming formulation
Circuit sizing algorithms often perform a search amongst the various candidate cells available for
each gate to select the cell with the best power-delay sensitivity. Let 4P represent the reduction in
power and 4D the degradation in delay, if an alternate cell is used. The ratio
4P
4D
is the power-delay
sensitivity. Such a cell is then used to replace the gate. However, as shown in [78], such decisions
can be quite suboptimal. The major advantage of the linear programming approach is that it leads
to an analysis of how changing each gate affects the gate it has a path to. We next review the gate
sizing algorithm presented in [78] and discuss enhancements we have made to it for implementing
the TCMS scheme.
The linear programming formulation is an iteration based algorithm. In each iteration, it selects
the best cells for any number of gates in the circuit, based on the power-delay sensitivity. To reduce
power, the cell with maximum reduction in power for a given increase in delay is chosen. To reduce
delay, the cell with maximum delay decrease for the corresponding increase in power is chosen.
When an alternative cell is chosen, the level of the existing gate and the input gate voltages play an
important role. If the existing gate is at an odd level and the voltage at input a (b) is high (low), it
31
can only be replaced by gates that have a high supply voltage and high (low) voltage at input a (b).
The same is true for gates at even levels. The free design variables are the threshold voltages and
gate sizes. The linear programming formulation is able to select alternative cells for any number of
gates in the circuit during each iteration. It uses a cell choice variable v for each gate. v denotes
whether an alternative cell has been chosen or not. v varies continuously in the range [0, 1]. A value
of v greater than a threshold value indicates an alternative cell should be used, else not. In [78],
the threshold value chosen is 0.99. We found that such a high threshold value greatly impairs the
chances of a cell being replaced. We found empirically that a threshold value of 0.6 works better.
An alternate cell for gate v (Fig. 3.71 ) is then chosen by minimizing power among various candidate
cells for which d0v dv + v 4dv , where d0v is the delay through v after a cell change. At the end
of an iteration of the algorithm, all the gates whose alternative cells have a v value greater than 0.6
are replaced with alternative cells. Equation (3.2) gives the objective used to optimize power in the
linear programming formulation. Delay constraints at individual gates (for the circuit in Fig. 3.7)
and at the circuit outputs are given by Equations (3.3) and (3.4), respectively.
min
!
v 4Pv
(3.2)
vV
tvw,rise tuv,f all + duv,rise
(3.3)
+v (4tuv,f all,v + 4duv,rise,v + vw,f all 4suv,rise,v )

X
+
x (4duv,rise,x + vw,f all 4suv,rise,x )
xf anout(v),x6=w
max {tv,rise , tv,f all } Tmax
voutputs
(3.4)
In the above equations, all timing arcs are assumed to have a negative polarity, i.e., a falling input
causes a rising transition at the output if the output changes. 4Pv is the change in power due to
changing gate v. tuv,f all is the falling arrival time at gate v from gate u. tvw,rise is the rising
arrival time at gate w from gate v. duv,rise is the delay from the signal on uv to the output of v
1
Reproduced from [78].
32
w
u
v
a
b
z
x
Figure 3.7: A circuit to illustrate delay constraints

rising. 4tuv,f all,v is the change in tuv,f all due to the cell of v changing. 4duv,rise,x (4duv,rise,v )
and 4suv,rise,x (4suv,rise,v ) are the changes in delay and slew, respectively, of this timing arc if
cell x (v) changes. vw,f all is a sensitivity term that determines how delay dvw,f all is impacted by
4suv,rise . Tmax is the maximum allowed signal arrival time at circuit outputs. Term 4tuv,f all,v
was added to the original formulation in [78]. It was found empirically that this term helps to better
model the effect of a change in fanout load of gate u on its delay.
As mentioned earlier, the sizing algorithm proceeds by selecting alternative cells for any number
of gates in the circuit depending upon the value of v . However, after each power optimization
iteration, there might be a violation of the delay constraints provided for the circuit. This may
happen because the algorithm replaces several gates at once and the delay constraints are based on
individual gates changing, and simultaneous changes in a gate and its fanin are not modeled. If the
arrival time constraint (ATC) at the outputs is violated, delay minimization is performed using the
delay minimization version of the algorithm mentioned earlier. However, the objective function and
the criteria for selecting an alternative cell changes. A detailed analysis of the delay minimization
framework is provided in [78]. The iteration terminates when the ATC has been met and the power
reduction from one iteration to the next is less than some pre-specified percent.
Next, we show the application of our methodology to the smallest ISCAS85 benchmark c17.
3.5.4
Application of methodology to c17
We synthesized the power-optimized netlist for ISCAS85 benchmark c17 at 130% ATC, i.e., with
a slack of 30% relative to the delay-minimized version, using the methodology illustrated earlier.
33
The set of high supply voltages used were 1.08V and 0.08V. The nominal set of supply voltages
were 1.0V and ground. These supply voltages were chosen by fixing the nominal set of supply
voltages and experimenting with various sets of high supply voltages. The two threshold voltages for
nFinFETs were 0.29V and 0.45V and those for pFinFETs were 0.25V and 0.40V. The switching
activity at each primary input was set to 0.1.
c17 is initially mapped to low-Vdd gates with low-Vth and the delay-minimized logic netlist
is obtained, as shown in Fig. 3.8. Thereafter, the power-optimized netlist is obtained at 130%
ATC. We achieved 50.3% power reduction for this circuit. The initial power consumption of the
delay-minimized netlist was 301.35W (leakage power: 28.02W, dynamic power: 273.33W). In
the power-optimized netlist, leakage power reduces by 92.8% and dynamic power by 45.9%, and,
hence, the total power consumption reduces to 149.87W . The cells chosen by our methodology
for c17 are shown in Fig. 3.9.
e
X2
X1
d
X8
X2
X8
X16
X2
d
X16
X8
X4
X16
X4
X8
X16
Level:
Figure 3.8: Delay-minimized netlist obtained using Design Compiler

The reasoning behind why the leakage power could be reduced by almost 92.8% is as follows.
34
nor10011
nor11011
X1
X1
inv101
d
X2
X1
nor01100
X2
inv101
nand01001
inv101
c
X2
inv101
nor00111
X4
X8
inv101
a
X1
X8
nor10011
nand00110
X2
X1
nor01100
X2
X8
inv101
Level :
Figure 3.9: Power-minimized netlist obtained using the TCMS principle

The cells in the optimized netlist either have a high or low Vth . If they have a low Vth , they are
being driven by high-Vdd gates and thus, as explained earlier, the leakage power of these gates
reduces significantly. To illustrate the point, consider the NOR and NAND gates at levels 2 and 4 in
Fig. 3.9. The mode of NOR and NAND gates in level 2 are nor01100, nor00111, and nand01001,
while the mode of the NAND gate in level 4 is nand00110. Even when these gates employ a low
Vth , they consume low leakage power because they are driven by high-Vdd gates. The gates at an
odd level employ a high Vth because they are driven by low-Vdd gates. The high Vth results in
low leakage power consumption in these cells. The increase in supply voltage of the cells at odd
levels tends to increase the leakage power dissipation in these cells. However, the reduction in
leakage power obtained by the increase in the Vth of these cells outweighs the increase in leakage
power caused due to the use of a higher supply voltage. The dynamic power reduces in the poweroptimized netlist because there is a large reduction in the area (and hence capacitance) as compared
35
to the delay-minimized netlist (as can be seen by the gate sizes shown in Figs. 3.8 and 3.9). The
cells at the odd levels are high-Vdd gates. The dynamic power consumption of these cells tends to
increase due to an increase in their supply voltage, but tends to decrease due to a reduction in the
area of the cells they drive. The power consumption of the cells at even levels decreases if there
is a reduction in the area of the cells they drive. In Fig. 3.9, there is a decrease in the size of all
except one (in which case the size is the same) cell in the netlist. Thus, the dynamic power of the
low-Vdd cells decreases. The dynamic power of high-Vdd cells also decreases in most cases because
the reduction in area outweighs the increase in supply voltage. The total number of fins in the
delay-minimized netlist is 538 while the total number of fins in the power-optimized netlist is only
216.
3.5.5 Comparison to conventional multiple-Vdd approach

In traditional multiple-Vdd approaches, voltage level-converters are required to feed a high supply
voltage gate from a low supply voltage gate. Since these supply voltages are different from those
used in TCMS, let us denote them as HIGH-Vdd and LOW-Vdd gates, respectively. E.g., HIGH-Vdd
= 1.0V and LOW-Vdd = 0.7V are often used. There are two major conventional multiple-Vdd approaches that have been published in the literature. The first one is the synchronous multiple-Vdd ,
also known as CVS, methodology [57], which has level-converters at the outputs of combinational
logic only. Level-converters may be combined with flip-flops, known as level-converting flip-flops,
to reduce the delay and power overhead attached with a level-converter.
The second methodology is the asynchronous multiple-Vdd scheme, also known as ECVS [58],
in which asynchronous level-converters are used. These level-converters allow any gate in a path to
be changed to be a LOW-Vdd gate, provided the path has sufficient slack. Since there is no restriction on the assignment of LOW-Vdd gate, ECVS can theoretically achieve greater power savings
compared to CVS. Thus, we compare the TCMS scheme with the ECVS methodology.
The ECVS scheme is also implemented using the linear programming framework described
earlier. The nominal set of supply voltages were 1.0V and ground. The other set of supply voltages
used were 0.7V and ground. The Vth for nFinFET (pFinFET) was set to 0.29V (0.25V). A new
library (LOW-Vdd library) was created for the lower supply voltage (0.7V, ground). The library was
characterized at the above Vth . The input gate voltage varies between 0.7V and ground, i.e., no
36
high input gate voltage is used when the LOW-Vdd library is characterized. When a HIGH-Vdd gate
feeds a LOW-Vdd gate, the rising delay value can be directly obtained from the library. However, the
L to V H . To circumvent
falling delay value reduces because the input gate voltage increases from Vdd
dd
this problem, we reduce the falling delay value by a fixed fraction [79] whenever a HIGH-Vdd gate
feeds a LOW-Vdd gate. Similarly, a HIGH-Vdd library was characterized with supply voltages (1.0V,
ground). Only these two libraries are required for the ECVS methodology.
Initially, the circuit is mapped to the HIGH-Vdd library to obtain the delay-minimized netlist,
and then the ECVS methodology is applied. We synthesized the power-optimized netlist for c17
at 130% ATC, using ECVS. The netlist is shown in Fig. 3.10. The cells marked as inv and nand
are LOW-Vdd gates, while the cells marked as inv h and nor h are HIGH-Vdd gates. The power
consumption of the ECVS power-optimized netlist is 176.69W . The dynamic power reduces by
40.2% and the leakage power by 53.1% in the power-optimized netlist. On the other hand, leakage
power reduces by 92.8% in the TCMS scheme. There is a larger reduction in leakage power in the
TCMS scheme because of the negative gate-to-source voltage resulting from the TCMS principle
and also due to the other set of high-Vth employed. Although ECVS employs LOW-Vdd gates,
which decrease the dynamic power consumption quadratically, still, dynamic power reduces by a
larger margin in the TCMS scheme because of the greater reduction in area obtained. The total
number of fins in ECVS power-optimized netlist is 282, which is 30.1% higher as compared to the
TCMS scheme. Out of 14 gates in the ECVS circuit, three are mapped to LOW-Vdd gates. Note that
the power-optimized netlist does not have any level-converters because there is no LOW-Vdd gate
driving a HIGH-Vdd gate. For larger circuits, however, ECVS would have to incur delay and power
overheads of level-converters.
3.6 Experimental results

We present experimental results for power optimization on the ISCAS85 benchmark suite in this
section. The cell libraries were characterized using HSPICE based on the PTM [71] at 70o C for
32nm FinFETs. This operating temperature was chosen because FinFETs suffer significantly from
self-heating, and thermal simulations yield a temperature near 70o C when the switching activity is
0.1, which is what is assumed for the experimental results here.
37
nor_h
nor
X1
X1
inv_h
X4
X1
nor_h
X4
inv_h
nand
inv_h
c
X8
inv_h
X8
inv_h
a
X8
X2
nor_h
d
X4
nor_h
nand
X2
X2
nor_h
X4
X8
inv_h
Figure 3.10: Power-minimized netlist obtained using ECVS

The same set of supply and threshold voltages were used that are mentioned in Sect. 3.5.4. The
average runtime for sizing the benchmarks on a shared server farm using eight dual-core 64-bit
AMD Opteron processors, running Red Hat Linux 4.0, was a few CPU hours.
The input switching activities were propagated through the entire circuit to obtain the switching
activities at different nodes using Synopsys Design Compiler.
The circuits were initially mapped to low-Vdd gates with low-Vth and their delay-minimized
configurations were obtained. Thereafter, power was optimized using our methodology at 130%
ATC.
We present the experimental results in Table 3.1. Column 1 lists the ISCAS85 benchmarks.
Major column 2 presents dynamic, leakage and total power of the delay-minimized version. Major
H ,V H ) of (1.08V,.08V).
column 3 presents the power results for the TCMS scheme assuming (Vdd
ss
As we can see, the leakage power reduces by 95.8% and dynamic power by 53.3%, providing a
total power reduction of 67.6%, on an average, when compared to the delay-minimized netlist. The
38
Table 3.1: Power Savings Using the TCMS Scheme

Design
c432
c499
c880
c1355
c1908
c3540
c5315
c6288
c7552
Total
Savings
Delay-minimized
Dynamic Leakage
Total
679.15
546.38
1225.53
9174.54
1757.26 10931.80
1499.76
921.35
2421.11
8015.13
1583.94
9599.07
2810.42
1290.01
4100.43
2198.84
2073.06
4271.90
5545.22
2929.66
8474.88
11662.20
9749.32 21411.52
10836.80
5781.22 16618.02
52422.06 26632.20 79054.26
0
0
0
Power consumption (W )
TCMS (1.08V and 0.08V)
TCMS (Single Vth )
Dynamic Leakage
Total Dynamic Leakage
Total
345.08
24.26
369.34
382.53
25.89
408.42
3973.30
95.37
4068.67
4357.34
100.63
4457.97
768.27
33.90
802.17
827.37
35.08
862.45
3414.07
80.98
3495.05
3805.77
88.01
3893.78
1276.65
58.35
1335.00
1435.42
62.95
1498.37
1198.03
53.25
1251.28
1273.48
90.67
1364.15
2889.58
126.10
3015.68
3047.51
129.72
3177.23
5479.86
394.28
5874.14
5806.66
397.43
6204.09
5133.74
234.35
5368.09
5357.89
213.67
5571.56
24478.58 1100.84 25579.42 26293.97 1144.05 27438.02
53.3%
95.8%
67.6%
49.8%
95.7%
65.3%
Dynamic
372.93
4181.26
802.10
3564.44
1373.25
1262.22
2956.05
5792.84
5170.33
25475.42
51.4%
Dual-Vdd
Leakage
Total
31.16
404.09
118.74
4300.00
27.14
829.24
94.86
3659.30
62.49
1435.74
69.07
1331.30
136.97
3093.02
362.09
6154.93
204.72
5375.05
1107.24 26582.67
95.8%
66.3%
FinFET area reduces by 65.2%, on an average, as shown in Table 3.2 .

Table 3.2: Area savings
Design
c432
c499
c880
c1355
c1908
c3540
c5315
c6288
c7552
Total
Savings
Delay-minimized
12731
39533
21483
35994
29507
45283
64807
216800
125762
591900
0
Area (Total no of fins)

Power-optimized (1.08V) Single Vth
4221
5012
14414
18663
7203
8229
12790
15681
10580
12920
18887
19763
23556
27306
74387
88535
40054
43753
206092
239862
65.2%
59.5%
Dual-Vdd
4834
16127
7813
13531
11810
19785
26739
85316
41059
227014
61.6%
Figs. 3.11 and 3.12 present the leakage and dynamic power breakdown for delay-minimized
and power-optimized benchmarks, respectively. In the delay-minimized circuits, the leakage power
accounts, on an average, for 34% of the total power. After applying the TCMS scheme, the leakage
power accounts, on an average, for only 5% of the total power, as expected.
To account for a manufacturing process that allows only a single-Vth , not dual-Vth , we ran
the experiments again assuming that only Vth s of 0.45V for nFinFETs and 0.40V for pFinFETs
were available. The results are shown in major column 4 in Table 3.1. As expected, the overall
power reduction reduced slightly from 67.6% to 65.3%, since the power optimization algorithm
had less freedom to optimize. However, the negative impact of using a single-Vth is marginal.
In general, the dynamic power consumption is slightly higher because the FinFET area (hence
capacitance) is higher at single-Vth . This can be seen from Table 3.2. Although the single-Vth
TCMS scheme employs only high-Vth , the leakage power consumption is marginally higher than
39
Power breakdown for delay-minimized circuits
Leakage
Dynamic
100
90
80
% of Power
70
60
50
40
30
20
10
0
c432
c499
c880
c1355
c1908
c3540
c5315
c6288
c7552
ISCAS'85 benchmarks
Figure 3.11: Power breakdown for delay-minimized circuits
Power breakdown for power-optimized circuits using TCMS scheme
Leakage
Dynamic
100
% of Power
80
60
40
20
0
c432
c499
c880
c1355
c1908
c3540
c5315
c6288
ISCAS'85 benchmarks
Figure 3.12: Power breakdown for power-optimized circuits
40
c7552
the dual-Vth TCMS scheme, because of the greater area reduction obtained in the latter. It was also
observed that in the power-optimized netlists obtained using the dual-Vth TCMS scheme, most of
the cells employed high-Vth . This further explains the similar reductions in leakage power achieved
by the two techniques.
Even though TCMS leads to a substantial power reduction, a limitation is the need to lay out
an additional VssH line. This limitation can be addressed by using the double-supply/double-ground
grid suggested in [70]. Another way to address this limitation is to replace VssH with VssL , i.e., just use
one Vss line instead of two. This would decrease the power reduction possible. However, since the
TCMS principle will still be applicable to the pFinFETs in the circuit, the power reduction would
L , V H and
still be appreciable. Therefore, we performed experiments with dual supply voltages Vdd
dd
a single ground line VssH . We refer to this as the dual-Vdd scheme. The results are shown in major
column 5 in Table 3.1. As expected, the overall power savings decreases slightly from 67.6% to
66.3%. The dynamic power consumption is slightly higher because the fin-count in the dual-Vdd
scheme is higher than the fin-count in the TCMS scheme (see Table 3.2). However, the leakage
power consumption is almost similar across all the benchmarks. This is true because when a lowVdd gate drives a high-Vdd gate in the TCMS scheme, the gate-to-source voltage difference increases
the leakage current exponentially. This is counteracted by the use of a high-Vth in high-Vdd gates.
However, in the dual-Vdd scheme, there is no gate-to-source voltage difference when a low-Vdd gate
drives a high-Vdd gate and the output of the low-Vdd gate is low, due to the use of a single ground
line. This leads to exponential savings in leakage power consumption of high-Vdd gates for the
above case. However, when a high-Vdd gate drives a low-Vdd gate, there is an exponential amount
of power savings in the low-Vdd gates due to the TCMS principle. In the dual-Vdd scheme, these
power savings can only come from pFinFETs. The two counteracting effects in the dual-Vdd scheme
thus lead to similar power savings to the TCMS scheme.
Next, we consider trends in average power savings across ISCAS85 benchmarks at successively
relaxed ATCs. As expected, the average total power savings increase from 56% to 76% (Fig. 3.13).
This happens because at relaxed ATCs, the linear programming algorithm has more overall slack
to allocate to individual gates. This shows that the proposed TCMS based optimization methodology can effectively utilize the increased slack to reduce power consumption in circuits. We also
performed simulation at 110% ATC to study the effectiveness of the technique at overall low slacks
41
% reduction in power
90
80
% reduction in power
70
60
50
40
30
20
10
0
110%
130%
150%
170%
190%
ATCs
Figure 3.13: Reduction in power consumption at various ATCs

(Fig. 3.13). The average total power savings in this scenario is smaller (56%) because of the reduced
availability of gate slacks. However, there is still a significant reduction in average power savings,
making this technique suitable for high-performance applications as well.
The TCMS scheme was also compared with the ECVS methodology. Table 3.3 presents dynamic, leakage and total power and area of the power-optimized netlists using ECVS at 130% ATC.
The leakage power reduces, on an average, by 88.7% and the dynamic power reduces, on an average, by 50.7% when compared to the delay-minimized netlist. The FinFET area reduces by 61.2%,
on an average. We do not consider the delay overhead due to level-converters in this work. Considering this overhead would have reduced the slack available and thus reduced the power savings
obtainable while meeting the delay constraints. Thus, the reported power savings for this scheme
are quite optimistic.
The leakage and dynamic power of ISCAS85 benchmarks (except c7552) reduce by a smaller
amount when ECVS is applied, as compared to the TCMS optimization methodology, in spite of the
fact that the delay overhead of level-converters is not considered. The efficacy of ECVS is the result
of a large replacement of HIGH-Vdd gates with LOW-Vdd gates, as shown in Fig. 3.14. On an average, 86% of the HIGH-Vdd gates are replaced by LOW-Vdd gates. Although the fin-count reduces
42
Constitution of circuits by mode
HIGH-Vdd gates
LOW-Vdd gates
5000
4500
Total number of gates
4000
3500
3000
2500
2000
1500
1000
500
0
c432
c499
c880
c1355
c1908
c3540
c5315
c6288
c7552
ISCAS'85 benchmarks
Figure 3.14: Constitution of circuits by mode in ECVS circuits

by a smaller margin in ECVS, the dynamic power reduction is comparable to the TCMS because of
the quadratic reduction in dynamic power provided by the LOW-Vdd gates. However, as pointed out
earlier, if the delay of the level-converters is included in the formulation, substantially fewer HIGHVdd gates will be replaced by LOW-Vdd gates. This will result in significantly decreased dynamic
power savings from the ECVS scheme. Note that leakage power savings of ECVS would also go
down significantly if much fewer HIGH-Vdd gates were replaced by LOW-Vdd gates.
Also, currently we perform experiments at 700 C. However, in modern microprocessors, the
operating temperature can be as high as 1100 C. Furthermore, FinFETs suffer from increased selfheating because in a FinFET the channel is surrounded by silicon dioxide, which has lower thermal
conductivity compared to bulk silicon [80]. Since, leakage power increases exponentially with a
temperature increase, the fraction of leakage power relative to total power will also increase drastically with a rise in temperature. In this case, the power savings obtained using TCMS scheme will
increase further because of its ability to reduce the leakage power effectively.
43
Table 3.3: Power Savings Using ECVS

Design
c432
c499
c880
c1355
c1908
c3540
c5315
c6288
c7552
Total
Savings
Dynamic
(W )
417.31
4002.47
865.59
3588.80
1380.27
1697.14
3439.03
5700.75
4765.77
25857.13
50.7%
ECVS scheme
Leakage
Total
(W )
(W )
68.12
485.43
294.11
4296.58
86.40
951.99
234.94
3823.74
147.06
1527.33
228.63
1925.77
328.07
3767.10
1055.93 6756.68
545.35
5311.12
2988.61 28845.74
88.7%
63.5%
Area
(No. of fins)
4987
16567
8173
14437
11758
19347
25840
85328
42776
229213
61.2%
3.7 Chapter summary

In this chapter, we proposed a synthesis scheme to reduce the power consumption of FinFET based
circuits. This scheme is based on TCMS, which is able to reduce both delay and subthreshold
current in a logic gate simultaneously. The efficacy of the scheme was demonstrated on a set of
ISCAS85 benchmarks. We also proposed a variant of the TCMS scheme known as the dual-Vdd
scheme, which addresses the problem of power supply layout imposed by the TCMS scheme. The
dual-Vdd scheme also promises high power savings under relaxed delay constraints. In addition, we
showed that switching from a dual-Vth scheme to a single-Vth scheme does not adversely impact
power savings much. Finally, we compared our optimization methodology with the conventional
ECVS scheme and showed that the power savings obtained using the TCMS scheme exceeds the
power savings obtained using ECVS.
44
Chapter 4
Low-power FinFET Circuit Synthesis

Using Surface Orientation Optimization
4.1 Introduction
The advantages of channel-oriented FinFETs have been studied before in the realm of SRAMs [81],
but to our knowledge this is the first work that demonstrates the efficacy of oriented logic gates in
low-power circuit synthesis. We propose novel logic gates that employ channel-oriented transistors.
We generate lookup tables to quantify the delay and power of the different kinds of logic gates. We
perform SPICE simulations using BSIM double-gate model [82, 83]. We constructed three types
of libraries: shorted-gate (SG), oriented shorted-gate (OSG) and oriented low-power (OLP) gate.
The SG library consists of logic gates in which both the gates of the FinFETs have been shorted.
The OSG library contains logic gates whose pull-up network consists of pFinFETs oriented along
the <110> plane, whereas the pull-down network uses nFinFETs in the <100> plane. Electron
mobility is highest in the <100> plane and the hole mobility along the <110> plane. Thus, OSG
gates are faster than SG gates. However, OSG gates incur an area penalty because of the oriented
pFinFETs used in the pull-up network. This is explained in greater detail in Section 4.3. OLP
logic gates have oriented pFinFETs and a reverse voltage bias applied to the back gates of all their
FinFETs in order to increase the effective threshold voltage of the front gate. This allows leakagedelay tradeoff.
45
We use a linear programming based optimization methodology to produce power-optimized

netlists at tight delay constraints [84]. The methodology optimizes circuits through an appropriate
choice of logic gates in the gate library. It takes into account the tradeoff offered by oriented gates.
Experimental results demonstrate that there is an average power saving of 74% at 30% slack. A
place-and-route tool is used to get the area occupied by the power-optimized netlists. These are
based on cell layouts drawn using Magic [85]. On an average, the area of the power-optimized
netlists reduces by 63.0% at 30% slack.
The remainder of the chapter is organized as follows. In Section 4.2, we discuss FinFET device
characteristics and thereafter, in Section 4.3, we discuss the design and utility of different kinds of
logic gates. In Section 4.4, we discuss the power optimization methodology. In Section 4.5, we
present experimental results and thereafter conclude in Section 4.6.
4.2 FinFET device simulation

In this section, the FinFET device is described and thereafter the effect of channel orientation on
FinFETs is discussed. The parameters, which have a significant effect on the on-current and offcurrent in a FinFET, are identified and matched with the ITRS-predicted on-current and off-current.
The performance benefits of a channel-oriented FinFET are explored. The performance and power
characteristics of independent-gate (IG) FinFETs are evaluated. The impact of back-gate reverse
bias in a FinFET is studied in the context of both subthreshold current and performance. In addition,
the optimal reverse bias for IG FinFETs is evaluated through a series of BSIM simulations.
4.2.1
FinFET device parameters
The FinFET device consists of a thin silicon body, whose thickness is denoted TSI , wrapped around
by gate electrodes. The effective gate width of a FinFET is 2nHF in , where n is the number of fins
and HF in is the fin height. The fin-pitch (p) is the minimum pitch between adjacent fins allowed
by lithography at a particular technology node. Table 4.1 shows symmetric-gate FinFET device
parameters used in our simulations for the 32nm FinFET technology. The parameters, which have
a drastic effect on the leakage power of FinFETs, are gate-oxide thickness (TOX ), TSI and the
effective channel length (Lch ). The lateral doping profile in the source/drain region defines Lch . We
46
Table 4.1: FinFET parameters

Parameter
Value
Lphys
32nm
Lch
18nm
TOX
1.0nm
TSI
10nm
NBODY
1015 cm3
p
16nm
HF in
40nm
experimented with a number of values for the above three parameters to obtain those matching the
ITRS 32nm logic technology node on- and off-current specifications. These values are shown in
Table 4.1. The supply voltage (Vdd ) was chosen to be 1V . The body was intrinsically doped with
NBODY = 1015 cm3 . Lphys is the physical length of the channel.
4.2.2 Channel orientation effects

FinFETs can be easily fabricated outside the conventional <100> plane. When non-<100> orientations are used, the electron and hole mobilities are modified due to the asymmetry of the carrier
effective masses in the silicon crystal lattice [9]. This property can be exploited to design faster
logic gates with differently-oriented transistors in the pull-up and pull-down networks.
To quantify the delay of the variously oriented transistors, we performed simulations using
BSIM. Fig. 4.1 shows the variation in pFinFET (nFinFET) drain-to-source current (Ids ) with drainto-source voltage (Vds ) for different channel orientations. When the orientation changes from
<100> to <110>, the saturation current for pFinFETs increases by around 18%, whereas for
nFinFETs, when the orientation changes from <110> to <100>, it increases by 12%. There is
a larger increase in the pFinFET current drive when the channel orientation changes because of the
smaller dependence of the hole mobility on velocity saturation. The change in carrier mobility due
to transistor orientation is diminished by the velocity saturation effect [9].
4.2.3
Optimal reverse bias
We next discuss how the best back-gate reverse bias can be derived. Fig. 4.2 shows the BSIMsimulated DC transfer characteristics for a 32nm nFinFET implemented in the <100> plane. The
47
150
<100>
<110>
<100>
<110>
100
12%
Ids (A)
50
nFinFET
0
pFinFET
50
100
18%
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Vds (V)
Figure 4.1: BSIM-simulated Ids vs. Vds characteristics for different orientations
drain voltage is set to 1V . The front gate-to-source voltage (Vgf s ) is varied from 0V to 1V . The
transfer characteristics are shown for various back-gate biases (Vgbs ). The top curve corresponds to
the OSG mode and the bottom four curves correspond to the OLP modes of operation, as indicated.
There is a noticeable difference in the Ion and Iof f currents for the different modes. Ion for the
OSG mode is about 73% greater than the Ion for the OLP mode (Vgbs = 0.2V ). However, the
subthreshold current decreases by an order of magnitude in the OLP mode as compared to the OSG
mode. It can be seen that in the OLP mode, the leakage current decreases exponentially with an
increase in reverse bias. The percent decrease in Ion with an increase in reverse bias is marginal.
Beyond a certain point, a further increase in reverse bias results in a very marginal decrease in
leakage current.
The above discussion indicates that it is important to quantify the variation of leakage current
with transistor delay. Fig. 4.3 shows the delay and leakage current for an OLP-mode inverter at
various back-gate reverse bias magnitudes, ranging from 0 to 0.4V . It can be seen that the leakage
current is strongly dependent on the back gate bias. On the other hand, the degradation in delay
48
10
73%
3
10
10
Ids (A)
10
OSGmode
Vgbs=0V
Vgbs=0.1V
Vgbs=0.2V
Vgbs=0.3V
10
10
10
10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Vgfs (V) Vds=1.0V
Figure 4.2: BSIM-simulated DC transfer characteristics for a 32nm FinFET

is gradual. However, there is a knee point beyond which the leakage current graph flattens out.
Thus, if the reverse bias is increased further, the reduction in leakage current is minimal whereas
the degradation in delay is noticeable. Therefore, for our experiments, we chose the reverse bias at
the knee point: 0.2V for nFinFETs and 1.18V for pFinFETs. The back gate bias was adjusted for
pFinFETs to balance the rising and falling delays.
4.3
Library design
In this section, we study the performance and power characteristics of FinFET logic gates in various
channel orientations. We show that optimally channel-oriented logic gates are considerably faster
than corresponding logic gates, which have all FinFETs in one plane. We discuss the design of the
different kinds of cell libraries.
49
40
12
10
Delay (ps)
Leakage Power (nW)
8
leakage
delay
20
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Vbgs (V)
Figure 4.3: Optimal back gate bias
4.3.1
Logic design using surface orientation optimization
We next discuss the various issues involved in the design of logic circuits in which the FinFETs have
various surface orientations. We simulate the percent decrease in delay (w.r.t. to <100> oriented
gates) for a minimum-sized inverter, NAND and NOR gates in the <110> orientation and in the
optimized orientation where the pull-up network is in the <110> plane and the pull-down network
is in the <100> plane. The delay is the average of the rising and falling delays. For the <110> case,
all the FinFETs in the logic gates are in the <110> plane. The reduction in delay when the inverter
is switched from the <100> plane to the optimized orientation is 8%, whereas the reduction in
delay for the NAND (NOR) gate is 10% (14%). The reduction in delay for a NOR gate is maximum
because the improvement in hole mobility has a maximal effect on stacked pFinFETs. There is
also a reduction in delay when we move to the <110> orientation, despite degradation in electron
mobility, because the increased hole mobility in this plane reduces the rising time delay. However,
as expected, the delay reduction in the <110> orientation is smaller as compared to the optimal
50
configuration because of the degraded electron mobility in the <110> plane. The delay reduction
in the <110> plane for the INV is 7%, while the delay reduction for the NAND (NOR) gate is 7%
(10%).
4.3.2
Library characterization and area effects
In this section, we describe how the three cell libraries (SG, OSG and OLP) are obtained. All the
three libraries contain INV, NAND and NOR gates. Each type of logic gate is implemented in five
different sizes: X1, X2, X4, X8 and X16, where the number indicates its size relative to a minimumsized gate. BSIM is used to characterize the libraries by simulating delay, leakage and short-circuit
power consumption of each constituent cell. The interconnect delay and load are modeled using the
fanout and size-dependent wire load models presented in [77].
We also laid out all the cells in the SG, OSG and OLP libraries to get an accurate estimation
of the area occupied by these cells. The standard cell layouts and a gate-level netlist is given as
input to the place-and-route tool, which provides the layout of the circuit. The area occupied by
the SG-mode NAND gate is 75042 , whereas the area occupied by the OSG-mode NAND gate is
90402 . The OSG-mode gate occupies a larger area because of the presence of oriented pFinFETs
in the OSG mode. The pFinFETs are in the <110> plane whereas the wafer is in the <100> plane.
Hence, the layout of pFinFETs is at an angle of 45 with respect to the nFinFETs. The tilted layout
results in an increase in area. Spacer lithography allows the minimum fin-pitch to be half of the
lithography pitch. Thus, to be conservative, a distance of is assumed between adjacent fins in the
OSG and SG-mode layouts. In the OLP-mode NAND gate, the fin-pitch needs to be increased to
10. This is because the back gate of its FinFETs needs to be reverse biased and, hence, we need to
place a poly-to-metal contact between adjacent fins. Due to the large distance between the adjacent
fins in an OLP-mode NAND gate, its area is considerably larger than the SG-mode NAND gate.
The area of the OLP-mode NAND gate is 139052 . The heights of all cells are kept the same to
enable standard cell design.
51
4.4 Power optimization methodology

In this section, we discuss our power optimization methodology. The methodology starts with a
delay-optimized netlist consisting of SG-mode gates, obtained using Synopsys Design Compiler.
Thereafter, we use an extension of linear programming based gate sizing algorithm [78] to select
cells from different libraries to generate power-optimized netlists. Finally, we use a place-androute tool to obtain an accurate area estimate of the power-optimized netlists. Next, we discuss the
optimization flow and the linear programming framework.
4.4.1
Optimization flow
The power optimization flow is as shown in Fig. 4.4. It starts by mapping a Verilog netlist to
SG-mode gates. Then its delay-minimized configuration is obtained. Thereafter, to evaluate the
utility of channel-oriented transistors, an iterative linear programming based algorithm is used to
map gates to cells of appropriate sizes and modes. The linear programming formulation can be
used to reduce both circuit delay and power consumption iteratively. The iteration terminates when
all the delay constraints are met and the reduction in power between successive iterations is less
than some prefixed fraction. Finally, the area of the power-optimized netlist is obtained through a
place-and-route tool.
4.4.2
Linear programming framework
The linear programming algorithm can size multiple logic gates at once and, hence, it avoids greedily sizing individual gates. Furthermore, while sizing a logic gate, the algorithm takes into account
how sizing this gate effects other gates on the same circuit path. The algorithm iteratively improves
the design. It selects one of various candidate cells available for each logic gate based on its powerdelay sensitivity ratio. Let 4P represent the change in power and 4D represent the change in delay
if an alternative cell is chosen. The ratio
4P
4D
is the power-delay sensitivity ratio. If the objective is
to reduce power, then the cell with the maximum power reduction for a given delay increase is chosen (min
4P
4D ,
4P < 0, 4D > 0). On the other hand, to reduce delay, the cell with the maximum
delay decrease for a given power increase is chosen (max
4P
4D ,
4P > 0, 4D < 0). The algorithm
determines the best alternative cells for each logic gate in the circuit and then formulates a linear
52
Verilog
netlist
SG library
OSG/OLP/
SG library
Delay-minimized
netlist by Design
Compiler
Linear programming
formulation
Delay
constraints
met ?
No
Yes
Yes
P > ?
No
Power-optimized
netlist
OSG/OLP/
SG layout
library
Place-and-route
tool
Figure 4.4: Optimization flow
53
programming program. The solution obtained from this program indicates which logic gates are to
be replaced with their alternatives. The rest of the algorithmic formulation is the same as the one
presented in Section 3.5.3.
4.5 Experimental results

We present power optimization results in this section. The power-optimized netlists are synthesized at 130% arrival time constraint (ATC), i.e., 30% slack with respect to the delay-minimized
netlists. The cell libraries were characterized using HSPICE based on BSIM multiple-gate models. The switching activity at all the input nodes was set to 0.1. The operating temperature was
fixed at 700 C [51]. FinFETs suffer from self-heating effects, which explain the high operating temperature. The switching activity was propagated through the entire circuit using Synopsys Design
Compiler. The circuit was initially mapped to SG-mode gates and a delay-minimized configuration was obtained. Thereafter, the optimization methodology was applied to the netlists to obtain
power-optimized netlists at 130% ATC.
Table 4.2 shows the power saving results obtained by applying the optimization scheme to ISCAS85 benchmarks. Major column I gives the total power as well as its leakage and dynamic
components at minimum delay. Major column II gives the power savings when only the SG-mode
library is used. An average of 69.6% in power savings is obtained. Dynamic power reduces by
61.6% and leakage power by 79.3%. Major column III gives the power savings when the OSGmode and OLP-mode libraries are used with the SG-mode library. It can be seen that the reduction
in dynamic power consumption is roughly the same as that in the SG-mode case. On an average, the
OLP-mode gate is 38% slower than an SG-mode gate. Thus, a sized-up OLP-mode gate is required
to replace an SG-mode gate with the same delay. On the other hand, on an average, an OSG-mode
gate is 10% faster than an SG-mode gate. Thus, a sized-down gate can replace an SG-mode gate.
an average, 10% faster than SG mode gates. Thus, the increase in area caused by OLP-mode gates
is offset by a decrease in area offered by OSG-mode gates, resulting in similar dynamic power
savings. On the other hand, the leakage power saving in this case is 90.0%. The higher leakage
power savings can be attributed to the OLP-mode gates. The total power savings is 13.6% w.r.t. the
power-optimized SG-mode case.
54
Table 4.2: Power savings using oriented FinFETs

Design
c17
c432
c499
c880
c1908
c3540
c5315
c7552
Total
Savings
Delay-minimized
Dynamic
Leakage
Total
837.28
172.70
1009.98
2190.22.
1817.45
4007.67
34067.60
12925.80
46993.40
6401.05
6641.98
13043.03
11362.70
9130.46
20493.16
14058.50
19844.40
33902.90
30243.20
26174.20
56417.40
41568.80
39012.30
80581.10
140729.35 115719.29 256448.64
0
0
0
Power consumption (W )
SG-mode
Dynamic Leakage
Total
304.65
28.49
333.14
886.45
383.12
1269.58
11360.6
2381.63 13742.23
2770.56
1488.68
4259.24
3948.41
1779.68
5728.09
5846.64
4653.07 10499.71
12296.90
5658.04 17954.94
16522.20
7484.49 24006.69
53936.41 23857.21 77793.62
61.6%
79.3%
69.6%
SG/OSG/OLP-mode
Dynamic Leakage
Total
304.66
15.00
315.00
917.50
249.89
1167.39
11723.60
1561.45 13285.05
2790.56
733.37
3523.93
4076.27
902.03
4978.28
5940.24
2330.60
8270.84
12372.10
2774.17 15146.27
16530.08
4024.39 20555.19
54651.05 12590.90 67241.95
61.2%
90.0%
73.8%
Table 4.3: Accurate area estimates

Design
c17
c432
c499
c880
c1908
c3540
c5315
c7552
Total
Savings
Delay-minimized
X-span Y-span
Area
()
()
(2 )
1225
1162
1423450
4852
5146
24968392
10933 11122
121596826
8063
8466
68261358
9335
9794
91426990
13705 14110
193377550
15786 16102
254186172
19154 19422
372008988
83053 85324 1127249726
0
0
0
X-span
()
1283
2700
5476
4594
4917
7796
8749
10206
45721
44.9%
Total area
SG-mode
Y-span
Area
()
(2 )
1282
1644806
2822
7619400
5810
31815560
4814
22115516
5146
25302882
8134
63412664
9130
79878370
10126 103345956
47264 335135154
44.6%
70.3%
SG/OSG/OLP-mode
X-span Y-span
Area
()
()
(2 )
480
830
398400
2856
3154
9007824
5955
6142
36575610
5048
5478
27652944
5786
6142
35537612
9130
9462
86388060
9739
9794
95383766
11150 11454 127712100
50144 52456
41865316
39.6% 38.5%
62.9%
Table 4.3 gives accurate area estimates of the delay-minimized as well as power-optimized
netlists. A place-and-route tool [86] is used to find the area of the netlists. This tool takes the circuit
netlist and the cell layouts as inputs and provides an accurate area estimate of the netlist. Major
column I gives the length, width and the area of the delay-minimized netlists. X-span (Y-span)
denotes the length (width) of the layout in . Major column II gives the area estimates of the poweroptimized SG-mode netlists at 130% ATC. On an average, the length (width) of the circuit layout
reduces by 44.9% (44.6%). The area reduces by 70.3%. Major column III gives the area estimates
of the power-optimized netlists comprising SG/OSG/OLP-modes gates at 130% ATC. The total area
of the power-optimized netlists reduces by only 62.9% because of the oriented transistors.
4.6 Chapter summary

In this chapter, we proposed a power optimization methodology based on the channel orientation
of the FinFETs. We designed novel cell libraries for cells utilizing such FinFETs. Thereafter, we
55
presented a FinFET cell library based circuit synthesis scheme. Such a scheme was not possible
in bulk CMOS due to the difficulty of fabricating transistors along the <110> plane. The efficacy
of the scheme was demonstrated with the help of ISCAS85 benchmarks. It was shown that significant power savings can be obtained at a relaxed delay constraint by using the suggested power
optimization methodology. We also developed standard cell FinFET libraries, which were used by
a place-and-route tool to give accurate area estimates of the logic circuits.
56
Chapter 5
Die-level Leakage Power Analysis of

FinFET Circuits Considering Process
Variations
5.1 Introduction
FinFETs are generally patterned using direct lithography or spacer lithography [87]. Owing to the
small dimensions involved and factors such as line edge roughness, both techniques can result in
chip-wide variations in fin thickness and gate length, which can degrade the power-performance
metrics of FinFET circuits. Of particular interest is the chip-scale leakage power distribution in
FinFET circuits synthesized using standard cell libraries. Here, an open question that begs attention
is what process variation-aware circuit synthesis strategies need to be adopted in moving from
planar CMOS standard cell design to FinFET standard cell design (in view of the diversity of standard cell libraries feasible using FinFETs). To our knowledge, the current work is the first attempt
at addressing the above design problem along the dimension of leakage tradeoffs at iso-delay, under
process variations, considering the effect of spatial correlations. It should be noted that modeling
variations in delay is not an objective of the current work and merits a separate investigation in
leakage vs. delay tradeoffs.
The major contributions of this chapter can be summarized as follows [88]:
57
We develop leakage current macromodels for SG- and IG-mode FinFET devices, which are
extracted from mixed-mode device simulations in Sentaurus TCAD.
We extend the above to stacked devices in SG-, LP-, mixed-terminal (MT)-mode [89] NAND/NOR
gates to obtain input vector dependent macromodels that can be used in FinFET circuit synthesis. Furthermore, we verify the distributions predicted by the macromodel with quasi-Monte
Carlo (QMC) mixed-mode device simulations of NAND/NOR/INV gates.
We implement a Latin hypercube sampling based methodology to capture leakage current
variations under spatial correlations in ISCAS 85 benchmarks synthesized using FinFET
standard cell libraries.
We examine the leakage yield tradeoffs offered by substituting LP- and MT-mode gates in a
100% SG-mode circuit at iso-delay.
We also show that by replacing an optimal percentage of SG-mode gates with LP- and MTmode gates in a pure SG-mode circuit, with a reasonable delay slack, the mean and spread in
leakage can be reduced dramatically.
The rest of the chapter is organized as follows. In Section 5.2, we review the background work.
In Section 5.3, we describe the setup used to simulate n/pFinFETs in FinE [7]. In Section 5.4, we
formulate leakage current macromodels and validate their distributions for various FinFET standard
cells. In Section 5.5, we describe the simulation flow and methodology used to obtain the leakage distribution of FinFET circuits. In Section 5.6, we present the experimental results and future
directions for synthesis strategies using FinFET standard cells. We conclude in Section 5.7.
5.2
Background work
In the past few years, FinFET research has gained a lot of traction amongst device and process
engineers as well as circuit designers. Logic styles leveraging the SG and IG modes of FinFET
operation have been explored in [65, 90, 91]. Power optimization in FinFET circuits has been
explored in [6, 60, 22, 56] using techniques like genetic algorithms/linear programming for gate
sizing, and multiple supply and threshold voltages.
58
Though FinFET circuit design and synthesis has attracted significant attention, few researchers
have explored the impact of process variations in FinFET devices and its effect at the circuit level.
In [53], engineering the workfunction of gate materials is shown to be effective in controlling Vth
under variations. Further, the sensitivity of the electrical parameters of the device to several important physical fluctuations, such as gate length, fin thickness and gate dielectric thickness is analyzed. Quantum effects are also shown to have a significant impact on FinFET device performance.
In [92, 50], a statistical estimation of leakage in SG-mode FinFET devices is performed under variations. The effect of process variations on device temperature in FinFET circuits is studied in [51],
where a Monte Carlo (MC) simulation based methodology using thermal models is used to solve
the temperature and leakage power self-consistently. Leakage current variability due to process
variation has been extensively studied in conventional CMOS. Models evaluating full-chip leakage
distributions under spatial correlation are presented in [93, 47, 94].
In this work, we perform die-level leakage analysis under process variations for FinFET circuits,
with the goal of leveraging the tradeoffs specific to FinFET standard cells during circuit synthesis.
In the next section, we deal with the simulation setup used to obtain various characteristics of
individual FinFET devices and logic gates.
5.3 Simulation setup

In this section, we briefly describe our simulation setup. Owing to the absence of a suitable platform
for double-gate circuit design exploration, we used FinE [7], an environment that integrates doublegate compact models like Spice3-UFDG [95], BSIM-CMG/IMG [82] and a device simulator like
Sentaurus TCAD [96] into a single framework, thereby enabling designers to perform high-level
simulation experiments with ease (Fig. 5.1). FinE partially automates mixed-mode device simulations in TCAD and circuit-level simulations with compact models, and initiates process variations,
parameter extraction as well as postprocessing functions.
Table 5.1 shows the parameters for a typical n/pFinFET device, where LGF , LGB , TOXF , TOXB ,
TSI , HF in , HGF , HGB , LSP F , LSP B , LU N , NBODY , G , NSD , VDD are the front and back
physical gate lengths, front and back gate oxide thicknesses, fin thickness, fin height, front and back
gate thicknesses, front and back gate spacer thicknesses, gate-drain/source underlap, body doping,
59
Compact Model
Spice3UFDG
QuasiMC process
variation module
LTSpice netlist extraction
Sentaurus TCAD
mixed mode device
simulation
MATLAB GUI
Parameter extraction
module
MATLAB postprocessing
Figure 5.1: FinE simulation framework for double-gate circuit design space exploration [7]
gate workfunction, source/drain doping and the operating voltage, respectively.
Table 5.1: FinFET device parameters
PARAMETERS
LGF , LGB (nm)
25
TOXF , TOXB (nm)
1
TSI (nm)
10
HF in (nm)
50
HGF , HGB (nm)
20
LSP F , LSP B (nm)
20
LU N (nm)
10
NBODY (cm3 )
1015
G (eV )
nFinFET : 4.4, pFinFET : 4.8
NSD (cm3 )
1020
VDD (V )
1
Fig. 5.2 shows the two-dimensional (X-Y) FinFET cross-section of the 3D device structure that
was simulated in TCAD. The heavily doped extended source and extended drain regions (HCON
LCON ) aid in forming contacts to the device. They lead into the source/drain regions in the fin
where the dopant concentration gradually decreases, progressing towards the relatively undoped
body region, causing an overlap (LOV ) or underlap (LU N ). The underlap (LU N = LU NSOU RCE =
LU NDRAIN ) is defined as the distance from the physical gate edge to the point where source/drain
doping starts decreasing from its peak value. The Vth of FinFETs is typically tuned by directly
adjusting the workfunction of the gate material. The workfunction for nFinFET (G = 4.4eV )
and pFinFET (G = 4.8eV ) are chosen corresponding to high-performance logic requirements.
In order to model the effect of process variations, we have incorporated a QMC tool [97] based
60
GF
CON
HGF
LSPF
TOXF
TSI
HCON
TOXB
HGB
LSPB
GB
LUN (LOV )
Figure 5.2: Two dimensional (X-Y) cross-section of an nFinFET simulated in Sentaurus TCAD
on Sobols sequence in FinE (with 2000 samples) to avoid the sample clustering problem encountered in MC simulation. QMC methods based on low discrepancy sequences have been known to
produce samples that cover the sample space homogeneously, leading to quicker convergence with
fewer samples. Using the above setup, in the following section, we extract simple leakage current macromodels for SG/IG-mode FinFET devices and individual SG/LP/MT-mode logic gates.
We also verify their distributions with QMC sampling described above, in order to obtain reliable
models that can be utilized in circuit synthesis under process variations.
5.4 Modeling leakage in FinFET logic gates

Leakage current in double-gate transistors and transistor stacks under the effect of process variations
has been examined in detail in [92]. However, the models developed are not suitable for extracting
leakage distributions of FinFET circuits with many gates and are restricted to the SG-mode of
operation. In this section, we develop macromodels that draw inspiration from the physical models
in [92, 98].
61
5.4.1
Leakage in a single SG/IG FinFET device
We model leakage (ILEAK ) as sub-threshold leakage, ignoring negligible contributions from gate
leakage (due to the undoped body) and gate-induced drain leakage (due to the choice of LU N ). We
identified the two main physical parameters that affect ILEAK using QMC simulation. Fig. 5.3
shows the ILEAK distribution for an nFinFET with inputs LU N , TOX , LG and TSI individually
varying normally, such that 3/ 10%. The spread in leakage is more pronounced in the LG and
TSI cases in comparison to the TOX and LU N cases. LG and TSI primarily face lithographic variations. TOX , LU N and G are dependent on thermal effects of processing, which are controllable
[92]. Hence, we focus on LG and TSI as the primary physical parameters determining leakage.
0.25
L
Probability of Occurence
UN
TOX
0.2
TSI
0.15
0.1
0.05
0
9.4
9.3
9.2
9.1
9.0
Log (I
LEAK
8.9
8.8
8.7
8.6
/ 1A)
Figure 5.3: ILEAK spreads for LU N , TOX , LG and TSI , each varying independently
In [98], the Poisson and carrier continuity equations are solved without the charge sheet approximation to correctly predict volume inversion in a double-gate MOSFET and ILEAK translates
to
ILEAK

i
h
i
h
qV
q(VGS ms )
HF in
k DS
kB T
T
B
=
(1 e
)
kB T ni TSI e
LG
(5.1)
where , kB T, ni are the mobility, thermal energy, and intrinsic concentration, respectively, and ms
62
is the difference in Fermi levels between the metal gate and semiconductor. Here, ILEAK TSI and
is relatively independent of TOX (to the first order, ignoring gate leakage). However, for FinFETs
under the short-channel regime with low TSI and LG , this is inaccurate as it fails to account for the
short-channel effect (SCE) and quantum confinement effect. ILEAK should then be obtained from
the general expression for sub-threshold leakage [92]:
i
h
qV
k DS
T
ILEAK =
HF in kB T (1 e
"
R LG
dy
R TSI /2
(5.2)
nc (x,y)dx
TSI /2
where nc (x, y) is the effective channel concentration. In [92], using a Taylor series expansion of
log(nc (x, y)), an analytical model is developed for leakage in individual transistors and transistor
stacks. The model correctly predicts an exponential loss in gate control over the channel with
increasing TSI /decreasing LG , and hence an exponential increase in ILEAK . However, using the
above approach to extract leakage distributions for large FinFET circuits is infeasible. Inspired by
the above observations, we formulate a macromodel for leakage in SG-mode FinFETs as
h
ILEAK = ISG0 e
b1
LG
h
i
a
a1 TSI + T 2
SI
(5.3)
where a1 , a2 and b1 are coefficients that are extracted from TCAD simulations of the device.
Figs. 5.4 and 5.5 show the variation in ILEAK over a wide range of values for LG and TSI simulated
in TCAD. The macromodel parameters are obtained by fitting the data points with the lowest-degree
polynomial yielding the least residual.
For IG-mode FinFETs, the back-gate bias (Vb ) can alter Vth , and it is an effective knob to control
leakage through the factor ( =
Vth
Vb ),
which can be approximated as follows [99]:
3TOXF
3TOXB + TSI
(5.4)
From TCAD simulation data shown in Fig. 5.6, the dependence of ILEAK on Vb is better approximated by a quadratic fit than a linear fit around the nominal back bias. Hence, we incorporate the
63
0.1
TCAD
linear fit
LEAK LEAK
) (m)
0.05
y = 7.6*x + 0.079
0.1
0.15
L log(I
/I
0.05
0.2
0.25
0.01
0.015
0.02
0.025
0.03
0.035
0.04
LG (m)
Figure 5.4: Matching SG-mode FinFET TCAD simulations with the macromodel for different LG
effect of Vb using
h
ILEAK = IIG0 e
b1
LG
h
i
a
a1 TSI + T 2
SI
2
e[k1 (Vb Vb0 ) +k2 (Vb Vb0 )]
(5.5)
where IIG0 is obtained at Vb = Vb0 .

In the presence of process variations, we assume that LG and TSI are independent of each
other and are normally distributed about their nominal mean values. Hence, for E[LG ] = LG ,
E[(LG LG )2 ] = L2 G , E[TSI ] = TSI , and E[(TSI TSI )2 ] = T2SI , we obtain the mean
ILEAK from
ILEAK = ILEAK (LG , TSI ) +
1 2 ILEAK
1 2 ILEAK
2
|
|(L ,T ) T2SI
+
2
G
SI
2 L2G (LG ,TSI ) LG 2 TSI
(5.6)
and variance ILEAK from

I2LEAK =
ILEAK
LG
2

|(L
,TSI
2
) LG +
64
ILEAK
TSI
2
|(L
,TSI )
T2SI
(5.7)
0.35
TCAD
quadratic fit
0.25
TSI log(ILEAK/ILEAK ) (m)
0.3
0.2
0.15
y = 3.5e+002*x 0.029*x 0.011
0.1
0.05
0
0.05
0.005
0.01
0.015
0.02
0.025
0.03
TSI (m)
Figure 5.5: Matching SG-mode FinFET TCAD simulations with the macromodel for different TSI
Fig. 5.7 shows that the spread in ILEAK predicted by the macromodel for an SG-mode nFinFET
with coefficients derived from Eqs. (5.4) and (5.5) is in good agreement with that obtained from
TCAD QMC simulations with inputs LG and TSI distributed normally with 3/ 10%.
5.4.2
Leakage in FinFET standard cells
Owing to the double-gate structure, NAND/NOR/INV functionality can be implemented in many

flavors in FinFET technology. Figs. 5.8 and 5.9 show the schematic and layout of standard cell
NAND gates in SG, LP and MT modes. While the SG-mode NAND uses only SG-mode FETs,
and the LP-mode NAND employs only IG-mode FETs, the MT-mode NAND uses both SG and
IG-mode FETs. In the MT mode [89], a single IG-mode FET is used in the series stack followed
by SG-mode FETs, while the parallel devices are always IG-mode. SG-mode gates are fast owing
to a high current drive and occupy smaller area at the cost of high leakage. LP-mode gates leak less
(due to the back-gate bias, which increases Vth ), occupy the largest amount of area and suffer in
propagation delay. MT-mode gates lie at the center of the leakage-delay spectrum.
In order to determine leakage, we develop separate macromodels for each input vector. This
65
log(ILEAK/ILEAK )
0
2
y = 14*x 0.26
y = 6.5*x2 + 17*x 0.017
4
TCAD
linear fit
quadratic fit
6
8
0.5
0.4
0.3
0.2
0.1
Vb (V)
0.3
linear
quadratic
Residuals
0.2
0.1
0
0.1
0.2
0.5
0.4
0.3
0.2
0.1
V (V)
b
Figure 5.6: Matching IG-mode TCAD simulations with the macromodel for different Vb
greatly reduces the computational burden for determining leakage in a NAND/NOR stack, unlike the
solution of transcendental equations presented in [92], which compute the mid-point voltage of the
stack. Figs. 5.10 and 5.11 show that the SG-mode NAND ILEAK for input vectors (00, 01, 10, 11)
follows an identical form (with different coefficients) to that of a single SG-mode nFinFET device
shown in Figs. 5.4 and 5.5, respectively. Fig. 5.12 shows that the mean and variance predicted by the
macromodel using the coefficients extracted from Figs. 5.10 and 5.11 closely match those obtained
from TCAD QMC simulations. A similar trend is observed for SG-mode NOR/INV gates and all
the LP- and MT-mode gates as well [Vb = 1.2V (0.2V) for pFinFETs (nFinFETs)]. Fig. 5.13
shows the distributions in leakage for input vector 10 in SG-, LP- and MT-mode NAND gates.
The LP-mode gates have the best leakage probability density function (PDF). While MT- and SGmode gates have similar means, the spread is smaller in the MT-mode case. In the next section, we
66
Normalized Occurence
0.9
Model
QMC data
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
11
10.5
10
9.5
Log (I
8.5
/ 1 A)
SG,LEAK
Figure 5.7: Matching QMC TCAD data with the macromodel

describe the complete simulation flow used in synthesizing FinFET circuits using the above leakage
macromodels for FinFET standard cells under process variations.
5.5 Methodology
In this section, we describe the methodology adopted to examine leakage in large FinFET circuits
using macromodels developed in the earlier section. Fig. 5.14 describes the simulation flow that
was adopted to extract the die-level leakage distributions of circuits synthesized using SG/LP/MTmode gates at various delay slacks, considering process variations with spatial correlations. First,
we synthesized a pure SG-mode circuit using Synopsys Design Compiler [96]. Then we fed the
output to a linear programming tool presented in [6], which produces a power-optimized netlist
consisting of SG-, LP- and MT-mode standard cells for a particular delay constraint. The SG-, LPand MT-mode libraries consisted of INV, NAND and NOR gates in five sizes: X1, X2, X4, X8 and
X16. We obtained the standard cell layouts using Magic [85] and used Timberwolf [86] to place
and route the cells.
67
Vdd
Vdd
Vhl
Vhl
Vdd
Vhl
Vlow
Vhl
Vlow
Vlow
(a) SG
(b) LP
(c) MT
Figure 5.8: Schematics of SG-, LP-, and MT-mode NAND gates
Figure 5.9: Layouts of SG-, LP-, and MT-mode NAND gates

Next, we extracted the layout area and placement data, which were used in multi-level grid
assignment. Depending on the spatial location of the gates, they were assigned to a grid shown
in Fig. 5.15, consisting of multiple levels of granularity, to capture spatial correlation. Thereafter,
we computed the leakage spatial correlation matrix, which, in essence, captures the correlations in
leakage between gates, given input LG , TSI , LG , and TSI for the various levels. Unlike [94],
we assume that both primary parameters LG and TSI are spatially correlated and independent of
each other.
The spatial correlation can be modeled using a variety of methods [42, 100, 101]. These methods give the correlation between process parameters as a function of the distance between the two
locations. We use the rectangular grid-based method proposed in [42]. In this method, the chip is
hierarchically partitioned into different numbers of levels (Fig. 5.15). At each level k, the chip is
divided into 2k by 2k regions. Fig. 5.15 shows the chip divided into three levels. Each region at
68
00
01
10
11
0.1
0.2
/I
LEAK LEAK
0.05
) (m)
L log(I
0.15
0.25
0.01
0.015
0.02
0.025
0.03
0.035
0.04
L (m)
G
Figure 5.10: SG-mode NAND leakage from TCAD for different LG
0.35
00
01
10
11
0.25
0.2
0.15
0.1
SI
log(I
/I
LEAK LEAK
) (m)
0.3
0.05
0
0.005
0.01
0.015
0.02
SI
0.025
0.03
(m)
Figure 5.11: SG-mode NAND leakage from TCAD for different TSI
69
Normalized Occurence
1
Model
QMC data
0.8
0.6
0.4
0.2
0
11.5
11
10.5
10
9.5
8.5
Log(I /1 A)
00
Figure 5.12: SG-mode NAND I00 distribution predicted by the model and TCAD QMC simulations
200
180
SGmode
LPmode
160
MTmode
Occurence
140
120
100
80
60
40
20
0
11
10.5
10
9.5
Log (I
10
8.5
/ 1A)
Figure 5.13: I10 distributions for SG-, LP-, and MT-mode NAND gates
70
FinE
Sentaurus
TCAD
Verilog netlist
SG-mode
library
Synopsys
Design Compiler
LP/MTmode
library
Linear
programming
tool
No
Residuals
negligible ?
Yes
Netlist with SG/LP/

MT standard cells
Area
extraction
FinE
QMC
simulation
Model
coefficients
Leakage
spatial
correlation
matrix
Timberwolf
place &
route
Positive
semidefinite ?
Spatial grid
assignment
for gates
No
Yes
Latin hypercube
sampling with
correlation
Overall leakage
distribution
Figure 5.14: Simulation flow

each level is associated with a set of normal random LG and TSI . The variation in the process
parameter of the logic gates in a particular region at the bottommost level is represented by the sum
of the variations of its parent regions at higher levels. A parent region completely covers the region
under it. E.g., the LG of a logic gate in region (2,2) of Fig. 5.15 can be written as
LG (2, 2) = LG0 + LG (2, 2) + LG (1, 1) + LG (0, 1)
(5.8)
Here, LG0 represents the nominal gate length and LG (2, 2), LG (1, 1), and LG (0, 1) are the
zero-mean normal random variables in their corresponding regions. Consider the logic gates in
regions (2,2), (2,4) and (2,16). Gates in regions (2,2) and (2,4) have two common parent regions,
and thus LG (2, 2) and LG (2, 4) are tightly correlated, i.e., if the gate in (2,2) has less than nominal
LG , then it is highly probable that the gate in (2,4) will also have less than nominal LG . On the
other hand, the gate in region (2,16) only shares (0,1) as a parent region with the other two gates,
71
Figure 5.15: Grid assignment for spatial correlation

and thus their LG s are weakly correlated. More specifically, the correlation between the gates in
(2,2) and (2,4), Corr(LG (2, 2), LG (2, 4)), turns out to be
Corr(LG (2, 2), LG (2, 4)) = V ar(LG (1, 1)) + V ar(LG (0, 1))
(5.9)
where V ar(LG (1, 1)) and V ar(LG (0, 1)) are the variances of LG (1, 1) and LG (0, 1), respectively. On the other hand, Corr(LG (2, 2), LG (2, 16)) = V ar(LG (0, 1)). The number of
levels the grid is partitioned into typically depends on the processes affecting LG and TSI .
Using the input correlation matrices generated for LG and TSI , we generated the spatial correlation matrix of leakage currents between the logic gates. Using the Taylor expansion of the model
in Eq. (5.3), ILEAK can be approximated as
ILEAK = ecTSI +dLG
(5.10)
Here, , c and d are constants that depend on ISG0 , a1 , a2 and b1 . Clearly, ILEAK is a lognormal
random variable which can be expressed as eY , where Y is a normal random variable with mean
c TSI + d LG and variance c2 T2SI + d2 L2 G .
As mentioned in Section 5.4, the leakage current for a gate is also input vector dependent. For
each input vector, the basic template for ILEAK remains the same, but the constants a1 , a2 and b1
change. Therefore, the average leakage current through a gate is given by:
avg
ILEAK
=
X
i
72
i
pi ILEAK
(5.11)
i
Here, pi represents the probability of occurrence of the ith input vector state, and ILEAK
is the
leakage current in that state. The leakage probabilities corresponding to different input vector states
are obtained using Synopsys Design Compiler [96].
Denoting the gates in regions (2,2) and (2,4) in Fig. 5.15 as gate A and B, respectively, we find
avg
avg
Corr(ILEAK
, ILEAK
)=
A
B
XX
i
j
i
pi pj Corr(ILEAK
, ILEAK
)
A
B
(5.12)
j
i
where ILEAK
= Ai eYi , ILEAK
= Bj eZj , Yi = cA,i TSIA + dA,i LGA , and Zj = cB,j
A
B
TSIB + dB,j LGB . Note that

j
i
)=
Corr(ILEAK
, ILEAK
A
B
j
i
Cov(ILEAK
, ILEAK
)
A
B
I i
LEAKA
(5.13)
I j
LEAKB
j
i
The covariance between ILEAK
and ILEAK
can be expressed as
A
B
j
i
, ILEAK
) = Cov(eYi , eZj )
Cov(ILEAK
A
B
= E(e(Yi +Zj ) ) E(eYi )E(eZj )

"
=e
2 + 2
Y
Z
Yi +Zj + i 2 j
(5.14)
(e
Cov(Yi ,Zj )
2
1)
Here, Cov(Yi , Zj ) can be easily computed from
Cov(Yi , Zj ) =cA,i cB,j Cov(TSIA , TSIB ) + dA,i dB,j Cov(LGA , LGB )
(5.15)
where Cov(TSIA , TSIB ) and Cov(LGA , LGB ) are obtained as described earlier. Using our assumptions, the grid-based method does not guarantee that the spatial correlation matrix will be positive
semi-definite and, hence, we use the algorithm from [102] to generate the closest correlation matrix
from a given symmetric matrix. Thereafter, we use Latin hypercube sampling on ILEAK for each
gate, by imposing the above correlation in leakage between the gates, to obtain the overall leakage
distribution for the circuit. In the next section, we present the results of our work using the above
methodology on various benchmark circuits.
73
Table 5.2: Comparison of SG-, SG + LP- and SG + MT-mode synthesis techniques for ISCAS 85
benchmarks at iso-delay
Total leakage current IT OT
Benchmark circuit
SG-mode
SG + LP-mode
SG + MT-mode
Mean (A) Std. (A) Mean (A) Std. (A) Mean (A) Std. (A)
c17
1.59
0.23
1.13
0.13
1.21
0.15
c432
25.60
0.79
12.79
0.38
16.98
0.61
c499
72.16
1.24
46.34
0.46
57.97
0.61
c880
72.11
1.68
30.96
0.49
45.83
0.64
c1908
64.86
1.49
35.57
0.60
52.02
0.83
c3540
160.15
2.50
85.53
0.89
128.12
1.42
c5315
192.30
2.60
89.73
1.89
148.66
1.02
c7552
259.46
2.08
143.54
2.04
204.61
1.22
Savings
0
0
47.5%
45.4%
22.7%
48.5%
5.6 Results and discussion

avg
In this section, we discuss the overall leakage (IT OT = allgates ILEAK
) distributions of bench-
mark circuits under process variations, synthesized with SG/LP/MT-mode logic gates using the
methodology described in Section 5.5. The parameters for LG and TSI were set to LG = 25nm,
TSI = 10nm and 3/ = 10%.
Table 5.3 shows the results of synthesizing ISCAS85 benchmarks using SG-, SG + LP- and
SG + MT-mode libraries at iso-delay. Major column I (II) gives the IT OT and IT OT for circuits
synthesized using only SG-mode (SG + LP-mode) gates. It can be seen that the IT OT (IT OT ) of
circuits synthesized using SG + LP-mode gates is, on an average, 47.5% (45.4%) lower than that
of the circuits synthesized using only SG-mode gates. The average number of LP-mode gates in
circuits synthesized using SG + LP-mode gates is around 60%. LP-mode gates are slower than SGmode gates and thus larger LP-mode gates are required to meet the same timing constraint. Though
circuits synthesized using a combination of SG + LP-mode gates have larger LP-mode gates, SG
+ LP-mode netlists have a superior leakage PDF because of the considerably reduced IT OT and
IT OT of the LP-mode gates as compared to that of the SG-mode gates. Major column III gives
IT OT and IT OT for circuits synthesized using SG + MT-mode gates. The IT OT (IT OT ) for
these circuits is, on an average, 22.7% (48.5%) lower than that of circuits synthesized using only
SG-mode gates. However, the IT OT is larger than that of the circuits synthesized using SG + LPmode gates. IT OT of SG + MT-mode circuits is, in general, larger than that of SG + LP-mode
74
Table 5.3: Comparison of SG-, SG + LP- and SG + MT-mode synthesis techniques for ISCAS 85
benchmarks at iso-delay
Total leakage current IT OT
Benchmark circuit
SG-mode
SG + LP-mode
SG + MT-mode
Mean (A) Std. (A) Mean (A) Std. (A) Mean (A) Std. (A)
c17
1.59
0.23
1.13
0.13
1.21
0.15
c432
25.60
0.79
12.79
0.38
16.98
0.61
c499
72.16
1.24
46.34
0.46
57.97
0.61
c880
72.11
1.68
30.96
0.49
45.83
0.64
c1908
64.86
1.49
35.57
0.60
52.02
0.83
c3540
160.15
2.50
85.53
0.89
128.12
1.42
c5315
192.30
2.60
89.73
1.89
148.66
1.02
c7552
259.46
2.08
143.54
2.04
204.61
1.22
Savings
0
0
47.5%
45.4%
22.7%
48.5%
circuits, except for the c5315 and c7552 benchmarks. This is because, as stated earlier, the mean
and variance of MT-mode gates is, on an average, lower than that of SG-mode gates but higher than
that of LP-mode gates.
Table 5.4: Mean and std. deviation of IT OT for ISCAS 85 benchmarks for TSI = 0 and LG = 0
Benchmark
TSI = 0
LG = 0
circuit
Mean (A) Std. (nA) Mean (A) Std. (A)
c17
0.04
0.04
1.67
0.23
c432
0.69
0.16
28.06
0.62
c499
1.97
0.25
82.63
0.99
c880
1.99
0.29
81.35
1.14
c1908
1.75
0.24
72.91
1.26
c3540
4.46
0.39
180.80
2.03
c5315
6.95
0.58
287.63
2.29
c7552
7.12
0.48
294.86
1.96
Table 5.4 presents the results for circuits synthesized using only SG-mode gates, due to the
variation in LG and TSI individually in Major columns I and II, respectively. For
= 10%,
the variation in IT OT due to LG is not substantial. However, when TSI alone varies, IT OT and
IT OT are similar to the case where LG and TSI vary together. This observation is consistent with
the leakage current trend of a single FinFET shown in Figs. 5.4 and 5.5 (a small variation about the
mean LG in Fig. 5.4 translates to a linear change in log(ILEAK /ILEAK0 ), whereas a small variation
about the mean TSI in Fig. 5.5 results in a quadratic change in log(ILEAK /ILEAK0 )).
Fig. 5.16 shows the IT OT PDF for the c880 benchmark circuit synthesized using only SG-mode
75
3.5
x 10
Uncorrelated
Relative Units
2.5
2
Correlated
1.5
1
0.5
0
2.52 2.54 2.56 2.58
2.6
TOT
2.62 2.64 2.66 2.68
(A)
x 10
Figure 5.16: Spreads in IT OT in the correlated and uncorrelated cases for benchmark circuit c880.
gates for the cases when LG and TSI are assumed to be correlated and uncorrelated. IT OT for both
the cases is similar, however, IT OT doubles in the correlated case. The latter occurs due to the
fact that gates with correlated dimensions are likely to have large leakage currents simultaneously,
leading to a wider spread.
Fig. 5.17 shows the normalized area, IT OT , and IT OT obtained by increasing the fraction
of LP-mode gates in a pure-SG mode netlist at iso-delay. Each point in the figure is obtained by
normalizing the metric of the SG + LP-mode circuit to a 100% SG-mode circuit synthesized for the
same delay. The normalized area increases sharply, while IT OT falls gradually and IT OT drops
sharply as the percentage of the LP-mode gates is increased in the netlist. Fig. 5.18 shows the
effect of increasingly mixing LP-mode (MT-mode) gates in a pure-SG mode netlist for the c880
benchmark at an increasing output arrival time constraint. There is a 80% (87%) improvement in
IT OT (IT OT ) as we move from the 100% SG-mode circuit to the 80% SG-mode circuit. On the
other hand, the gain decreases rapidly as we increase the fraction of LP-mode gates in the circuits.
This happens because initially all the large SG-mode gates get substituted with LP-mode gates and,
hence, there is a large reduction in IT OT and IT OT . As we substitute more and more LP-mode
76
Normalized to 100% SGmode at isodelay
1.6
1.4
1.2
Norm. area
Norm. (ITOT )
Norm. (ITOT )
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
Fraction of LPmode gates

Figure 5.17: Effect of mixing LP-mode gates into a pure SG-mode c880 benchmark circuit, normalized to the 100% SG-mode case at iso-delay.
77
Normalized to delay minimized 100% SGmode
1
0.9
SG + LPmode norm. (I
TOT
SG + LPmode norm. (ITOT)
0.8
SG + MTmode norm. (I
TOT
0.7
SG + MTmode norm. (ITOT)
0.6
0.5
0.4
SG + MTmode
SG + LPmode
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Fraction of LPmode (MTmode) gates

Figure 5.18: Effect of mixing LP-mode (MT-mode) gates into a pure SG-mode c880 benchmark
circuit, normalized to the 100% SG-mode case with delay slacks.
78
gates, the smaller SG-mode gates are replaced eventually. Since the difference in IT OT (IT OT )
between a small SG-mode and LP-mode gates is small, the gain starts diminishing. A similar trend
is observed for the MT case, where IT OT (IT OT ) improves by 76% (83%) as we move towards
0.8
40% SG +
60% LPmode
40% SG +
60% MTmode
100% SGmode
0.6
0.4
0.2
TOT
cumulative distribution function
the 80% SG-mode circuit from the 100%-SG mode circuit.
0
2
ITOT (A)
7
5
x 10
Figure 5.19: Cumulative distribution function of IT OT for 100% SG-mode vs. 40% SG + 60%
LP-mode (MT-mode) gates at iso-delay for benchmark circuit c880.
Fig. 5.19 shows the cumulative distribution function (CDF) for c880 synthesized using SG, SG + LP- and SG + MT-mode logic gates at iso-delay. It can be seen that the slope of the
SG CDF is smaller than the slope of SG + LP (SG + MT) CDF, implying larger variance for
circuits synthesized using only SG-mode gates. The area of the SG-mode netlist is 31.2% (29.0%)
smaller than the area of the SG + LP-mode (SG + MT-mode) netlist. The solid vertical lines
show the different leakage current constraints. If the primary design objective is area along with a
reasonable margin for leakage current, then circuits can be synthesized using SG-mode gates under
a small failure probability. However, if the leakage constraints are tight, then the circuit needs to
be synthesized using SG + LP-mode gates. The SG + LP-mode circuit can meet all three leakage
constraints. On the other hand, the SG-mode circuit can meet only one leakage constraint. Again,
79
c880 synthesized using SG + MT-mode gates lies in the middle of the spectrum and is able to meet
two leakage current constraints. Thus, SG + MT-mode circuits offer greater yield as compared to
circuits synthesized using SG-mode gates, but lower yield as compared to circuits synthesized using
SG + LP-mode gates.
5.7 Chapter summary

Owing to their double-gate structure, FinFETs offer a diverse library of logic gates that enable
power-performance tradeoffs under process variations. In this chapter, we attempted to address the
issue of variation-aware synthesis for FinFET standard cell design along the dimension of leakage
tradeoffs at iso-delay. We presented a methodology to evaluate the leakage PDF of circuits synthesized using FinFET standard cell libraries. We demonstrated that circuits synthesized using a combination of SG- and LP-mode gates have superior yield in comparison to circuits synthesized using
pure SG-mode gates at iso-delay. A combination of SG- and MT-mode gates also outperforms pure
SG-mode gates. We also showed that continuously increasing the fraction of LP/MT-mode gates in
a pure SG-mode circuit (to reduce leakage), by permitting a delay slack, yields diminishing returns.
80
Chapter 6
Statistical Delay Characterization of

FinFET Standard Cells Under Design of
Experiments Using Response Surface
Methodology
6.1 Introduction
In this chapter, we statistically characterize the delay of FinFET standard cells under spatial and
environmental variations, using central composite rotatable design (CCRD) for response surface
methodology (RSM). We identify the most critical parameters that affect timing arcs of logic cells
under lithographic variations. We also show that the variation of a key process parameter w.r.t.
delay is in complete contrast of what one would observe in CMOS technology. This later leads to
variation-aware (environmental and lithographic) delay models for FinFET standard cells (NAND
and INV) implemented in different logic styles: SG and LP.
Despite the advantages FinFETs enjoy over bulk CMOS, they are still subjected to various process and environmental (such as temperature) variations. Thus, identifying which FinFET process
parameters are the most important from the delay/leakage perspective, under the above variations,
is important. This information can be used to develop delay and leakage models for characterizing
81
standard cell behavior under variations. In this chapter, we concentrate on delay. This needs to be
done for various FinFET logic styles. The delay model can also be very useful while performing
statistical static timing analysis (SSTA) of a FinFET circuit. To the best of our knowledge, this is
the first work to study the most critical process parameters affecting FinFET delay and then extend
the framework to develop variation-aware FinFET delay models.
The major contributions of this work can be summarized as follows:
We identify the most critical parameters that affect the saturation current (Ids ) of the SGFinFET and IG-FinFET. Since Ids directly influences the delay of a logic gate, these parameters impact the delay arcs of logic cells as well.
We show that the dependence of Ids (hence, delay) on the process variation of gate length LG
is remarkably different from the case of conventional bulk MOSFETs.
We develop delay RSM models for FinFET standard cells using CCRD under environmental
and lithographic variations.
We show that the delay RSM models are in close agreement with MC simulations.
We extend the delay RSM models to incorporate the effect of temperature on delay.
The rest of the chapter is organized as follows. In Section 6.2, we analyze the effects of process
and environmental variations on the delay of FinFETs. In Section 6.3, we describe the design of
experiments (DOEs). In Section 6.4, we demonstrate the efficacy of delay-based RSM models under
process variations. In Section 6.5, we enhance the delay RSM models to incorporate the effect of
temperature variation on delay. We conclude in Section 6.6.
6.2
Delay modeling
In this section, we study the effects of process and environmental variations on FinFET Ids . The delay of a logic gate is directly correlated with the Ids of the transistor [103]. Thus, we analyze Ids for
studying the effects of variations on delay. We first analyze the effects of temperature variation on
FinFET delay and thereafter identify the critical process parameters whose variations affect FinFET
82
delay the most. We study the effect of variations on both SG- and IG-mode FinFETs. The FinFET
device parameter values are shown in 6.1.
Table 6.1: FinFET device parameters
PARAMETERS
LGF , LGB (nm)
20
TOXF , TOXB (nm)
1
TSI (nm)
10
HF in (nm)
50
HGF , HGB (nm)
20
LSP F , LSP B (nm)
20
LU N (nm)
10
NBODY (cm3 )
1015
n (eV )
4.4
p (eV )
4.8
NSD (cm3 )
1020
VDD (V )
1
6.2.1
Effect of temperature on delay
There are two kinds of variations in integrated circuits: environmental and physical (or spatial).
Most works on process variations analyze the physical variations in the fundamental process parameters. However, environment-based temporal variations may also be manifested due to varying
operating conditions. They can occur at a frequency of nanoseconds to years [10]. For example,
effects, such as negative or positive bias temperature instability, lead to variations in Vth over the
circuit lifetime. On the other hand, varying computing workload leads to temporal variations in the
chip temperature. Thermal packaging and heat dissipation issues become an important concern for
FinFETs because of their SOI structure. FinFETs may attain a very high temperature at large input
switching activity [51]. Therefore, it is very important to validate FinFET circuit designs at various
temperature corners. We analyze nFinFETs. (A similar analysis is applicable to pFinFETs.)
The dependence of Ids on temperature can be understood through the following equation:
Ids = COX
Wef f
(Vgs Vth )
Lef f
(6.1)
where , COX , Wef f , Lef f and Vgs are the mobility, gate capacitance, effective width of the transistor, effective channel length and gate-source voltage, respectively [103]. Wef f is equal to 2HF in
83
for an SG-FinFET and HF in for an IG-FinFET. Vth incorporates the effects of TSI and n . The
temperature dependence of Ids originates from the dependence of and Vth on temperature. As
temperature T increases, Vth decreases because of the increased intrinsic carrier concentration at
the channel surface. This increased concentration results in the shifting of the Fermi level towards
the conductivity band, and thus lowering of Vth . An analytical model for Vth is presented in [104]:
Vth = M S
KT
q 2 ni TSI TOX
ln(
)
q
4OX KT
(6.2)
where M S is the difference in the Fermi level of metal and semiconductor. K, q, ni , and OX
are the Boltzmann constant, electron charge, intrinsic carrier concentration, and permittivity of
the oxide, respectively. It is evident from this equation that Vth has a negative correlation with
temperature T . However, with an increase in temperature, the gate drive (Vgs Vth ) increases. On
the other hand, a temperature increase aggravates lattice scattering, thus reducing electron mobility.
Hence, these two effects counteract each other, making the change in Ids dependent on the relative
sensitivities of and Vth at that temperature.
In order to investigate the temperature effect, we simulated an SG-nFinFET and IG-nFinFET,
with gate and drain tied to VDD and source shorted to ground, at two different temperatures: 25o C
and 125o C. Fig. 6.1 shows Ids with varying voltages at the two temperatures. At VDD = 1.0V , Ids of
an SG-nFinFET (IG-nFinFET) at T = 25o C is 12% (3%) larger than Ids at T = 125o C. However,
at VDD = 0.5V , Ids of an SG-nFinFET (IG-nFinFET) at T = 25o C is 3% (33%) lower than Ids at
T = 125o C. At VDD = 0.54V (0.9V ), Ids is independent of temperature for an SG-nFinFET (IGnFinFET). Thus, IG-nFinFETs can be seen to be affected by temperature more than SG-nFinFETs.
Hence, depending on the supply voltage, Ids may decrease, increase or remain constant with varying
temperature. This is due to the counteracting effects of and (Vgs Vth ).
The Ids of IG-nFinFETs behaves differently with temperature than Ids of SG-nFinFETs because
the back-gate bias controls the electron concentration of the channel, thus controlling the Vth and
of the electrons in the channel. Thus, it is extremely important to characterize both FinFET modes
under temperature variations, not just the SG mode.
It should be noted that Ids has a strong dependence on LG and TSI . Thus, the temperatureinsensitive delay point will also have a strong dependence on LG and TSI . Fig. 6.2 shows Ids at
84
x 10
SGnFinFET
25 C
ds
(A)
4
3
125 C
o
125 C
25oC
1
0
1
0
IGnFinFET
0.2
0.4
0.6
Vds=Vgs=1.0V
0.8
Figure 6.1: Variation of nFinFET saturation current with voltage and temperature
various voltages and TSI , at 25o C and 125o C, both for SG- and IG-nFinFET. The temperatureinsensitive delay point for SG-nFinFET (IG-nFinFET) is at 0.56V (1.0V ) for TSI = 5nm. The
point shifts towards the origin with increasing TSI . This is because the electron mobility decreases
with decreasing TSI . A reduction in TSI leads to a narrow confinement of volume-inverted charge
in the real space, increasing phonon scattering [105]. Also, surface roughness scattering increases
with a reduction in TSI , resulting in a lower Ids .
6.2.2
Screening spatial process parameters for relative importance
In this section, we discuss the effect of fundamental process parameters on Ids . We also identify the
most critical parameters that affect Ids of both the SG- and IG-nFinFET.
Fig. 6.3(a) (6.3(b)) shows the variation of normalized Ids for SG-nFinFET (IG-nFinFET) with
normalized LG , TSI , TOX and n . LGo , TSIo , TOXo and no denote the nominal device parameters
shown in Table 6.1. Idso is the current at the nominal device parameter values. The polynomials
were fitted to simulation data with a root mean square (RMS) error of less than 3%. It can be seen
from the figures that Ids changes quadratically (quadratically) with LG , cubically (quadratically)
85
x 10
TSI=20nm
25 C
125 C
o
25 C
125oC
ds
(A)
TSI=12nm
25oC
TSI=5nm
125 C
0
1
0
0.2
0.4
V =V
ds
gs
0.6
(V)
0.8
(a) SG-nFinFET
5
x 10
TSI=20nm
25oC
2.5
125oC
2
Ids (A)
25 C
1.5
TSI=12nm
1
125 C
0.5
TSI=5nm
0
o
o
25 C 125 C
0.5
0
0.2
0.4
V =V
ds
gs
0.6
(V)
0.8
(b) IG-nFinFET
Figure 6.2: Saturation current dependence on temperature and fin thickness
86
1.2
1.05
0.06x20.37x+1.31
0.49x 0.08x+0.58
0
ds ds
I /I
I /I
ds ds
1.1
1
0.9
0.8
0.9
0.95
1.1
L /L
0.9
1.1
/T
OX
OX
2.2x37.42x2+8.46x2.26
6.18x+7.37
0
Ids/Ids
I /I
ds ds
1.1
1
1.5
0.9
0.8
0.9
TSI/TSI
0.5
0.9
1.1
0.95
1.05
n/n
1.1
(a) SG-nFinFET
1.15
1.3
0.28x +0.39x+0.32
ds ds
1.1
1
0.9
0.95
0.8
0.8
0.9
0.9
0.8
1.1
LG/LG
1.05
0.9
TOX/TOX
1.1
2.5
0.02x2+0.61x+0.35
10.18x+11.20
ds ds
1
0.95
I /I
Ids/Ids
0.13x20.66x+1.54
1.05
I /I
ds ds
I /I
1.1
0
1.2
0.9
0.85
0.8
0.8
1.5
1
0.5
0.9
TSI/TSI
0
0.9
1.1
0.95
1.05
1.1
(b) IG-nFinFET
Figure 6.3: Saturation current dependence on process parameters for SG- and IG-nFinFET
87
with TSI , quadratically (quadratically) with TOX , and linearly (linearly) with n for SG-nFinFET
(IG-nFinFET). Though Ids varies linearly with n , Ids varies the most when there is a 10% variation
in n . More precisely, in the case of the SG-nFinFET, Ids varies from 0.6Idso to 1.77Idso when there
is a 10% variation in n . On the other hand, Ids only varies from 0.83Idso to 1.18Idso , 0.94Idso to
1.01Idso , and 0.97Idso to 1.02Idso , when there is 10% variation in LG , TSI and TOX , respectively.
The impact of variation in n is also the most profound in the case of the IG-nFinFET. Thus, though
Ids varies quadratically, cubically and quadratically with LG , TSI and TOX , respectively, a small
variation in n can manifest itself as a large variation in Ids because of the strong coefficients in
its linear model. However, if the process variation in LG , TSI and TOX is larger than that in n , it
can result in a strong variation in Ids because of the corresponding polynomial dependence. This
implies that all four parameters are critical for modeling the delay of standard cells under process
variations.
Another important point to note is that Ids monotonically increases with LG . This is in stark
contrast to bulk CMOS where Ids decreases with an increase in LG . This is owing to a slight
difference in the fabrication processes of FinFETs and conventional MOSFETs. To explain this
phenomenon, we first discuss the traditional planar MOSFET fabrication technology. We highlight
the part that determines Ids behavior controlled by LG of the device. Next, we show how FinFET
fabrication differs, resulting in a different Ids behavior with process variations in LG .
SiO
Ion implantation for
Ion implantation for
source and drain
source and drain
Poly-silicon gate
Poly-silicon gate
L G1
L G2
SiO
n+
SiO
SiO
n+
n+
n+
Si-substrate
Si-substrate
Figure 6.4: Effect of process variation on physical gate length of traditional planar MOSFETs
We explain the impact of LG on Ids with the help of a bulk nMOS transistor (similar explanation
is also applicable to a pMOS transistor). First, an oxide layer is created on the silicon substrate
followed by deposition of a polysilicon layer, which is used as gate material. Next, both layers are
88
etched to create the channel for the device. Then, the open area is doped using ion implantation to
create the source and the drain. Hence, any variation in the process of etching away the polysilicon
gate layer determines the closeness of the source and drain regions. This can be seen from Fig. 6.4.
It shows two nMOS transistors with different channel lengths: LG1 and LG2 . With increasing LG ,
the distance between source and drain increases, which leads to a reduction in Ids because now the
electrons have to traverse a longer path.
(a) FinFET with small gate length (b) FinFET with large gate length
Figure 6.5: Effect of process variation on physical gate length of FinFETs

For FinFET fabrication using spacer technology, first a silicon layer is deposited on top of
the silicon substrate and then appropriate regions are etched away to form the thin channel and
extended source-drain region. The substrate is lightly doped. The channel region remains lightly
doped whereas the source-drain region is heavily doped. Doping is done without creating the gate
oxide or polysilicon layer. Thus, the distance between the source and drain is fixed and does not
depend on LG . After source-drain doping, an oxide layer is formed to cover the channel region.
Then nitride spacer is deposited on top of the extended source-drain region. It partially covers the
thin fin as well. The rest of the open area is used for depositing gate material. Hence, process
variation in nitride spacer deposition eventually determines the length of the open space available
for forming the polysilicon/metal gate. If the spacer thickness decreases, LG increases accordingly.
This eventually covers a larger undoped portion of the silicon fin, as seen from Fig. 6.5. LG1 in
Fig. 6.5(a) is shorter than LG2 in Fig. 6.5(b). The difference can be attributed to the nitride spacer
length. If process variations lead to an increased LG , a larger area of the undoped channel forms an
inversion layer. This in turn reduces the resistance between the source and drain, since a larger part
of the undoped body is in the inversion region, leading to huge electron concentration. The underlap
between the source and drain is also reduced simultaneously. The reduced underlap implies reduced
89
resistance offered by the undoped body. Thus, as LG increases the resistance between source and
drain decreases in a FinFET. This, in effect, gives rise to a larger Ids .
To quantitatively analyze the impact of process parameter variation on FinFET delay, the sensitivity analysis of Ids to process parameter (P r) is next performed. Ids is calculated at the point
where Vds = Vgs = 1.0V . A simple three-point experiment is performed. For each P r, the slope of
Ids is first calculated between the and + 3 points, and then between the and 3 points.
The average of the two slopes defines the total sensitivity of Ids to P r. The average of the two
slopes is calculated to account for the inherent nonlinearity of Ids with respect to process parameter
variation. In order to make sensitivity dimensionless, it is divided by the nominal current, Idsnom ,
and multiplied by the nominal process parameter, P rnom . Making it dimensionless makes it easier
to compare sensitivities across various process parameters. Thus, dimensionless sensitivity S is
given by:

S=
4Ids
Idsnom

4P r
/
P rnom
(6.3)
Fig. 6.6 shows the absolute value of S for various P r, i.e., LG , TSI , TOX and n , both for SGand IG-nFinFETs. Ids can be seen to be most sensitive to variations in n , followed by variations
in LG , TSI and TOX . This corroborates the results presented in Fig. 6.3. It can be seen that IGnFinFETs are more prone to Ids variations as compared to SG-nFinFETs. Since Ids is directly
related to FinFET delay, it can be safely concluded that variations in n also impact delay the most.
However, due to lithographic effects, such as line edge roughness, the variations in LG , TSI and
TOX can also be substantial. Thus, we consider all four process parameters for delay modeling.
6.3 Design of experiment (DOE)

In this section, we discuss a variety of DOEs that can be used to develop analytical delay models.
The pros and cons of several different DOEs are described. We choose CCRD along with RSM for
our DOE. We develop RSM-based delay models for SG- and LP-mode inverters and NAND gates:
SG-INV, LP-INV, SG-NAND, and LP-NAND using CCRD.
The purpose of the DOE is to characterize the impact of input parameters on the output parameter. The DOE starts with screening of the most sensitive input parameters and then designing an
90
Figure 6.6: Absolute S values with respect to different process parameters

experiment to study the combined influence of the input parameters on the output. The experiment
should span the design space efficiently. There are three major categories of DOEs: screening, full
factorial and response surface [106].
Screening designs are used to screen out less important input parameters. Most commonly used
screening designs are two-level designs. Typical two-level design types are two-level full factorial
and two-level fractional factorial. In a two-level design, each variable has two levels: high and
low. In a two-level full factorial design, all the variables can take either of the two possible level
values. Thus, for a k-variable design, the number of experiments needed is 2k . When the number
of variables is greater than six, the two-level full factorial design may require a large number of
runs. This is alleviated through the fractional factorial design in which some of the design vectors
of the full factorial design are deleted. However, the sample size remains a power of 2, e.g., for a
k-variable design, 2kq experiments need to be run where q < k. Fractional factorial designs suffer
from aliasing between the main factors and interaction effects. The amount of aliasing is defined by
the value of q [106].
Full factorial designs consist of all combinations of the levels of factors. Each variable has n
levels. The value of n can vary from variable to variable. The number of runs is the product of
the number of levels of each variable. Thus, this number grows exponentially. On the other hand,
since full factorial designs are the most conservative, they are also the most accurate in modeling
91
the impact of input parameters on the output.

Response surface designs are a collection of statistical and mathematical methods that are useful
for modeling and analysis problems. Here, the main objective is to optimize the response surface
that is influenced by various input parameters. RSM also quantifies the relationship between the
input parameters and the obtained surface responses. If all variables are assumed to be measurable,
the response surface can be expressed as:
y = f (x1, x2, ..., xk)
(6.4)
The goal is to optimize the response variable (y). It is assumed that the independent variables are
continuous.
To model the above equation, we need to evaluate f at various input vectors. Also, three distinct
input values are needed for each variable if one needs to model a quadratic-level response surface.
Hence, two-level factorial designs cannot be used. Also, full factorial designs are not preferred
because of the huge computation time associated with them. An effective alternative to factorial
designs is the CCRD, originally developed by Box and Wilson and later improved upon by Box and
Hunter [107].
Fig. 6.7 shows the geometrical representation of a CCRD for three variables. It consists of eight
factorial cube points, six axial points, and a center point. In general, the number of tests required for
a k-variable CCRD is 2k factorial points, 2k axial points and a center point, for a total of 2k +2k +1
experiments. However, the factorial portion can also be a fractional factorial design [108]. The
factorial points generate the coefficients for the linear terms and the axial points for the quadratic
terms. The axial points are chosen such that they allow rotatability, which ensures that the variance
of model prediction is constant at all points equidistant from the design center. After the range of
input variables has been fixed, they are coded as 1 for factorial points, for axial points, and
0 for the center point. The coded values are calculated as functions of the range of interest of each
variable, as shown in Table 6.2 [109]. Here, xmax (xmin ) denotes the maximum (minimum) value
of the variable and = 2k/4 . In our case, k = 5. However, we have chosen fractional factorial
design. Hence, the number of factorial points is 16 (2(51) ). Also, = 2(51)/4 = 2 for these
fractional factorial designs [108].
92
Figure 6.7: CCRD for k=3

Table 6.2: Relationship between coded and actual variable values
Code
1
0
+1
+
Actual value of variable

xmin
[(xmax + xmin )/2] [(xmax -xmin )/2]
[(xmax + xmin )/2]
[(xmax + xmin )/2] + [(xmax -xmin )/2]
xmax
In order to simulate the delay response effect of the process variation parameters, CCRD was
simulated for five different parameters: LG , TSI , TOX , n , and p . A five-factor and five-coded
level CCRD was used to determine the delay response for the standard cells in the library (factors
are essentially the process parameters in the present context). The total number of tests required
for the five-factor design is 27. Table 6.3 shows each of the process parameters along with its level
for CCRD. Table 6.4 shows the coded and actual values of variables for each of the experiments
conducted in the design space.
We next investigated response surface models, which are essentially a quadratic model of the
predictor variables. The RSM involves a group of statistical techniques for empirical model building
and model utilization. RSM models seek to relate a response variable to the levels of the predictors.
The most widely used RSM models are low-order polynomials. Second-order polynomials have a
general form given by
Yi = 0 + 1 Xi + 2 Xi Xj +
(6.5)
Here, Yi is the response variable and Xi s are the predictor variables. 0 , 1 , and 2 are the co93
Table 6.3: Process parameters along with their levels for CCRD
Process parameter
LG (nm)
TSI (nm)
TOX (nm)
n (eV )
p (eV )
Lowest
18
9.0
0.9
4.30
4.70
Coded variable level

Low Center High
1
0
+1
19
20
21
9.5
10.0
10.5
0.95
1.0
1.05
4.35
4.40
4.45
4.75
4.80
4.85
Highest
+
22
11.0
1.0
4.50
4.90
Table 6.4: Coded process parameters along with their actual values
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
2
2
0
0
0
0
0
0
0
Code
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
2
2
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
2
2
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
2
2
0
LG
19
19
19
19
19
19
19
19
21
21
21
21
21
21
21
21
18
22
20
20
20
20
20
20
20
20
20
94
TSI
9.5
9.5
9.5
9.5
10.5
10.5
10.5
10.5
9.5
9.5
9.5
9.5
10.5
10.5
10.5
10.5
10.0
10.0
9.0
11.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
TOX
0.95
0.95
1.05
1.05
0.95
0.95
1.05
1.05
0.95
0.95
1.05
1.05
0.95
0.95
1.05
1.05
1.00
1.00
1.00
1.00
0.90
1.10
1.00
1.00
1.00
1.00
1.00
n
4.35
4.45
4.35
4.45
4.35
4.45
4.35
4.45
4.35
4.45
4.35
4.45
4.35
4.45
4.35
4.45
4.40
4.40
4.40
4.40
4.40
4.40
4.30
4.50
4.40
4.40
4.40
p
4.85
4.75
4.75
4.85
4.75
4.85
4.85
4.75
4.75
4.85
4.85
4.75
4.85
4.75
4.75
4.85
4.80
4.80
4.80
4.80
4.80
4.80
4.80
4.80
4.70
4.90
4.80
Table 6.5: RSM delay model coefficients for SG-INV

Variable
const
x1
x2
x3
x4
x5
x1.x2
x1.x3
x1.x4
x1.x5
x2.x3
x2.x4
x2.x5
x3.x4
x3.x5
x4.x5
x12
x22
x32
x42
x52
Error
R2
Coeff. values
(1e-09)
0.6456
-0.0006
-0.0039
-0.0134
-0.1694
-0.0977
0.0
-0.0002
0.0001
-0.0002
0.0001
0.0001
0.0001
0.0024
-0.0019
-0.0025
0.0
0.0002
0.0086
0.0207
0.0114
0.2%
0.997
95
efficients of regression, and is the predictor noise. The predictor coefficients can be obtained by
minimizing across the whole sample space. Regression analysis aims to minimize the residual
sum of squares to calculate 0 , 1 and 2 . Ybi = 0 + 1 Xi + 2 Xi Xj is the predicted value of the
response variable through the regression equation. We used MATLAB 7.0 to minimize the residual
sum of squares and calculate the regression coefficients. Table 6.5 gives the regression coefficients
and the average error between simulation data and model predictions for SG-INV. The average absolute fitting error is 0.2%, i.e., the error encountered in fitting the 27 CCRD simulations to the
quadratic model. In the table, x1, x2, x3, x4 and x5 correspond to LG , TSI , TOX , n , and p , respectively. In order to determine the strength of the relationship between the response and predictor
variables, the coefficient of determination R2 is used [108]. The expression for R2 is given by (Yi
refers to the average across all Yi s)
Pn
(Yi Ybi )2
R = 1 Pi=1
n
2
i=1 (Yi Yi )
2
(6.6)
R2 is a statistic that gives us information on the goodness of the fit of the model. A value of
R2 = 1 corresponds to a perfect fit between the regression line and the data. R2 is the proportion
of variation in the dependent variable Yi that can be explained by predictors Xi in the regression
model. Table 6.5 indicates the value for R2 for the quadratic delay model of SG-INV, indicating a
very good fit.
RSM delay models were similarly developed for SG-NAND, LP-INV and LP-NAND, as shown
in Table 6.6. R2 values above 0.99 show that the model fits the data quite well. The methodology is
very general and can be used to characterize other standard cells as well.
6.4
Validation of the RSM model
In this section, we show that gate delays obtained through the RSM delay models developed using
CCRD simulations closely approximate delays obtained through MC simulations.
We first use TCAD-based MC simulations to obtain the golden intrinsic delay values of SG-INV
at 1000 random n values. The rise and fall times of the input were set to 5ps. Fig. 6.8 shows the
probability density function (PDF) for the inverter delay. As can be seen, there is a good match
96
Table 6.6: RSM delay model coefficients for SG-NAND, LP-INV and LP-NAND
Variable
const
x1
x2
x3
x4
x5
x1.x2
x1.x3
x1.x4
x1.x5
x2.x3
x2.x4
x2.x5
x3.x4
x3.x5
x4.x5
x12
x22
x32
x42
x52
Error
R2
Coeff. value (SG-NAND)

(1e-09)
0.7828
0.0025
-0.0075
0.0215
-0.1171
-0.2005
0.0
0.0003
-0.0006
-0.0003
0.0007
-0.0003
0.0010
-0.0006
-0.0046
0.0040
0.0
0.0001
-0.0011
0.0145
0.0175
1.2%
0.999
Coeff. value (LP-INV)

(1e-08)
0.6087
0.0004
-0.0048
0.0004
-0.1181
-0.1364
0.0
-0.0002
-0.0003
0.0
-0.0002
0.0001
0.0
-0.0008
-0.0027
-0.0018
0.0
0.0002
0.0106
0.0153
0.0151
1.3%
0.967
Coeff. value (LP-NAND)

(1e-08)
0.1841
-0.0002
-0.0004
-0.0046
-0.0216
-0.0524
0.0
0.0001
-0.0001
0.0001
0.0
-0.0001
0.0001
0.0015
-0.0006
-0.0002
0.0
0.0
0.0002
0.0030
0.0050
0.4%
0.996
Table 6.7: Average testing error for SG-INV, SG-NAND, LP-INV, and LP-NAND
SG-INV SG-NAND LP-INV LP-NAND
Parameters
Error
Error
Error
Error
n , p , LG , TSI , TOX
2.1%
0.2%
4.2%
1.2%
between RSM and MC based delays, with an average absolute testing error of only 0.9%. Similar
results were obtained when LG , TSI , TOX and p were assumed to have Gaussian distributions.
The absolute testing error was 1.0%, 0.8%, 0.3% and 0.4%, respectively, for these parameters.
Table 6.7 shows the average error obtained by using our RSM based delay models for SG-INV,
SG-NAND, LP-INV, and LP-NAND with n , p , LG , TSI and TOX assumed to have Gaussian
distributions. All the above process parameters were varied simultaneously in MC simulations. The
average absolute error ranged from 0.2% to 4.2%. The average speedup of RSM models across all
cells was 40, relative to MC simulations.
97
11
x 10
RSM Model
MC Simulation
PDF
0
6
9
Delay (s)
10
11
12
12
x 10
Figure 6.8: MC and RSM based delay distributions for SG-INV with n assumed to have a Gaussian
distribution
6.5 Dependence of delay model on temperature

In this section, we analyze how the RSM model can be extended to include the variation of delay
with temperature.
We include a T term in the RSM model to incorporate the effect of temperature on delay. The
revised RSM model has the following equation:
Yi = (0 + 1 Xi + 2 Xi Xj )T
(6.7)
Fig. 6.9(a) shows the plot of SG-INV delay at varying temperatures and at two different supply
voltages, 0.5V and 1.0V. At 1.0V (0.5V), the delay increases (decreases) with increasing temperature. These simulation results further support the results reported in Fig. 6.1. The explanation of
this behavior was given in Section 6.2.1. T can be calculated by fitting polynomials to the delaytemperature curve. For example, in Fig. 6.9(a), delay is linearly (quartically) dependent on T at
1.0V (0.5V).
98
10
9.5
0.5V
9
23.9T1 109.86T1 +188.98T1 144.36T1+51.07

Delay (ps)
8.5
8
7.5
1.0V
2.06T1+3.62
6.5
6
5.5
1.05
1.1
1.15
1.2
1.25
1.3
T1 = T/300 (K)
(a) SG-INV
60
55
67.94T12211.77T1+200.1
50
0.5V
Delay (ps)
45
40
35
30
0.8T1 +4.6T1+11.79
25
1.0V
20
15
1.05
1.1
1.15
1.2
1.25
1.3
T1 = T/300 (K)
(b) LP-INV
Figure 6.9: Delay vs. temperature at two supply voltages
99
Fig. 6.9(b) shows the delay-temperature curves for LP-INV. The decrease in delay at 0.5V with
increasing temperature for LP-INV is much sharper than that observed for SG-INV. The decrease in
delay for LP-INV (SG-INV) is 29.7% (0.3%) when the temperature changes from 300o K to 390o K.
This is because of the sharp increase in Ids for LP-INV at 0.5V (as explained in Section 6.2.1).
However, at 1.0V, the delay increases slightly with an increase in temperature.
Similarly, T was obtained for SG-NAND and LP-NAND at the two voltages. The delay trends
observed were similar to the delay trends of SG-INV and LP-INV.
6.6 Chapter summary

In this chapter, we described the impact of environmental and process variations on Ids (hence,
delay) of FinFETs. We showed that the impact of temperature on delay was more pronounced in
the case of IG-FinFETs than SG-FinFETs. Delay is also highly dependent on device parameters,
such as G , LG , TSI and TOX , and operational parameters, such as VDD . We developed RSM
delay models for FinFET standard cells (INV and NAND) using CCRD for both SG and LP logic
styles. We also demonstrated the efficacy of our models under process variations relative to MC
simulations. We extended the RSM delay models to include the effects of temperature variation.
100
Chapter 7
Conclusions and Future Research

Technology scaling has provided us with increased circuit performance over the past two decades.
The industry has scaled the conventional transistors for the past six years using several innovative
techniques such as high-k dielectrics and strained silicon. However, scaling of conventional transistors beyond the 22nm node is very difficult due to short-channel effects, such as drain-induced barrier lowering (DIBL), subthreshold slope and subthreshold leakage current. DGFETs have emerged
as a possible solution to continue technology scaling. Such FETs have two gates to control the
concentration of the electrons in the channel and thus have superior electrostatic integrity. The two
gates mitigate the effect of the drain-source electric field in the channel and thus provide superior
channel control. Among DGFETs, FinFETs have emerged as the most viable solution due to their
ease of fabrication. The fabrication process of FinFETs is quite similar to the fabrication process
of conventional transistors. The tri-gate version of FinFETs was recently announced by Intel as its
choice of transistor for fabricating processors at the 22nm technology node.
FinFETs have been shown to have superior on-current and off-current when compared to a
conventional transistor at the same technology node. Their dual-gate structure can be exploited for
innovative circuit design. E.g., if a reverse bias is applied to the back gate of IG-mode FinFETs, the
threshold voltage Vth of the front gate can be modulated. Since Vth impacts both the subthreshold
current and delay of a transistor, it can be used as a knob to make delay-leakage trade-offs. Novel
standard cells have been proposed in the literature that exploit the above fact. It has been shown that
FinFETs can be easily fabricated along the < 110 > channel orientation by rotating the pFinFET by
45o relative to the nFinFET. The electron mobility is highest along the < 100 > channel orientation
101
while the hole mobility is highest along the < 110 > channel orientation. Thus, using < 110 >
transistors in the pull-up network of the logic gate and < 100 > transistors in its pull-down network
can lead to better delay.
FinFETs still suffer from the effects of process variations due to factors such as line edge roughness and temperature variations. However, they do not suffer from the random dopant fluctuation
effect encountered in bulk transistors, since their body is undoped. Lithographic variations can lead
to deviation in FinFET parameters, such as LG , TSI , TOX . Further, these variations can be intra-die
or inter-die in nature. G is heavily dependent on the processing temperature and, hence, temperature variations during processing can lead to deviations in the value of G . Since both leakage
and delay heavily depend on the above process parameters, it is extremely important to characterize
variations in FinFET delay/leakage with variations in these parameters.
In Chapter 1, we outlined the obstacles in the scaling of conventional bulk MOSFETs. We
discussed several short-channel effects and how DGFETs can circumvent such problems. We also
discussed the different kinds of DGFETs proposed in the literature. Thereafter, we systematically
showed why FinFETs have emerged dominant among DGFETs.
In Chapter 2, we detailed related work in the field of FinFETs. We first discussed various
lithographic techniques used to fabricate FinFETs. We pointed out that spacer lithography provides
double the fin density when compared to optical lithographic techniques. Further, spacer lithography
produces uniform fins, which enables better short-channel control. Thereafter, we reviewed work
done in the area of FinFET logic synthesis. We discussed various innovative FinFET standard cells
along with logic synthesis algorithms specifically tailored to FinFETs. We also reviewed work done
in the area of FinFET SRAMs, specifically, how the dual-gate structure of FinFETs can be exploited
to improve various SRAM metrics, such as the read margin, write margin and cell stability. We also
studied how metal gate workfunction engineering can serve as a substitute for sizing in SRAM cells.
Finally, we reviewed work done in the area of FinFET process variations.
In Chapter 3, we proposed a low-power FinFET circuit synthesis methodology using multiple
supply and threshold voltages. We proposed a mechanism called TCMS for improving the power
efficiency of FinFET circuits. This scheme represents a significant divergence from conventional
multiple-supply voltage schemes. It also obviates the need for voltage level-converters. We employed accurate delay and power estimates using table look-up methods based on HSPICE sim102
ulations for supply voltage and threshold voltage optimization. Experimental results demonstrate
that TCMS can provide power savings of 67.6% and device area savings of 65.2% under relaxed
delay constraints. We also proposed two variants of TCMS that yield similar benefits. We compared our scheme to ECVS, a popular dual-Vdd scheme presented in the literature. ECVS makes
use of voltage level-converters. Even when it is assumed that these level-converters have zero delay,
thus significantly favoring ECVS in time-constrained power optimization, TCMS still outperforms
ECVS.
In Chapter 4, we proposed a low-power FinFET circuit synthesis methodology using surface
orientation optimization. FinFETs with channel surface along the <110> plane can be easily fabricated by rotating the fins by 45o from the <100> plane. By designing logic gates, which have
pFinFETs in the <110> plane and nFinFETs in the <100> plane, the gate delay can be reduced by
as much as 14%, compared to the conventional <100> logic gates. The delay reduction depends
upon the type of logic gate, dielectric constant of the oxide, and the technology node. The reduction in delay can be traded off for reduced power in FinFET circuits. We proposed a low-power
FinFET-based circuit synthesis methodology based on surface orientation optimization. We studied
various logic design styles, which depend on different FinFET channel orientations, for synthesizing low-power circuits. We used BSIM, a process/physics based double-gate model in HSPICE, to
derive accurate delay and power estimates. We designed layouts of standard library cells containing
FinFETs in different orientations to obtain an accurate area estimate for the low-power synthesized
netlists after place-and-route. We used a linear programming based optimization methodology that
gives power-optimized netlists, consisting of oriented gates, at tight delay constraints. Experimental
results demonstrated the efficacy of our scheme.
In Chapter 5, we proposed a die-level leakage power analysis algorithm for FinFET circuits
under process variations. We modeled the leakage probability density function in SG-, IG/LP-, and
MT-mode FinFET standard logic cells, and examined the leakage trade-offs in benchmark circuits
synthesized using combinations of SG-, LP-, and MT-mode logic cells under the effect of process
variations. Using quasi-Monte Carlo mixed-mode device simulations in Sentaurus TCAD, we developed simple macromodels to capture the physical effects that influence the leakage spread in
SG- and IG-mode FinFET devices, and extended it to stacked devices in NAND/NOR gates. We
also implemented a methodology to obtain the overall leakage current distribution for large circuits
103
(synthesized using SG/LP/MT-mode logic cells) using Latin hypercube sampling, considering spatial correlation on a quad-tree based grid. Results indicated that, starting from a 100% SG-mode
circuit, the leakage spread/yield point can be improved considerably by suitably introducing LPmode and MT-mode gates at iso-delay. We also showed that increasing the fraction of LP/MT-mode
gates (to reduce the mean and variance in leakage) in an SG-mode circuit, by permitting a delay
slack, yields diminishing returns. Mixing LP- and MT-mode gates with SG-mode gates appeared
to be a promising synthesis strategy that can leverage the leakage trade-offs offered by FinFET
standard cells.
In Chapter 6, we proposed a statistical delay characterization of FinFET standard cells under
design of experiments using response surface methodology (RSM). We statistically characterized
the delay of FinFET standard cells under spatial and environmental variations, using central composite rotatable design (CCRD) based on RSM. We identified the most critical parameters that affect
timing arcs of logic cells under lithographic process variations. We also showed that the delay trend
based on variations in a key process parameter is completely opposite of what one would expect in
conventional CMOS technology. These results formed the foundation of variation-aware (environmental and lithographic) delay models for FinFET standard cells (NAND and INV) implemented
in different logic styles, e.g., SG and LP. Results showed that the delay obtained from RSM models
developed for various standard cells are in close agreement with the delay obtained from Monte
Carlo simulations of the logic cells.
In summary, in this dissertation, we discussed some innovative low-power synthesis algorithms/tools
that exploit the unique characteristics of FinFETs. Further, we also proposed a variation-aware synthesis algorithm that takes into account the subthreshold leakage of logic gates. We also proposed
a methodology to calculate the probability density function of die-level leakage power of FinFET
circuits. We statistically characterized the delay of various FinFET standard cells using CCRD.
There are several areas related to the present work that can be explored further in the future:
In Chapter 5, we proposed a method for calculating die-level leakage power of FinFET circuits under process variations. However, the scheme does not take into account spatial or
temporal variations in the die temperature. Since FinFETs are likely to suffer from the ill
effects of self-heating, it is important to analyze chip-level leakage distribution of FinFET
104
circuits under intra-die and inter-die temperature variations.

One can develop a statistical static timing analysis (SSTA) algorithm, which specifically exploits the dual-gate structure of FinFETs. Such an algorithm can employ the delay models
from Chapter 6. Further, we can use the SSTA algorithm to develop a joint delay/leakage
optimization synthesis algorithm under process variations.
105
Bibliography
[1] E. J. Nowak, I. Aller, T. Ludwig, K. Kim, R. V. Joshi, C.-T. Chuang, K. Bernstein, and
R. Puri, Turning silicon on its edge, IEEE Circuits and Devices Magazine, vol. 20, no. 1,
pp. 2031, Jan.-Feb. 2004.
[2] H.-S. P. Wong, K. K. Chan, and Y. Taur, Self-aligned (top and bottom) double-gate MOSFET with a 25 nm thick silicon channel, in Proc. Int. Electronic Device Mtg., Dec. 1997, pp.
427430.
[3] T.-J. King, FinFETs for nanoscale CMOS digital integrated circuits, in Proc. Int. Conf.
Computer-Aided Design, Nov. 2005, pp. 207210.
[4] 2007
International
Technology
Roadmap
for
Semiconductors,
http://www.itrs.net/Links/2007ITRS/Home2007.htm.
[5] Y.-K. Choi, T.-J. King, and C. Hu, Nanoscale CMOS spacer FinFET for the terabit era,
IEEE Electronic Device Lett., vol. 23, no. 1, pp. 2527, Jan. 2002.
[6] A. Muttreja, N. Agarwal, and N. K. Jha, CMOS logic design with independent gate FinFETs, in Proc. Int. Conf. Computer Design, Oct. 2007, pp. 560567.
[7] A. N. Bhoj and N. K. Jha, Pragmatic design of gated-diode FinFET DRAMs, in Proc. Int.
Conf. Computer Design, Oct. 2009, pp. 747751.
[8] K. Bernstein, C.-T. Chuang, R. V. Joshi, and R. Puri, Design and CAD challenges in sub90nm CMOS technologies, in Proc. Int. Conf. Computer-Aided Design, Nov. 2003, pp. 129
136.
106
[9] L. Chang, M. Ieong, and M. Yang, CMOS circuit performance enhancement by surface
orientation optimization, IEEE Trans. Electron Devices, vol. 51, pp. 16211627, Oct. 2004.
[10] S. Ganapath et al., Circuit propagation delay estimation through multivariate regressionbased modeling under spatio-temporal variability, in Proc. Design Automation & Test Europe Conf., Mar. 2010, pp. 417422.
[11] TSMC, http://www.eetimes.com/electronics-news/4213622/TSMC-to-make-FinFETs-in450-mm-fab.
[12] J.-H. Yang, Y.-S. Jin, H.-R. Lee, K.-S. Rha, J.-A. Choi, S.-K. Bae, S. Maeda, Y.-W. Kim,
and K.-P. Suh, Fully working 1.25m2 6T-SRAM cell with 45nm gate length triple gate
transistors, in Proc. Int. Electronic Device Mtg., Dec. 2003, pp. 2.1.12.1.4.
[13] 22nm FinFET SRAM, http://www.eetimes.com/electronics-news/4199830/IBM-partnersto-report-22-nm-FinFET-SRAM.
[14] Infineon FinFET chip, http://www.dailytech.com/Infineon+Tests+3D/article5208.htm.
[15] B. Doyle, B. Boyanov, S. Datta, M. Doczy, S. Hareland, B. Jin, J. Kavalieros, T. Linton,
R. Rios, and R. Chau, Tri-gate fully-depleted CMOS transistors: Fabrication, design and
layout, in Proc. Int. Symp. VLSI Technology, June 2003, pp. 133134.
[16] R. Dennard, F. Gaensslen, V. Rideout, E. Bassous, and A. LeBlanc, Design of ion-implanted
MOSFETs with very small physical dimensions, IEEE J. Solid-State Circuits, vol. 9, no. 5,
pp. 256268, Oct. 1974.
[17] Y.-K. Choi, N. Lindert, P. Xuan, S. Tang, D. Ha, E. Anderson, T.-J. King, J. Bokor, and C. Hu,
Sub-20 nm CMOS FinFET technologies, in Proc. Int. Electronic Device Mtg., 2001, pp.
19.1.119.1.4.
[18] X. Huang, W.-C. Lee, C. Kuo, D. Hisamoto, L. Chang, J. Kedzierski, E. Anderson,
H. Takeuchi, Y.-K. Choi, K. Asano, V. Subramanian, T.-J. King, J. Bokor, and C. Hu, Sub50nm FinFET: PMOS, in Proc. Int. Electronic Device Mtg., 1999, pp. 6770.
107
[19] D. Frank, Y. Taur, and H.-S. P. Wong, Future prospects for Si CMOS technology, in Proc.
Device Research Conf., 1999, pp. 1821.
[20] Y.-K. Choi, D. Ha, T.-J. King, and C. Hu, Threshold voltage shift by quantum confinement
in ultra-thin body device, in Proc. Device Research Conf., 2001, pp. 8586.
[21] A. Datta, A. Goel, R. T. Cakici, H. Mahmoodi, D. Lakshmanan, and K. Roy, Modeling
and circuit synthesis for independently controlled double gate FinFET devices, IEEE Trans.
Computer-Aided Design, vol. 26, no. 11, pp. 19571966, Nov. 2007.
[22] J. Ouyang and Y. Xie, Power optimization for FinFET based circuits using genetic algorithms, in Proc. IEEE Int. SOC Conf., Sept. 2008, pp. 211214.
[23] R. A. Thakker, C. Sathe, A. B. Sachid, M. Shojaei-Baghini, V. R. Rao, and M. B. Patil,
A novel table-based approach for design of FinFET circuits, IEEE Trans. Computer-Aided
Design, vol. 28, no. 7, pp. 10611070, July 2009.
[24] T. Ludwig, I. Aller, V. Gernhoefer, J. Keinert, E. Nowak, R. Joshi, A. Mueller, and
S. Tomaschko, FinFET technology for future microprocessors, in Proc. Int. SOI Conf.,
Oct. 2003, pp. 3334.
[25] K. Anil, K. Henson, S. Biesemans, and N. Collaert, Layout density analysis of FinFETs, in
Proc. European Conf., Solid-State Device Research, 2003, pp. 139142.
[26] M. Alioto, Analysis and evaluation of layout density of FinFET logic gates, in Proc. Int.
Conf. Microelectronics, Dec. 2009, pp. 106109.
[27] , Analysis of layout density in FinFET standard cells and impact of fin technology, in
Proc. Int. Symp. Circuits & Systems, May/June 2010, pp. 32043207.
[28] , Comparative evaluation of layout density in 3T, 4T, and MT FinFET standard cells,
IEEE Trans. VLSI Systems, vol. 19, no. 5, pp. 751762, May 2011.
[29] R. Joshi, K. Kim, and R. Kanj, FinFET SRAM design, in Proc. Int. Conf. VLSI Design,
Jan. 2010, pp. 440445.
108
[30] R. V. Joshi, K. Kim, R. Q. Williams, E. J. Nowak, and C.-T. Chuang, A high-performance,

low leakage, and stable SRAM row-based back-gate biasing scheme in FinFET technology,
in Proc. Int. Conf. VLSI Design, Jan. 2007, pp. 665672.
[31] M.-L. Fan, Y.-S. Wu, V.-H. Hu, P. Su, and C.-T. Chuang, Investigation of static noise margin
of FinFET SRAM cells in sub-threshold region, in Proc. Int. SOI Conf., Oct. 2009, pp. 12.
[32] A. Bansal, S. Mukhopadhyay, and K. Roy, Device-optimization technique for robust and
low-power FinFET SRAM design in nanoscale era, IEEE Trans. Electron Devices, vol. 54,
no. 6, pp. 14091419, June 2007.
[33] H. Ananthan, A. Bansal, and K. Roy, FinFET SRAM - device and circuit design considerations, in Proc. Int. Symp. Quality of Electronic Design, 2004, pp. 511516.
[34] D. Lekshmanan, A. Bansal, and K. Roy, FinFET SRAM: Optimizing silicon fin thickness
and fin ratio to improve stability at iso area, in Proc. Custom Integrated Circuits Conf., Sep.
2007, pp. 623626.
[35] S. Tawfik and V. Kursun, Work-function engineering for reduced power and higher integration density: An alternative to sizing for stability in FinFET memory circuits, in Proc. Int.
Symp. Circuits & Systems, May 2008, pp. 788791.
[36] S. Gangwal, S. Mukhopadhyay, and K. Roy, Optimization of surface orientation for highperformance, low-power and robust FinFET SRAM, in Proc. Custom Integrated Circuits
Conf., Sep. 2006, pp. 433436.
[37] A. Gattiker, S. Nassif, R. Dinakar, and C. Long, Timing yield estimation from static timing
analysis, in Proc. Int. Symp. Quality of Electronic Design, 2001, pp. 437442.
[38] J. Jess, K. Kalafala, S. Naidu, R. Otten, and C. Visweswariah, Statistical timing for parametric yield prediction of digital integrated circuits, IEEE Trans. Computer-Aided Design,
vol. 25, no. 11, pp. 23762392, Nov. 2006.
[39] C. Visweswariah, K. Ravindran, K. Kalafala, S. Walker, S. Narayan, D. Beece, J. Piaget,
N. Venkateswaran, and J. Hemmett, First-order incremental block-based statistical timing
analysis, IEEE Trans. Computer-Aided Design, vol. 25, no. 10, pp. 21702180, Oct. 2006.
109
[40] A. Agarwal, D. Blaauw, V. Zolotov, and S. Vrudhula, Computation and refinement of statistical bounds on circuit delay, in Proc. Design Automation Conf., June 2003, pp. 348353.
[41] L. Scheffer, Explicit computation of performance as a function of process variation, in
Proc. ACM/IEEE Int. Wkshp. on Timing Issues in the Specification and Synthesis of Digital
Systems, 2002, pp. 18.
[42] A. Agarwal et al., Statistical timing analysis for intra-die process variations with spatial
correlation, in Proc. Int. Conf. Computer-Aided Design, Nov. 2003, pp. 900907.
[43] H. Chang and S. Sapatnekar, Statistical timing analysis considering spatial correlations using a single PERT-like traversal, in Proc. Int. Conf. Computer-Aided Design, Nov. 2003, pp.
621625.
[44] H. Chang, V. Zolotov, S. Narayan, and C. Visweswariah, Parameterized block-based statistical timing analysis with non-Gaussian parameters, nonlinear delay functions, in Proc.
Design Automation Conf., June 2005, pp. 7176.
[45] A. Srivastava, R. Bai, D. Blaauw, and D. Sylvester, Modeling and analysis of leakage power
considering within-die process variations, in Proc. Int. Symp. Low Power Electronics &
Design, 2002, pp. 6467.
[46] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, Statistical estimation of leakage current
considering inter- and intra-die process variation, in Proc. Int. Symp. Low Power Electronics
& Design, Aug. 2003, pp. 8489.
[47] A. Agarwal, K. Kang, and K. Roy, Accurate estimation and modeling of total chip leakage considering inter-and intra-die process variations, in Proc. Int. Conf. Computer-Aided
Design, Nov. 2005, pp. 736741.
[48] V. W. S. Zhang and K. Banerjee, A probabilistic framework to estimate full-chip subthreshold leakage power distribution considering within-die and die-to-die P-V-T variations, in
Proc. Int. Symp. Low Power Electronics & Design, 2004, pp. 156161.
110
[49] H. Dadgour, S.-C. Lin, and K. Banerjee, A statistical framework for estimation of fullchip leakage-power distribution under parameter variations, IEEE Trans. Electron Devices,
vol. 54, no. 11, pp. 29302945, Nov. 2007.
[50] J. Gu, J. Keane, S. Sapatnekar, and C. H. Kim, Statistical leakage estimation of double gate
FinFET devices considering the width quantization property, IEEE Trans. VLSI Systems,
vol. 16, pp. 206209, Feb. 2008.
[51] J. H. Choi, J. Murthy, and K. Roy, The effect of process variation on device temperatures in
FinFET circuits, in Proc. Int. Conf. Computer-Aided Design, Nov. 2007, pp. 747751.
[52] H. Khan, D. Mamaluy, and D. Vasileska, Simulation of the impact of process variation on
the optimized 10-nm FinFET, IEEE Trans. Electron Devices, vol. 55, no. 8, pp. 21342141,
Aug. 2008.
[53] S. Xiong and J. Bokor, Sensitivity of double-gate and FinFET devices to process variations,
IEEE Trans. Electron Devices, vol. 50, pp. 22552261, Nov. 2003.
[54] S. Rasouli, K. Endo, and K. Banerjee, Variability analysis of FinFET-based devices and
circuits considering electrical confinement and width quantization, in Proc. Int. Conf.
[55] B. Yu et al., FinFET scaling to 10nm gate length, in Proc. Int. Electronic Device Mtg.,
2002, pp. 251254.
[56] B. Swahn and S. Hassoun, Gate sizing: FinFETs vs. 32nm bulk MOSFETs, in Proc. Design
Automation Conf., July 2006, pp. 528531.
[57] K. Usami and M. Horowitz, Clustered voltage scaling technique for low-power design, in
Proc. Int. Symp. Low Power Electronics & Design, Aug. 1995, pp. 38.
[58] K. Usami, M. Igarashi, F. Minami, T. Ishikawa, M. Kanzawa, M. Ichida, and K. Nogami,
Automated low-power technique exploiting multiple supply voltages applied to a media
processor, IEEE J. Solid-State Circuits, vol. 33, no. 3, pp. 463472, Mar. 1998.
111
[59] K. Roy, L. Wei, and Z. Chen, Multiple-Vdd and multiple-Vth CMOS (MVCMOS) for lowpower applications, in Proc. Int. Symp. Computer Architecture, Oct. 1999, pp. 366370.
[60] P. Mishra, A. Muttreja, and N. K. Jha, Evaluation of multiple supply and threshold voltages
for low-power circuit synthesis, in Proc. Int. Symp. Nanoscale Architectures, June 2008, pp.
7784.
[61] , Low-power FinFET circuit synthesis using multiple supply and threshold voltages,
ACM J. Emerging Technologies in Computing Systems, July 2009.
[62] H. Mahmoodi, S. Mukhopadhyay, and K. Roy, High performance and low power domino
logic using independent gate control in double-gate SOI MOSFETs, in Proc. Int. SOI Conf.,
Oct. 2004, pp. 6768.
[63] L. Wei, Z. Chen, and K. Roy, Double gate dynamic threshold voltage (DGDT) SOI MOSFETs for low power high performance designs, in Proc. Int. SOI Conf., Oct. 1997, pp. 8283.
[64] P. Beckett, Low-power circuits using dynamic threshold voltage devices, in Proc. Great
Lakes Symp. VLSI, Apr. 2005, pp. 213216.
[65] M.-H. Chiang, K. Kim, C. Tretz, and C.-T. Chuang, Novel high-density low-power logic
circuit techniques using DG devices, IEEE Electronic Device Lett., vol. 52, no. 10, pp.
23392342, Oct. 2005.
[66] W. Zhang, J. G. Fossum, L. Mathew, and Y. Du, Physical insights regarding design and
performance of independent-gate FinFETs, IEEE Electronic Device Lett., vol. 52, no. 10,
pp. 21892206, Oct. 2005.
[67] T. Cakici, H. Mahmoodi, S. Mukhopadhyay, and K. Roy, Independent gate skewed logic in
double-gate SOI technology, in Proc. Int. SOI Conf., Oct. 2005, pp. 8384.
[68] A. Muttreja, P. Mishra, and N. K. Jha, Threshold voltage control through multiple supply
voltages for power-efficient FinFET interconnects, in Proc. Int. Conf. VLSI Design, Jan.
2008.
112
[69] V. P. Trivedi, J. G. Fossum, and W. Zhang, Threshold voltage and bulk inversion effects in
nonclassical CMOS devices with undoped ultra-thin bodies, Solid-State Electronics, vol. 1,
pp. 170178, Dec. 2007.
[70] M. Popovich, E. G. Friedman, M. Sotman, and A. Kolodny, On-chip power distribution
grids with multiple supply voltages for high performance integrated circuits, in Proc. Great
Lakes Symp. VLSI, Apr. 2005, pp. 27.
[71] W. Zhao and Y. Cao, New generation of predictive technology model for sub-45nm design exploration, in Proc. Int. Symp. Quality of Electronic Design, May 2006, pp. 585590,
http://www.eas.asu.edu/ ptm.
[72] , Predictive technology model for nano-CMOS design exploration, ACM J. Emerging
Technologies in Computing Systems, vol. 3, no. 1, pp. 117, Apr. 2007.
[73] F. Wang, Y. Xie, K. Bernstein, and Y. Luo, Dependability analysis of FinFET circuits, in
Proc. Symp. Emerging VLSI Technologies and Architectures, Mar. 2006, pp. 399404.
[74] T. Sairam, W. Zhao, and Y. Cao, Optimizing FinFET technology for high-speed and lowpower design, in Proc. Great Lakes Symp. VLSI, Mar. 2007, pp. 7377.
[75] A. U. Diril, Y. S. Dhillon, A. Chatterjee, and A. D. Singh, Level-shifter free design of low
power dual supply voltage CMOS circuits using dual threshold voltages, IEEE Trans. VLSI
Systems, vol. 13, no. 9, pp. 11031107, Sept. 2005.
[76] L. Chang, S. Tang, T.-J. King, J. Bokor, and C. Hu, Gate length scaling and threshold
voltage control of double-gate MOSFETs, in Proc. Int. Electronic Device Mtg., Dec. 2000,
pp. 719722.
[77] D. Sylvester and K. Keutzer, Getting to the bottom of deep submicron, in Proc. Int. Conf.
[78] D. Chinnery and K. Keutzer, Linear programming for sizing, Vdd and Vth assignment, in
Proc. Int. Symp. Low Power Electronics & Design, Aug. 2005, pp. 149154.
113
[79] A. Srivastava and D. Sylvester, Minimizing total power by simultaneous Vdd /Vth assignment, in Proc. Asia South Pacific Design Automation Conf., Jan. 2003, pp. 400403.
[80] L. Su et al., Measurement and modelling of self-heating in SOI nMOSFETs, IEEE Electronic Device Lett., vol. 41, pp. 6975, Jan. 1994.
[81] S. Gangwal, S. Mukopadhyay, and K. Roy, Optimization for surface orientation for highperformance, low-power and robust FinFET SRAM, in Proc. Custom Integrated Circuits
Conf., Sept. 2006, pp. 433436.
[82] M. V. Dunga et al., BSIM-MG: A versatile multi-gate FET model for mixed-signal design,
in Proc. Int. Symp. VLSI Technology, June 2007, pp. 6061.
[83] D. D. Lu, M. V. Dunga, C. Lin, A. Niknejad, and C. Hu, A multi-gate MOSFET compact
model featuring independent gate-operation, in Proc. Int. Electronic Device Mtg., Dec. 2007,
pp. 565568.
[84] P. Mishra and N. K. Jha, Low-power FinFET circuit synthesis using surface orientation
optimization, in Proc. Design Automation & Test Europe Conf., Mar. 2010.
[85] J. K. Ousterhout, C. T. Hamachi, R. N. Mayo, W. S. Scott, and G. S. Taylor, Magic: A VLSI
layout system, in Proc. Design Automation Conf., June 1984, pp. 152159.
[86] C. Sechen and A. Sangiovanni-Vincentelli, The Timberwolf placement and routing package, in Proc. Custom Integrated Circuits Conf., May 1984, pp. 522527.
[87] J. Colinge, FinFETs and Other Multi-gate Transistors. Springer, New York, 2008.
[88] P. Mishra, A. Bhoj, and N. K. Jha, Die level leakage power analysis of FinFET circuits
considering process variations, in Proc. Int. Symp. Quality of Electronic Design, Mar. 2010,
pp. 347355.
[89] M. Agostinelli, M. Alioto, D. Esseni, and L. Selmi, Design and evaluation of mixed 3T4T FinFET stacks for leakage reduction, in Proc. Int. Wkshp. Power and Timing Modeling,
Optimization, and Simulation, Sept. 2008.
114
[90] A. Kumar, B. A. Minch, and S. Tiwari, Low voltage and performance tunable CMOS circuit
design using independently driven double gate MOSFETs, in Proc. Int. SOI Conf., Oct.
2004.
[91] S. A. Tawfik and V. Kursun, High speed FinFET domino logic circuits using independent
gate-biased double-gate keepers providing dynamically adjusted immunity to noise, in Proc.
Int. Conf. Microelectronics, Dec. 2007, pp. 175178.
[92] H. Ananthan and K. Roy, A fully physical model for leakage distribution under process
variations in nanoscale double-gate CMOS, in Proc. Design Automation Conf., July 2006,
pp. 413419.
[93] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, Statistical estimation of leakage current
considering inter-and intra-die process variation, in Proc. Int. Symp. Low Power Electronics
& Design, Aug. 2003, pp. 8489.
[94] H. Chang and S. S. Sapatnekar, Full-chip analysis of leakage power under process variations,
including spatial correlations, in Proc. Design Automation Conf., June 2005, pp. 523528.
[95] J. Fossum et al., A process-physics based compact model for nanoclassical CMOS device
and circuit design, Solid-State Electronics, vol. 48, pp. 919926, June 2004.
[96] Sentaurus TCAD, HSPICE, Design Compiler manuals. http://www.synopsys.com.
[97] A. Singhee and R. A. Rutenbar, From finance to flip flops: A study of fast quasi-Monte
Carlo methods from computational finance applied to statistical circuit analysis, in Proc.
Int. Symp. Quality of Electronic Design, Mar. 2007, pp. 685692.
[98] Y. Taur et al., A continuous, analytic drain current model for DG MOSFETs, IEEE Electronic Device Lett., vol. 25, no. 2, pp. 107109, Feb. 2004.
[99] W. Zhang, J. G. Fossum, L. Mathew, and Y. Du, Physical insights regarding design and performance of independent-gate FinFETs, IEEE Trans. Electron Devices, vol. 52, pp. 2198
2206, Oct. 2005.
115
[100] S. Bhardwaj, S. Vrudhula, P. Ghanta, and Y. Cao, Modeling of intra-die process variations
for accurate analysis and optimization of nano-scale circuits, in Proc. Design Automation
Conf., July 2006, pp. 791796.
[101] J. Xiong, V. Zolotov, and L. He, Robust extraction of spatial correlation, in Proc. Int. Symp.
Quality of Electronic Design, Aug. 2007, pp. 619631.
[102] N. Higham, Computing the nearest correlation matrix - a problem from finance, IMA Journal of Numerical Analysis, pp. 329343, July 2002.
[103] J. Rabaey, A. Chandrakashan, and B. Nikolic, Digital Integrated Circuits, 2nd ed.
Prentice
Hall, NJ, 2003.

[104] S. Soleimani, A. Afzali-Kusha, and B. Fourozadeh, Temperature dependence of propagation
delay characteristics in FinFET circuits, in Proc. Int. Conf. Microelectronics, Dec. 2008.
[105] F. Gamiz et al., Monte Carlo simulation of electron transport properties in extremely thin
SOI MOSFETs, IEEE Trans. Electron Devices, vol. 45, no. 5, pp. 11221126, May 1998.
[106] A. Mutlu and M. Rahman, Statistical methods for the estimation of process variation effects
on circuit operation, IEEE Trans. Electronics Packaging Manufacturing, vol. 28, no. 4, pp.
364375, Oct. 2005.
[107] G. E. P. Box, W. G. Hunter, and J. S. Hunter, Statistics for Experimenters: An Introduction to
Design, Data Analysis and Model Building. John Wiley and Sons, New York, 1978.
[108] Engineering statistics handbook, http://www.itl.nist.gov/div898/handbook.
[109] N. Aslan, Application of response surface methodology and central composite rotatable design for modeling and optimization of a multi-gravity separator for chromite concentration,
Powder Technology, vol. 185, no. 1, pp. 8086, 2008.
116

Mishra Princeton 0181D 10442 PDF

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Mishra Princeton 0181D 10442 PDF

Caricato da

Copyright:

Formati disponibili

Low-power FinFET Circuit Design and Synthesis under

Spatial and Temporal Variations

C ANDIDACY FOR THE D EGREE

R ECOMMENDED FOR ACCEPTANCE

by Prateek Mishra, 2012.

All Rights Reserved

Obstacles to scaling of the conventional transistor . . . . . . . . . . . . . . . . . .

Different kinds of DGFETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

FinFET logic synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

FinFET process variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The principle of TCMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Library design using TCMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Power optimization methodology . . . . . . . . . . . . . . . . . . . . . . . . . . .

Phase I: Initialization of the circuit . . . . . . . . . . . . . . . . . . . . . .

Phase II: Linear programming formulation . . . . . . . . . . . . . . . . .

Application of methodology to c17 . . . . . . . . . . . . . . . . . . . . .

Comparison to conventional multiple-Vdd approach . . . . . . . . . . . . .

4 Low-power FinFET Circuit Synthesis Using Surface Orientation Optimization

FinFET device simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

FinFET device parameters . . . . . . . . . . . . . . . . . . . . . . . . . .

Channel orientation effects . . . . . . . . . . . . . . . . . . . . . . . . . .

Optimal reverse bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Logic design using surface orientation optimization . . . . . . . . . . . . .

Library characterization and area effects . . . . . . . . . . . . . . . . . . .

Power optimization methodology . . . . . . . . . . . . . . . . . . . . . . . . . . .

Linear programming framework . . . . . . . . . . . . . . . . . . . . . . .

Die-level Leakage Power Analysis of FinFET Circuits Considering Process Variations 57

Modeling leakage in FinFET logic gates . . . . . . . . . . . . . . . . . . . . . . .

Leakage in a single SG/IG FinFET device . . . . . . . . . . . . . . . . . .

Leakage in FinFET standard cells . . . . . . . . . . . . . . . . . . . . . .

Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Effect of temperature on delay . . . . . . . . . . . . . . . . . . . . . . . .

Screening spatial process parameters for relative importance . . . . . . . .

Design of experiment (DOE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Validation of the RSM model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Dependence of delay model on temperature . . . . . . . . . . . . . . . . . . . . .

Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Conclusions and Future Research

Different kinds of DGFETs [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A multiple-fin FinFET structure . . . . . . . . . . . . . . . . . . . . . . . . . . .

FinFET structures: (a) SG and (b) IG . . . . . . . . . . . . . . . . . . . . . . . . .

Breakdown of power consumption of ICs in future designs [4] . . . . . . . . . . .

Comparison of fin density in spacer and optical lithography [5] . . . . . . . . . . .

Different kinds of FinFET NAND gate designs [6] . . . . . . . . . . . . . . . . . .

The principle of TCMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Simulated Ids -Vgf s characteristics for an overdriven 32nm nFinFET . . . . . . . .

NAND gate employing the TCMS principle . . . . . . . . . . . . . . . . . . . . .

Power optimization flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A circuit to illustrate delay constraints . . . . . . . . . . . . . . . . . . . . . . . .

Delay-minimized netlist obtained using Design Compiler . . . . . . . . . . . . . .

Power-minimized netlist obtained using the TCMS principle . . . . . . . . . . . .

3.10 Power-minimized netlist obtained using ECVS . . . . . . . . . . . . . . . . . . .

3.11 Power breakdown for delay-minimized circuits . . . . . . . . . . . . . . . . . . .

3.12 Power breakdown for power-optimized circuits . . . . . . . . . . . . . . . . . . .

3.13 Reduction in power consumption at various ATCs . . . . . . . . . . . . . . . . . .

3.14 Constitution of circuits by mode in ECVS circuits . . . . . . . . . . . . . . . . . .

BSIM-simulated Ids vs. Vds characteristics for different orientations . . . . . . . .

BSIM-simulated DC transfer characteristics for a 32nm FinFET . . . . . . . . . .

Optimal back gate bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Two dimensional (X-Y) cross-section of an nFinFET simulated in Sentaurus TCAD

ILEAK spreads for LU N , TOX , LG and TSI , each varying independently . . . . . .

Matching IG-mode TCAD simulations with the macromodel for different Vb . . . .