Sei sulla pagina 1di 68

CG1001 Introduction to Computer

Engineering
Module on Systems-on-Chip: Module on Systems on Chip:
Challenges and Directions for Green
VLSI Processing Circuits
Massimo Alioto
VLSI Processing Circuits
ECE Department
National University of Singapore (NUS)
Outline
h f k i l f i h b d + The framework: Computing platforms in the broad sense
+ Historical Trends towards Multi-Core through Moores g
Law
+ Aggressive Voltage Scaling Minimum-Energy + Aggressive Voltage Scaling, Minimum-Energy
Computation and Limits
+ Opportunities to Improve Energy Efficiency/Voltage
Scalability
+ Beyond-CMOS Ultra-Low Voltage Circuits
+ Conclusions
2
prof. Massimo Alioto
+ Conclusions
The Framework:
Computing Platforms in the Broad Sense Computing Platforms in the Broad Sense
3
prof. Massimo Alioto
Computing Platforms: The Big Picture
+Computing/sensing platforms are rapidly expanding
*
+ networks move towards macro and nano scale
nano scale (self-powered nodes)
meso scale (portable/handheld)
macro scale (data centers)
+ macro: cloud computing
+ nano: ubiquitous computing/sensing q p g g
4
prof. Massimo Alioto
*
adapted from MuSyC FCRP center
Computing Platforms: The Big Picture
+Computing/sensing platforms are rapidly expanding
+ networks move towards macro and nano scale
meso
nano
eso
macro
5
prof. Massimo Alioto
Computing Platforms: The Big Picture
+Computing/sensing platforms are rapidly expanding
+ networks move towards macro and nano scale
meso
nano
eso
macro
New concepts
+ Internet of things
6
prof. Massimo Alioto
Computing Platforms: The Big Picture
+Computing/sensing platforms are rapidly expanding
+ networks move towards macro and nano scale
meso
nano
New concepts
CE l N
eso
macro
+ CEntral Nervous
System for the Earth
7
prof. Massimo Alioto
Computing Platforms: The Big Picture
+Computing/sensing platforms are rapidly expanding
+ networks move towards macro and nano scale
meso
nano
New concepts
+ data-centric
eso
macro
+ data centric
collective
intelligence,
8
prof. Massimo Alioto
Computing Platforms: The Big Picture
+Computing/sensing platforms are rapidly expanding
+ networks move towards macro and nano scale
meso
nano
New applications
+ personalized services
eso
macro
+ personalized services
9
prof. Massimo Alioto
Computing Platforms: The Big Picture
+Computing/sensing platforms are rapidly expanding
+ networks move towards macro and nano scale
meso
nano
New applications
+ intelligent transportation
eso
macro
+ intelligent transportation
10
prof. Massimo Alioto
Computing Platforms: The Big Picture
+Computing/sensing platforms are rapidly expanding
+ networks move towards macro and nano scale
meso
nano
New applications
eso
macro
+ advanced water/energy
management,
11
prof. Massimo Alioto
Historical Trends towards Multi-Core
through Moores Law through Moore s Law
12
prof. Massimo Alioto
CMOS Integrated Circuits
+MOS transistor
+ Shockley-Brattain-Bardeen
(1947, Bell Labs)
+ cheap: silicon based + cheap: silicon based
+Integrated Circuit (IC)
chip
multiple transistors +
interconnects =
--------------------------
IC (chip)
+ Jack Kilby (1958)
IC ( chip )
packaged chip
13
prof. Massimo Alioto
demonstrated 1
st
IC
PCB
Gordon Moores Prediction
+CMOS technology scaling
X
Y
Z
0.7X
0.7Y
0.7Z
+ 2X more transistors/chip
previous generation
next generation
+Prediction in 1965 (not a law)
+ Moores law: 1 generation/24 months
+ exponential growth in transistor count
14
prof. Massimo Alioto
10 m
+Unbelievably accurate
+ based on few points
22
+ self-fulfilling prophecy
22 nm
+ semiconductor industry:
high risk, high ROI 320 B$ in 2013
+ costly fab (4 B$ @ 32 nm)
+ fast obsolescence (5 yrs)
+ volatile , expands (more apps)
1
+ volatile , expands (more apps)
White Paper Forecasting the
450mm Ramp Up
IC Knowledge LLC
1
2
%
/
y
r
4
%
/
y
r
6
.
3
%
/
y
r
15
prof. Massimo Alioto
+ colossal investments
+ coordinated by International Technology Roadmap for
Semiconductors (ITRS) Semiconductors (ITRS)
+ process, device, circuits
+ challenges, performance,
consumption, capabilities,
16
prof. Massimo Alioto
As a Result of CMOS Scaling
+CMOS scaling trends for microprocessors (macro scale)
+ before 2005: Moores law + Dennards scaling (voltage +)
+ exponential growth in # of transistors and performance
I
n
t
e
l

4
1
2

m
m
I
n
t
e
l

4
1
2

m
m
17
prof. Massimo Alioto
4
0
0
4
m
2
4
0
0
4
m
2
+ exponential growth in power consumption (area limited)
+ 2005: power density reached max (40 W/cm
2
, 100-150 W)
+ Thermal Design Power (TDP): 30-130 W (mobile-server)
+ every performance increase requires better energy efficiency
I
n
t
e
l

4
1
2

m
m
18
prof. Massimo Alioto
4
0
0
4
m
2
+ Power vs. energy efficiency
+ improve energy per operation (Eop) to increase throughput
lkg op chip
P throughput E P + =
+ improve energy per operation (Eop) to increase throughput
+ keep leakage power small enough (10-15%)
+ Multi-core systems + Multi core systems
+ E
op
V
2
+ single-core performance V
I
n
t
e
l

4
1
2

m
m
19
prof. Massimo Alioto
4
0
0
4
m
2
+ Power vs. energy efficiency
+ improve energy per operation (Eop) to increase throughput
lkg op chip
P throughput E P + =
+ improve energy per operation (Eop) to increase throughput
+ keep leakage power small enough (10-15%)
+ Multi-core systems + Multi core systems
+ E
op
V
2
+ single-core performance V
+ + E
op
by using lower V
+ use multiple cores to
I
n
t
e
l

4
1
2

m
m
+ use multiple cores to
improve peformance
20
prof. Massimo Alioto
4
0
0
4
m
2
Multi-Core: Numerical Example
+Post-Dennard scaling
+ keep V
TH
, V
DD
~constant
+ performance becomes power limited use area (more
transistors available use them to improve efficiency)
+ use silicon for low power density blocks (cache 10 W/cm
2
) p y ( )
that strongly impact total speed, rather than logic (30 W/cm
2
)
D. Frank, Power Constrained CMOS Scaling Limits, IBM J. RES. & DEV. VOL. 46 NO.
2/3 MARCH/MAY 2002
+ Example (iso power/technology):
V 0 8V 0 63V
core 1
V
DD
f
area = 1
power = 1
0.8V
DD
0.8f
area = 2
power = 1
core core 1
core 2
0.63V
DD
0.63f
area = 4
power = 1
core 2
core 3
21
prof. Massimo Alioto
power 1
throughput = 1
power 1
throughput = 1.6
power 1
throughput = 1.6
2
= 2.5
core 4
Multi-Core Scaling
+Multi-core era will not last long
+ [ISCA2011]: announced catastrophe Dark Silicon and the End
f M lti S li d t i d t ffi i of Multicore Scaling due to inadequate energy efficiency
+ percentage of unusable dark silicon p g
is growing fast
22
prof. Massimo Alioto
[ISCA2011] H. Esmaeilzadeh et al., Dark Silicon and the End of Multicore Scaling ISCA, June 2011
+Multi-core era will not last long
+ [ISCA2011]: announced catastrophe Dark Silicon and the End
f M lti S li d t i d t ffi i of Multicore Scaling due to inadequate energy efficiency
+ percentage of unusable dark silicon p g
is growing fast
23
prof. Massimo Alioto
[ISCA2011] H. Esmaeilzadeh et al., Dark Silicon and the End of Multicore Scaling ISCA, June 2011
+Multi-core era will not last long
+ [ISCA2011]: announced catastrophe Dark Silicon and the End
f M lti S li d t i d t ffi i of Multicore Scaling due to inadequate energy efficiency
+ percentage of unusable dark silicon p g
is growing fast
i i i 2016 f f li + new power crisis in 2016 for processors: no reason for scaling
24
prof. Massimo Alioto
[ISCA2011] H. Esmaeilzadeh et al., Dark Silicon and the End of Multicore Scaling ISCA, June 2011
A Broader View of Dark Silicon
+At macro scale, dark refers to spatial dimension
+At nano scale (self-powered nodes)
+ inadequate energy efficiency dark silicon along the
temporal dimension (intermittent available power)
available
energy
no
operation
normal
operation
normal
operation
no
operation
t
+At meso scale (portable)
+ dark silicon in both spatial (power constraint~1-2 W) and
25
prof. Massimo Alioto
temporal dimension (limited lifetime @ given functionality)
The Energy Efficiency Challenge
+Remove dark silicon to make the best of scaling
+ clever ideas to replace it with green silicon
+ needs to be done from nano to macro scale
meso
nano
(globally green)
+ nano-macro scale platforms can evolve
l if i l d ll l l
macro
26
prof. Massimo Alioto
only if energy issues are solved at all levels
Green IC Group
+Painting silicon green: mission of Green IC group
+ www.green-ic.org
meso
nano
macro macro
27
prof. Massimo Alioto
Aggressive Voltage Scaling, Minimum-Energy
Computation and Limits Computation and Limits
28
prof. Massimo Alioto
Voltage Scaling: Dynamic Energy
+If dynamic energy per clock
dominates:
2
DD SW dyn
V C E =o
+ affected by switching activity, capacitance, voltage
+ reduce V
DD
as much as possible
+ energy reduction limited by V
DD,min
(functional/timing failures)
E
E
dyn
29
prof. Massimo Alioto
V
DD
V
DD,min
Voltage Scaling: Leakage Energy
+If leakage (static) energy per cycle dominates
CK off DD lkg
T I V E =
+ affected by
+ supply voltage, leakage, clock cycle (logic depth*gate delay)
C off lkg
+ V
DD
reduction and trends
D Q

clk
Reg comb 1 D Q

clk
Reg comb 2 D Q

clk
Reg comb n D Q

clk
Reg ...
stage 1 stage 2 stage n
E
DD
+ linear
+ ~constant
i i + exponentially growing
E
lkg
exponentially increases
E
lkg
30
prof. Massimo Alioto
at low V
DD V
DD
V
DD,min
Voltage Scaling: Total Energy
+Total energy vs. V
DD
+ tradeoff between E
dyn
and E
lkg
+ minimum-energy point (MEP) exists
E
E
TOT
E
dyn
E
lkg
d i d b i l b l f d
V
DD
V
DD,min
V
DD,opt
31
prof. Massimo Alioto
+ MEP determined by optimal balance of E
dyn
and E
lkg
Importance of Voltage Scaling: Broader View
+Minimum-energy operation for better (10X) energy
efficiency + circuit/architectural/SW integration
+ permit performance increase at macro scale
+ reduces battery size
and lifetime at meso/nano
+Voltage scaling is po erf l
by courtesy of D Blaauw
+Voltage scaling is powerful
+ intrinsic in Dennard scaling
+ P t D d li
by courtesy of D. Blaauw
+ Post-Dennard scaling
+ aggressive voltage scaling: do it by yourself
+ as much as possible (give up something variable workload)
32
prof. Massimo Alioto
p (g p g )
+ deal with related issues
Ultra-Low Voltage (ULV) Operation:
+Energy reduction comes
Limits and Challenges
at a price
+ f + performance
+ leakage energy
+ resiliency
lkg
energy
performance
yield
failure
rate
+ yield
+ design effort testing time
gy y
V
33
prof. Massimo Alioto
+ design effort, testing time
V
DD
Limits and Challenges
+Delay increases at low voltages
+ linear degradation down
!"#$%&
!
!
"
"
"
#

$
g
to near threshold (10X)
+ crucial also in mobile
!"#$%!
!"#$%'
!"#$%(
#
$
%


&
'
(
)
*
+
,
-
.
/

0
'

#
$
%
!
systems (responsiveness)
+Leakage energy bigger portion of power budget
!"#$%%
'%% &%% )%% *%% !%%%
!
""
1)$2
M. Alioto(TCAS-I 2012)
g gy gg p p g
+ easily >50%: limits min. energy!
+ more critical: MEM >50% area
&
+
)
,
*
%
&
'
+ traditional techniques ineffective
+ weaker stack effect, multi-VTH
%
!
'
(
!%% '%% (%% &%% +%% )%% ,%% *%% -%% !%%%
(
)
*
%
! 1 $2
34
prof. Massimo Alioto
unfeasible (limits V
DD
scalability)
!
""
1)$2
./0 1' 2345678 9:;<= ./0 1( 2345678 9:;<=
./>> 1' 2345678 9:;<= ./>> 1( 2345678 9:;<=
Limits and Challenges
!+
+Resiliency degraded at ULV
+ process/voltage/temperature
+
!%
!+
$
%

&
'
(
)
*
+
,
-
.
/

*
+

*
,
-

&
%
)
-


$
3
3
.
/

!
p g p
+ 5-10X more process variations
(delay: easily 2X variations)
%
+
'%% &%% )%% *%% !%%%
o
4

'
5

#
$&
!
""
1)$2
M. Alioto(TCAS-I 2012)
+ 5X higher sensitivity to V
DD
p
r
o
c
e
s
s
v
o
l
t
a
g
e
e
m
p
e
r
a
t
u
r
e
+ design margining
+ cycle margin (20-30% @ full V
DD
)
nominal margin
p
t
e
R. Krishnamurthy (Micro 2012)
+ cycle margin (20-30% @ full V
DD
)
+ at near threshold, easily 2X margin
(in speed binning, many discarded)
35
prof. Massimo Alioto
+ + performance/energy efficiency
Limits and Challenges
+Aging (depends on history, workload, voltage, temperature)
+Soft errors
+ 2X error rate/gen (2X transistors )
nominal margin
+ 2X error rate/gen. (2X transistors)
+ higher failure rate at ULV
+Degraded functionality at ULV
+ logic V >300 mV with adequate yield
VDD,min increase due to
variations 8

9

v
t
1
4

v
t

3
5
0

m
V
+ logic V
DD
>300 mV with adequate yield
+ degraded I
on
/I
off
(incomplete switching)
+ MEM arrays: much less scalability (0.6-0.7 V)
theoretical lower bound
VDD,min increase due to intrinsic
NMOS/PMOS imbalance
VDD,min increase due to residual
PUN/PDN imbalance
0
.
5

v
t
2
.
5

v
t
v
t
1
3

1
3
2
5

3
36
prof. Massimo Alioto
theoretical lower bound
2

v
M. Alioto (TCAS-I 2012)
Opportunities to Improve Energy
Efficiency/Voltage Scalability Efficiency/Voltage Scalability
37
prof. Massimo Alioto
Near-Threshold ICs
+Parallelism compensates speed loss
+ enhanced by 3D chip stacking y p g
+ th h ld ti i i 10X ffi i + near threshold computing very promising: 10X efficiency
+ enables data center scalability
(D. Blaauw/D. Sylvester) ( . uw/ . Sy ves e )
+ can enable exascale
computing by 2020
(Shekhar Borkar, Intel)
R. Krishnamurthy (ISSCC 2008)
38
prof. Massimo Alioto
R. Krishnamurthy (ISSCC 2008)
+Near-threshold computers will be different
+ logic/MEM: different scaling (MEM becomes faster) g g ( )
+ less cache levels, bigger cache
+ better logic/MEM coupling through 3D integration
+More efficient and scalable microarchitectures
d i li l l k + deep pipeline: lower leakage energy
D Q

clk
Reg comb 1 D Q

clk
Reg comb 2 D Q

clk
Reg comb n D Q

clk
Reg ...
stage 1 stage 2 stage n
CK off DD lkg
T I V E =
ultra-low power = high speed
+ only 17FO4/stage in 1,024-point complex FFT, 4X lower energy
g g g
39
prof. Massimo Alioto
(D. Blaauw, D. Sylvester ISSCC 2011)
+Finer-grain power domains
+ suppress leakage through power gating pp g g p g g
.

.

.
+ ne t step: fine grain/freq ent (mod le/stage) + next step: fine-grain/frequent (module/stage)
+ challenges
+ minimize energy cost at sleep-active + minimize energy cost at sleep-active
transitions
+ minimize area overhead of sleep transistors
40
prof. Massimo Alioto
+Finer-grain voltage domains
+ currently: cores share same voltage, different frequency y g , q y
lkg op chip
P throughput E P + =
+ slower cores might operate at lower voltage (+ E
op
, + P
lkg
)
+ not possible (share same voltage)
rr
+ multiple on-chip regulators on sight
+ cores with independent voltages and
r
e
g
u
l
a
t
o
r
r
e
g
u
l
a
t
o
r
e
g
u
l
a
t
o
r
e
g
u
l
a
t
o
r
+ cores with independent voltages and
different frequencies
+ can exploit workload reduction to further
r
e
r
e
r
e
g
u
l
a
t
o
r
r
e
g
u
l
a
t
o
r
41
prof. Massimo Alioto
reduce E
op
and P
lkg
Enhance Energy Efficiency: Heterogeneity
+Exploit heterogeneity (different scaling at ULV)
+ area is commodity: give
R. Krishnamurthy (Micro 2012)
y g
up flexibility for better
efficiency
+ HW accelerators
(media, image, crypto,
di DSP FPGA ) radio, DSP, FPGA)
+ same function in different IPs
+ ex : big-little ARM (2-3X better) + ex.: big-little ARM (2-3X better)
+ more extreme: use different
replicas with different variations
module 1
module 2
42
prof. Massimo Alioto
+ energy efficiency more testing delay
use
module 2
use
module 1
Enhance Energy Efficiency
+Limit communication energy
+ exploit locality at different scales p y
+ limit off-chip (2-10X intrachip)
+ limit intra-chip (1-10X computation)
B. Dally (CICC 2012)
+ restrict data structure and flow (SIMD)
+Reduce clocking energy (40+%) +Reduce clocking energy (40+%)
+ better Flip-Flops (post-silicon tuning)
+ CP
3
L: 1 3-2 3X better energy efficiency + CP L: 1.3-2.3X better energy efficiency
+ better clock domain design
+ clock slope optimization: 35% better
43
prof. Massimo Alioto
energy efficiency [Alioto TCAS-I 2010]
M. Alioto (ISSCC 2012)
Margin Elimination: Design vs. Testing Time
+Uncertainty margin at design time is too expensive
+ post-silicon (self)tuning absolutely needed
+ eliminate margin: optimally allocate cost/design effort at
design/ testing / boot / run time
increase design margin, improve
understanding/modeling, more robust
+ complexity |, uncertainty |
design...
+ ckts people, architects and
testing people need to
post-silicon
tune at testing
time, adapt at
run time
44
prof. Massimo Alioto
play in the same field
decisions related
to design time
decisions related to
testing / boot / run time
run time
Margin Elimination: Timing Error Detection
+Reduce/eliminate worst-case
margin by catching delay faults
nominal margin
+ correct at run-time, tune to compensate actual variations
+ run-time testing improves energy efficiency
+ can speculatively reduce energy (if critical path is infrequent) + can speculatively reduce energy (if critical path is infrequent)
I i i i F l di i In-situ monitoring
+ no margin
Fault prediction (Tunable Replica Circuit)
+ needs some margin (false positives,
mimics only critical path)
45
prof. Massimo Alioto
+ invasive, limited tuning + little invasive, tuning required, low overhead
Margin Elimination: Timing Error Detection
+Timing monitoring: some circuit approaches
double sampling transition detection
Razor
Razor II
(Umich)
(Umich)
+ slow propagation of error
DSTB
(Intel)
TDTB
(Intel)
+ slow propagation of error
to architecture through OR tree
+ hold-time/detection window (TD)
46
prof. Massimo Alioto
+ metastability in data (Razor)/error path (others)
Margin Elimination: Error Correction
+Faults can be corrected at various levels
faster correction
SW Architecture Microarchitecture Circuit
Circuit Microarchitecture Architecture SW
less HW resources
SW Architecture Microarchitecture Circuit
lower energy/performance penalty
+ energy overhead
energy of traditional
margined design
t
h
r
o
u
g
h
p
u
t

d
e
g
r
a
d
a
t
i
o
n

d
u
e

t
o

i
n
c
r
e
a
s
e
d

e
r
r
o
r

r
a
t
e
energy reduction
through margin p
u
t
/
I
P
C
t
e
op
correction
E throughput
rate error E E
+
+ =
P i t f
energy reduction below
PoFF
error rate
increase
below PoFF
through margin
elimination
minimum energy
under error det./corr.
e
n
e
r
g
y
t
h
r
o
u
g
h
p
e
r
r
o
r

r
a
t
47
prof. Massimo Alioto
V
DD
margined VDD
(traditional
design)
Point of
First
Failure
(PoFF)
energy-
optimum
VDD
Margin Elimination: Error Correction
+Existing approaches
+ circuit
+ clock gating (Umich)
+ clock stretching (Georgia Tech)
+ error propagation within a clock cycle (very hard)
+ microarchitecture
+ counterflow pipelining (Umich)
+ micro-rollback (Umich) ( )
+ Bubble Razor (Umich)
+ interferes with microarchitecture/
cycle-based timing y g
+ architecture
+ instruction re-execution (Intel), simple, large
energy/performance penalty
48
prof. Massimo Alioto
energy/performance penalty
+ checkpoint-restart (Wisc), simple, very large penalty
The Next Step: Sub-Cycle Detection/Correction
+Existing approaches are cycle-based
from J. Crop et al.,
JLPEA, 2011
+ correction interferes with microarchitecture (design effort)
+ errors affect timing at boundary: difficult SoC integration
+ large energy penalty in high error rate regime (future)
+Our vision
+ sub-cycle detection/correction
+ errors detected/corrected in the same cycle
+ or, at least, errors do not have to propagate to the boundary
49
prof. Massimo Alioto
, , p p g y
+ so that errors are confined and determine low energy penalty
Approximate Computing as Extreme Scaling
+Some apps do not need to have perfect computation
+ aggressively push voltage and tolerate errors
+ approximate computing (voltage overscaling by N. Shanbhag,
K. Roy) y)
+ ex.: multimedia (occasionally wrong pixels/samples)
+ errors not corrected on the fly + errors not corrected on the fly
+ rather, avg error rate kept within bound (slow correction loop)
+ degradation of signal quality can be dynamically adjusted
(application level)
50
prof. Massimo Alioto
Our Approach: User Experience-Centric Design
+Voltage/energy reduction in portable multimedia for
a given quality of user experience
20
40
g q y p
+ tight link between circuit and final user
+ errors are acceptable
20 40 60 80 100 120 140 160
60
80
100
120
140
PSNR=24 dB
+ metrics for quality of user experience (PSNR)
+ close circuit design loop at application level
20
40
60
80
100
120
+ minimize energy for given quality
20 40 60 80 100 120 140 160
140
PSNR=36 dB
+ energy scalability: reduce energy
if lower quality is accepted
+ d i li
51
prof. Massimo Alioto
+ dynamic scaling
+Limits of recent work on energy scalability (SRAM)
+ [Wolf2009], [Kurdahi2008]: aggressive V
DD
scaling to reduce
energy at the cost of higher BER
+ very limited voltage/energy
l bilit BER (V )
BER
(or PSNR)
BER
energy
scalability: BER exp(V
DD
)
abruptly increases
targeted
quality
+ same limitation in mixed 6T/8T SRAM [Roy2011]
V
DD
+ s e o ed / S [ oy ]
+ near threshold, 6T array almost always fails, 8T almost never
fails
52
prof. Massimo Alioto
+ not really scalable either
+Our approach
+ errors have different impact depending on where they
occur
+ optimal energy allocation: protect (=spend energy) only
important bits to have graceful degradation (various knobs)
+ when limiting precision, use
unused bits to improve resiliency
h V d + can push more on V
DD
to reduce
energy at same quality
+ tl 28 hi d t t
53
prof. Massimo Alioto
+ currently, 28-nm chip under test
+Our approach
+ errors have different impact depending on where they
occur
+ optimal energy allocation: protect (=spend energy) only
important bits to have graceful degradation (various knobs)
+ when limiting precision, use
unused bits to improve resiliency
h V d + can push more on V
DD
to reduce
energy at same quality
+ tl 28 hi d t t
54
prof. Massimo Alioto
+ currently, 28-nm chip under test
+Results in 28-nm
+ 32-kb SRAM, YUV format
(QCIF 144x176)
+ Akiyo video, frame #30
+ 50%energy reduction at same + 50%energy reduction at same
PSNR w.r.t. voltage scaling
+ 41%better PSNR(dB) at same energy
20
40
A
20
40
B
20
40
Original
60
80
100
120
60
80
100
120
60
80
100
120
55
prof. Massimo Alioto
20 40 60 80 100 120 140 160
140
20 40 60 80 100 120 140 160
140
20 40 60 80 100 120 140 160
140
Other Opportunities
+Enable burst very high-speed computation
+ just violate reliability constraint j y
+ temporarily exceed Thermal Design Power
+ leverage thermal cap for DVFS Turbo Boost
2.0 [Intel, Rotem et al., HOTCHIPS 2011]
+ enhance thermal cap via phase change materials
Computational Sprinting [Raghavan HPCA 2012] Computational Sprinting [Raghavan HPCA 2012]
56
prof. Massimo Alioto
Our Vision of Distributed Power Management
+Globally green systems
+ energy-efficient, widely energy scalable gy , y gy
and externally tunable components
+ need for communication (energy state, knob tuning)
+ global policies based on information on energy state
ENERGY
MANAGEMENT
CHANNEL
TRADITIONAL
COMMUNICATION
CHANNEL
REG
inputs
self-adjust
internal
internal
EX.: bus, NoC, crossbar...
EX.: throughput,
arithmetic precision...
instantaneous requirements ments
sensors
internal
knobs to
minimize
energy
settings processing
added to enable energy
scalability and dynamic
tradeoff with other assets
MODULE
energy-related parameters meters
57
prof. Massimo Alioto
outputs
EX.: timing slack, bit error rate...
Our Vision of Distributed Power Management
+ keep it simple (integration), yet maintain global view:
hierarchical structure
g
h
e
r

l
e
v
e
l

i
n

h
i
e
r
a
r
c
h
y
+ bl t t ( l b l i d
h
i
g
+ enables remote power management (global view and
intelligence kept out of nano-scale nodes)
+ move computation where more efficient (computation vs.
58
prof. Massimo Alioto
communication, locality, heterogeneity)
Beyond-CMOS Ultra-Low Voltage Circuits
59
prof. Massimo Alioto
Tunnel-FETs: a Very Promising Alternative
+ Main limit to voltage scaling of CMOS transistor
+ V
TH
can be reduced only if
bth h ld l (SS) i subthreshold slope (SS) is
lowered at given leakage
+ use new devices with
l bth h ld l lower subthreshold slope
+ Tunnel FETs: very promising (ITRS: after 2020)
+ Physical structure
p+
i
n+
60
prof. Massimo Alioto
n
metal
Tunnel-FETs: Robustness Comparison
+ Comparison with CMOS bulk (FinFET) / SOI
+ fair: all optimized for ULV, same targets (leakage)
+ Noise margin degradation at ULV
+ ~linear '%%
'+%
!
""0123
167862 " 9: )$
!
""0123
1;<=2 " >: )$
!
""0123
1?@+A2 " BC )$
+ min. operating voltage:
+%
!%%
!+%
4
5
1
)
$
2
TFET SOI bulk
+ th SOI
%
!%% !+% '%% '+% (%% (+% &%% &+% +%%
!
""
1)$2
?@#? <;. ABC6
TFET SOI bulk
V
DD,min
78.6 mV 58.1 mV 90.5 mV
&%
+%
)%
,%
+ worse than SOI
+ due to lower output
resistance (gain)
%
!%
'%
(%
&%
!%% !+% '%% '+% (%% (+% &%% &+% +%%
6
!
61
prof. Massimo Alioto
(g )
!%% !+% '%% '+% (%% (+% &%% &+% +%%
!
""
1)$2
?@#? <;. ABC6
Tunnel-FETs: Performance Comparison
+ Transistor speed: FO4 at ULV
+ impressive speed advantage at ULV
+ @ 200 mV: TFET is
42X faster than bulk
10X faster than SOI
!"%#%,
!"%#%)
!"%#%+
!"%#%&

1
D
2
10X faster than SOI
!"%#!!
!"%#!%
!"%#%-
!"%#%*
#
$
%
+ Practical example
+ TFET can operate at MHz @ 250 mV
!%% !+% '%% '+% (%% (+% &%% &+% +%%
!
""
1)$2
?@#? <;. ABC6
V
DD
TFET SOI Bulk
LD=40FO4 (ultra-
energy efficient)
250 mV 3.4 MHz 860 kHz 100 kHz
400 mV 19.5 MHz 92 MHz 5.2 MHz
LD=100FO4 250 mV 1 4 MHz 350 kHz 41 kHz
62
prof. Massimo Alioto
LD 100 FO4
(typical)
250 mV 1.4 MHz 350 kHz 41 kHz
400 mV 7.8 MHz 37 MHz 2.1 MHz
Tunnel-FETs: Energy Comparison
+ FO4 inverter chain (10% activity, 16 slices)
+ min. energy vs. logic depth
+ max. TFET advantage
w.r.t. SOI 35% @ 60FO4
wr t bulk 43%@20FO4 % &
%")
%"*
!
!"'
!"&
!")
(
)
*
+
,
-
.
/

0
'

6
7
8
6
w.r.t. bulk 43% @ 20FO4
%
%"'
%"&
'% &% )% *% !%% !'% !&% !)% !*% '%%
7
+
8
*
&
'
(
+'E,F /.G0H 9"
#/D3 1<;.=E#/D31?@#?= #/D3 1ABC6=E#/D31?@#?=
+ TFET exhibits better energy in practical ULV arch.
+ circuit-architecture co-design needed to take full
advantage of TFET potential
+ TFET has 2X energy sensitivity to V
DD
uncertainty
d MEP
63
prof. Massimo Alioto
around MEP (needs more accurate voltage tuning)
Tunnel-FETs: Energy Comparison
+ Impact of transistor stacking
+ at ULV, leakage reduction in 2-4 stacked TFETs is
+ 5-8X better than SOI, 3-6X better than bulk
+ at ULV, I
on
reduction in 2-4 stacked TFETs is
+ 2X better than SOI and b lk + 2X better than SOI and bulk
+ TFET cells with larger fan-in provide more benefits
+ faster, lower leakage lower min. energy
+ TFET standard cell libraries must include higher fan-in cells
+ Example: zero-detector with 4-input gates
+ min. energy improved by 1.79X (1.84X) w.r.t. to SOI (bulk)
+ much better than inverters
64
prof. Massimo Alioto
+ much better than inverters
Tunnel-FETs: SRAM cell
+ System voltage scalability limited by SRAM cell
+ small margins, sensitive to variations
+ 8T cell

+ about same area (33 X 13.4 F
2
)
+ TFET SNM scales better
'%%
'+%
5
1
)
$
2
%
+%
!%%
!+%
(
.
*
/

)
*
(
E
,
&

:
4
5
TFET SOI bulk
V
DD
>140 mV 30% V
DD
35% V
DD
30% V
DD
V 100 V 25% V 18% V 10% V
~
65
prof. Massimo Alioto
!%% !&% !*% ''% ')% (%% (&% (*% &'% &)% +%%
!
""
1)$2
F748 <9: 1?@#?= F748 <9: 1<;.= F748 <9: 1ABC6=
V
DD
=100 mV 25% V
DD
18% V
DD
10% V
DD
Conclusions
+ Future computing platforms (macro, meso, nano)
+ Green: energy efficiency is key in any component
+ Ultra-low voltage is really challenging
+ speed, leakage, resiliency (design margin) p g y ( g g )
+ Opportunities to overcome challenges
+ margin reduction
+ heterogeneity
+ fine-grain/independent power domains
+ di t hit t / i it d i + coordinate architecture/circuit design
+ use better devices
+ 10X energy efficiency targeted by 2020 (8 nm)
66
prof. Massimo Alioto
+ 10X energy efficiency targeted by 2020 (8 nm)
67
prof. Massimo Alioto
Speakers Contacts
E-mail malioto@ieee.org
Massimo Alioto, Ph.D.
massimo.alioto@nus.edu.sg
Homepage: http://wwwgreen ic org Homepage: http://www.green-ic.org
ECE Department ECE Department
National University of Singapore (NUS)
4 Engineering Drive 3, Singapore 117576
68
prof. Massimo Alioto

Potrebbero piacerti anche