Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Power Design
Vishwani D. Agrawal
James J. Danaher Professor
Department of Electrical and Computer Engineering
Auburn University
http://www.eng.auburn.edu/~vagrawal
vagrawal@eng.auburn.edu
Why is it a concern?
Source: http://www.semichips.org
2/8/06 D&T Seminar 3
ISSCC, Feb. 2001, Keynote
“Ten years from now,
microprocessors will run at
10GHz to 30GHz and be capable
of processing 1 trillion operations
per second -- about the same
number of calculations that the
world's fastest supercomputer
can perform now.
Patrick P. Gelsinger
Senior Vice President
General Manager “Unfortunately, if nothing
Digital Enterprise Group
INTEL CORP.
changes these chips will produce
as much heat, for their
proportional size, as a nuclear
reactor. . . .”
Rocket
Power Density (W/cm2)
1000
Nozzle
Nuclear
100
Reacto
r Plate
Hot
8086
10 4004 P6
8008 8085 386 Pentium®
286 486
8080
1
1970 1980 1990 2000 2010
Year
2/8/06 D&T Seminar 5
Power Dissipation in
CMOS Logic (0.25µ)
Ptotal (0→1) = CL VDD2 + tscVDD Ipeak + VDDIleakage
VDD VDD
CL
%75 %20 %5
2/8/06 D&T Seminar 6
Low-Power Datapath Architecture
• Lower supply voltage
– This slows down circuit speed
– Use parallel computing to gain the speed back
• Works well when threshold voltage is also
lowered.
• About 60% reduction in power obtainable.
• Reference: A. P. Chandrakasan and R. W.
Brodersen, Low Power Digital CMOS Design,
Boston: Kluwer Academic Publishers (Now
Springer), 1995.
2/8/06 D&T Seminar 7
A Reference Datapath
Register
Register
Combinational
Input Output
logic
Cref
CK
Supply voltage = Vref
Total capacitance switched per cycle = Cref
Clock frequency =f
Power consumption: Pref = CrefVref2f
2/8/06 D&T Seminar 8
A Parallel Architecture
A copy processes Supply voltage:
Register
Comb. VN ≤ V1 = Vref
every Nth input, Logic
operates at Copy 1
f/N N = Deg. of
reduced voltage
N to 1 multiplexer
parallelism
Register
Comb.
Register
Logic Output
Input Copy 2
f/N
f
Register
Multiphase Comb.
Clock gen. Logic
and mux f/N Copy N
control
CK
2/8/06 D&T Seminar 9
Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
= (Cinreg+ Ccomb+Coutreg)VN2f
= CrefVN2f
PN = [1 + δ(N – 1)]CrefVN2f
PN V N2
── = [1 + δ(N – 1)] ───
P1 Vref2
2/8/06 D&T Seminar 11
Voltage vs. Speed
CLVref CLVref
Delay of a gate, T ≈ ──── = ──────────
I k(W/L)(Vref – Vt)2
slows down as we
Normalized
3.0 N=3
get closer to Vt
N=2
2.0
N=1
1.0
0.2
Vt=0V (extreme case)
0.0
1 2 3 4 5 6 7 8 9 10 11 12
N
PN 1
── = [1+ δ (N – 1)] ── → 1/N
P1 N2
PN 1
── ≈ ──
P1 N2
(VDD – 0.5)2
Relative clock rate = ───────
20.25
• Problem:
• Integrate multiplier core on a SOC
• Power budget for multiplier ~ 5W
2/8/06 D&T Seminar 15
A Multicore Design
Multiplier
Reg
Core 1
40MHz
5 to 1 mux
Multiplier Output
Reg
Core 2
Reg
Input
40MHz
200MHz
Multiphase Multiplier
Clock gen.
Reg
Core 5
and mux 40MHz
control
200MHz
CK
• For N cores:
• clock frequency = 200/N MHz
• Supply voltage, VDDN= 0.5 + (20.25/N)1/2 Volts
• Assuming 10% overhead per core,
VDDN 2
Power dissipation =15 [1 + 0.1(N – 1)] (───) watts
5
4 50 2.75 5.90
5 40 2.51 5.29
8 25 2.10 4.50
f Processor f
Capacitance = C
Voltage = V f/2 Capacitance = 2.2C
Frequency = f Voltage = 0.6V
Power = CV2f Frequency = 0.5f
Power = 0.396CV2f
2/8/06 D&T Seminar 20
Register
Pipeline Architecture
Register
Register
Input Output Input ½ ½ Output
Processor
Proc. Proc.
f f
Capacitance nC C
Frequency f/n f
Multicore
Single core
L to R
Atsushi Kameyama, Toshiba
James Kahle, IBM
Masakazu Suzoki, Sony
2/8/06 D&T Seminar 25
Cell’s Nine-Processor Chip
0 1 time
1
Speedup = ─────────
S + (1 – S)/ N