Sei sulla pagina 1di 62

Introduction to

CMOS VLSI
Design

Combinational Circuits
Outline
• Bubble Pushing
• Compound Gates
• Logical Effort Example
• Input Ordering
• Asymmetric Gates
• Skewed Gates
• Best P/N ratio

8: Combinational Circuits Slide 2


Example 1
module mux(input s, d0, d1,
output y);

assign y = s ? d1 : d0;
endmodule

1) Sketch a design using AND, OR, and


NOT gates.

8: Combinational Circuits Slide 3


Example 1
module mux(input s, d0, d1,
output y);

assign y = s ? d1 : d0;
endmodule

1) Sketch a design using AND, OR, and


NOT gates.

8: Combinational Circuits Slide 4


Example 2
2) Sketch a design using NAND, NOR, and
NOT gates. Assume ~S is available.

8: Combinational Circuits Slide 5


Example 2
2) Sketch a design using NAND, NOR, and
NOT gates. Assume ~S is available.

8: Combinational Circuits Slide 6


Bubble Pushing
• Start with network of AND / OR gates
• Convert to NAND / NOR + inverters
• Push bubbles around to simplify logic
o Remember DeMorgan’s Law

8: Combinational Circuits Slide 7


Example 3
3) Sketch a design using one compound
gate and one NOT gate. Assume ~S is
available.

8: Combinational Circuits Slide 8


Example 3
3) Sketch a design using one compound
gate and one NOT gate. Assume ~S is
available.

8: Combinational Circuits Slide 9


Compound Gates
• Logical Effort of compound gates

8: Combinational Circuits Slide 10


Compound Gates
• Logical Effort of compound gates

8: Combinational Circuits Slide 11


Example 4
• The multiplexer has a maximum input
capacitance of 16 units on each input. It
must drive a load of 160 units. Estimate
the delay of the NAND and compound
gate designs.

8: Combinational Circuits Slide 12


Example 4
• The multiplexer has a maximum input
capacitance of 16 units on each input. It
must drive a load of 160 units. Estimate
the delay of the NAND and compound
gate designs.

H = 160 / 16 = 10
B=1
N=2

8: Combinational Circuits Slide 13


NAND Solution

8: Combinational Circuits Slide 14


NAND Solution

8: Combinational Circuits Slide 15


Compound Solution

8: Combinational Circuits Slide 16


Compound Solution

8: Combinational Circuits Slide 17


Example 5
• Annotate your designs with transistor
sizes that achieve this delay.

8: Combinational Circuits Slide 18


Example 5
• Annotate your designs with transistor
sizes that achieve this delay.

8: Combinational Circuits Slide 19


Introduction to
CMOS VLSI
Design

Datapath Functional Units


Funnel Shifter Operation

• Computing N-k requires an adder

12: Datapath Functional Units Slide 21


Simplified Funnel Shifter
• Optimize down to 2N-1 bit input

12: Datapath Functional Units Slide 22


Funnel Shifter Design 1
• N N-input multiplexers
o Use 1-of-N hot select signals for shift amount
o nMOS pass transistor design (V drops!)
t

12: Datapath Functional Units Slide 23


Funnel Shifter Design 2
• Log N stages of 2-input muxes
o No select decoding needed

12: Datapath Functional Units Slide 24


ROM Implementation
• 16-word x 5 bit ROM

14: CAMs, ROMs, and PLAs Slide 25


ROM Implementation
• 16-word x 5 bit ROM

14: CAMs, ROMs, and PLAs Slide 26


PLAs
• A Programmable Logic Array performs
any function in sum-of-products form.
• Literals: inputs & complements
• Products / Minterms: AND of literals
• Outputs: OR of Minterms

• Example: Full Adder

14: CAMs, ROMs, and PLAs Slide 27


NOR-NOR PLAs
• ANDs and ORs are not very efficient in
CMOS
• Dynamic or Pseudo-nMOS NORs are very
efficient
• Use DeMorgan’s Law to convert to all
NORs

14: CAMs, ROMs, and PLAs Slide 28


PLA Schematic & Layout

14: CAMs, ROMs, and PLAs Slide 29


PLAs vs. ROMs
• The OR plane of the PLA is like the ROM
array
• The AND plane of the PLA is like the
ROM decoder
• PLAs are more flexible than ROMs
o No need to have 2n rows for n inputs
o Only generate the minterms that are needed
o Take advantage of logic simplification

14: CAMs, ROMs, and PLAs Slide 30


Example: RoboAnt PLA
• ConvertSstate
L transition
R S ’ TR table
1:0TL Fto logic1:0

equations
00 0 0 00 0 0 1
00 1 X 01 0 0 1
00 0 1 01 0 0 1
01 1 X 01 0 1 0
01 0 1 01 0 1 0
01 0 0 10 0 1 0
10 X 0 10 1 0 1
10 X 1 11 1 0 1
11 1 X 01 0 1 1
11 0 0 10 0 1 1
11 0 1 11 0 1 1

14: CAMs, ROMs, and PLAs Slide 31


RoboAnt Dot Diagram

14: CAMs, ROMs, and PLAs Slide 32


RoboAnt Dot Diagram

14: CAMs, ROMs, and PLAs Slide 33


Review: Basic Building Blocks
• Datapath
o Execution units
 Adder, multiplier, divider, shifter, etc.
o Register file and pipeline registers
o Multiplexers, decoders
• Control
o Finite state machines (PLA, ROM, random
logic)
• Interconnect
o Switches, arbiters, buses
• Memory
o Caches (SRAMs), TLBs, DRAMs, buffers
Parallel Programmable Shifters
Shift amount
Control = Shift direction
Shift type (logical,
arith, circular)

Data Out
Data In

Shifters used in multipliers, floating point units

Consume lots of area if done in random logic gates


A Programmable Binary Shifter
rgt nop left

Ai Ai-1 rgt nop left Bi Bi-1


A1 A0 0 1 0 A1 A0
Ai Bi
A1 A0 1 0 0 0 A1

A1 A0 0 0 1 A0 0

Ai-1 Bi-1
A Programmable Binary Shifter
rgt nop left

Ai Ai-1 rgt nop left Bi Bi-1


A1 A0 0 1 0 A1 A0
Ai Bi
A1 A0 1 0 0 0 A1

A1 A0 0 0 1 A0 0

Ai-1 Bi-1
4-bit Barrel Shifter
Example: Sh0 = 1
A3 B3B2B1B0 = A3A2A1A0
B3

Sh1 = 1
Sh1
B3B2B1B0 = A3A3A2A1
A2
B2
Sh2 = 1
B3B2B1B0 = A3A3A3A2
Sh2
A1
B1 Sh3 = 1
B3B2B1B0 = A3A3A3A3
Sh3
A0
B0

Area dominated by
Sh0 Sh1 Sh2 Sh3
wiring
4-bit Barrel Shifter
Example: Sh0 = 1
A3 B3B2B1B0 = A3A2A1A0
B3

Sh1 = 1
Sh1
B3B2B1B0 = A3A3A2A1
A2
B2
Sh2 = 1
B3B2B1B0 = A3A3A3A2
Sh2
A1
B1 Sh3 = 1
B3B2B1B0 = A3A3A3A3
Sh3
A0
B0

Area dominated by
Sh0 Sh1 Sh2 Sh3
wiring
4-bit Width
Barrel Shifter Layout
barrel

Only one Sh#


active at a timel Widthbarrel ~ 2 pm N
N = max shift distance, pm = metal pitch
Delay ~ 1 fet + N diff caps
8-bit Logarithmic Shifter
Sh1 !Sh1 Sh2 !Sh2 Sh3 !Sh3

A3 B3

A2 B2

A1 B1

A0 B0
8-bit
0 1 Logarithmic
Sh1 !Sh1
1 0
Sh2 !Sh2
Shifter
0 1
Sh3 !Sh3

A3 B3

A2 B2

A1 B1

A0 B0

log N stages
8-bit Logarithmic Shifter Layout
1 2 4
Slice
A3
B3

A2
B2

A1
B1

A0
B0

Widthlog ~ pm(2K+(1+2+…+2K-1)) = pm(2K+2K-1)


K = log2 N
Delay ~ K fets + 2 diff caps
Shifter Implementation
Comparisons

Barrel Logarithmic
Width Speed Width Speed
N K
2 N pm 1 + N diffs pm(2K+2K-1) K + 2 diffs
8 3 16 pm 1+8 13 pm 3+2
16 4 32 pm 1 + 16 23 pm 4+2
32 5 64 pm 1 + 32 41 pm 5+2
64 6 128 pm 1 + 64 75 pm 6+2
Decoders
• Decodes inputs to activate one of many
outputs Enable

Out0 = !In1 & !In0


In0
Out1 = !In1 & In0
2x4
In1 Out2 = In1 & !In0
Out3 = In1 & In0

• two inverters, four 2-input nand gates, four


inverters plus enable logic
• how about for a 3-to-8, 4-to-16, etc. decoder?
Dynamic NOR Decoder
Vdd GND GND

B3

B2

B1

B0

A0 !A0 A1 !A1
precharge
Dynamic NOR Decoder
Vdd GND GND

on on
B3 1  0
on

on B2
10

B1
10

B0
11
A0 !A0 A1 !A1
precharge
0 1 0101
Dynamic NAND Decoder
GND

B3

B2

B1

B0

A0 !A0 A1 !A1 precharge


Dynamic NAND Decoder
GND

B3 1  1

B2 1  1

B1 1  1

on on B0 1  0

A0 !A0 A1 !A1 precharge


0101 0 1
Building Big Decoders from
Small Active low enable
Active low output

101
enable 2x4
2x4 .
1x2 .
.
2x4
2x4
A4 A3 A2 A1 A0
00001
Multiplexers
• Selects one of several inputs to gate to the
single output S S1 0

In0
In1
4x1 Out = In0 & !S1 & !S0 |
In2
In1 & !S1 & S0 |
In3 In2 & S1 & !S0 |
In3 & S1 & S0

• two inverters, four 3-input nands, one 4-input nand


• how about for an 8x1, 16x1, etc. mux?
Review: TG 2x1 Multiplexer S S F
S
VDD

In2

!S F

In1

F = !((In1 & S) | (In2 & !S)) GND

In1 S S In2
Building Big Muxes from Small
S0 S1

A0
2x1
A1

2x1 Out

A2
2x1
A3
Building Big
0
Muxes
1
from Small
S0 S1

A0
2x1
A1

2x1 Out

A2
2x1
A3
Review: Datapath Bit-Sliced
Organization
Control Flow

Bit 3

Bit 2
From Pipeline Register File
PipelineMultiplexer
Register
Adder Pipeline Multiplexer
Shifter Pipeline
RegisterRegister
I$
Bit 1

Bit 0
decoder

Data Flow To/From D$

Tile identical bit-slice elements


Layout of Bit-Sliced Datapaths
Transistor Sizing
Rp Rp Rp
1 A B 1 2 B

Rn 2 Rp Cint
CL
2 A
B

2 Rn Cint
Rn Rn CL
1
A A B 1
Transistor Sizing a Complex CMOS
Gate
B
A
C

D
OUT = !(D + A • (B + C))
A
D
B C
Transistor Sizing a Complex CMOS
Gate
B 4 12
A 2 6
C 4 12

D 2 6
OUT = !(D + A • (B + C))
A 2
D 1
B 2C 2
Fan-In Considerations
A B C D

A CL
B C3 Distributed RC model
C C2
(Elmore delay)
D C1 tpHL = 0.69 Reqn(C1+2C2+3C3+4CL)

Propagation delay deteriorates


rapidly as a function of fan-in –
quadratically in the worst case.
Example of Logical Effort
• Assuming a pmos/nmos ratio of 2, the input
capacitance of a minimum-sized inverter is three
times the gate capacitance of a minimum-sized nmos
(Cunit)
B
A B
A A

A A•B
A+B
A A

B A B
Example of Logical Effort
• Assuming a pmos/nmos ratio of 2, the input
capacitance of a minimum-sized inverter is three
times the gate capacitance of a minimum-sized nmos
(Cunit)
B 4
A 2 B 2
A 2 A 4

A A•B
1 A+B
A A 2

B 2 A 1 B 1

Cunit = 3
Cunit = 4 Cunit = 5

Potrebbero piacerti anche