Sei sulla pagina 1di 29

Embedded Systems Design: A Unified

Hardware/Software Introduction

Chapter : Custom single-purpose


processors

1
Outline

Introduction
Combinational logic
Sequential logic
Custom single-purpose processor design
RT-level custom single-purpose processor design

Embedded Systems Design: A Unified 2


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Introduction

Processor
Digital circuit that performs a
computation tasks
Controller and datapath CCD
Digital camera chip

General-purpose: variety of computation CCD Pixel coprocessor D2A


tasks A2D preprocessor

Single-purpose: one particular lens


computation task
JPEG codec Microcontroller Multiplier/Accum
Custom single-purpose: non-standard
task DMA controller Display

A custom single-purpose ctrl

processor may be
Fast, small, low power Memory controller ISA bus interface UART LCD ctrl

But, high NRE, longer time-to-market,


less flexible

Embedded Systems Design: A Unified 3


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
CMOS transistor on silicon

Transistor
The basic electrical component in digital systems
Acts as an on/off switch
Voltage at gate controls whether current flows from
source to drain
source
Dont confuse this gate with a logic gate gate Conducts
if gate=1
1 drain

gate
IC package IC oxide
source channel drain
Silicon substrate

Embedded Systems Design: A Unified 4


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
CMOS transistor implementations

Complementary Metal Oxide source source

Semiconductor gate Conducts


if gate=1
gate Conducts
if gate=0

We refer to logic levels drain drain

nMOS pMOS
Typically 0 is 0V, 1 is 5V
Two basic CMOS types
nMOS conducts if gate=1 1 1 1
x y x
pMOS conducts if gate=0 x F = x'
F = (xy)' y
Hence complementary x F = (x+y)'
0 y x
Basic gates
y

0 0
Inverter, NAND, NOR inverter NAND gate NOR gate

Embedded Systems Design: A Unified 5


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Basic logic gates

x F x F x x y F x x y F x x y F
F y F F
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
1 0 0 1 0 1 1 0 1
F=x F=xy F=x+y F=xy
1 1 1 1 1 1 1 1 0
Driver AND OR XOR

x F x F x x y F x x y F x x y F
F F F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x F = (x y) 1 0 1 F = (x+y) 1 0 0 F=x y 1 0 0
Inverter NAND 1 1 0 NOR 1 1 0 XNOR 1 1 1

Embedded Systems Design: A Unified 6


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Combinational logic design
A) Problem description B) Truth table C) Output equations

y is 1 if a is to 1, or b and c are 1. z is 1 if Inputs Outputs y = a'bc + ab'c' + ab'c + abc' + abc


b or c is to 1, but not both, or if all are 1. a b c y z
0 0 0 0 0
0 0 1 0 1 z = a'b'c + a'bc' + ab'c + abc' + abc
0 1 0 0 1
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1

z = ab + bc + bc

Embedded Systems Design: A Unified 7


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Combinational components

I(log n -1) I0 A A B
B A B
I(m-1) I1 I0 n
n n n n
n
log n x n n-bit n bit,
S0 n-bit, m x 1 n-bit
Decoder Adder m function S0
Comparator
Multiplexor ALU
n
S(log m) S(log m)
n n
O(n-1) O1 O0 carry sum less equal greater
O O

O= O0 =1 if I=0..00 sum = A+B less = 1 if A<B O = A op B


I0 if S=0..00 O1 =1 if I=0..01 (first n bits) equal =1 if A=B op determined
I1 if S=0..01 carry = (n+1)th greater=1 if A>B by S.
O(n-1) =1 if I=1..11 bit of A+B
I(m-1) if S=1..11

With enable input e With carry-in input Ci May have status outputs
all Os are 0 if e=0 carry, zero, etc.
sum = A + B + Ci

Embedded Systems Design: A Unified 8


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Sequential components

I
n
load shift n-bit
n-bit n-bit
Register Shift register Counter
clear I Q
n n

Q Q

Q= Q = lsb Q=
0 if clear=1, - Content shifted 0 if clear=1,
I if load=1 and clock=1, - I stored in msb Q(prev)+1 if count=1 and clock=1.
Q(previous) otherwise.

Embedded Systems Design: A Unified 9


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Sequential logic design

A) Problem Description C) Implementation Model D) State Table (Moore-type)


You want to construct a clock
divider. Slow down your pre- x
a Combinational logic Inputs Outputs
existing clock so that you output a I1 Q1 Q0 a I1 I0 x
1 for every four clock cycles 0 0 0 0 0
I0 0
0 0 1 0 1
0 1 0 0 1 0
Q1 Q0 0 1 1 1 0
1 0 0 1 0 0
B) State Diagram State register 1 0 1 1 1
1 1 0 1 1
x=0 x=1 a=0 1
a=0 1 1 1 0 0
I1 I0
0 a=1 3

a=1 a=1

a=0
1
a=1
x=0
2
a=0
Given this implementation model
x=0
Sequential logic design quickly reduces to
combinational logic design

Embedded Systems Design: A Unified 10


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Sequential logic design (cont.)
E) Minimized Output Equations F) Combinational Logic
I1 Q1Q0
a 00 01 11 10
a
0 0 0 1 1
I1 = Q1Q0a + Q1a + x
1 Q1Q0
0 1 0 1

I0 Q1Q0 I1
00 01 11 10
a
0 0 1 1 0 I0 = Q0a + Q0a

1 1 0 0 1

x Q1Q0 I0
a
00 01 11 10
0 0 0 1 0 x = Q1Q0
Q1 Q0
1 0 0 1 0

Embedded Systems Design: A Unified 11


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Custom single-purpose processor basic
model

external external
control data controller datapath
inputs inputs

datapath next-state registers
control and
controller inputs datapath control
logic

datapath
control state functional
outputs register units

external external
control data
outputs outputs

controller and datapath a view inside the controller and datapath

Embedded Systems Design: A Unified 12


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Example: greatest common divisor
!1
(a) black-box 1:
(c) state
First create algorithm view 1 !(!go_i) diagram
2:

Convert algorithm to go_i x_i y_i


2-J:
!go_i

complex state machine GCD


3: x = x_i
d_o
Known as FSMD: finite- 4: y = y_i
state machine with datapath (b) desired functionality
5: !(x!=y)
Can use templates to 0: int x, y; x!=y
1: while (1) {
perform such conversion 2: while (!go_i);
6:
x<y !(x<y)
3: x = x_i;
7: y = y -x 8: x = x - y
4: y = y_i;
5: while (x != y) {
6-J:
6: if (x < y)
7: y = y - x;
else 5-J:

8: x = x - y; 9: d_o = x
}
9: d_o = x; 1-J:
}

Embedded Systems Design: A Unified 13


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
State diagram templates

Assignment statement Loop statement Branch statement


a=b while (cond) { if (c1)
next statement loop-body- c1 stmts
statements else if c2
} c2 stmts
next statement else
other stmts
next statement

!cond
a=b C: C:
cond c1 !c1*c2 !c1*!c2

next loop-body-
c1 stmts c2 stmts others
statement statements

J: J:

next next
statement statement

Embedded Systems Design: A Unified 14


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Creating the datapath
Create a register for any 1:
!1

1 !(!go_i)
declared variable 2:
x_i y_i
!go_i
Create a functional unit for 2-J:
x_sel
Datapath

each arithmetic operation 3: x = x_i


y_sel
n-bit 2x1 n-bit 2x1

x_ld

Connect the ports, registers 4: y = y_i 0: x 0: y


y_ld

and functional units 5: !(x!=y)


!= < subtractor subtractor
x!=y
Based on reads and writes 6:
5: x!=y
x_neq_y
6: x<y 8: x-y 7: y-x

x<y !(x<y)
Use multiplexors for 7: y = y -x 8: x = x - y
x_lt_y
d_ld
9: d

multiple sources 6-J:


d_o

Create unique identifier 5-J:

for each datapath component 9: d_o = x

control input and output 1-J:

Embedded Systems Design: A Unified 15


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Creating the controllers FSM
!1 go_i
1:

1 !(!go_i)
Controller
0000 1:
!1 Same structure as FSMD
2:
!go_i
0001 2:
1 !(!go_i)
Replace complex
!go_i
2-J:
0010 2-J: actions/conditions with
3: x = x_i x_sel = 0
0011 3: x_ld = 1
datapath configurations
4: y = y_i
y_sel = 0 x_i y_i
0100 4: y_ld = 1
!(x!=y)
Datapath
5: !x_neq_y
0101 5: x_sel
x!=y n-bit 2x1 n-bit 2x1
x_neq_y y_sel
6: 0110 6:
x_ld
x<y !(x<y) x_lt_y !x_lt_y 0: x 0: y
y_ld
7: y = y -x 8: x = x - y 7: y_sel = 1 8: x_sel =1
y_ld = 1 x_ld = 1

6-J: 0111 1000


!= < subtractor subtractor
1001 6-J:
5: x!=y 6: x<y 8: x-y 7: y-x
5-J: x_neq_y
1010 5-J:
x_lt_y 9: d
9: d_o = x 1011 9: d_ld = 1
d_ld

1-J: 1100 1-J: d_o

Embedded Systems Design: A Unified 16


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Splitting into a controller and datapath
go_i

Controller implementation model Controller !1


0000 1: x_i y_i
go_i
x_sel 1 !(!go_i) (b) Datapath
Combinational y_sel 0001 2:
logic !go_i x_sel
x_ld n-bit 2x1 n-bit 2x1
y_ld 0010 2-J: y_sel
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
!= < subtractor subtractor
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: x_neq_y
Q3 Q2 Q1 Q0 x_neq_y=1
0110 6: x_lt_y 9: d
State register d_ld
x_lt_y=1 x_lt_y=0
I3 I2 I1 I0
7: y_sel = 1 8: x_sel =1 d_o
y_ld = 1 x_ld = 1
0111 1000
1001 6-J:

1010 5-J:

1011 9: d_ld = 1

1100 1-J:

Embedded Systems Design: A Unified 17


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Controller state table for the GCD example
Inputs Outputs
Q3 Q2 Q1 Q0 x_neq x_lt_ go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld
_y y
0 0 0 0 * * * 0 0 0 1 X X 0 0 0
0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
0 0 1 0 * * * 0 0 0 1 X X 0 0 0
0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
1 0 0 1 * * * 1 0 1 0 X X 0 0 0
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1 1 1 1 * * * 0 0 0 0 X X 0 0 0

Embedded Systems Design: A Unified 18


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Completing the GCD custom single-purpose
processor design
We finished the datapath

We have a state table for controller datapath

the next state and control next-state registers


and
logic control
logic

All thats left is


combinational logic state
register
functional
units
design
This is not an optimized

design, but we see the
a view inside the controller and datapath
basic steps
Embedded Systems Design: A Unified 19
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
RT-level custom single-purpose processor
design
We often start with a state

Problem Specification
machine Sende
r rdy_in
Bridge
A single-purpose processor that rdy_out
Rece
iver
converts two 4-bit inputs, arriving one
Rather than algorithm clock at a time over data_in along with a
rdy_in pulse, into one 8-bit output on

Cycle timing often too central data_in(4)


data_out along with a rdy_out pulse.
data_out(8)

to functionality
Example rdy_in=0
rdy_in=1
Bridge rdy_in=1

Bus bridge that converts 4-bit WaitFirst4 RecFirst4Start


data_lo=data_in
RecFirst4End

bus to 8-bit bus rdy_in=0 rdy_in=0 rdy_in=1


rdy_in=1
Start with FSMD FSMD
WaitSecond4 RecSecond4Start RecSecond4End
data_hi=data_in
Known as register-transfer rdy_in=0
Inputs
(RT) level Send8Start
data_out=data_hi Send8End
rdy_in: bit; data_in: bit[4];
Outputs
rdy_out=0
Exercise: complete the design
& data_lo rdy_out: bit; data_out:bit[8]
rdy_out=1 Variables
data_lo, data_hi: bit[4];

Embedded Systems Design: A Unified 20


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
RT-level custom single-purpose processor
design (cont)
Bridge
(a) Controller
rdy_in=0 rdy_in=1
rdy_in=1
WaitFirst4 RecFirst4Start RecFirst4End
data_lo_ld=1
rdy_in=0 rdy_in=0 rdy_in=1
rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
data_hi_ld=1

Send8Start Send8End
data_out_ld=1 rdy_out=0
rdy_out=1

rdy_in rdy_out

clk
data_in(4) data_out

data_lo_ld
data_out_ld
data_hi_ld
registers

data_hi data_lo
to all

data_out
(b) Datapath

Embedded Systems Design: A Unified 21


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing single-purpose processors

Optimization is the task of making design metric


values the best possible
Optimization opportunities
original program
FSMD
datapath
FSM

Embedded Systems Design: A Unified 22


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the original program

Analyze program attributes and look for areas of


possible improvement
number of computations
size of variable
time and space complexity
operations used
multiplication and division very expensive

Embedded Systems Design: A Unified 23


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the original program (cont)
original program optimized program
0: int x, y; 0: int x, y, r;
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger number
4: y = y_i; 3: if (x_i >= y_i) {
5: while (x != y) { 4: x=x_i;
replace the subtraction
6: if (x < y) 5: y=y_i;
operation(s) with modulo
7: y = y - x; }
operation in order to speed
else 6: else {
up program
8: x = x - y; 7: x=y_i;
} 8: y=x_i;
9: d_o = x; }
} 9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
GCD(42, 8) - 9 iterations to complete the loop GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows : (42, 8), (43, 8), x and y values evaluated as follows: (42, 8), (8,2),
(26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). (2,0)

Embedded Systems Design: A Unified 24


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the FSMD

Areas of possible improvements


merge states
states with constants on transitions can be eliminated, transition
taken is already known
states with independent operations can be merged
separate states
states which require complex operations (a*b*c*d) can be broken
into smaller states to reduce hardware size
scheduling

Embedded Systems Design: A Unified 25


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the FSMD (cont.)
int x, y; !1 optimized FSMD
original FSMD
1:
int x, y;
1 !(!go_i) eliminate state 1 transitions have constant values 2:
2:
!go_i go_i !go_i

2-J: x = x_i
3: y = y_i
merge state 2 and state 2J no loop operation in
3: x = x_i between them
5:

4: y = y_i x<y x>y


merge state 3 and state 4 assignment operations are
independent of one another 7: y = y -x 8: x = x - y
5: !(x!=y)

x!=y
9: d_o = x
6: merge state 5 and state 6 transitions from state 6 can
x<y !(x<y) be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J transitions from each state
6-J: can be done from state 7 and state 8, respectively

5-J:
eliminate state 1-J transition from state 1-J can be
d_o = x done directly from state 9
9:

1-J:

Embedded Systems Design: A Unified 26


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the datapath

Sharing of functional units


one-to-one mapping, as done previously, is not necessary
if same operation occurs in different states, they can share a
single functional unit
Multi-functional units
ALUs support a variety of operations, it can be shared
among operations occurring in different states

Embedded Systems Design: A Unified 27


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the FSM

State encoding
task of assigning a unique bit pattern to each state in an FSM
size of state register and combinational logic vary
can be treated as an ordering problem
State minimization
task of merging equivalent states into a single state
state equivalent if for all possible input combinations the two states
generate the same outputs and transitions to the next same state

Embedded Systems Design: A Unified 28


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Summary

Custom single-purpose processors


Straightforward design techniques
Can be built to execute algorithms
Typically start with FSMD
CAD tools can be of great assistance

Embedded Systems Design: A Unified 29


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

Potrebbero piacerti anche