Sei sulla pagina 1di 72

High-level VLSI Digital Systems Design: Methods, Notation, Architecture

Dr. James P. Davis, Associate Professor Director, VLSI Systems Design Lab University of South Carolina Columbia, S.C., U.S.A.

2004 Dr. James P. Davis

Seminar Session Outline


1. Introduction to High-level Design (50 minutes)
1.1 Rationale Design Issues and Trends 1.2 Systems Custom Logic versus Microprocessor Embedded 1.3 Concepts Design Hierarchy, Modeling, Abstractions, Patterns, Reuse 1.4 Process Design Objectives, Planning, Activities 1.5 Methods 1 Design Representation and Search Space 1.6 Methods 2 Model-Driven Architecture (MDA) for VLSI Systems 1.7 Metrics Measuring Effectiveness and Productivity
Question & Answer

2. Teaching & Practicing High-level Design (70 minutes)


2.1 Using the Executable ASM Method 2.2 Architecture of Digital Systems Control and Datapath 2.3 Analysis and Modeling of Algorithms and Protocols 2.4 Datapath-Dominated Designs Arithmetic and Filter Circuits 2.5 Control-Dominated Designs Protocol Engines 2.6 Applying Architecture Patterns for Reuse
Question & Answer

3. Example High-level System Designs (50 minutes)


3.1 Unsigned Integer Multiplier Circuits 3.2 MAC Layer for 802.11b Wireless LAN
Question & Answer

4. Session Wrap-Up
2004 Dr. James P. Davis

Section 1 Introduction to High-Level Design

2004 Dr. James P. Davis

Section 1 Key Points


High-level analysis and architecture design of VLSI-based digital systems.
Assume systems will be implemented wholly in custom logic. Assume for this lecture that applications are designed without use of CPU, and therefore do not incur the overhead of an Instruction Set.

Design process, methods, and notation allow you to take any digital systems application, analyze it, and implement it: starting with math formulae, algorithms or protocols.
Iterative enhancement, top-down/bottom-up, stepwise refinement (stables of Software Engineering discipline). Additional heuristics in relating properties of digital circuits to high-level systems design, to make the systems planning process more effective.

Evaluate the goodness of your architecture based on the use of available tools, by collecting and analyzing post-implementation data to select a best fit.
Method of designing for circuit synthesis, then evaluating different architectures in order to select the best for the application at hand. Metrics will include speed, area, power consumption, module cohesion, module coupling, Power-Delay product, Area-Delay product, etc.
2004 Dr. James P. Davis

System Design Trends What is driving us:


Complexity, Capacity vs. Capability

2004 Dr. James P. Davis

Introduction Vertical Market Drivers

Telecomm

Computers

Consumer Electronics

VLSI Silicon "chip"

"At the root of cascading changes of modern economic life...devaluing resources in technology, business and geopolitics...overcoming the the constraints of material resources, the microchip has devalued most large accumulations of physical capital and made possible the launching of global economic enterprises...microchips find their value not in their substance but in their intellectual content: their design..." George Gilder, Microcosm, 1989
2004 Dr. James P. Davis

Example Wireless Communications


The market is seeking product technology options to cover different geography ranges and data rates.
Bluetooth WPAN. IEEE 802.11 - WLAN. 2/3G Network WWAN.

Range

WWAN

The opportunity for creating value chains encompassing product offerings, distribution and new service offerings hinges on the ability to get low cost solutions to market quickly.
Deliver content to wireless handheld devices. Function convergence in the handset and at the base station. Requires large cross-functional design teams in varied disciplines.

G2/G3 Network CDMA


WPAN WLAN

Blue Tooth 802.11b

IEEE802.11g Data Rate


Source: Knowledge Edge KK

2004 Dr. James P. Davis

Introduction - VLSI SOC Drivers


Many market and technology factors coming together to create pressure on electronics product engineering organizations worldwide.
Increasing global competition and new markets. Increasing rate of product innovations and new product introductions. Decreasing time-to-market windows. Decreasing shelf life for products in many categories. Increasing pressure on competitive cost containment, profit margins. Increasing convergence: integrated functionality in single electronics devices and product packages. Increasing quality expectations: used as a means to better manage distribution and support costs. Increasing innovation in silicon process technology and wafer scale integration densities.
9

Increasing disparity: capacity of the underlying technologies versus capability of designers to manage increasing design complexity.
2004 Dr. James P. Davis

The Capacity vs. Capability Gap


Increasing capacity of the technology:
The rate of new technology and associated silicon process changes has continued to follow Moores Law.
24 10,000K

The capability of designers and design teams to use this capacity isnt keeping 9 M onths up.
The Capacity versus Capability Gap is widening. Each set of technology and process changes requires designers to manage ever more complexity in the design process. New architectures, abstractions, methods and tools are required to address this increasing complexity.
3

Device C apacity
D esign Size

Product Tim e-to-Market


1,000K

Design C apability
1990 1995

100K 2000

1985

Source: Gartner/Dataquest

2004 Dr. James P. Davis

Introduction VLSI SOC Objectives


The objectives of SOC approaches are to better manage design complexity.
Using better design planning in trade-off analysis and decision-making
greater availability of downstream design constraints earlier in the process. back annotating early iteration data into high-level design activities.

Using electronic systems design best practices


Increased levels of design reuse. More effective hardware-software co-design. Better trade-offs between general-purpose vs. domain-specific architectures and algorithms. Greater integration of functionality on-chip (hardware-software, analog-digital).

SOC Architecture Imperatives:


Reusability/Extensibility faster creation of primary and derivative products. Reliability managing device technology constraints as geometry shrinks. Scalability ever-larger design densities and levels of integration. Performance increasing data throughput, system capacity requirements. Resource Utilization efficient use of function, area, power, clocking, interconnect.
2004 Dr. James P. Davis

Design Problem-Solving What we use:


Methods, Notation, & Process

2004 Dr. James P. Davis

VLSI-based Design Space (Y-chart)

2004 Dr. James P. Davis

Categories of Computing Systems Design


"Layered" Computing System

Electronic systems today are of different types, depending on the (1) function, (2) application.
Computing system: CPU and hardware executing O/S, with applications running on top of O/S. Embedded system: fixed-functions, instruction-based micro-controller with both hardware and software components. On-chip system: complete control and data functions implemented in "custom logic" VLSI package. On-chip systems can be components of embedded systems, which can be part of a layered "virtual" machine.
2004 Dr. James P. Davis

VLSI hardware microcode machine code operating system application

ASICs and FPGAs

Classes of Electronic Systems

Embedded System
ROM RAM

uC

I/O

On-chip System

Mapping Algorithms/Protocols to Architecture

2004 Dr. James P. Davis

Levels of Abstraction in System Design


A design transforms from "concept" to "implementation" in a series of ordered levels.
Protocol
UML diagrams

Architectural

Algorithm flowchart

queueing network From the highest level to lower block diagram petri net levels of design "abstraction", a

design is iteratively refined.

Behavioral

The design description is verified and validated at each level, often cycling between levels of abstraction. Design descriptions are described using one or more domain representations (Behavior, Structure, Physical).

state equations state diagram flowdiagram ASM diagrams RTL notation datapath diagram truth table schematic diagram netlist

Functional/RTL

Structural

Geometrical

layout mask
2004 Dr. James P. Davis

Algorithm to Architecture Process


Software Algorithm (C code)

Control Flow modeling (Algorithmic structure)

Create Ordered Sequence of Operations


Algorithm Spec (Text or Math)

Overlay Operation Sequence onto Control Structure

Add Hardware Semantics

Data Flow modeling (Operation ordering)

- Clocking - Operation Scheduling - Parallelism - Resource Binding

2004 Dr. James P. Davis

Design as a Problem-solving Process


z

The "search" for an optimal solution involves tools & methods.


Many possible solutions, some better than others. Search through "solution space".
Application Requirements

Trade-offs and constraint checks at each node. At dead-end node, "backtrack" and try another path. Backtracking is costly and time consuming.
Specification High-level Design and Verification Logic Design, Timing Analysis Layout, Production Routing and Field Test

2004 Dr. James P. Davis

Impact of Backtracking on High-level Design


Application Requirements Violation Requirements Text documentation Function and Timing Specification Behavioral or Functional Constraint Violation

HDL Coding or Text documentation

High-level Design and Verification Area or Timing Constraint Violation

Logic Synthesis or Schematic Capture

Gate-level Analysis and Verification Area or Timing Constraint Violation

Layout and Routing

Physical Analysis and Verification

Problem
Behavioral or functional constraint violations cause 50-80% of cycling between design steps.

Goal
Eliminate unnecessary "cycling" through unplanned design steps. Improve the turnaround time per cycle.

Approach
Support easy exploration of design alternatives via iteration. Allow function & behavior changes to be made quickly.

2004 Dr. James P. Davis

Process Design-for-Synthesis Methods & Tools


Start KBS flowHDL Exsedia NimbusTM
& IPstation blockHDL
TM

TM

TM

Design Approach - "stepwise refinement", with "iterative enhancement". Create design "skeleton", with core functions and cycle-level timing information specified. Iterate the design through synthesis, checking key area and timing constraints.

Capture Design Compile & Checking Correct Entry?

NO

YES

Cycle-based Simulation?
YES

Synopsys SGE
NO

TM TM

Synopsys VSS

Behavioral Simulation Correct Behavior?

HDL Simulation Required? NO


YES

Synopsys Design Compiler HDL Compiler


TM

TM

FPGA Compiler

TM

NO

YES

Functional Simulation Correct Function?

Logic Synthesis Gate-level Timing Analysis Correct Timing?

DesignWare

TM

Design Analyzer Timing Analyzer


TM

NO

YES

NO

YES

Return to the top of the process to make corrections, and to enhance the design description. Integrate completed behavioral block with other blocks for HDL "system" simulation.
NO

Partition, Place & Route Area & Speed?

Xilinx ISETM

YES

Fabricate Device Done

2004 Dr. James P. Davis

The Design Integration Process


Decompose into functionally partitioned modules.
entry unit-level module behavior.

Capture, analyze and verify

TM IPalette blockHDL

Designer #1

cycle-based verification
TM Nimbus flowHDL

Lead Designer

Capture, plan and verify the system test harness graphically.


test spec entry

Designer #2

automatic HDL code generation

NimbusTM flowHDL

B i n d
Designer #3

Test Engineer

cycle-based verification
TM Nimbus flowHDL

t o b l o c k s

TM Nimbus flowHDL

automatic HDL test harness generation

Integrate HDL modules

Integrate HDL test harness


2004 Dr. James P. Davis

Analysis & Design Process Variations


We can have several possible process variations, using the Executable ASM as the core methodology component in analysis and design.
Starting from a protocol description, derive an appropriately partitioned architecture of communicating concurrent ASM threads for realizing the distributed control. Starting with an algorithm, derive an appropriately sequenced and scheduled set of data operations, allocated and bound to appropriate classes of macro operators, as modeled using one or more ASM threads. Starting with an existing circuit design, refactor or reengineer, the design into an abstract architecture consisting of one or more ASM threads.

2004 Dr. James P. Davis

Architecture Analysis Example

Finite Impulse Response Filter

2004 Dr. James P. Davis

Example Finite Impulse Response Filter


Mathematical structure.
In the 802.11 PHY Digital Signal Processor (DSP) block, it is a requirement to smooth time samples of data with a digital Finite Impulse Response (FIR) filter. Values of past output samples are required, making this is a recursive operation where discrete data signals create a linear time invariant system, given the transfer function. The equation of this systems behavior is as follows:

We generate an expansion for the above equation, as follows:

b0

Dataflow structure.
Usually represented as a lattice.
b1

X X
Xn

b2

x(n-1)

+ +
2004 Dr. James P. Davis

X
x(n-2)

FIR Filter Analysis of Datapath Operations


Data path architecture.
The above operation is expressed as a sequence of Register Transfer statements as follows.
out1[9:0] <- ADD( ADD(a,b)), CAT(0,in1)) a <- in1[7:0] b <- a[7:0]

The intent is to specify the transfer function algorithm as a sequence of standard RTL statements. Note the alignment of operators so that they are the same width. We will then create a high-level design description for the control and data path operations comprising the design of this FIR filter element. The following diagram depicts the RTL representation of part of this functionality (ADDER units) using standard elements:
in1[7:0] a[7:0] b[7:0] ADD1[8:0] CLK RES
2004 Dr. James P. Davis

'0' out1[9:0] ADD0[9:0]

Example - FIR Filter Architecture: 1st Pass


K0 IN K1 a K2 b K3 4 4 4 4 4 4 4 4 c CLK

8 9

8 10 OUT

8 9 8

2004 Dr. James P. Davis

Example - FIR Filter Architecture: 2nd Pass


CL1 CL1

K0 IN K1

4 4 4 4 4 4 4 4

* * * *

9 8 10

OUT

K2

8 9

K3

CL3

2004 Dr. James P. Davis

Example - FIR Filter Architecture: 3rd Pass


CL7 IN 9

*
K3

10 9

OUT

9 9

K2

K0 K1 CL1 CL2 CL3 CL4 CL5 CL6

Additional MUX resources and additional registers required to resource share the Arithmetic units.

2004 Dr. James P. Davis

FIR Filter - Architecture: Area Evaluation


1100 1095 1090 1085 1080 1075 1070 1065 1060 1055 1050 1045 1st Pass 2nd Pass
O O The 3 design architectures are synthesized into circuits, and their area data is compared to see which is most efficient in resource usage.

Total Area

3rd Pass
The 2nd design model explored uses resources most efficiently. On examination, we see that the cost of the multiplexing and registers exceeds that of the original Multiplier resources (ignoring interconnect).
2004 Dr. James P. Davis

Section 1 Summary of Key Points


Increasing Pressure Factors.
System complexity. Design functionality. Productivity Gap: Device Capacity vs. Designer Capability.

Systems Design Methods.


Manage design complexity through analysis and design methods. Model-driven architecture definition (hardware and software). Map algorithms and protocols into architectures with high cohesion and low coupling (use of metrics). Executable ASM models are important design representation method,allowing you to bridge to gap between systems algorithms or protocol specification and circuits architecture. Use process to explore design space and realize best trade-off.

Next Section:
Discussion of background of digital systems their representation, specification and design.
2004 Dr. James P. Davis

Section 2 Introduction to High-Level Digital Design for Custom and Programmable Logic
2004 Dr. James P. Davis

Section 2 Key Points


Design of Digital Systems.
Combinational logic (truth tables, Boolean expressions, K-Maps for minimizationup to 4-5 variables). Sequential logic (state tables, bubble diagrams, ASM diagrams). Digital System Unit = Control + Datapath + Timing Constraints. Digital System = N Digital System Units + Sequenced Interaction Pattern.

Design of Computing Systems.


Specification as custom-logic architecture can be done as effectively as programming a particular microprocessor architecture. Custom-logic implementations are usually 100x faster than processor-based embedded software, due to elimination of instruction set, implementing operations in parallel, and locating memory (registers, arrays) adjacent to the logic processing circuit. Map algorithms and protocols into architectures with high cohesion and low coupling (use of metrics). Executable ASM models are important design representation method,allowing you to bridge to gap between systems algorithms or protocol specification and circuits architecture.
2004 Dr. James P. Davis

Design Representation of Digital Systems or


What are the notations and methods of analysis and design?
2004 Dr. James P. Davis

Representation for Design for Synthesis


On-chip System

High-level Design for Synthesis starts with abstract description created in graphical notation. Notation is appropriate for capturing structure and behavior of "on-chip VLSI systems. Notation #1: structured blocks for partitioning and interconnect of design components. Notation #2: structured flowgraph for partial ordering a sequence of abstract operations. Both notations are used to construct a "plan" for meeting the design spec, in terms of structure and behavior.
2004 Dr. James P. Davis

Block_1 Block_2 CLK (rising) ^RES Block_3 s0 s1 s2 s3 s4 s5 outSig1 outSig2 aReg <- inBus bReg <- '1' aReg <- '0' bReg <- '0'

idleSig aSig 1

0 aReg <- '4' bReg <- '2'

^outSig3 aReg <- '4' bReg <- '2'

Digital Systems Design - Structure


Top_Level UART Architectural - Local

Block diagram: Used to partition


and decompose a design into its functional units.
Step #1: Identify "top level" and all functional operations that transform the application's data. Step #2: Group the functions by how the data is manipulated, or by what resources are needed. Step #3: Separate different functions from each other by "interconnect" that shows the flow of data. Step #4: Decompose more "abstract functions into more "primitive" sets of functions that operate on data. Step #5: Repeat Step #4 until all the functions are fully decomposed into a hierarchy of "primitive" units.
2004 Dr. James P. Davis

Higher-level block name Block name Block type Input pin Input port block_1 Architectural - Local

Output pin
UART (appears as Output port at higher level)

Internal output/Buffer pin


(appears as Buffer port at higher level)

Bi-directional pin
(appears as Bi-directional port at higher level) Functional Block

Bounding Box (Encapsulates block's internal structure and interconnect)

Digital Systems Design Behavior


E-ASM Diagram: Executable Algorithmic
State Machine (ASM) used to decompose behavior within a function block into an ordered sequence of operations.
Clocking definition Enabling event definition s0 Moore Machine Actions: (synchronous or asynchronous) Signal Assertion signal1 Areg <- '0' Bus Assignment CLK1 (rising) ^RES

Step #1: Define sequence of steps within control algorithm, and specific operations to occur in each step, using abstract "flow chart". Step #2: Insert the necessary control to respond to events, using Case branch and Conditional branch. Step #3: Allocate each abstract data operation to RTL macro operator. Step #4: Bind individual buses for function variables to RTL bus units (registers, latches, wires). Step #5: Define control scheduling with declaration of clocking. Step #6: Decompose into concurrent "threads" for resource sharing.
2004 Dr. James P. Davis

State Input Conditions: Binary Decision Condition (If-Then) Multiway Branch Condition (CASE) IObus

s1

input1 & input2

Boolean input expression Mealy Machine Actions: (synchronous or asynchronous)

!signal5 Breg <- input2 default Macro-function Assignment Output <- NMUX (Areg, Breg, in1) MDR <- ScratchPad [MAR] Memory Read/Write with Relative Addressing

1001

0110

s2

A<- '0'

s3

Areg <- input1

s4

!signal4

s5

Overview of Behavioral Synthesis


d <= a + b + c; d <= a + b + c;

Behavioral synthesis starts with abstract description of behavior written in VHDL, SpecC or C, using no timing information. Task #1: Compile source code into intermediate format, for example, control-flow graph, dataflow graph. Task #2: schedule data operations to occur on specific control cycles, determined by clocking. Task #3: allocate data operations to RTL components implied by use of language operators <+, -, *...>. Task #4: bind specific operations to individual RTL components, to construct complete circuit topology.
2004 Dr. James P. Davis

control step 1

+ +
control step 1

+ +

control step 2

a b c

+ +
d

a c b

+
d

MUX

s
MUX

High-Level Design for Synthesis


Concept

Designer starts with Functional Spec, and a "concept" of design solution. Choice #1: create initial design representation as RTL level VHDL or Verilog code, or... Choice #2: create initial design representation using "behavioral VHDL or SpecC, SystemC, C language code, or... Choice #3: create initial design representation as high-level graphical "plan" of design solution.

behavioral C/VHDL description

graphical plan-based description High-level design for synthesis

register-level VHDL/Verilog description RT level description

High-level Synthesis
RT level description

RT level description

Logic Synthesis

Gate level description

The goal is to create HDL code that is synthesizable and efficient for logic synthesis.
2004 Dr. James P. Davis

Graphical Languages versus HDLs


Why we choose to use graphical design notations for analysis and architecture design:
Human mind works more effectively with visual and spatial information (for learning, retention, manipulation of artifacts, and for communicating ideas).
During evolution of human species, we spent more time using pictures to convey ideas rather than text-based language. We use sophisticated graphical representation of design artifacts annotated with textual components.

Graphical notations more effective in chunking design knowledge: a few changes in graphical model implies large number of changes in HDL code.
More compact representation in graphics. The links between constructs carry much information.

Design consists of planning and configuration tasks, which are easier to perform with diagrammatic representations than textual ones. Graphics allows designers to keep focus at higher-level, making possible better trade-offs in the design, also allowing more agile exploration of design space.
2004 Dr. James P. Davis

Digital Systems Design

Partitioning of Control and Data

2004 Dr. James P. Davis

Functional Partitioning of Control and Datapath


present state information next state information inputs

Control in

control outputs

Control out

State Registers
input/next state decoding logic output decoding logic

CLK

^RES

Control Unit Status Select

Data in

Data out
steering logic clocked register combin. logic MUX clocked register

Data Path Unit

2004 Dr. James P. Davis

Finite State Machine Model - Introduction


present state information next state information inputs input synchronizing registers input/next state decoding logic CLK inputs CLK control outputs

State Registers
output decoding logic

output filtering registers

CLK

Components of FSM Model


State registers, input synchronization registers (optional) and output filter registers (optional). Next state decoding logic, and output decoding logic - combinational logic blocks. Input signals to the state machine, which are inputs to the next state and output decoding logic blocks (could be synchronized to clock with input registers). Next state information, which is generated as a result of input/next state decoding logic. Present state information, output from the state registers, which is fed back as an input to both next state and output decoding logic blocks. Outputs from the state machine - either generated synchronously from the output of the state registers (also used as present state information), or asynchronously as output of the output decoding logic block (which takes input and present state information to produce outputs). Could be filtered using output registers to eliminate possible signal transients.
2004 Dr. James P. Davis

Finite State Machine Design Model Types


Moore machines:
Control outputs generated by the state machine are dependent only on the present state information. The control outputs are synchronized to the clock that controls state transitions.
output decoding logic

next state inputs present state input/next state decoding logic State Registers

control outputs

Moore machines are used when it is important to synchronize all control actions with the change in state, and thus, by the clock. Moore machines effectively filter out transients, and can be used to eliminate race conditions when inputs are unfiltered.

2004 Dr. James P. Davis

Finite State Machine Design Model Types


Mealy machines: The control outputs of the state machine are dependent on the inputs and present state information. The control outputs can be asynchronous, in that outputs can change value as the inputs change value, provided the appropriate present state information is maintained. The control outputs are gated by the present state. Mealy machines are used to create control blocks that respond quickly to external signal changes. Care must be taken to isolate the design from transients and race conditions.
2004 Dr. James P. Davis

next state inputs input/next state decoding logic State Registers present state output decoding logic

control outputs

Sequential Logic Design


Use of storage elements in the data path to store signal values.
Purpose is to synchronize the behavior of complex circuits. Benefits of circuit synchronization:
Eliminate unpredictability of output behavior due to timing skew. Create signal stability, as they must have stable values for certain period of time. Better isolate signals from noise transients.

Use of storage to create complex control structures.


Controller sequences the operations in the data path. The movement of data through the data path is staged in pipelined fashion Complex circuits can be broken down and architected in terms of their control or data dominated behaviors.
tp1

Data path pipeline stage 1


Storage Registers
Synchronizing Clock Signal

tp2

Data path pipeline stage n


Storage Registers
Synchronizing Clock Signal
Outputs

Inputs

Combinational Logic block

Combinational Logic block

2004 Dr. James P. Davis

Executable ASM Method State Machines


The timing semantics of Moore and Mealy machine modeling

2004 Dr. James P. Davis

Composition of ASM Diagram - Example


Clocking definition CLK1 (rising) ^RES Enabling event definition s0 Moore Machine Actions: (synchronous or asynchronous) Signal Assertion signal1 Areg <- '0' Bus Assignment

Executable ASMs
States Conditions Cases Conditional Outputs Assertions Assignments Expressions Macro-functions Memory Indexing Clocking Reset Synchronous events

State Box

State Input Conditions: Binary Decision Condition (If-Then) Multiway Branch Condition (CASE) IObus

s1

input1 & input2

Boolean input expression Mealy Machine Actions: (synchronous or asynchronous)

!signal5 Breg <- input2 default Macro-function Assignment

1001

0110

s2

A<- '0'

s3

Areg <- input1

s4

!signal4

s5

Output <- NMUX (Areg, Breg, in1) MDR <- ScratchPad [MAR] Memory Read/Write with Relative Addressing
2004 Dr. James P. Davis

Relationship of State Machines to Datapath


ASM diagrams incorporate information about control path and data path into a single representation. Using this notation, a design can express different design styles for both synchronous and asynchronous behavior of both the control and datapath.

2004 Dr. James P. Davis

Moore Machine - Registered Bus Assignments


Registered bus assignments may be used by placing expressions on buses that are defined as Register in the Element field of Bus Table. The buses in the datapath will be realized using registered logic. The buses aReg and bReg are realized by using additional layer of registers. This imposes a 1 clock cycle delay from when the operation is scheduled by the state machine in state s1 and when the updated values are propagated to outputs of aReg and bReg. However, the resultant assignment value is preserved in the registered output until it is explicitly modified. Using registered bus assignments increases gate count and circuit delay of the datapath. However, the designer avoids race conditions, signal transients, and unwanted feedback conditions by designing with registered logic.

2004 Dr. James P. Davis

Moore Machine - Unregistered Bus Assignments


Unregistered bus assignments may be used by selecting bus Elements as Wire in Nimbus Bus Table. This implies that the buses in the datapath will not be realized using registers, but with wires. Unregistered datapath operations, though causing the datapath buses to be realized without registers, are still synchronized by the clock driving the Moore-style state register outputs. Using unregistered bus assignments reduces gate count of the datapath, and reduces circuit delay. However, special care must be taken to avoid race conditions, signal transients, and unwanted feedback loops in the datapath that can cause metastability (not settling to a specific value) or oscillation.

2004 Dr. James P. Davis

Mealy Machine - Unregistered Bus Assignments


For Mealy-style outputs specifying datapath Bus assignments, the control outputs change asynchronously with the value of input aSig. The use of Bus assignments with Mealy outputs implies that the datapath buses with be unregistered (either latches or wires, depending on Element value specified in the Bus Table). Thus the result of the assignment will appear on the output of the datapath immediately (assuming no propagation delay, since it is assumed this is subsumed by the clocking around the other elements of the design unit).

2004 Dr. James P. Davis

Mealy Machine - Registered Bus Assignments


Specifying registered assignments of buses used in Mealy outputs is done by selecting Register as bus Element type in Bus Table. The values to buses aReg and bReg are assigned synchronous to the next active clock edge.

2004 Dr. James P. Davis

Executable ASM Method Control Structures


The inventory of algorithmic control constructs and comparing them to state diagrams
2004 Dr. James P. Davis

ASM vs. State Diagrams - Control Structures


CLK(rising) ^RES

s0
Action1 Action2 Action3 Action4 Action5

s0

s1

s0 s1

a=0 s0 s1
0 1

a|b

a=1 s2

s1 s2

s2

s2

Action6

Sequence: Series of States States follow the sequencing indicated by direction of state transitions. ASM diagrams have actions attached to the right side of state "boxes" for clarity. In ASM diagrams, transitions to next state are triggered by the system clock or any single-bit signal. States with no actions are called delay states, indicating a delay of one clock cycle.

Selection: Binary Branching Binary branching is represented using a condition "diamond". Boolean expressions on conditions can be of arbitrary complexity, with many terms and variables. For simple branching situations, both representations are equally suited to the task.

2004 Dr. James P. Davis

ASM Diagram Nested Control Structures


If-Then-Else: Nested Statements

Source: Roth 1998 PWS Publishing

2004 Dr. James P. Davis

ASM vs. State Diagrams Branching Control Structures


a=0001 a=0010 s0 a=1000 s0 a
0001 0010 0100 1000

s1 s2

a&b s1 ^a & b s0 s2 ^a & c & d ^b & c

s0 a&b
0 1 1 1

others?

a=0100 s3

^a & b
0

^a & c & d
0

s4

^b & c
1 0

default

s1

s2

s3

s4

s4

s3

s1

s2

s3

s4

Selection: Multi-way Branching, Single Variable ASM diagrams use case construct for multi-way branch conditions of a single, multi-bit variable/bus. Branch conditions are binary or enumerated values. In ASM diagrams, all undefined transitions are tied to a default transition. This eliminates possible transitions to unspecified states, a common cause of design failure in the field. State diagrams have no such mechanism, and thus are ambiguous.

Selection: Multi-way Branching, Multi-variable In State diagrams: (1) the ordering of transitions is ambiguous; (2) all transitions aren't specified (device problems likely in the field); (3) behavior is unpredictable under all conditions. In ASM diagrams: (1) ordering of transitions is explicit; (2) transitions specified for all possible input combinations (including MVL values); (3) behavior is predictable.

2004 Dr. James P. Davis

ASM vs. State Diagrams Loop Control Structures


^a s2 s0 a s1 ^a a s1 s0 s2
s0 s2
1 0

^a

s2

s0

s1

s1 a
1 0

Repetition: "While-Do" Control Loop While-Do control structure is more apparent in the ASM diagram, and is consistent with hardware description language (HDL) constructs. Single decision point in ASM diagram handles complexity more easily when multiple terms and variables are used in looping condition expression. This type of control construct is used often for counting the number of loop iterations, where you test the condition prior to each execution of the loop, including the first pass.

Repetition: "Repeat-Until" Control Loop Placement of the decision "diamond" defines the type of looping construct, and the type of control behavior. Repetition control structures are used in various styles of polling loops for implementing handshaking protocols. This type of control construct is used often for counting the number of loop iterations, where you test the condition after completing each execution of the loop, requiring execution of at least the first pass.

2004 Dr. James P. Davis

ASM vs. State Diagrams Synchronous Transfer of Control


s4
^RES

s0

a : P2

s0 a

s1

s2
0

a
1 ^RES : P1

s4 s0 s1 s2 synchronous enable event

a s4

s1 a
0 1

s2 a
0 1

explicit conditional tests

Control Interrupt Schemes - State Diagrams: The State diagram models state transitions, but the prioritization information--when multiple transitions are possible--isn't clear in the notation, without adding some additional symbol to indicate priority of the transitions. Also, if State transitions have additional conditions, it isn't clear what happens when conditions aren't met (incomplete specification). However, this may be handled by modeling looping on a state in some cases.

Control Interrupt Schemes ASM Diagrams: In ASM diagrams, you can either define a test on each state transition for the specific event using condition "diamonds", or you can use the Enable Event construct, indicating that the specified event has precedence over the normally-specified next state transition. This works like a priority encoder on the next state decoding logic of the state machine. At any time when the input is sampled for determining next state, the transition for the Enable Event a will take precedence.
2004 Dr. James P. Davis

Digital Systems Design

Combinational Logic Units as Datapath Building Blocks

2004 Dr. James P. Davis

Gate Level Design Gate Devices


Symbols and Truth tables for 5 basic gate-level logic gates: NOT, NAND, NOR, AND, OR.

Tanenbaum 1999 Prentice-Hall Publishing

2004 Dr. James P. Davis

Combinational Logic - Comparator Circuit


z

Tanenbaum 1999 Prentice-Hall Publishing

Comparator (EQ): 9 COMP allows comparison of two different inputs to check if they are equal. Alternate circuits evaluate the relative magnitude of two inputs. 9 If they are equal, the output is HIGH, but if they are not equal, the output is LOW. 9 The comparison must be done with two signal inputs of equal width. 9 The output of the Comparator operation is a single-bit signal.
2004 Dr. James P. Davis

Combinational Steering Circuits - MUX


Tanenbaum 1999 Prentice-Hall Publishing

8-input Multiplexer (MUX):


9

MUX allows the signal value of one of its data inputs (D0 D7) to pass to the output F. The selection of the signal to be passed is controlled by SELECT lines (A, B, C). The number of select lines, n, is based on a power of 2 for the number of inputs, m. So, if we have m inputs, well need n select lines so that 2**n=m. The MUX inputs and output must be the same width, and the SELECT lines are 1-bit each.
2004 Dr. James P. Davis

Combinational Encoder/Decoder Circuits


Tanenbaum 1999 Prentice-Hall Publishing

3:8 Decoder (n to 2n DECO):


9

DECO takes a binary encoded input of n data bits, and decodes it into individual data output lines, where one output (D0 D7) is enabled, depending on whether the encoded value corresponds to the data line number. A Decoder input with n lines means we can encode 2**n possible binary encoded values. With 2**n possible encoded values on the input, well need exactly n output lines, one for each possible encoded input value. The output line corresponding to the decoded value will be enabled.
2004 Dr. James P. Davis

Executable ASM Method Datapath


The timing semantics of Moore and Mealy driven datapath

2004 Dr. James P. Davis

Datapath Logic Design


Use of memory elements in the data path to store signal values.
Purpose is to synchronize the behavior of complex circuits. Benefits of circuit synchronization:
Eliminate unpredictability of output behavior due to timing skew. Create signal stability, as they must have stable values for certain period of time. Better isolate signals from noise transients.

Use of memory to create complex control structures.


Controller sequences operations in the data path.

tp1

Data path pipeline stage 1


Storage Registers
Synchronizing Clock Signal

tp2

Data path pipeline stage n


Storage Registers
Synchronizing Clock Signal
Outputs

Inputs

Combinational Logic block

Combinational Logic block

2004 Dr. James P. Davis

Datapath Logic Operations


Pre-defined executable datapath operator macros.
Are used in LHS of macro assignment statements. Datapath macro operations are scheduled in states on entry to the state. Attached as a text expression to the state. LHS: assigned bus/signal. RHS: input args, macro function calls, possibly nested (like C functions).
Specified output bus/signal Pre-defined macro-functions Specified input buses/signals

C-Bus <- ( BOR( AND( B-Bus, NOT( DECO( A_Bus)))))

A_Bus

DECO

NOT

Macro types
Arithmetic: ADD, SUB, INCR, MUL, DIV, REM. Boolean logic: AND, OR, NOT. Steering logic: MUX, DECO, PENCO, DMUX. Combinational logic elements.

B_Bus

AND

BOR

C_Bus

nPort

select

AnyXN
CollIn CollisionEvent

2004 Dr. James P. Davis

Moore Machine - Macro-function Assignments

2004 Dr. James P. Davis

Creating User-defined Macro-functions


A_Bus DECO NOT

B_Bus

AND

BOR

C_Bus

nPort

select

AnyXN
CollIn CollisionEvent

RTL macro-functions: contain over 30 primitive data path elements in a library. Macros are "scalable" - with any number of buses and any bus widths. Macros can be used to construct more complex user-defined data path functions.
Macro-function definition: AnyXN(A_bus,B_bus) ::= BOR(AND(B_bus,NOT(DECO(A_bus)))) Macro-function binding: CollisionEvent <- AnyXN(nPort,CollIn)

2004 Dr. James P. Davis

Using Boolean Operators vs Macro-functions


Specifying simple data path operations can be done using either expression operators or macrofunction operators.

2004 Dr. James P. Davis

Carry/Borrow Bit with Arithmetic Macro-functions

2004 Dr. James P. Davis

Combinational Logic Example:


Unsigned Integer Adder

2004 Dr. James P. Davis

Adder Circuit The Basic Adder Unit


Source: Tanenbaum, 4th Edition, 1999 Prentice-Hall Publishers.

Full Adder circuit:


9

This circuit takes two singlebit inputs and adds them together to produce a Sum as output. The Full Adder also has a Carry (called a Carry Out) like the Half Adder. The Full Adder also has a Carry In signal, allowing Carry Out from earlier adder stage to be connected. This allows a multi-bit, multistage adder circuit to be constructed.
2004 Dr. James P. Davis

Adder Circuit The Ripple Carry Adder


Source: Tanenbaum, 4th Edition, 1999 Prentice-Hall Publishers.

RPC Adder circuit:


9

This circuit takes two multibit inputs and adds them together to produce a multibit Sum as output. The RPC Adder also has a Carry Out signal that results from the rippling of the carry output from each FA bit computation through the entire operand word length. The RPC Adder is a reasonable solution for small bit widths; however, a more elegant solution is needed when speed is critical, or when bit-widths get large.
2004 Dr. James P. Davis

ASM Models 32-bit Ripple Carry Adder


We model the Carry Logic separate from the Add Logic. Each bit of the 32-bit operands is sliced.

Carry logic

The Add functions are a Boolean reduction of the Full Adder structure on previous page.

1st stage AND logic.

This is a structural model, with each bit modeled in terms of its gate-level logic. It is very lowlevel.
Each combinational block is represented as a separate, single-state concurrent thread. The final Sum bits are registered.
2004 Dr. James P. Davis

2nd stage OR logic.

ASM Models 32-bit Ripple Carry Adder


This is a more abstract model, with each bit operated on by a 1-bit ADD unit. It is still a lowlevel model.

2004 Dr. James P. Davis

Using a Behavioral Adder 32-bit Multiplier


This is a reference to a behavioral Adder macro model, maintained by NimbusTM, with all bits operated on in parallel by a 32-bit unit. It is a higherlevel model. The shift-add Multiplier scheme is the most basic of unsigned Integer multiplication algorithms. Note the algorithmic nature of the model: (1) loop control using a count register, (2) concurrent operations scheduled on each state, (3) bus slicing, shifting and register-register assignment.

2004 Dr. James P. Davis

Sequential Logic Example:


Binary Up/Down Counter

2004 Dr. James P. Davis

The Binary Up/Down Counter Block Diagram


The Binary Up/Down Counter 9 This multi-mode counter block takes an input seed value, and counts up or down from this value, based on the direction of count. CountValue
z

NoCounts Seed Direction Enable 1 8 8 1 1 8

Valid

Each count cycle the Enable pulse is set, the Direction bit can be set to count in one or the other direction, and the Seed and NoCounts values are set. The counter will run for the number of cycles indicated by input NoCounts.

2004 Dr. James P. Davis

Up/Down Counter Model ASM Diagram


O The ASM model consists of a single thread, with two condition tests: (1) loop termination condition, and (2) count direction (up or down). We use arithmetic macro-functions INCRNC (increment by one with no carry) to increment the loop counter, and INCRNC and DECRNC (decrement by one with no carry) to modify the counter value, depending on the count direction.
CLK ^RES Bus_1 <- Seed Bus_2 <- Direction Bus_3 <- NoCounts Loop <- 0 CountVal <- 0 Enable = 1 N Y

Poll

CntControl

Loop = Bus_3 Y N Loop <- INCRNC(Loop) CountVal <- Bus_1

Bus_2 = 1 Poll Y N Bus_1 <- DECRNC(Bus_1) Bus_1 <- INCRNC(Bus_1)

2004 Dr. James P. Davis

Digital Systems Design

Cycle-Level Timing Definition

2004 Dr. James P. Davis

Synchronous Design Quantized Timing via Cycles


In high-level design, we make a trade-off of timing accuracy versus design productivity. We abstract detailed timing to lower-level design, so we are not overwhelmed with design details; but we use a cycle-level timing model.

Source: Tanenbaum, 4th ed. 1999, Prentice-Hall

2004 Dr. James P. Davis

Digital Design Register Device Timing

Source: A. Tanenbaum, 4th ed., 1999, Prentice-Hall Inc.

Setup Time
Time tsu is the amount of time we must keep stable data on the input prior to sampling at the active clock edge.

Hold Time
Time th is the amount of time the input signal value must be stable after the active clock edge, so that it can be sampled by the flip flop correctly.

Example: D flip flop

2004 Dr. James P. Davis

Delay Assumption for Cycle-Based Timing


In Executable ASMs, we model the behavior of registers using the "limit" assumption. First, at some time tn corresponding to the active edge of a clock, there is a different value on the input of a register than on its output. The time between when the register input is "sampled" and when the value appears on the register output cannot be zero. We need to consider the change in "state" of the design on the clock edge, where we are assigning values to the input and wanting to see the results that appear on the output. We assume the mathematical limit of tn from both sides of the clock transition, which we refer to as times tn- and tn+. The time tn- is when the register input is being "sampled", and the time tn+ is when the sampled value appears on the register output.
tpsd = 0 next state State Reg present state

next state decoding logic

tpsm = 0 t0t0+

combination and steering logic

Data Path Reg

tpdp = 0 t0t0+

2004 Dr. James P. Davis

Register Level Design Clocking Schemes


A clocking element
An oscillating crystal with properties that allow it to serve as a synchronizing element in digital designs. Graphical means to indicate expected or observed timing behavior of a design. Represented by a waveform of significant signals, their timing and values at each instant. Many designs operate on the clock to generate some multiple or fraction of the whole clock signal, which could be symmetric or asymmetric to the system clock.
2004 Dr. James P. Davis

Timing diagram

Multiple clocks

Source: A. Tanenbaum, 4th ed., 1999, Prentice-Hall Inc.

Register Level Design Clocking Schemes


Executable ASM models support built-in clocking schemes.
Control and data path register clock pins tied to same synchronizing source.

Scheme 1:
Single-phase, rising or falling edge clocks.

Scheme 2:
Two-phase overlapping, rising or falling edge clocks.

Scheme 3:
Two-phase non-overlapping, rising and/or falling edge clocks.
2004 Dr. James P. Davis

Executable ASM Method Synchronization


The timing semantics of Reset, Clock, and synchronized concurrent threads
2004 Dr. James P. Davis

Synchronous vs Asynchronous Enable Events


Enable Events are used is specifying pre-emptive control behaviors of the design, where the normal control flow in the design is interrupted. Enable Events can either be synchronous or asynchronous.

2004 Dr. James P. Davis

Enable Events for Synchronization


CLK (rising) ^RES C1 (rising) AS1 C1 <- '1' AS1 <- '1' AS1 <- '0' C1 <- ^C1 C2 (falling) ^AS2 C2 <- '1' AS2 <- '0' AS2 <- '1' C2 <- ^C2

s0 s1 s6
Thread 1

s2 s3 s8
Thread 2

s4 s5 s7
Thread 3

ASSERT_1 !ASSERT_2

Modeling Concurrency: - Multiple model FSM "threads" having shared buses. - Independent clocking schemes and enabling events (e.g., ^RES). - Types of concurrent interaction:
I. Synchronization - coordinated activities (e.g., handshaking, pipelining). - implicit references to shared buses. II. Competition - shared resources (for example, bus arbitration). - explicit use of other concurrent processes, components, or entities to model the arbitration protocol.
2004 Dr. James P. Davis

Digital Systems Design

Architecture Patterns

2004 Dr. James P. Davis

Control Design Handshaking Pattern


Handshaking
Polled handshaking:
FSM A thread waits in a polling loop, testing for signal ZB to be asserted by FSM B. FSM B thread waits in IDLE loop for signal ZA to be asserted by FSM A.

Asynchronous handshaking:
FSM threads use an asynchronous interrupt mechanism to alert it to when the event has occurred (however, most likely gated to a clock signal).

We use architecture and behavior patterns


Source: Roth 1998, PWS Publishing

Well-defined structure and behavior, re-used through a system.


2004 Dr. James P. Davis

Control Design Sequencing Pattern


z

Sequencing
9

This pattern has the sequencing of data path operations by one or more state machines. The example shown is the data path for a small CPU, where micro-operations based on program instructions are decoded and staged to execute multi-cycle instructions out of memory. This example also uses a pipelining pattern structure (discussed later).

Source: Tanenbaum, 4th ed. 1999, Prentice-Hall

2004 Dr. James P. Davis

Control Design Pipelining Pattern


N-Stage Pipeline
Pipeline allows serial processing, in sequence, of instructions or data elements. Each n-element in the pipeline processes its task, then passes the element to the stage n+1 in the pipeline. Design structures that use pipelining: CPU Instruction Fetch Unit (IFU), Digital filters (e.g., FIR, IIR).

Source: Tanenbaum, 4th ed. 1999, Prentice-Hall 2004 Dr. James P. Davis

Control Design Pipelining Pattern


z

Pipelining - 1
9

Source: Tanenbaum, 4th ed. 1999, Prentice-Hall

There are two kinds of pipelining: data path pipelining and control pipelining. An example of control pipelining is the Instruction Fetch, Decode, Execute cycle used in all CPU architectures. Another example is Bus Reads and Writes, which are generally pipelined so as to interleave the control operations, thus saving clock cycles (shown in the figure).
2004 Dr. James P. Davis

Control Design Pipelining Pattern


Pipelining - 2
The sequence of figures show how pipelining works in the control path. The control pipelining is the Instruction Fetch, Decode, Execute cycle used in all CPU architectures. Each stage of the control pipeline is buffered by registers that provide setup of data. The different stages of the pipeline also use handshaking.

Source: Tanenbaum, 4th ed. 1999, Prentice-Hall 2004 Dr. James P. Davis

Control Design Arbitration Pattern


Arbitration-1
The pattern works in situations where multiple service requesters want access to a scarce resource (such as a Bus). There are different arbitration schemes for requesting by one or more requesters and granting control of the resource by the arbiter module. Some use daisy chaining, or other prioritization schemes, to grant access. Arbitration can be centralized, using an arbiter module, or it can be decentralized (see next example).

Source: Tanenbaum, 4th ed. 1999, Prentice-Hall 2004 Dr. James P. Davis

Control Design Arbitration Pattern

S ta tio n -3 (In te r n e t G a te w a y ) S ta tio n -4 (P r in t S e r v e r )

Arbitration-2
The previous scheme involves use of a separate arbiter module, as is the case with most bus schemes.
802.11 WLAM operates this way when an Access Point is present, and the network is operating in PCF (point coordination facility) mode.

8 0 2 .1 1 W ir e le s s M e d iu m (C S M A /C A )

Another scheme involves no centralized arbiter:


When 802.11 is operating in ad-hoc DCF (distributed coordination facility) mode, without the centralized control of an Access Point, which operates like 802.3 Ethernet. CSMA/CD: Carrier Sense Multiple Access/Collision Detect. Sense for a distributed carrier signal, and detect for collisions as a means to gain access to the shared resource (wired network medium). CSMA/CA: Carrier Sense Multiple Access/Collision Avoidance. Sense for carrier signal, but dont rely on it solely as the means for gaining access. Use an additional timing mechanism passed among the data frames (needed because of the hidden node problem).
2004 Dr. James P. Davis

S ta tio n -2

S ta tio n -1

Digital Systems Design

Memory Arrays

2004 Dr. James P. Davis

Gates to Registers Single-Port Memory Array


Source: Tanenbaum, 4th ed. 1999, Prentice-Hall

4 x 3 Memory Array
The Memory array is built up from gates and flip flops, to take advantage of certain properties of the devices. Each row is one of four 3-bit words. Data Lines I0 I2 feed all of the D FFs in a column. The address lines A0 A1 act as select lines for a given bank. The control signals CS (control select), RD (read enable), OE (output enable) are used to route data to and from memory (allowing writes and reads) based on the combinational logic gates. The bus drivers are enabled by the AND of the 3 control signals.
2004 Dr. James P. Davis

Systems Design Memory Arrays


m x n Memory Array
Memory arrays allow multi-bit values to be stored and retrieved by address. A memory of n bits can be organized in different ways. Memory array as a word length: the number of bits of each word stored in memory (e.g., 32 bits). Memory array has number of words (e.g., 64KB). Each memory location is uniquely addressable (called content addressable memory).
2004 Dr. James P. Davis

Source: Tanenbaum, 4th ed. 1999, Prentice-Hall

Memory Use Example RAM Control Thread


O

Here, we have two banks of memory that are selectable through a bank select signal. Usually, some number of upper bits of the Address are used for this purpose. Note how we access memory locations via assignment and an index register.

Note: This bus serves as an index into the memory array.

2004 Dr. James P. Davis

Memory Use Example Memory Table


Memory creation consists of several steps: (1) Create a RAM array in the Memory Table (with Addr. Width, Data Width and Type). O(2) Edit values in the array using the Matrix Table. O(3) Reference the memory locations using an index.
O O

2004 Dr. James P. Davis

Memory Arbitration Pattern-1

2004 Dr. James P. Davis

Memory Arbitration Pattern-2

Invoke

Return

Invocation of a sub-flow in the ASM model: the handshaking logic is abstracted into a single thread fragment which is reused.
2004 Dr. James P. Davis

Memory Arbitration Pattern-3


Invocation of a sub-flow initiates the handshaking with the Arbiter thread for access to the memory array.

Request

Grant Start Transfer

Signal when Done

The handshaking is managed by polling loops on both sides of the protocol.


2004 Dr. James P. Davis

Section 2 - Summary
Executable algorithmic state machines (E-ASM):
Allow both control and data operations to be specified in time (cycle by cycle scheduling) and space (allocation of specific resource types using abstract macro-functions). Notation we define is directly executable in the Nimbus tool set.

Using ASM diagrams:


A thinking aid for defining the structure and sequencing behavior of Finite State Machines. Used in 3 different ways: (1) definition/specification of sequential systems, (2) analysis of sequential circuits, (3) design of combinational and sequential circuits behaviorally.

Next: well look at some designs and analysis.

2004 Dr. James P. Davis

Section 3 Example High-Level Designs of VLSI Circuits & Systems


2004 Dr. James P. Davis

Analysis & Design Process Variations


We can have several possible process variations, using the Executable ASM as the core methodology component in analysis and design.
Starting from a protocol description, derive an appropriately partitioned architecture of communicating concurrent ASM threads for realizing the distributed control. Starting with an algorithm, derive an appropriately sequenced and scheduled set of data operations, allocated and bound to appropriate classes of macro operators, as modeled using one or more ASM threads. Starting with an existing circuit design, refactor or reengineer, the design into an abstract architecture consisting of one or more ASM threads.

2004 Dr. James P. Davis

Process Design-for-Synthesis Methods & Tools


Start KBS flowHDL Exsedia NimbusTM
& IPstation blockHDL
TM

TM

TM

Design Approach - "stepwise refinement", with "iterative enhancement". Create design "skeleton", with core functions and cycle-level timing information specified. Iterate the design through synthesis, checking key area and timing constraints.

Capture Design Compile & Checking Correct Entry?

NO

YES

Cycle-based Simulation?
YES

Synopsys SGE
NO

TM TM

Synopsys VSS

Behavioral Simulation Correct Behavior?

HDL Simulation Required? NO


YES

Synopsys Design Compiler HDL Compiler


TM

TM

FPGA Compiler

TM

NO

YES

Functional Simulation Correct Function?

Logic Synthesis Gate-level Timing Analysis Correct Timing?

DesignWare

TM

Design Analyzer Timing Analyzer


TM

NO

YES

NO

YES

Return to the top of the process to make corrections, and to enhance the design description. Integrate completed behavioral block with other blocks for HDL "system" simulation.
NO

Partition, Place & Route Area & Speed?

Xilinx ISETM

YES

Fabricate Device Done

2004 Dr. James P. Davis

Executable ASM Design Method Using NimbusTM

2004 Dr. James P. Davis

The Design Capture Process


Using Exsedias NimbusTM toolset:
Well specify the behavior of design units as Executable ASM models. Well apply design patterns for defining the system control as a collection of threads, executing concurrently. Well verify execution of compiled ASM models using the integrated cycle-based graphical simulator in NimbusTM.

2004 Dr. James P. Davis

Arithmetic Datapath Unit Example:


192-bit Unsigned Integer Multiplier

2004 Dr. James P. Davis

Getting From Small to Large Multipliers


Multiplier Top-level Unit A q

Realities:
Z

B q

2q or q

Large multipliers dont behave like small ones, as they are not linearly scalable in space or time. [Parhami, 2000]
Adder Intermediate Unit
CIN

Multiplier Intermediate Unit A p

B p

2p or p

Z Multiplier Base Unit A n

A r

Questions:
Z r

+
1
COUT

B n

*
Adder Base Unit
CIN

r Z 2n

What small multiplier unit configurations can be efficiently deployed to achieve the best tradeoffs in area and timing on the FPGA fabric? How do we structure the hierarchy of such units to achieve the best overall balance in building wide-bit MUL units for use in our application?
2004 Dr. James P. Davis

A B

1 m m

+
1

COUT

Exploring the Architecture Design Space


To pursue this line of inquiry, there are a number of questions to be answered: - Given the modeling of these arithmetic base units, can we use behavioral models, or must we use structural models? - If we use behavioral models for multiplication, at what bitwidths do these models cease to be efficient? - What combination of structural and behavioral models should be used, and where in the overall architectural hierarchy are each appropriate?

2004 Dr. James P. Davis

Candidate Wide-bit Multipliers BCn


b5 a0 a1 a5 32 b4 b3 b2 b1 b0

The 192-bit Broadcast :


[Buell et al., 2002]

. . .

MUX 32

X
Operand Select 64

Shift_32

Rf

Rd

Rb

Re

Rc

Ra

Requires 6 32-bit MUL units, 3 64-bit ADD units, a 192-bit ADD unit, and lots of shifters & wide registers. Were I/O limited by a 32-bit PCI-X bus transfer to the FPGAs
We size the MUL ops of this unit to match. We absorb some of the cycle cost of getting operand data by starting MUL pipelining after 7 bus cycles.

+ + +

MUX Ripple Carry

MUX

MUX

Ri

Rh

Rg

Ripple Carry

Shift_32

Rl

Rk

Rj

Final Product

We want to exploit the Xilinx Virtex MULT18x18 hard macros.

2004 Dr. James P. Davis

Characterizing the Design Space-1

MUL Architectures Virtex 18x18 Block (VE) Behavioral VHDL * Op (BE) Booth (BO) Shift-Add (SA) Broadcast (BC) Divide & Conquer (DC) Bit-widths 12, 16-bits 32, 48, 64-bits 16, 32, 48, 64-bits 16, 32, 48, 64-bits 32, 48, 64, 96, 128, 192, 256-bits 32, 48, 64, 96, 128, 192, 256-bits

Using a decomposition tree model, we can discuss size of the space of candidate MUL configurations, based on possible candidate partition trees.
Let R be the set of possible root node multipliers, of differing topology and bit-width, R type X R bit-width
R = { DC192, DC256, BC192, BC256, .. }

Root node

Depth of partition tree:

Let NT be the set of non-terminal nodes, compositional MUL units, of different topology and bit-width, which can be further decomposed (either compositionally or recursively) into smaller NT units, where
NT = { DC128, DC96, DC64, DC48, DC32, BC128, BC96, BC64, BC48, BC32, BO64, BO48, BO32, SA64, SA48, SA32, BE64, BE48, BE32, ..}

|S0| = N-bits

d=0

Non-terminal nodes

S1 |S3| = N/4 Sn-r


Terminal nodes

|S2| = N/2 S4 S5 S6 Sn-1

d=1 d=2

Some number of partitions later...

-- Let T be the set of terminal nodes, base MUL units, which are not further decomposed, by type (width = 16 bits). T = { VE16, VE12, BO32, BO16, SA32, SA16, ..}

dmax = 4

2004 Dr. James P. Davis

Collecting Configuration & Cost Data


For answers, models of different topologies and configurations must be created and their performance characterized:
- Metrics: pin-to-pin delay, resource usage (slices, MUL units, IOBs), and minimum clocking for circuit synchronization. - Architectural techniques: concurrency through parallelism and pipelining (enhancing speed at the cost of FPGA resources), resource sharing (minimizing resource consumption at the cost of speed), and interleaving (trying to hide operation timing in the hard bottlenecks of other operations). - Experimental Method: create models for Multipliers, and their constituent Adder, logic and register units, using different architectures, bit widths, decomposition depths, and decomposition strategies--and measure delay and resource usage for different permutations. - Design Technique: try a number of combinations at a given level of design hierarchy, then use this selection as the primitive unit in the next layer upward in the component hierarchyuntil we reach scale.
2004 Dr. James P. Davis

Step 1 Costing Logic Component Primitives


Preliminary data.

Goal: to evaluate our ability to estimate area and delay for different bit-width MUL unit architectures, top-down. Approach: build models of combinational & sequential logic units, whose width scalability costs can be assessed, bottom-up.
Build unit models using ASM. Compare cost of behavioral vs. structural model styles.

2004 Dr. James P. Davis

Step 2 Costing Adder Units


25000.00

64-bit exemplar ADD units.

20000.00

Area Delay Product

Adder schemes are not considered costly relative to multipliers.


But were using large bit-widths! Many ADD units of different widths embedded in a single wide-bit MUL unit.

15000.00

10000.00

5000.00

0.00 Ripple Carry Area * Delay 408.70

Carry Look Ahead 870.94 Carry Select 1031.10 Bit Serial (Gray) 19752.77 VHDL Add Operator ('+') 522.24

Metrics:
area and delay are used to calculate AreaDelay Product. Take inverse ratio of others to lowest A-D Product, and normalize against the most efficient.

120.0% 100.0%

Normalized Efficiency

80.0% 60.0% 40.0% 20.0% 0.0% Ripple Carry Efficiency 100.0%

Result: Ripple-Carry has best efficiency score.


Carry-Lookahead and Carry-Select are less than half as efficient, overall, due to their area being 2X of Ripple-Carry.

Carry Look Ahead 46.9%

Carry Select 39.6%

Bit Serial (Gray) 2.1%

VHDL Add Operator ('+') 78.3%

2004 Dr. James P. Davis

Step 2 Costing Adder Units


Ripple-Carry ADD - Timing Constraint Evaluation
30 25

As ADD widths scale, do their costs factor significantly in overall cost of a wide-bit MUL?
What impact on design do ADD units have relative to the over-all MUL pipeline architecture? Focus on the delay cost of the carry chain.

Delay (ns)

20 15 10 5 0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 Operand Bit-width

Factors:
For Virtex-II, with 18x18s, the ADD stages will be slower than the base unit MUL stages in the overall wide-bit pipeline. Clocking strategies will be gated by how fast ADD stages of different widths can run.

Split this 192-bit ADD into 3 64-bit ADD pipeline stages, then clock at faster rate.
b5 a0 a1 a5 32 b4 b3 b2 b1 b0

. . .

MUX 32

X
O p e ra n d S e le c t 64

S h if t _ 3 2

Rf

Rd

Rb

Re

Rc

Ra

Impact on setting clocking scheme of the pipeline.


Timing constraints defined for 15ns and 30 ns Ripple-Carry ADD runs. ADD units < ~128-bit width can be clocked at 15 ns. Others require slower clock, unless wide-bit ADD units are split across pipeline stages.

MUX R i p p le C a r r y

MUX

MUX

Ri

Rh

Rg

R i p p le C a r r y

S h ift_ 3 2

Rl

Rk

Rj

F in a l P r o d u c t

2004 Dr. James P. Davis

Step 3 Costing Base Multiplier Choices


450 400 350 425 387

269

Data shown for 32-bit base unit configurations only

Utilization (CLB Slices)

At issue was the selection of the base Multipliers:


What architecture(s) should be used? What bit-width should constitute the base unit, from which higher-level wide-bit units would be subsequently built? How many levels of recombination was optimal, with how many base units at each level?

300 250 200 150 100 50 0 197

119

Shift-Add Virtex-II Slices


16 14 12

Knuth 425

Booth 387

Divide & Conquer 269

Xilinx Macro 119

We settled on two paths:


32-bit base units (leveraging CLB LUTs alone). 16-bit units (leveraging 18x18 MUL resources for Virtex II).

197

13.563

Max Delay (ns)

10 8 6.223 6 4 2 0 Shift-Add Knuth 7.243 7.243

9.289

4.421

Each path was costed out at width including Adder data collected in previous step. Some were pruned from further consideration.
At issue: how much better are Xilinx 18x18s than CLB-based MULs?

Booth 9.289

Divide & Conquer 13.563

Xilinx Macro 4.421

Max Delay (ns)

6.223

Preliminary data.

2004 Dr. James P. Davis

Step 3 Costing Base Multiplier Choices


4000

32-bit base unit MULs.

3500

Area Delay Product

3000 2500 2000 1500 1000 500 0

How do the base units compare with one another?


Use Area-Delay Product as a metric to assess relative strength of architectures. [Rabaey et al, 2002] This gives a measure that is independent of area vs. speed tradeoff.

Shift-Add 1225.931

Knuth 3078.275

Booth 3594.843

Divide & Conquer 3648.447

Xilinx Macro 526.099

Area * Delay

How to assess the relative strength of using the Virtex II 18x18 multipliers as base units?
Use Normalized Efficiency, defined relative to the 18x18. Most of the other multiplier schemes are extremely inefficient when compared to 18x18 as a base unit. The Shift-Add architectures approach 45% of the 18x18 unit efficiency (60.4% in area, 71% in delay), making it the next candidate, when 18x18 resources are exhausted. The Divide & Conquer is 14% as efficient as 18x18s at 32-bits!!!

120.0%

32-bit base unit MULs.

100.0%

Normalized Efficiency

80.0% 60.0%

40.0% 20.0% 0.0% Shift-Add Efficiency 42.9% Knuth 17.1% Booth 14.6%

Divide & Conquer 14.4%

Xilinx Macro 100.0%

Preliminary data.

2004 Dr. James P. Davis

Step 4 Characterizing Wide Datapaths


Data for 32-bit base unit MULs only.
Preliminary data.

We have wide Multipliers of differing widths (e.g., 192 and 256-bits):


Estimate each one in terms of performance of its smaller-width subunits. Different unit architectures are possible: Divide & Conquer, and Broadcast, etc. These MUL architectures are then decomposed. We need resource cost estimates, checking which candidate MUL configurations to apply, which to prune. Consider cost of controller as well. We need worst case delay, along with the properties of the FPGA, to select the clocking strategy. We need to count cycles, to get at the latency of the unit, and trade this against FPGA device fit.
2004 Dr. James P. Davis


Worst case delay envelope => minimum cycle time Cycle time * number of cycles per operation => latency

Step 4 Characterizing Wide Datapaths

Wide-bit MUL Comparison


DC-1 10000 9000 8000 7000
Area (CLB Slices)

Assessing architectures at different bit-widths.


Two versions of Divide & Conquer, one of Broadcast, where differences apparent in CLB usage. 18x18 usage is same across versions at differing bit-widths.

DC-2

BC

6000 5000 4000 3000 2000 1000 0 32 64 Bit-wdith 128 256

Comparing DC and BC architectures at different bitwidths.


Which architecture uses resources most efficiently as it scales.

Preliminary data.

Assessment is augmented by considering latency (after clock assignment) and throughput.

2004 Dr. James P. Davis

Step 5 Characterizing Pipeline Control


Using Control-Dataflow Graphs (CDFGs)[Karp & Miller, 1969; Gajski et al., 1994]
Specify cycle-based sequencing and scheduling of operation steps in deterministic graph. Algorithmic state machine (ASM) method allows direct representation of algorithm for encoding in VHDL. Combination of state machine and RTN specification of MUL operation steps. Allows direct exploration of design space, for trading off area/speed issues, and for pipelining & resource sharing. Allows direct consideration of the controller logic component of a given MUL architecture.

State encoding scheme can impact delay cost from output decoding logic.
N states requires register resources of 2N/2 CLB slices on Virtex-II device (binary, gray coding) or N/2 slices (one-hot), plus decoding LUTs and MUXes.

2004 Dr. James P. Davis

Step 5 Characterizing Pipeline Control


Preliminary data.
16000.00 14000.00

Area Delay Product

12000.00 10000.00 8000.00 6000.00 4000.00 2000.00 0.00 Binary 14081.23 Gray 11126.29 OneHot 14217.25

How do architectures compare in terms of FSM encoding efficiency?


Use Area-Delay Product as metric. Averaged results over multiple architectures and differing wide bitwidths. Not looking at FSM efficiency for a given pipelined MUL architecture.

Area * Delay

120.00% 100.00%

Assessing the impact of FSM encoding on the architectures:


Use Normalized Efficiency, defined relative to Gray Code scheme. The other FSM schemes (Binary, One-Hot) are moderately inefficient compared to Gray Code. This is surprising, since One-Hot is generally preferred for FPGAs. On average, One-Hot encoding is 80% as efficient as Gray Code for realizing the FSM, as efficient as Binary encoding, for these pipelined architectures. The disparity appears to derive from how delay is affected by MUX chains in the control path.

Normalized Efficiency

80.00%

60.00%

40.00%

20.00%

0.00% Efficiency

Binary 79.02%

Gray 100.00%

OneHot 78.26%

2004 Dr. James P. Davis

Step 6 Assessing Pipeline Latency & Throughput


Examination of the multi-level hierarchical wide-bit units:
MUL Unit Slices 18x18s Clock Cycles Clock Period Latency DCn (ns) (ns) 32 585 3 6 15 8 64 2485 9 16 20 32 128 9282 27 21 30 63 Throughput (MMOPS) 12.5 3.1 1.6

Divide & Conquer Multiplier - Composite Plot


Slices 18x18s Latency Throughput

- use of subunits consists of intermediate MUL units - themselves divide & conquer units of the smaller width - these ultimately make reference to MULT18x18s.

We can assess the cost of the MUL pipelines, in terms of latency and throughput. There is a big question as to whether DCn will scale effectively while holding resource utilization, latency and throughput at acceptable levels.

32

64 Bit-width

128

2004 Dr. James P. Davis

Protocol State Machine Example:


802.11b WLAN MAC Layer

2004 Dr. James P. Davis

Example - 802.11 WLAN MAC


Wireless Local Networking
Supports pervasive computing on PCs, laptops, PDAs and other Internet-enabled devices. Most Access Points support broadband access to a wired network. Personal network management screens access functions that control behavior of the MAC Layer. Example: wireless router from D-Link.
S ta tio n -3 (In te r n e t G a te w a y ) S ta tio n -4 (P r in t S e r v e r )

Screen shorts 2002 D-Link Corporation

8 0 2 .1 1 W ir e le s s M e d iu m (C S M A /C A )

S ta tio n - 2

S t a tio n -1

2004 Dr. James P. Davis

802.11 WLAN Typical Architecture


C P U co re D ie area
4 .9 m m 2 o n 0 .1 8 m e stim a te d size w ith 1 6 K B in stru ctio n & 4 K B d a ta ca ch e s a n d n o TCMs U sin g A rtisa n ce ll lib ra ry & R A M co m p ile r

P eak P o w er C o n s u m p tio n (m W /M H z )

M em o ry S ystem
S e le cta b le I & D ca ch e size s: 0 , 4 K , 8 K ... 1 M S e le cta b le I & D T C M size s: 0 , 4 K , 8 K ... 1 M

C lo ck fre q u e n c y & M IP S p e rfo rm an c e


150M H z on TSM C 0 .1 8 m (w o rst ca se ) 230M H z on TSM C 0 .1 8 m (typ ica l)

A R M 9 E / A R M 9 4 6 E-S ca ch e d p ro ce sso r w ith tig h tly co u p le d m e m o ry in te rfa ce s

1 .1 m W /M H z @ 1 .8 V (e stim a te d )

M AC co n tro ller BUS in terface RF F ro n t End BBP M AC

d evice d river (b asic fu n ctio n call, m em o ry co n tro l etc.

P ro to co l S tack & OS kern el

C lien t d river Ap p licatio n AP d river

T arg et B B P & M A C C o n tro ller C h ip

S o ftd rive r

PHY

M AC

M e m o ry b lo c k (4 ~ 1 5 k B ) + c o n tro l b lo c k

O n c h ip m e m o ry d e p e n s o n firm w a re s ize (1 0 0 ~ 2 0 0 K B ) th e F W s ize d e p e n d s o n th e M AC fe a tu re 1 1 e & i?

1 0 K g a te s P C I/P C M C IA
Source: Knowledge Edge KK

2004 Dr. James P. Davis

Behavior to Architecture Mapping

Most 802.11b MAC implementations are done as embedded systems executing on a CPU (e.g., ARM microprocessor).

Well be designing our MAC layer model in VLSI custom logic using concurrent state machines, and will generate a circuit using a Xilinx family FPGA device.

2004 Dr. James P. Davis

802.11 WLAN Core Functions (UML Use Case) Operations


Inventory of basic TransmitFrame functions supported in our MAC-layer Receiver architecture. MAC_Layer PHY_Layer Interaction at the system ReceiveFrame boundary with the PHY layer. Each Use Case will be iterated using Sequence diagrams, hardware Dec odeFrameHeader Dec odeFrameCheck block diagrams, and Executable ASM diagrams. DecodeAddresses DecryptFrameData Each Use Case will have a set of behaviors associated with it that DecodeDurationID DecodeSequenc eControl we will want to model as an RTL hardware description.
2004 Dr. James P. Davis

802.11 WLAN Core Actors & Data (UML Class)


Model structure of the problem domain using Class diagram.
Most applications have inherent structure that youll want to understand. Useful for defining problem scope and for capturing design requirements. The classes identified may become key application data, or modules that operate on data. Well use some of the identified classes as modules in Sequence diagrams, which will aid in making partitioning decisions.
ShiftController <<PartOf>> 1 WordCounter
(from Use Case View)

<<PartOf>>

1 FrameSequencer
(from Use Case View)

(from Use Case View)

<<PartOf>> Generates_Word

WordSelector 0..* <<PartOf>> 1.. * Frame_Word


(from Use Case View)

Maintains 1

1 Check s_Current_Value 1 1 0..* +next_st at e

Forwards_Word_To_Target 1 Selects_Target_For_Word 1 DecoderSelector


(from Use Case View)

Frame_Sequence_State 0..* +previous_state

St ate_Sequence

2004 Dr. James P. Davis

802.11 WLAN Partitioning (UML Sequence)


Modeling sequenced module interactions leads to a partitioned architecture.
Taking each Use Case and decomposing it into a sequence of interactions between system modules. Allows a designer to explore choices for partitioning the system. Partitioning metrics: (1) module cohesion, (2) inter-module coupling, (3) scope of module responsibility, (4) degree of information hiding. We use concepts of object-oriented systems analysis to converge on an optimal partitioning strategy. Generally, we trade-off circuit optimization against degree of design reuse, extensibility, maintainability.
2004 Dr. James P. Davis

802.11 WLAN Behavior (UML Statechart)


Take each module and derive its internal behavior description from the sequenced interactions.
Each module has events and actions defined at its interfaces. These collectively provide an inventory of the behavior the module must support. Looking at the Sequence diagrams, we extract the sequence of steps to be performed within each module. We specify the internal behavior of a module using either a UML Statechart diagram (for state-based lifecycles) or UML Activity diagram (for algorithmic description).

Done

Start ^Target_Decoded AND Select_Enable

Created Data_Buffered Target_Decoded Frame_Subtype = Data AND Buffer_Complete Frame_Subtype != D ata ^Select_Enable Frame_Subtype = Data AND ^Buffer_Complete

Target_Identified Fully_Decoded ^Decode_Complete Select_Enable Decode_Complete Latched

2004 Dr. James P. Davis

802.11 WLAN - Frame Word Lifecycle


: Frame_W ord : Shifter 1: Shift in 4-bits This scenario is designed to show how state information for the Frame_W ord might be correlated with the passing of signals between blocks, actions corresponding to state change for the Frame_W ord. 2: Shift in 4-bits 3: Shift in 4-bits : WordS el ector : Frame Sequencer : Decoder Selector : FrameHeader Decoder

4: Shift in 4-bits 5: Create Frame_W ord 6: State = "Created" 7: Signal new Frame_W ord

In this scenario, we assume this is the first word created for a new frame.

8: Check if 1st W ord in Frame 9: Signal Seque ncer to 1st W ord

10: Set Decoder_Selector = "FrameHeaderDecoder" 1 1: Signal "Target_ Dec oded" 11: 1 3: Signal Fram e_H eader_Decoder 14: Latch Frame_W ord

12: State = "Targ et_Identifi ed"

15: Signa l "S elect_ Ena ble" 16: Stat e = "Lat ched" 17: Decode Header 18: Pass Frame_Subtype 19: Signal "Decode_Complete" 20: State = "Fully_Decoded"

2004 Dr. James P. Davis

802.11 WLAN Sequence Diagram-1


: Shifter 1: Buffer 1st Frame Word : WordCounter : Decoder Selector : FrameHeader Decoder

See next slide!


2: Signal Start_of_Frame 3: Initialize for New Frame 4: SOF_Acknowledge 5: Signal Word_Ready 6: Initialize for Next Word in Frame 7: Latch New Frame Word 8: Select FrameHeaderDecoder

You will need to create a block for this one!

9: Signal New Word Available 8b: Buffer Header Word 10: Initialize for New Frame Header 11: Latc h Frame Header W ord 12: Determine Frame Subtype & DS Bits

13: Buffer Frame Subtype and DS Bits 14: Signal Header_Data_Ready 15: Load Subtype & DS Bits

2004 Dr. James P. Davis

802.11 WLAN Sequence Diagram-2


: P HY 1: P HY _W ord_A vail 2: Initializ e 3: G et 4-bit W ord 4: S hift_In (do it 4 t im es) : S hift er : W or dS ele c tor : F ram e S equenc er : Decoder Selec t or

5: Creat e F r am e W ord (r egis ter )

6: New_W ord

6b: New_W or d 7b: S elec t Nex t S t ate (regis ter) 7: Initializ e

8: Latc h New W ord 9: Read F ram e S tate

Note that the Word_Counter of the previous slide is now decomposed into the Word-Selector and the Frame_Sequencer actors, which will be the basis for depicting the details in the block diagrams that follow.

10: CA S E : s witc h on F ram e S tate

11: M ak e Dec oder S elec tion for W ord Des tination 12: S et E nable Line for downs tream bloc k

2004 Dr. James P. Davis

802.11 WLAN MAC Receiver Block Diagram

2004 Dr. James P. Davis

802.11 WLAN Receive Word Shifter


A model for MAC-layer data receiver/shifter thread.
We use Executable ASM diagrams to model state machine behavior and datapath operations for the hardware. Executable ASM models have a graphical symbol set that looks like a flowchart. Algorithm structure can be easily modeled using the ASM graphics. The diagrams are annotated with register transfer notation (RTN) expressing operations and events. Executable ASM models are directly executable in NimbusTM, are correct by construction and result in VHDL/Verilog code optimized for circuit synthesis.

2004 Dr. James P. Davis

802.11 WLAN MAC Receiver: DID Decoder

Go to this state on error.

Go back to the poll state.


2004 Dr. James P. Davis

802.11 WLAN MAC Receiver: FCS Decoder


Go to this state on reset from MAC frame error. Poll for start of Frame Check Sequence bit stream.

Poll waiting for second word (16 bits) of FCS frame field..

Check that FCS field bits match CRC-32 bits computed from frame stream on the fly.

2004 Dr. James P. Davis

802.11 WLAN Frame Word Counter


Poll for new 16-bit word in receiver stream.

Test is new word is first word of a new MAC frame. Our sequencing choice depends on first word or not. Well always assume Frame Control Header if we have a new frame. Enable decoding of target block select, based on current state of Frame Sequencer.

2004 Dr. James P. Davis

802.11 WLAN Transmit Frame Selector


ASM sub-flow has cascading tests as nested If-Then-Else structure. Priority sequence defined by the order of nesting: NOTE there is an error in the design that is easy to see. The Transmitter should send reply to either RTS or DATA frame events reported by Receiver before sending MSDU passed down from LLC layer of network protocol stack.

This logic block encapsulates output Decoding logic, which is good to put In ASM sub-flows.
2004 Dr. James P. Davis

Section 3 - Summary
Executable algorithmic state machines:
Allow both control and data operations to be specified in time (cycle by cycle scheduling) and space (allocation of specific resource types using abstract macro-functions). Notation we define is directly executable in the Nimbus tool set, hence, executable ASM models. Basic data operations and memory operations supported directly in the ASM notation using datapath macro-functions and memory arrays.

Using ASM diagrams:


A thinking aid for defining the structure and sequencing behavior of Finite State Machines. Used in 3 different ways: (1) definition/specification of sequential systems, (2) analysis of sequential circuits, (3) design of combinational and sequential circuits behaviorally.

Algorithms and Protocols


Can directly support the mapping of an algorithm into one or more candidate architectures. Can directly support exploration of protocol implementations distributed across many concurrent state machine threads
2004 Dr. James P. Davis

Potrebbero piacerti anche