Sei sulla pagina 1di 284

CAD for IC design

VLSI Design Problem:


Optimization of design in several aspects
Area Minimization*
Speed*
Power dissipation*
Design time*
Testability*
Design Methodologies
Approach followed to solve the VLSI design
problem

To consider all parameters in one go during VLSI design process we use:


Cost function: measure of VLSI design cost in terms of different
parameters
To design a VLSI circuit at one go while at the same time optimizing
the cost functions complexity is simply too high
To understand complexity
Two main concepts are helpful to deal with this complexity
1. Hierarchy and
2. Abstraction
Hierarchy: shows the structure of a design at different levels
Abstraction: hides the lower level details

Abstraction ex:

The Design Domains


The behavioral domain.: Part of the design (or the whole) is seen
as set of black boxes
Black box: relations between outputs and inputs are given without a
reference to the implementation of these relations.
The structural domain:

Circuit is seen as the composition of subcircuits.

Each of the subcircuits has a description in the behavioral


domain or a
description in the structural domain itself
The physical (or layout ) domain
The physical domain gives information on how the subparts that
can be seen in the structural domain, are located on the twodimensional plane.

Design Methods and Technologies


full-custom design:
maximal freedom
ability to determine the shape of every mask layer for the
production of die
chip.
Semicustom:

smaller search space

limiting the freedom of the designer

shorter design time

semicustom design implies the use of gate arrays, standard


cells, parameterizable modules
designing an integrated circuit is
a sequence of many actions most of which can be done by
computer tools

Gated arrays
chips that have all their transistors preplaced in regular patterns.
designer specify the wiring patterns
gate arrays as described above are mask programmable
There also exist so-called field-programmable gate arrays (FPGAs)
Interconnections can be configured by applying electrical signals
on some inputs.
Standard Cells
simple logic gates, flip-flops, etc.
Predesigned and have been made available to the designer in a
library
characterization of the cells: determination of their timing
behavior is done once by the library developer
Module Generators
generators exist for those designs that have a regular structure such
as adders, multipliers, and memories.
Due to the regularity of the structure, the module can be described
by one or two parameters.

VLSI design automation tools:


Can be categorized in:
1.
2.
3.
4.
5.
6.

Algorithmic and system design


Structural and logic design
Transistor-level design
Layout design
Verification
Design management

Algorithmic and System Design:


mainly concerned with the initial algorithm to be implemented in
hardware and works with a purely behavioral description.
Hardware description languages (HDLs) are used for the purpose.
A second application of formal description is the possibility of
automatic synthesis
synthesizer reads the description and generates an equivalent
description of the design at a much lower level.
High-level synthesis: The synthesis from the algorithmic
behavioral level to structural descriptions is called high-level
synthesis.
Silicon Compiler : A silicon compiler is a software system that takes
a user's specifications and automatically generates an integrated
circuit (IC). (initial synthesizer)
formal specification does not always need to be in a textual
Tools available having capability to convert the graphical
information into a textual equivalent (expressed in a language
like VHDL) that can be accepted as input by a synthesis tool.

Hardware-software co-design
Design for a complex system will consist of several chips, some of
which are programmable.
Part of the specification is realized in hardware and some of which in
software. (hardware-software co-design)

partitioning of the initial specification required (difficult to


automate)

tools exist that support the designer

by providing information on the frequency at which each part of


the specification is executed.

the parts with the highest frequencies are the most likely to be
realized in hardware.
The result of co-design:
is a pair of descriptions:
one of the hardware (e.g. in VHDL) that will contain programmable
parts, and
the other of the software (e.g. in C).

Code generation :
Mapping the high-level descriptions of the software to the low-level
instructions of the programmable hardware : CAD problem.
Hardware-software co-simulation:
Verification of the correctness of the result of co-design using
simulation.

Structural and Logic Design


Sometimes the tools might not be able to cope with the desired
behaviour: inefficient synthesis
Designer provides lower level description :Structural and Logic
Design
designer can use a schematic editor program: CAD tool
It allows the interactive specification of the blocks composing
a circuit and their interconnections by means of a graphical
interface.
schematics constructed in this way are hierarchical
Role of simulation: Once the circuit schematics have been
captured by an editor, it is a common practice to verify the circuit by
means of simulation
fault simulation: checks whether a set of test vectors or test
patterns (input signals used for testing) will be able to detect faults
caused by imperfections of the fabrication process
automatic test-pattern generation:
the computer search for the best set of test vectors by using a
tool : ATPG.

Logic synthesis:
Generation and optimization of a circuit at the level of logic gates.
three different types of problems:
1. Synthesis of two-level combinational logic:
Boolean function can be written as sum of products or a product
of sums, can be directly be implemented as programmable logic
arrays (PLAs)
It is, therefore, important to minimize two-level expressions.
2. Synthesis of multilevel combinational logic:
. Some parts of integrated circuits consist of so-called random logic
(circuitry that does not have the regular structure )
. Random logic is often built of standard cells, which means that the
implementation does not restrict the depth of the logic.
3. Synthesis of sequential logic :
. sequential logic has a state which is normally stored in memory
elements
. problem here is to find logic necessary to minimize the state
transitions.

Timing constraints:
designer should be informed about the maximum delay paths
shorter these delays, the faster the operation of the circuit
One possibility of finding out about these delays is by means of
simulation
Or by timing analysis tool: compute delays through the circuit
without performing any simulation

Transistor-level Design
Logic gates are composed of transistors
Depending on the accuracy required, transistors can be simulated at
different levels
At the switch level , transistors are modeled as ideal
bidirectional switches and the signals are essentially digital
At the timing level , analog signals are considered, but the
transistors have simple models (e.g. piecewise linear functions)
At the circuit level , more accurate models of the transistors are
used which often
involve nonlinear differential equations for the currents and
voltages
more accurate the model, the more computer time is necessary for
simulation

Process (full-custom Transistor-level design):


1. it is the custom to extract the circuit from the layout data of
transistor.
2. Construct the network of transistors, resistors and
capacitances.
3. The extracted circuit can then be simulated at the circuit or
switch level.

Layout Design
Design actions related to layout are very diverse therefore,
different layout tools.
If one has the layout of the subblocks of a design available, together
with the list of interconnections then
1. First, a position in the plane is assigned to each subblock,
trying to minimize the area to be occupied by
interconnections (placement problem).
2. The next step is to generate the wiring patterns that realize
the correct interconnections (routing problem).
goal of placement and routing is to generate the minimal chip
area(1).
Timing constraint (2): As the length of a wire affects the
propagation time of a signal along the wire, it is important to keep
specific wires short in order to guarantee an overall execution speed
of the circuit. (timing-driven layout)

Partitioning problem: grouping of the sub-blocks


Subblocks that are tightly connected are put in the same group
while the number of connections from one group to the other is
kept low
Partitioning helps to solve the placement problem
Floorplanning:
The simultaneous development of structure and layout is
called floorplanning.
when making a transition of a behavioral description to a
structure, one can also fixes the relative positions of the subblocks
It gives early feedback on e.g. long wires in the layout
Detailed layout information is available in placement whereas
floorplanning has mainly to deal with estimations. (difference)

microcells.: complexity of around 10 transistors


A cell compiler generates the layout for a network of transistors.
Module: A block, layout of which can be composed by an
arrangement of cells.
Module generation: Given some parameters (e.g. the number of
bits for an adder or the word length and number of words in a
memory),
the module generator puts the right cells at the right place and
composes a module in this way.
Layout editor (In full-custom design): provides the possibility to
modify the layout at the level of mask patterns.
In a correct design, the mask patterns should obey some rules
called design rules.
Tools that analyze a layout to detect violations of these rules are
called design-rule checkers.

Circuit extractor: takes the mask patterns as its input and


constructs a circuit of transistors, resistors and capacitances that can
be simulated
disadvantage of full-custom design is that the layout has to be
redesigned when the technology changes. symbolic layout has been
proposed as a solution.
Symbolic representation represents positions of the patterns
relative to each other.
Compactor: takes the symbolic description, assigns widths to all
patterns and spaces the patterns such that all design rules are
satisfied.

Verification Methods: There are three ways of checking the


correctness of an integrated circuit without actually fabricating it
1. Prototyping
2. Simulation
3. Formal verification

1. Prototyping: building the system to be designed from discrete


components rather than one or a few integrated circuits ex.
Breadboarding, prototyping using programmable devices such as
FPGA (rapid system prototyping).
. A prerequisite for rapid system prototyping is the
availability of a compiler that can "rapidly" map some
algorithm on the programmable prototype.

2. Simulation: modelling a computer model of all relevant aspects of


the circuit, executing the model for a set of input signals, and
observing the output signals.
it is impossible to have an exhaustive test of a circuit of reasonable
size, as the set of all possible input signals and internal states grows
too large.
3. Formal verification: use of mathematical methods to prove that a
circuit is correct.

Design Management Tools:


CAD tools consume and produce design data
quantity of data for a VLSI chip can be enormous and
appropriate data management techniques have to be used to store
and retrieve them efficiently.
another aspect of design management is to maintain a consistent
description of the design while multiple designers work on different
parts of the design.
famous standard format for storing data is EDIF (Electronic Design
Interchange Format)
Framework is an universal interface used by tools to extract EDIF
data from database.

3. Algorithmic Graph Theory and Computational Complexity:


Algorithmic graph theory: design of algorithms that operate on
graphs
Graph: A graph is a mathematical structure that describes a set of
objects and the connections between them.
Graphs are used in the field of design automation for integrated
circuits
1. when dealing with entities that naturally look like a network (e.g. a
circuit of transistors)
2. in more abstract cases (e.g. precedence relations in the
computations of some algorithm.
Computational complexity: time and memory required by a certain
algorithm as function of the size of the algorithm's input.


Terminology:
A graph G(V, E) is characterized by two sets:
1. a vertex set V (node) and
2. an edge set E (branch)
The two vertices that are joined by an edge are called the edge's
endpoints, notation (u, v) is used.
The vertices u and v such that (u, v) E, are called adjacent vertices.

Subgraph: When one removes vertices and edges from a given


graph G, one gets a subgraph of G.
Rule: removing a vertex implies the removal of all edges connected
to it.
Complete graph : acomplete graphis a simple
undirectedgraphin which every pair of distinct vertices is
connected by a unique edge.
Digraph: adirected graph(ordigraph) is agraph, or set of
nodes connected by edges, where the edges have a direction
associated with them.
Completedigraph: Acompletedigraph is a directedgraphin
which every pair of distinct vertices is connected by a pair of unique
edges (one in each direction).

Clique: a clique in an undirected graph is a subset of its vertices such


that every two vertices in the subset are connected by an edge.
A subgraph that is complete
three cliques identified by the vertex sets {V1 , V2, V3}, {V3, V4} and
{V5, V6}

Maximal clique: A maximal clique is a clique that cannot be extended


by including one more adjacent vertex,
it is not a subset of a larger clique.
degree of a vertex: The degree of a vertex is equal to the number of
edges incident with it

Selfloop: An edge (u, u), i.e. one starting and finishing at the same
vertex, is called a selfloop.

Parallel edges: Two edges of the form e1= (v1 , v2) and e2 = (v1 ,
v2), i.e. having the same endpoints, are called parallel edges.
Simple graph: A graph without selfloops or parallel edges is called a
simple graph

A graph without selfloops but with parallel edges is called a


multigraph.

bigraph : a bipartite graph (or bigraph) is a graph whose vertices can


be divided into two disjoint sets U and V (that is, U and V are each
independent sets) such that every edge connects a vertex in U to one in
V.

Planar-graph: A graph that can be drawn on a two-dimensional


plane without any of its edges intersecting is called planar.
Path: A sequence of alternating vertices and edges, starting and
finishing with a vertex, such that an edge e = (u, v) is preceded by u
and followed by v in the sequence (or vice versa), is called a path. Ex
The length of a path equals the number of edges it contains
A path, of which the first and last vertices are the same and the
length is larger than zero, is called a cycle (sometimes also: loop or
circuit).
A path or a cycle not containing two or more occurrences of the
same vertex is a simple path or cycle.

Connected graph: If all pairs of vertices in a graph are connected, the


graph is called a connected graph
Connected component: a connected component is a subgraph
induced by a maximal subset of the vertex set, such that all pairs in the
set are connected
In-degree: The in-degree of a vertex is equal to the number of edges
incident to it
out-degree: out-degree of an edge is equal to the number of edges
incident from it.
strongly connected vertices: Two vertices u and v in a directed
graph are called strongly connected if there is both a directed path
from u to v and a directed path from u to u.
strongly connected components: In the mathematical theory of
directed graphs, a graph is said to be strongly connected if every
vertex is reachable from every other vertex.

A weighted graph is a graph in which each branch is given a


numerical weight.
a special type of labeled graph in which the labels are numbers
(which are usually taken to be positive).

To implement graph algorithms suitable data structures are required


Different algorithms require different data-structures.
adjacency matrix: If the graph G(V, E) has n vertices, an nxn matrix
A is used.
The adjacency matrices of undirected graphs are symmetric.

Finding all vertices connected to a given vertex requires inspection


of a complete row and a complete column of the matrix (not very
efficient)
adjacency list representation: It consists of an array that has as
many elements as the number of vertices in the graph.
Array element identified by an index i corresponds with the vertex
u.
Each array element points to a linked list that contains the
indices of all vertices to which the vertex corresponding to the
element is connected.

The complexity/ behaviour of an algorithm is characterized by


mathematical functions of the algorithm's "input size.
The input size or problem size of an algorithm is related to the number
of symbols
necessary to describe the input.
Ex: a sorting algorithm has to sort a list of n words, each consisting of at
most 10 letters
input size =10n
using the ASCII code that uses 8 bits for each letter
input size =80n
Two types of computational complexity are distinguished:
1. time complexity, which is a measure for the time necessary to
accomplish a computation,
2. space complexity which is a measure for the amount of memory
required for a computation.
Space complexity is given less importance than time complexity

Order of a function: Big O notation characterizes functions


according to their growth rates
growth rate of a function is also referred to as order of the
function.
A description of a function in terms of big O notation usually only
provides an upper bound on the growth rate of the function.

Worst-case time complexity:


The duration of a computation is expressed in elementary
computational steps
It is not only the size of the input that determines the number of
computational steps:
conditional constructs in the algorithm are the reason that the
time required by the algorithm is different for different inputs of
the same size.
one works with the worst-case time complexity, assuming that

algorithm's time complexities:

Examples of Graph Algorithms:


Depth-first Search:
To traverse the graph
to visit all vertices only once
A new member mark in the vertex
structure
initialized with the value 1 and given
the value 0 when the vertex is visited.

each vertex is visited exactly once, all edges are also visited exactly
once.
Assuming that the generic vertex and edge actions have a constant
time complexity, this leads to a time complexity of
depth-first search could be used to find all vertices connected to a
specific vertex u

Breadth-first Search:
directed graphs represented by an adjacency list
The central element is the FIFO queue.
the call shift_in (q, o) adds an object o to the queue q, that shif t_out
( q ) removes the oldest object from the queue q.
adding and removing objects from a FIFO queue can be done in
constant time.

Status of queue

depth-first search could be used to find all vertices connected to a


specific vertex u
vertices are visited in the order of their shortest path from vs
The shortest-path problem becomes more complex, if the length of
the path between two vertices is not simply the number of edges in
the path. (weighted graphs)

Dijkstra's Shortest-path
Algorithm
a weighted directed
graph G(V, E) is given
edge weights w(e), w(e)
>0
Visited vertices of the set
V are transferred one by
one to a set T
Ordering of vertices is
done using vertex
attribute distance.
the distance attribute of a
vertex v is equal to the
edge weight w((vs, v))

vt = V3 is reached after 5 iterations


continuing for one iteration more computes the lengths of the
shortest paths for all vertices in the graph
Time complexity of while loop = O(n) time, where n = |V|
overall time complexity = O(n^2)
as all edges are visited exactly once, viz. after the vertex from which
they are incident is added to set T. This gives a contribution of 0(E)
to the overall time complexity.
worst-case time complexity = O(n^2+|E|)

Prim's Algorithm for Minimum Spanning Trees


In the mathematical field of graph theory, a spanning tree T of an
undirected graph G is a subgraph that includes all of the vertices of G
that is a tree.

One gets a spanning tree by removing edges from E until all cycles in
the graph have
disappeared while all vertices remain connected.
a graph has several spanning trees, all of which have the same number
of edges (number of vertices minus one)
In the case of edge-weighted undirected graphs, spanning tree is to be
found with the least total edge weight, also called the tree length.
(minimum spanning tree problem)

starts with an arbitrary vertex which is considered the initial tree

Tractable Intractable problems:


A problem that can be solved in polynomial time is called tractable.
A problem that can not be solved in polynomial time is, called
intractable.

Combinatorial Optimization Problems


Problem : problem refers to a general class, e.g. the "shortest-path
problem
Instance: The term instance refers to a specific case of a problem,
e.g. "the shortest-path problem between vertex vs and vertex vt.
Instances of optimization problems can be characterized by a finite set
of variables.
If the variables range over real numbers, the problem is called a
continuous optimization problem.
If the variables are discrete, i.e. they only can assume a finite number
of distinct values, the problem is called a combinatorial optimization
problem.
An example of a simple combinatorial optimization problem is the
satisfiability
problem: to assign Boolean values to the variables in such a way that
the whole expression becomes true.
Xi can only assume two values, making the problem a combinatorial
optimization problem.

Another example:
Dijkstra's algorithm:
with given source and target vertices vt vs, defines an instance of the
problem
One could associate Boolean variables bi, bi = 1means that the edge
is "selected"
and bi = 0 means that it is not.
solving the shortest-path problem for this graph can be seen as
assigning Boolean values to the variables bi: making the problem
combinatorial.
a combinatorial optimization problem is defined as the set of all
the instances of the problem,
each instance I being defined as a pair (F, c).
F is called the set of feasible solutions (or the search space),
c is a function assigning a cost to each element of F.
Solving a particular instance of a problem consists of finding a
feasible solution
f with minimal cost

The traveling salesman problem (TSP):


Given a list of cities and the distances between each pair of cities,
what is the shortest possible route that visits each city exactly once
and returns to the origin city?
TSP can be modelled as an undirected weighted graph, such that
cities are the graph's vertices.
paths are the graph's edges, and a path's distance is the edge's
length.
It is a minimization problem starting and finishing at a specified
vertex after having visited each other vertex exactly once.
Often, the model is a complete graph

any permutation of the cities defines a feasible solution and the


cost of the feasible solution is the length of the cycle represented
by the solution

nonoptimal solution

optimal solution

cities c1, c2, . .. C9


If the coordinates of a city ci are given by (xi , yi), the distance
between two cities ci and cj is simply given by

Decision Problems (part of optimization problem):


the optimization version of the shortest-path problem in graphs
requires that the edges forming the shortest path are identified,
whereas
the evaluation version merely asks for the length of the shortest
path.
decision problems:
decision version : These are problems that only have two possible
answers: "yes" or "no.
If optimization version can be solved in polynomial time, then the
decision version can also be solved in polynomial time.
In other words: if there is an algorithm that is able to decide in
polynomial time whether there is a solution with cost less than or equal
to k, it is not always obvious how to get the solution itself in polynomial
time.
Therefore, the computational complexity of the decision version of a
problem gives a lower bound for the computational complexity of the
optimization version.


review
The decision version of a combinatorial problem can be defined
as the set of its instances (F, c, k).
Note that each instance is now characterized by an extra parameter k;
k is the parameter in the question "Is there a solution with cost less
than or equal to k?'.
An interesting subset of instances is formed by those instances for
which the answer to the question is "yes".
This set is called

task associated with a decision problem is solution checking.


It is the problem of verifying whether c(f) <k.

Complexity Classes:
it is useful to group problems with the same degree in one complexity
class.
The class of decision problems for which an algorithm is known that
operates in polynomial time is called P (which is an abbreviation of
"polynomial").
Deterministic and nondeterministic computer.
For a common (deterministic) computer it always is clear how a
computation continues at a certain point in the computation. This is also
reflected in the programming languages used for them.
A nondeterministic computer allows for the specification of multiple
computations at a certain point in a program: the computer will make a
nondeterministic choices on, which of them to be performed.
This is not just a random choice, but a choice that will lead to the desired
answer.
The machine splits itself into as many copies as there are choices,
evaluates all choices in parallel, and then merges back to one machine.

Complexity class NP: The complexity class NP (an abbreviation of


"nondeterministic polynomial") consists of those problems that can be
solved in polynomial time on a nondeterministic computer.
Any decision problem for which solution checking can be done in
polynomial time is in NP.
class P is a subset of the class NP
Halting problem (undecidable class): problem is to find an algorithm
that accepts a computer program as its input and has to decide whether
or not this program will stop after a finite computation time.

NP-completeness :
all decision problems contain in it are polynomially reducible to each
other.
An instance of any NP-complete problem can be expressed as an
instance of any other
NP-complete problem using transformations that have a polynomial time
complexity.
Ex:
HAMILTONIANCYCLE problem: whether a given undirected graph
G(V,E) contains a so-called Hamiltonian cycle, i.e. a simple cycle that
goes through all vertices of V.
TRAVELING SALESMAN, the decision version of TSP amounts to
answering the question of whether there is a tour (simple cycle) through
all vertices, the length of which is less than or equal to k.

Nondeterministic computer: Turing machine (mathematical model)


a computer with a sequentially accessible memory (a "tape") and a very
simple instruction set.
The set only includes instructions for writing a symbol (from a finite set)
to the memory location pointed at by the memory pointer and move the
pointer one position up or down.
A finite number of "internal states" should also be provided for a specific
Turing machine.
A "program" then consists of a set of conditional statements that map a
combination of a symbol at the current memory location and an internal
state to a new symbol to be written, a new internal state and a pointer
movement.
The input to the algorithm to be executed on a Turing machine is the
initial state of the memory.
The machine stops when it enters one of the special internal states
labeled by "yes" and "no" (corresponding to the answers to a decision
problem).

General-purpose Methods for Combinatorial Optimization:


algorithm designer has three possibilities when confronted with an
intractable problem.
1. try to solve the problem exactly if the problem size is sufficiently
small using an algorithm that has an exponential (or even a higher
order) time complexity in the worst case.
1. The simplest way to look for an exact solution is exhaustive
search: it simply visits all points in the search space in some
order and retains the best solution visited.
2. Other methods only visit part of the search space, albeit the
number of points visited may grow exponentially (or worse)
with the problem size.
2. Approximation algorithms
3. Heuristics algorithms

The Unit-size Placement Problem:


Problem: how the cells should be interconnected.
A net can be seen as a set of cells that share the same electrical signal
The interconnections to be made are specified by nets
Placement is to assign a location to each cell such that the total chip
area occupied is minimized.
As the number of cells is not modified by placement, minimizing the
area amounts to avoiding empty space and keeping the wires that
will realize the interconnections as short as possible.
cells in the circuit are supposed to have a layout with dimensions
1x1(measured in some abstract length unit)
it can be assumed that the only positions on which a cell can be put on
the chip are the grid points of a grid created by horizontal and vertical
lines with unit-length separation.

A nice property of unit-size placement is that the assignment of distinct


coordinate pairs to each cell guarantees that the layouts of the cells will
not overlap.

The possible way to evaluate the quality of a solution for unit-size


placement is to route all nets and measure die extra area necessary
for wiring.

A bad placement will have longer connections which normally will lead to
more routing tracks between the cells and therefore to a larger chip area.

Solving the routing problem is an expensive way to evaluate the


quality of a placement.
This is especially the case if many tentative placements have to be
evaluated in an algorithm that tries to find a good one.
An alternative used in most placement algorithms is to only estimate
the wiring area.

Backtracking and Branch-and-bound:


an instance I of a combinatorial optimization problem was defined by a
pair (F , c), with
F the "set of feasible solutions" (also called the "search space or
"solution
space") and
c a cost function assigning a real number to each element in F.
Suppose that each feasible solution can be characterized by an ndimensional vector f = [f1,f2 . . . fn) and
each fi(1 < i < n) can assume a finite number of values, called the
explicit
constraints .
the values assigned to the different components of f may sometimes not
be chosen independently. In such a case one speaks of implicit
constraints.
Consider a combinatorial optimization problem related to some graph
G(V, E) in which a path with some properties is looked for.
One can then associate a variable fi with each edge, whose value is
either 1 to indicate that the corresponding edge is part of the path or 0
to indicate the opposite.
The explicit constraints then state that fi{0, 1} for all i.

Backtracking:
The principle of using backtracking for an exhaustive search of the
solution space is to
start with an initial partial solution in which as many variables as
possible are left unspecified, and
then to systematically assign values to the unspecified variables
until either a single point in the search space is identified or an implicit
constraint makes it impossible to process more unspecified variables.
The cost of the feasible solution found can be computed if all variables
are found.
The algorithm continues by going back to a partial solution generated
earlier and then assigning a next value to an unspecified variable
(hence the name "backtracking")

It is assumed that all variables fi have type solution-element.


The partial solutions are generated in such a way that the variables fi
are specified for
1 < i < k and are unspecified for i > k.
Partial solutions having this structure will be denoted by f~(k).
f~(n)corresponds to a fully-specified solution (a member of the set of
feasible solutions).
The global array val corresponds to the vector f(k). The value of fk is
stored in val[k 1]. So, the values of array elements with index
greater than or equal to k are meaningless and should not be
inspected.
The procedure cost(val) is supposed to compute the cost of a feasible
solution using the cost function c. It is only called when k = n

procedure allowed(val, k) returns a


set of values allowed by the explicit
and implicit constraints for the
variable fk+I given f~(K)

Branch-and-bound:
Information about a certain partial solution f(k) 1 < k <n, at a certain
level can indicate that any fully-specified solution f(n) D (f(k)) derived
from it can never be the optimal solution.
Function that estimates this cost lower bound will be denoted by .
If inspection of can guarantee that all of the solutions belonging to
f(k) have a higher cost than some solution already found earlier
during the backtracking, none of the children of need any further
investigation.
One says that the node in the tree corresponding to can be killed.
killing partial solutions is called branch-and-bound


Procedure
Lower_bound_costis called to get a lower bound of the partial
solution based on the function

Dynamic Programming:
Dynamic programming is a technique that systematically constructs the
optimal solution of some problem instance by defining the optimal
solution in terms of optimal solutions of smaller size instances.
Dynamic programming can be applied to such a problem if there is a rule
to construct the optimal solution for p = k (complete solution) from the
optimal solutions of instances for which p < k (set of partial solutions).
The fact that an optimal solution for a specific complexity can be
constructed from the optimal lower complexity problems only, is essential
for dynamic programming.
This idea is called the principle of optimality.

The goal in the shortest-path problem is to find the shortest path from a
source vertex
vs to a destination vertex vt in a directed graph G(V, E) where the
distance between two vertices u, v is given by the edge weight w((u, v)).
If p = k, the optimization goal becomes: find the shortest path from vs to
all other vertices in the graph considering paths that only pass through
the first k closest vertices to vs.
The optimal solution for the instance with p = 1 is found in a trivial way
by assigning the edge weight w((vs, u)) to the distance attribute of all
vertices u.
Suppose that the optimal solution for p = k is known and that the k
closest vertices to vs have been identified and transferred from V to T.
Then, solving the problem for p = k+1 is simple: transfer the vertex u
in V having the lowest value for its distance attribute from V to T and
update the value of the distance attributes for those vertices remaining
in V.
additional parameters may be necessary to distinguish multiple instances
of the problem for the same value of p.

Integer Linear Programming:


Integer linear programming (ILP) is a specific way of casting a
combinatorial optimization problem in a mathematical format.
This does not help from the point of view of computational complexity
as ILP is NP complete itself .
ILP formulations for problems from the field of VLSI design automation
are often encountered due to the existence of "ILP solvers.
ILP solvers are software packages that accept any instance of ILP as
their input and generate an exact solution for the instance.

why ILP is useful in CAD for VLSI


The input sizes of the problems involved may be small enough for an
ILP solver to find a solution in reasonable time.
One then has an easy way of obtaining exact solutions, compared to
techniques such as branch-and-bound.

all variables are apparently restricted to be positive


Xi , that may assume negative values, can be replaced by the
difference Xi Xk of two new variables Xi and Xk that are both
restricted to be positive.
standard form

b1 is slack variable
It is possible to solve LP problems by a polynomial-time algorithm
called the ellipsoid algorithm

Integer Linear Programming: ILP is a variant of LP in which the variables


are restricted to be integers
The techniques used for finding solutions of LP are in general not suitable
for ILP
Other techniques that more explicitly deal with the integer values of the
variables should be used.
If integer variables are restricted further to assume either of the values
zero or one. This variant of ILP is called zero-one integer linear
programming.
zero-one ILP formulation for the TSP:
Consider a graph G(V, E) where the edge set E contains k edges: E = {e1,
e2, . . . , ek}.
The ILP formulation requires a variable xi, for each edge ei
The variable xi can either have the value 1, which means that the
corresponding edge
ei has been selected as part of the solution,
or the value 0 meaning that ei, is not part of the solution.
Cost function=
In the optimal solution, only those xi that correspond to edges in the

Solution set would also include


solutions that consists of multiple
disjoint tours
Additional constraint: A tour that
visits all vertices in the graph should
pass through at least two of the
edges that connect a vertex in V1
with a vertex V2 (where V2 = V \ V1)

Both V1 and V2 should contain at least three vertices.

size of the problem instance:


the number of variables is equal to the number of edges
The number of constraints of the type presented is equal to the number
of vertices
The number of constraints of the type given in however, can grow
exponentially as the number of subsets of V equals 2^y.


Local
Search:
Local search is a general-purpose optimization method that works with
fully specified solutions f of a problem instance (F, c)
It makes use of the notion of a neighbourhood N(f) of a feasible
solution f.
Works with subset of F that is "close" to f in some sense.
a neighbourhood is a function that assigns a number of feasible
solutions to each feasible solution: N : F 2^F.
2^F denotes the power set of F.
Any g N(f) is called a neighbour of f.

The principle of local search is to subsequently visit a number of


feasible solutions in the search space.
transition from one solution to the next in the neighbourhood is
called a move or a local transformation

Multiple minima problem


If the function has a single minimum, it will be found.
Functions with many minima, most of which are local
local search has the property that it can get stuck in a local
minimum.
the larger the neighbourhoods considered, the larger is the part of
the search space explored and the higher is the chance of finding a
solution with good quality.
One more possibility is to repeat the search a number of times with
different initial solutions
one should be able to move to a solution with a higher cost, by
means of so-called uphill moves.

Simulated Annealing:
a material is first heated up to a temperature that allows all its
molecules to move freely around (the material becomes liquid), and
is then cooled down very slowly.
At the end of the process, the total energy of the material is minimal.
The energy corresponds to the cost function.
The movement of the molecules corresponds to a sequence of moves
in the set of feasible solutions.
The temperature corresponds to a control parameter T which
controls the acceptance probability for a move from

The function random (k)


generates a real-valued random
number between 0 and k with a
uniform distribution
The combination of the
functions thermal equilibrium,
new temperature and stop
realizes a strategy for
simulated annealing, which is
called the
cooling schedule.
Simulated annealing allows
many uphill moves at the
beginning of the search
and gradually decreases their
frequency.

Tabu Search

Given
a neighbourhood subset G N(f) of a feasible solution f, the
principle of tabu search is to move to the cheapest element g G even
when c(g) > c(f).
The tabu search method, does not directly restrict uphill moves
throughout the search process
In order to avoid a circular search pattern, a so-called tabu list
containing the k last visited feasible solutions is maintained
This only helps, of course, to avoid cycles of length < k


Genetic
Algorithms:
instead of repetitively transforming a single current solution into a next
one by the
application of a move,
the algorithm simultaneously keeps track of a set P of feasible
solutions, called the population.
In an iterative search process, the current population is replaced by the
next oneusing a procedure
In order to generate a feasible solution two feasible solutions called the
parents of the child are first selected from
is generated in such a way that it inherits parts of its "properties" from
one
parent and the other part from the second parent by the application of
an operation
called crossover.
First of all, this operation assumes that all feasible solutions can be
encoded by a fixed length vector f = [f1,f2 .. . fn]T = f as was the case
for the backtracking algorithm
Bit strings are to represent feasible solutions.
Number of vector elements n is fixed, but that the number of bits to

Consider an instance of the unit-size placement problem with 100 cells


and a 10x 10 grid.
As 4 bits are necessary to represent one coordinate value (each value is
an integer between 1and 10) and
200 coordinates (100 coordinate pairs) specify a feasible solution, the
chromosomes of this problem instance have a length of 800 bits.
A feasible solution = the phenotype
Encoding of chromosome = the genotype
Given two chromosomes, a crossover operator will use some of the bits
of the first
parents and some of the second parent to create a new bit string
representing the
Child.
A simple crossover operator works as follows:
Generate a random number r between 1 and the length I of the bit strings
for the
problem instance.
Copy the bits 1through r 1 from the first parent and the bits r through

Suppose that the bit strings of the example represent the coordinates of
the placement
problem on a 10 x 10 grid, now with only a single cell to place (an
artificial problem).
The bit string for a feasible solution is then obtained by concatenating
the two 4-bit values of the coordinates of the cell.
So, f(k) is a placement on position (5, 9) and g(k) one on position (8, 6).
The children generated by crossover represent placements at
respectively (5, 14) and (8, 1).
Clearly, a placement at (5, 14) is illegal: it does not represent a feasible
solution as coordinate values cannot exceed 10.

The combination of the chromosome representation and the crossover


operator for generating new feasible solutions, may leads to more
complications.
Consider e.g. the traveling salesman problem for which each of the
feasible solutions can be represented by a permutation of the cities.
Two example chromosomes for a six city problem instance with cities c1
through c6 could then look like
"C1C3C6C5C2C4 and "C4C2C1C5C3C6".
In such a situation, the application of the crossover operator as
described for binary strings is very likely to produce solutions that are
not feasible.
illegal solution "C1C3C1C5C3C6 (or "C4C2C6C5C2C4")

Order crossover: for chromosomes that represent permutations


This operator copies the elements of the first parent chromosome until
the point of the cut into the child chromosome.
The remaining part of the child is composed of the elements missing in
the permutation in the order in which they appear in the second parent
chromosome.
Consider again the chromosomes "C1C3C6C5C2C4 and
"C4C2C1C5C3C6" cut after the second city.
Then the application of order crossover would lead to the child
"C1C3C4C2C5C6"

The function select is


responsible for the selection of
feasible solutions from the
current population favouring
those that have a better cost
The function stop decides
when to terminate the search,
e.g. when there has been no
improvement on the best
solution in the population
during the last m iterations,
where m is a parameter of the

stronger preference given to parents with a lower cost when selecting


pairs of parents to be submitted to the crossover operator
Mutation: Mutation helps to avoid getting stuck in a local minimum
One can work with more sophisticated crossover operators, e.g.
operators that
make multiple cuts in a chromosome.
One can copy some members of the population entirely to the new
generation
instead of generating new children from them.
Instead of distinguishing between the populations pk and p(k+1) one
could
directly add a new child to the population and simultaneously remove
some
"weak" member of the population.

Longest-path Algorithm for DAGs


A variable pi is associated with each vertex vi to keep count of the
edges incident to vi that have already been processed.
Because the graph is acyclic, once all incoming edges have been
processed, the longest path to vi is known.
Once processed Vj is included in a set Q.
It will be taken out later on to traverse the edges incident from it in
order to propagate the longest-path values to the vertices at their
endpoints.
Any data structure that is able to implement the semantics of a
"set" can be used.
All edges in the graph are visited exactly once during the execution
of the inner for loop.

Layout Compaction:
At the lowest level, the level of the mask patterns for the fabrication of
the circuit, a final optimization can be applied to remove redundant
space.
This optimization is called layout compaction
Layout compaction can be applied in four situations,
1. Converting symbolic layout to geometric layout.
2. Removing redundant area from geometric layout.
3. Adapting geometric layout to a new technology.
4. Correcting small design rule errors
A new technology means that the design rules have changed;
as long as the new and old technologies are compatible (e.g. both are
CMOS technologies), this adaptation can be done automatically, (e.g. by
means of so-called mask-to-symbolic extraction.)

The Layout design problem:


A layout is considered to consist of rectangles.
the rectangles can be classified into two groups:
1. rigid rectangles and
2. stretchable rectangles.
1. Rigid rectangles correspond to transistors and contact cuts
whose length and width are fixed.
When they are moved during a compaction process, their lengths
and widths do not change.
2. Stretchable rectangles correspond to wires.
In principle the width of a wire cannot be modified. The length of a
wire, can be changed by compaction.

Compaction tools:
Layout is essentially two-dimensional and layout elements can in
principle be moved both horizontally and vertically for the purpose
of compaction.
When one dimensional compaction tools are used, the layout
elements are only moved along one direction (either vertically or
horizontally).
This means that the tool has to be applied at least twice: once for
horizontal and once for vertical compaction.
Two dimensional compaction tools move layout elements in
both directions simultaneously.
Theoretically, only two-dimensional compaction can achieve an
optimal result. This type of compaction is NP-complete. On the
other hand, one
dimensional compaction can be solved
optimally in polynomial time

In one-dimensional, say horizontal, compaction a rigid rectangle can be


represented by one x-coordinate (of its centre, for example) and a
stretchable one by two (one for each of the endpoints)
A minimum-distance design rule between two rectangle edges can now
be expressed as an inequality
the minimum width for the layer concerned is a and the minimum
separation is b

A graph following all these inequalities is called constraint graph.

There is a source vertex no. located


at x = 0

Directed acyclic graph


A constraint graph derived from only minimum-distance constraints has
no cycles
The length of the longest path from the source vertex v0 to a specific
vertex vi in a the constraint graph G(V, E) gives the minimal xcoordinate xi, associated to that vertex.
The Longest Path in Graphs with Cycles
Two cases can be distinguished:
1. The graph only contains negative cycles, i.e. the sum of the edge
weights along any cycle is negative.
2. The graph contains positive cycles: The problem for graphs with
positive cycles is NP-hard
A constraint graph with positive cycles corresponds to a layout with
conflicting
constraints
Such a layout is called over-constrained layout and is impossible
to realize

partitions the edge set E of the constraint graph G(V, E) into two sets Ef
and Eb
The edges in Ef have been obtained from the minimum-distance
inequalities and are called Forward edges.
The edges in Eb correspond to maximum-distance inequalities and are
called backward edges

at the kth iteration of the do loop, the


values of the xi represent the longestpaths going
through all forward edges and
possibly k backward edges.

As the DAG longest-path algorithm has a time complexity of 0(|Ef |) and


is called
at most Eb times,
So the Liao-Wong algorithm has a time complexity of 0(|Eb| x |Ef |).
This makes the algorithm interesting in cases when the number of
backward edges
is relatively small.

The Bellman-Ford Algorithm


The algorithm does not discriminate between forward and backward
edges.
S1 contains the current wave
front and
S2 is the one for the next iteration
n is the number of vertices
after k iterations, the algorithm
has computed the longest-path
values for paths going through k
1 intermediate vertices

The time complexity of the Bellman-Ford algorithm is O(n x |E|) as each


iteration visits all edges at most once and there are at most n iterations

Clique cover problem


The clique cover problem (also sometimes called partition into cliques)
is the problem of determining whether the vertices of a graph can be
partitioned into k cliques.
High-level synthesis is often divided into a number of subtasks.
Considering them as independent tasks makes it easier to define
optimization problems and to design algorithms to solve them.
Scheduling is the task of determining the instants at which the
execution of the
operations in the DFG will start.
Assignment(also called binding) maps each operation in the DFG
to a specific functional unit on which the operation will be executed.
Assignment is also concerned with mapping storage values to
specific
memory elements and of data transfers to interconnected
structures.
Allocation (or "resource allocation or module selection) simply
reserves the hardware resources that will be necessary to realize the
algorithm.


Assignment(also called binding) problem is called task-to-agent
assignment, where
a task can be an operation or a value and an agent can be an FU or a
register.
Tasks are called compatible if they can be executed on the same agent.
In case of values, compatibility means when their life times do not
overlap.
The set of tasks can be used as the vertex set of a so-called
compatibility graph GC(VC, Ec).
The graph has edges (vi,Vj) Ecif and only if the tasks vi and vj are
compatible.
one can say that two tasks are in conflict if they cannot be executed on
the same agent.
The set of tasks is then used as the vertex set of a conflict graph that
has edges for those vertex pairs that are in conflict.
The conflict graph is the complement graph of the compatibility graph.


The
goal of the assignment problem is to minimize the number of agents
for the given set of tasks.
The vertices of any complete subgraph of a compatibility graph
correspond to a set of tasks that can be assigned to the same agent.
The goal of the assignment problem is then to partition the compatibility
graph in such a way that each subset in the partition forms a complete
graph and the number of subsets in the partition is minimal.
The subsets are pairwise disjoint and the union of the subsets forms the
original set by definition of a partition.
In the literature such a partitioning is called a clique partitioning
combining vertices in the compatibility graph results a supervertex.
The index I of a supervertex represents the set of indices of the vertices
from which the supervertex was formed.
For example, combining vertices 1,3 and 7gives a supervertexV1,3,7.
A supervertexvnis a common neighbor of the superverticesVi,vjVk, if both
edges (vi vn) and (vj, vn) are included in set Ek.

Placement and Partitioning


The input to the placement problem is the structural description of a
circuit. it is the output of high-level synthesis.
Such a description consists of a list of design subentities (hardware
subparts with their layout designs) and their interconnection patterns
that together specify the complete circuit.
The goal of placement is to determine the location of these layouts on
the chip such that the total resulting chip area is minimal, and sufficient
space is left for wiring.
The wiring should realize exactly the interconnections specified in the
structural description (routing problem).
Before dealing with the placement problem and possible solutions, some
attention is paid to the representation of an electric circuit
(partitioning).

Partitioning Problem
The partitioning problem deals with splitting a network into two or more
parts
by cutting connections.
Partitioning problem is treated here together with placement because
solution methods for the partitioning problem can be used as a
subroutine for some type of placement algorithms.

Data model of an electric circuit: the organization of the data structures


that represent electric circuit.

The data model consists of the three structures cell, port and net.
A cell is the basic building block of a circuit. A NAND gate is an example
of a cell.
The point at which a connection between a wire and a cell is established
is called a port.
The wire that electrically connects two or more ports is a net.
A set of ports is associated with each net.
A port can only be part of a single net.

the information stored in masters originates from a library


An input cell has a single port through which it sends a signal to the
circuit and an output cell has a single port through which it receives
a signal from the circuit.
Ports are indicated by
small squares
Dashed lines show the
cell boundaries

The graph will have three distinct sets of vertices:


1. a cell set,
2. a port set and
3. a net set.
There will be two edge sets:
4. one for edges connecting cells with ports and
5. one for edges connecting nets with ports
edges never connect vertices of the same type

A hypergraph consists of vertices


and hyperedges,
hyperedges connect two or more
vertices
the vertices represent the cells and
the nets
by omitting the explicit
representation
of nets: clique model
Used for clique partitioning

Wire-length Estimation
total wire length is used to evaluate the quality of placement
Estimation:
A wire-length metric is applied to each net, resulting in a length
estimate per net.
The total wire length estimation is then obtained by summing the
individual estimates.
The total wiring area can then be derived from this length by assuming
a certain wire width and a wire separation distance.
All metrics refer to a cell's coordinates.
common metrics are
Half perimeter: This metric computes the smallest rectangle that
encloses all terminals of a net and takes the sum of the width and
height of the rectangle as an estimation of the wire length.
The estimation is exact for two- and three terminal nets and gives a
lower bound for the wire length of nets with four or more terminals.

Minimum rectilinear spanning/Steiner tree:


The minimum Steiner tree always has a length shorter than or equal to
the spanning tree
Can be utilized to estimate total wire length required.
Squared Euclidean distance: This method is meant for the cliquemodel representation of an electric circuit.
As nets are not explicitly present in this model, the total cost is
obtained by summing over the cells rather than over the nets.
The cost of a placement is then defined as:
Yij is zero if there is no edge between
the vertices Vi, and Vj

Types of Placement Problem


standard-cell placement : standard cells are predesigned small circuits
(e.g. simple logic gates, flip-flops, etc.)
Rules for placement:
1. Connections that are shared by all or most cells, like e.g. power and
clock connections, cross the cells from left to right at fixed locations.
(called the logistic signals)
2. Signals related to the specific I/O of the cell have to leave the cell
either at the top or the bottom.

3. cells are collected into rows separated by wiring or routing channels.


they are connected by horizontal abutment. (standard-cell layout)

4. In full-custom design where designers have the freedom to give


arbitrary shapes to their cells , the cells need wiring space all around.
(building-block layout)

Apart from the standard-cell and building-block layout styles, a


combination of

The placement problem for standard cells or building blocks is more


complex than the unit-size placement problem.
One obvious difference is that moves that exchange two cells as
encountered in many general purpose algorithms are not always
possible due to the size difference

Placement Algorithms:
Placement algorithms can be grouped into two categories:
Constructive placement: the algorithm is such that once the
coordinates of a cell have been fixed they are not modified anymore;
Iterative placement: all cells have already some coordinates and
cells are moved around, their positions are interchanged, etc. in order
to get a new (hopefully better) configuration.
An initial placement is obtained in a constructive way and attempts are
made to increase the quality of the placement by iterative
improvement.

Constructive Placement:
Partitioning methods which divide the circuit in two or more
subcircuits of a given size while minimizing the number of connections
between the subcircuits:
1. min-cut partitioning and
2. Clustering
1. min-cut partitioning
The basic idea of min-cut placement is to split the circuit into two
subcircuits of more or less equal size while minimizing the number of
nets that are connected to both subcircuits
The two subcircuits obtained will each be placed in separate halves of the
layout
The number of long wires crossing from one half of the chip to the other
will be minimized
bipartitioning is recursively applied

The second task can be based on different heuristics.


One such heuristic is to look at the parts of the circuit that already have
a fixed position (either because the placement of these parts is already
fixed or because they are connected to the inputs or outputs of the chip
that are located at the chip's periphery)
Min-cut placement is a top-down method

Iterative Improvement:
Iterative improvement is a method that perturbs a given placement by
changing the
positions of one or more cells and evaluates the result
If the new cost is less than the old one, the new placement replaces the
old one and the process continues

Perturbation of a feasible solution for standard cell or building-block


placement is more complex due to the inequality of the cell sizes.
Different approaches are possible:
1. One can allow that cells in a feasible solution overlap and make the
overlap part of
the cost function to be minimized.
This will direct a placement algorithm towards solutions with little or no
overlap.
Any overlap that remains can be eliminated by pulling apart the cells in
the final layout (at the expense of a larger overall chip area).
2. One can eliminate overlaps directly after each move by shifting an
appropriate part of the cells in the layout.
In general, this is a computation-intensive operation as the coordinates of
many cells in the layout have to be recomputed as well

Force-directed placement:
It assumes that cells that share nets, feel an attractive "force" from each
other.
The goal is to reduce the total force in the network.
one can compute the "center of gravity" of a cell, the position where the
cell feels a force zero
center of gravity (xig , yig) of a cell i is defined as

perturbation is then to
move a cell to a legal position close to its center of gravity and
if there is another cell at that position to move that cell to some empty
location or to its own center of gravity

Partitioning

When
a large circuit has to be implemented with multiple chips and the
number of pins on the IC packages necessary for interchip
communication should be minimized.
Kernighan-Lin Partitioning Algorithm
There is an edge-weighted undirected graph G(V, E)
The graph has 2n vertices (|V| = 2n); an edge (a, b) Ehas a weight if (a,
b) E, 0.
The problem is to find two sets A and B, subject to A U B = V, A B = 0,
and |A| = |B| = n, which minimizes the cut cost defined as follows:

The principle of the algorithm is to start with an initial partition


consisting of the sets A0 and B0 which, in general, will not have a
minimal cut cost.
In an iterative process, subsets of both sets are isolated and
interchanged.
In iteration number m, the set isolated from Am-1 will be denoted by
Xm and the set isolated from Bm-1 will be denoted by Ym.
The new sets, Am and Bm are then obtained as follows

algorithm makes an attempt to find suitable subsets, interchange them


and then tries
to make a new attempt, until the attempt does not lead to an


The
construction of the sets Xm and Ym is based on external and
internal costs for vertices in the sets Am-l and Bm-l.
The external cost Ea of a Am-1 is defined as follows

the external cost for vertex a Am-1 is a measure for the pull that the
vertex experiences from the vertices in Bm-1.
the external cost Ebfor a vertex b Bm-1.

internal costs la and lb

The difference between internal and external costs gives an indication


about the
desirability to move the vertex:
a positive value shows that the vertex should be better moved to the
opposite set, a negative value shows a preference to keep the vertex in
its current set.
The differences for the vertices in both sets are given by the variables
Da and Db

the gain in the cut cost, A, resulting from the interchange of two vertices
can

It is important to realize that the best cut cost improvement leading to


the selection
of a pair (ai ,bi) may be negative.
Once all vertices have been locked, the pairs are investigated in the
order of selection: the actual subsets to be interchanged correspond to
the sequence of pairs (starting with i=1) giving the best improvement.
Pairs in the sequence may have negative cost improvements as long as
the pairs following
them compensate for it.
Such a situation would occur when the exchange of two clusters of
tightly connected vertices results in an improvement, while the
exchange of individual vertices from each cluster does not improve the
cut cost.

KL algorithm

Floorplanning:
floorplan-based design methodology: This top-down design
methodology advocates that layout aspects should be taken into
account in all design stages.
At higher levels of abstraction, due to the lack of detailed information,
only the relative positions of the subblocks in the structural description
can be fixed.
Taking layout into account in all design stages also gives early
feedback: structural synthesis decisions can immediately be evaluated
for their layout consequences and corrected if necessary.
The presence of (approximate) layout information allows for an
estimation of wire lengths. From these lengths, one can derive
performance properties of the design such as timing and power
consumption.

three registers, two multiplexers and


a controller and an ALU

At the moment that this type of structural information is not fully


available,
one can estimate the area to be occupied by the various subblocks
and, together with a precise or estimated interconnection pattern,
try to allocate distinct regions of the integrated circuit to the specific
subblocks

Terminology and Floorplan Representation:


floorplan can be represented hierarchically: cells are built from other
cells, except for those cells that are at the lowest level of the hierarchy
These lowest-level cells are called leaf cells
Cells that are made from leaf cells are called composite cells
Composite cells can contain other composite cells as well, The direct
subcells of a composite cell are called its children
every cell, except for the one representing the complete circuit, has a
parent cell
both leaf cells and composite cells are assumed to have a rectangular
shape.
If the children of all composite cells can be obtained by bisecting the cell
horizontally or vertically, the floorplan is called a slicingfloorplan.

So, in a slicing floorplan a composite cell is made by combining its


children horizontally or vertically
A natural way to represent a slicing floorplan is by means of a slicing
tree.
The leaves of this tree correspond to the leaf cells.
Other nodes correspond with horizontal and vertical composition of the
children nodes.

'from left to right' in horizontal composition and 'from bottom to top' in


vertical composition
When the the children of the given cell cannot be obtained by bisections
wheel or spiral Floorplan is presented. a composite cell needs to be
composed of at least five cells
in order not to be slicing.

One can derive new composition operators from the wheel floorplan and
its mirror image and use them in combination with the horizontal and
vertical composition operators in a floorplan tree.
A floorplan that can be described in this way is called afloorplan of order
5
slicing floorplan can also be called a floorplan of order 2

A representation mechanism that can deal with any floorplan is the


polar graph which actually consists of two directed graphs:
the horizontal polar graph and
the vertical one.
These graphs can be constructed by identifying the longest possible
line segments that separate the cells in the floorplans.
The horizontal segments are used as vertices in the horizontal polar
graph and the
vertical segments as the vertices in the vertical polar graph
Each cell is represented by an edge in the polar graph
In the horizontal one, there will be an edge directed from the line
segment that is the cell's top boundary to the line segment that is its
bottom boundary
In the vertical one, a similar idea is used where the edge direction is
from the left boundary to the right one

Abut : When two cells that need to be electrically connected have their
terminals in the right order and separated correctly, the cells can simply
be put against each other without the necessity for a routing channel in
between them. Such cells are said to Abut.
Ideally, all composite cells are created by abutment and no routing
channels are used in a floorplan: This requires the existence of flexible
cells
flexible cells should be able to accommodate feedthrough wires.
floorplan-based design does not exclude the existence of routing
channels. The channels can be taken care of by incorporating them in
the area estimations for the cells.

Optimization Problems in Floorplanning:


1. Mapping of a structural description to a floorplan (e.g. a slicing
tree).
In a true top-down design methodology, floorplanning will probably be
performed manually or interactively as the number of children cells in
which a parent cell is subdivided is relatively small and
good decisions can be made based on designer experience.
techniques known from placement like min-cut partitioning can also be
used in floorplanning
problem related to global routing is called abstract routing.
2. Floorplan sizing: The availability of flexible cells implies the
possibility of having
different shapes.
It is therefore possible to choose a suitable shape for each leaf cell such
that the resulting floorplan is optimal in some sense.

3. Generation of flexible cells:


This task takes as input a cell shape, data on desired positions of
terminals and a netlist of the circuit to be synthesized at some
abstraction level, and uses a cell compiler to generate the layout that
complies with the input.
The problem is especially complex when the layout has to be composed
of individual transistors because of the many degrees of freedom and the
huge search space that is associated with it.
As this style of design amounts to full-custom design, quite some extra
effort has to be spent in the characterization of the generated cells.
Characterization is the process of determining all kind of electrical
properties of a cell, such as parasitic capacitances and propagation delay,
Characterization is necessary for an accurate simulation of the circuit
containing the generated cell.
Flexibility in a cell's shape can be achieved using primitives belonging to
a level higher than the transistor level.
An example is a register file of 64 registers that can be laid out in many
different ways, such as 8 x 8, 16 x 4, 4 x 16 or 1 x 64.

Shape Functions and Floorplan Sizing:


When the cell is flexible, one could say that the realization needs an area
A.
Whichever shape the cell will have, its height h and its width w have to
obey the constraint hw > A.
The minimal height given as a function of the width is called the shape
Function of the cell.
Due to design rules neither the height nor the width will asymptotically
approach zero.

Inset or rigid cell: An inset cell, a predesigned cell residing in a library,


has the possibility of rotations (only in multiples of 90) and mirrorings
as the only flexibility to be fit in a floorplan.

The shape function of a composite cell in a slicing floorplan can be


computed from
the shape function of its children cells
If the shape function of c1 is indicated by h1(w) and the one of C2 by
h2(w), then the shape function h3(w) of the composite cell c3 can be
expressed as:

a small example where both c1 and c2 are inset cells with respective
sizes of 4 x 2 and 5x3. Clearly, there are four ways to stack the two cells
vertically

In the case of horizontal composition, the shape function of a composite


cell has to be computed using a detour via the inverses of the children's
shape functions
The inverse of the composite cell's shape function is the sum of the
inverses of its
children cell's shape functions
children shapes can be easily derived from the chosen parent shape for
both types of composition
sizing algorithm for slicing floorplans
1. Construct the shape function of the top-level composite cell in a
bottom-up fashion starting with the lowest level and combining shape
functions while moving upwards.
2. Choose the optimal shape of the top-level cell.
3. Propagate the consequences of the choice for the optimal shape down
the slicing tree until the shapes of all leaf cells are fixed.

Routing
The specification of a routing problem will consist of the
1. position of the terminals,
2. the netlist that indicates which terminals should be interconnected
and
3. the area available for routing in each layer.
Routing is normally performed in two stages.
4. The first stage, global or loose routing: determines through which
wiring channels a connection will run.
5. The second stage local or detailed routing: fixes the precise paths
that a wire will
take (its position inside a channel and its layer).

Types of Local Routing Problems are defined using following


parameters:
1. The number of wiring layers
The number of layers available depends on the technology and the
design style
A contact cut that realizes a connection between two layers is often
called a via in the context of routing.
2. The orientation of wire segments in a given layer:
Reserved-layer models of routing use either horizontal or vertical
segments in one layer
Sometimes it is also allowed to use segments with an orientation that is
a multiple of 45
3. Gridded or gridless routing. In gridded routing, all wire segments
run along lines of an orthogonal grid with uniform spacing between the
lines.
In gridless routing, wires of different widths as well as contacts are
explicitly represented.
4. The presence or absence of obstacles. Sometimes the complete
routing area is
available for routing, sometimes part of the area in one or more layers
is blocked.

6. Terminals with a fixed or floating position. In some problems the


position of the
terminals is fixed, but in other problems the router can move the terminal
inside a
restricted area.
7. Permutability of terminals. Sometimes the router is allowed to
interchange terminals because they are functionally equivalent.
8. Electrically equivalent terminals: In some situations, a group of
terminals belonging to the same net may already be connected to each
other, router should connect the rest of the net to only one of the
terminals in this group, whichever is the most suitable.

Area Routing (single wiring layer, a grid, the presence of obstacles, and
fixed terminals in all the routing area).
Routing problems in which terminals are allowed anywhere in the area
available for routing are normally classified as area routing problems
path connection" or "maze routing" algorithm
The basic algorithm is meant to realize a connection between two points
("source" terminal, the "target" terminal) in a plane, in an environment
that may contain obstacles.
If a path exists, the algorithm always finds the shortest connection,
going around obstacles.
Obstacles are grid points through which no wire segments can pass.
The distance between two horizontally or vertically neighboring grid
points corresponds to the shortest possible wire segment.
The algorithm consists of
three steps:
wave propagation,
backtracing, and
cleanup

in this backtracing step, sometimes the neighbor with label i is not


unique: a heuristic should be used to make a choice.
another heuristic can be used not to change the orientation of the
path unnecessarily,
Once a path has been found, it will act as an obstacle for the next
connections to be made
The worst-case time complexity of Lee's algorithm operating on an n
x n grid is 0(n2). Its space complexity is also 0{n2).
In the case that there are multiple layers, the algorithm operates on
a three-dimensional grid, where the size of the third dimension equals
the number of layers available
When a net has three or more terminals first a path between two
terminals should
be found and then a generalization of the algorithm has to be used
where a path can
either act as a source or a target for the wave propagation.

nets have to be routed sequentially is the weak point of Lee's


algorithm
Routing the nets in a different order strongly influences the final result

Channel Routing:
Channel routing occurs as a natural problem in standard cell and
building block layout styles, but also in the design of printed circuit
boards (PCBs).
It consists of routing nets across a rectangular channel.
all terminals belonging to the same net have the same number

the grid distance is equal to the horizontal separation between the


terminals.
The nets have fixed terminals at the top and bottom of the channel and
floating terminals at the "open sides, at the left and right.
A floating terminal is known to enter the channel on the left or on the
right side, but it is up to the router to determine the exact position
nets 1and 3 have floating terminals at the left side and the nets 4 and 5 at
the right
The main goal of channel routing is the minimization of the height, while
a secondary goal is the minimization of the total wire length and the
number of via.
In other words: The objective is to minimize the area of the channel's
rectangular bounding box--or equivalently, to minimize the number of
different horizontal tracks needed.

switchbox routing
A routing problem that has some similarity with channel routing is
switchbox routing
fixed terminals can be found on all four sides of the rectangular routing
area.
the minimization of the area is not an optimization goal.
Switchbox routing is a decision problem.
the goal is to find out whether a solution exists. When a solution can be
found, a
secondary goal is to minimize the total wire length and the number of
vias

Channel Routing Models:


classical model
1. All wires run along orthogonal grid lines with uniform separation.
2. There are two wiring layers.
3. Horizontal segments are put on one layer and vertical segments on
the other one.
4. For each net, the wiring is realized by a single horizontal segment,
with vertical segments connecting it to all terminals of the net.

Gridless routing model


Routers have been designed for working without a grid.
allows that each wire has a specific width.
reserved-layer model: each layer has only wires in one direction.
Works for small search space.
nonreserved layer model works for a larger solution space

The Vertical Constraint Graph


Consider a pair of terminals located in the same column and entering the
channel in
the same layer
It is obvious that in any solution of the problem,
the endpoint of the segment coming from the top has to finish at a
position higher or lower
than the endpoint of the bottom segment (otherwise, there would be a
shortcircuit)

This restriction is called a vertical constraint.


Each column having two terminals in the same layer gives rise to a
vertical constraint. The constraints are often represented in a vertical

In this directed graph, the vertices represent the endpoints of the


terminal segments and the directed edges represent the relation
"should be located above
it consists of pairs of vertices, one pair for each column that has two
terminals in the same layer, each pair connected by a single directed
edge from one vertex to the other, and unconnected vertices for the
other columns
Cycles are not allowed.

fully separated VCG

fully merged VCG

The main problem with the fully merged form is the possible existence of
cycles, in which case the corresponding layout cannot be realized: a
segment cannot be at the same time above and below another one.
In the absence of cycles in the VCG, a solution with a single horizontal
segment per net would amount to finding the longest path in the graph.

Horizontal Constraints and the Left-edge Algorithm


If, in the classical model for channel routing, horizontal segments
belonging to different nets are put on the same row (implying that they
will be in the same layer too), the segments should not overlap
(otherwise there would be a shortcircuit).
This restriction is called a horizontal constraint.
A net I in a channel routing problem without vertical constraints can be
characterized by an interval [Ximin, Ximax], corresponding to the leftmost and right-most terminal positions of the net.
The goal of channel routing is then reduced to assign a row position in
the channel to each interval
An optimal solution combines those nonoverlapping intervals on the
same row that
will lead to a minimal number of rows

The number of intervals that contain a specific x-coordinate is called


the local
density at column position x and will be denoted by d(x)
The maximum local density in the range of all column positions is called
the channel's density and is denoted by dmax
Obviously, the density is a lower bound on the number of necessary
rows: all intervals
that contain the same x-coordinate must be put on distinct rows
The left-edge algorithm always finds a solution with a number of rows
equal to the
lower bound.
Structures for the representation of intervals and linked lists of intervals,
called interval
and list of interval respectively
standard "list processing" function calls:
first(l) gives the first element of a list l;
rest(l) gives the list that remains when the first element is removed

list i_list that contains the intervals in order of increasing left coordinate.

The time complexity of the algorithm can easily be expressed in terms of


the number of intervals n and
the density of the problem d (the number of rows in the solution).
Sorting the set of intervals by their left coordinate can be done in 0(n
logn).
The outer loop will be executed d times and at most n intervals from the
sorted list will be inspected in the inner loop.
This leads to a total worst-case time complexity of 0(nlogn + dn).

The problem of assigning nonoverlapping intervals to rows can also be


described in graph-theoretical terms.
A set of intervals defines a so-called interval graph G(V, E):
for each interval i , there is a vertex v and
there is an edge (vi, vj) if the corresponding intervals vi and vj overlap
The problem of finding the minimum number of rows for the channel
routing problem without vertical constraints is equivalent to finding a
vertex coloring of the corresponding interval graph with a minimal
number of colors
The vertex coloring problem for graphs is the problem of assigning a
"color" to all vertices of the graph such that adjacent vertices have
different colors and a minimal number of distinct colors are used

Channel Routing Algorithms


robust channel routing algorithm
there are rows in the channel, the number of rows is given by the variable
height
The channel routing problems of decreasing size (stored in the variable
N) are solved in
subsequent iterations (dynamic programing)
The selected nets will be located on the same row alternatingly either on
the top or the bottom of the remaining channel
Each iteration consists of two parts:
1. the assignment of weights to the nets and
2. the selection of a maximal-weight subset of these nets
The algorithm tries to eliminate vertical constraint violations by maze
routing.

The weight Wj of a net I expresses the desirability to assign the net to


either the top
or bottom row
The side (top or bottom) that is selected in some point of the iteration
will be called the "current side
following rules to compute the weights:
1. For all nets I whose intervals contain the columns of maximal
density, add a large
number B to the weights wi.
2. For each net I that has a current-side terminal at the column
positions x, add to Wi the local density d(x).
3. For each column x for which an assignment of some net I to the
current side will
create a vertical constraint violation, subtract Kd(x) from w;
K is a parameter that should have a value between 5 and 10.
This discourages the creation of vertical constraint violations.

Once all nets have received a weight, the robust routing algorithm
finds the
maximal-weight subset of nets that can be assigned to the same row.
The nets selected for the subset should not have horizontal
constraints.
For any graph, a set of vertices that does not contain pairs of adjacent
vertices is called an independent set.
The problem of finding the maximal-weight subset of the nets could
therefore be formulated as the maximal-weight independent set
problem of the corresponding interval graph.
In the case of the problem of obtaining the group of nonoverlapping
intervals with maximal total weight, the subinstances can be
identified by a single parameter y, with
1 < y < channel.width.
To obtain the subinstance with y = c, one should remove all intervals
that extend beyond column position c.
The costs of the optimal solutions for the subinstances with y = c are

the optimal cost for the subinstance with y = c can be derived from the
optimal costs of the subinstances with y < c and the weights of the nets
that have their right-most terminals at position c(There are at most two
such nets)
Net n is part of the optimal solution if total[c -1] < wn + total[xnmin
1]
n is part of the optimal solution if n s weight added to the optimal
solution for the subinstance that did not include any nets that
overlapped with n, is larger than the optimal solution for the subinstance
with y =c 1.
n is part of the optimal solution if ns weight added to the optimal
solution for the subinstance that did not include any nets that
overlapped with n, is larger than the optimal solution for the subinstance
with y = c 1
If a net is selected for some c, the net's identification is stored in the
array selectec_net

Total=total weight till c

The robust channel routing algorithm uses a restricted maze routing


algorithm to repair these violations by selectively undoing the
assignments of nets to rows and rerouting these nets.
Such a strategy is often called rip up and reroute

Introduction to Global Routing:


global routing is a design action that precedes local routing and follows
placement
Global routing decides about the distribution across the available routing
channels of the interconnections as specified by a netlist.
Standard-cell Layout
this type of layout is characterized by rows of cells separated by wiring
channels.
If all terminals of a net are connected to cells facing the same channel,
the entire net can be routed by local routing only.
If the terminals of a net are connected to cells on more than two
adjacent rows, the global router should split the net into parts that can
be handled each by local routing
Obtaining a wiring pattern that roughly interconnects all terminals of a
net, amounts to constructing a minimum rectilinear Steiner tree.

The rectilinear Steiner tree contains vertical segments that cross the rows
of standard cells. They can be realized in different way.
1. By simply using a wiring layer that is not used by the standard cells.
2. By making use of feedthrough wires that may be available within
standard cells
3. By making use of feedthrough cells; these are cells that are inserted
between functional cells in a row of standard cells with the purpose of
realizing vertical connections.
First of all, it may be necessary to slightly shift the segments in order to
align with feedthrough wire positions.
Second, segments at approximately the same location can be permuted
to reduce the densities in the channels above and below the row that
they cross.
If feedthrough resources are scarce, their use can be minimized by
building a
Steiner tree for which vertical connections have a higher cost than
horizontal ones.

Given the fact that longer wires roughly correspond to larger delays,
cells connected to critical nets (nets that are part of the critical path)
will receive a higher priority to be placed close to each other during
placement.
a long wire in an IC behaves more like a transmission line, partition
the wire into multiple segments, each segment with its own resistance
and capacitance.
A model based on this principle, is the Elmore delay model
the signal flow in a net is unidirectional starting from a source
terminal and propagating to multiple sink terminals, signal changes
will not arrive simultaneously at all sinks
It may e.g. be necessary to optimize the length of the connection from
the source to the critical sink (this is a connection that is part of the
critical path) rather than the overall tree length
In standard-cell layout, global routing minimizes the overall area if it
minimizes
the sum of all channel widths.

Building-block Layout and Channel Ordering


Global routing for building-block layout is somewhat more complex
than for standard-cell layout as a consequence of a higher degree of
irregularity of the layout.
Area for routing is reserved around the ceils, but it is not always obvious
how this area can be partitioned into channels that can be handled by
channel routers (channel Definition problem), and
in which order these channels should be routed (channel ordering
problem).
In the case of horizontal composition in the slicing tree, the channels are
delimited by the top and bottom edges of the two cells involved in the
composition.
In the case of vertical composition, the left and right edges determine
the channel borders.
Both the layout and the tree are annotated with a number between
parentheses that indicates a possible correct order for routing the
channels.
This order can be obtained by a depth-first traversal of the tree.

once Channel (2) has been routed, its floating terminals at its "bottom"
side are fixed
by the channel router and become fixed terminals for the top side of
Channel (3).
The floating terminals at the left side of Channel (3), receive a fixed
position after completing the routing of the channel and become fixed
terminals for the right side of Channel (4).

Algorithms for Global Routing:


the layout is covered by a grid.
The horizontal grid lines are chosen such that they run across the
centers of the cell rows.
Vertical grid lines can be chosen such that the horizontal and vertical
resolutions are roughly equal.
Note that the exact distance between horizontal lines is not known in
advance and depends on the results of channel routing.

The grid divides the routing area into elementary rectangles.


All terminals located in such a rectangle will be thought of as having the
same coordinates.
The points to be interconnected by rectilinear Steiner trees will then all be
considered to lie at the center of these unit rectangles

local density
The local vertical density dv(i, j) (1 <i < m; 1 < j < n - 1) ; is then
defined as the number of wires crossing the vertical grid segment
located on vertical grid line j between the horizontal lines i -1 and i.
The local horizontal density dh(i, j) (1 < i < m 1; 1 < j < n) is defined
as the number of wires crossing the horizontal grid line i between the
vertical grid lines j - 1and j.
The density Dv(i) (1 < i < m) of the channel between grid lines i- 1 and
i is then given by:

The goal of global routing is to minimize the total channel density given
by:

Mij are the parameters that give the maximum number of


feedthroughs that can be
accommodated per horizontal grid segment.
Algorithm concept:
One could, construct Steiner trees for all nets independently, examine
the result for congested areas and try to modify the shapes of those
trees that are the cause of overcongestion, or those trees that
contribute to the reduction of the total channel density after
reshaping.

Divide-and-conquer algorithm
Instead of using the same grid during the complete routing process, one
could start with a very coarse grid, say a 2 x 2 grid, perform global
routing on this grid by assuming
that all terminals covered by an elementary rectangle are located at
the rectangle's center, and
construct Steiner trees that evenly distribute the wires crossing the
grid segments.
One then gets four smaller routing problems that can be solved
recursively following the same approach.
The recursion stops when a sufficient degree of detail has been reached
for handing the problem over to a local router.
The decision on the ordering of wires crossing a boundary for one
subproblem will constrain the search space of the neighboring one.

Efficient Rectilinear Steiner-tree Construction


The input of the rectilinear Steiner-tree problem is a set of n points P =
{p1, p2, ... , pn) located in the two-dimensional plane.
The rectilinear distance between a pair of points pi = (xi, yi) and pj = ( xj
, yj) is equal to |xixj | + |yi - yj|.
The goal of the problem is to find a minimal-length tree that
interconnects all points in P and makes use of new points.
The set of new points will be denoted by S.
The problem of finding Steiner tree is the problem of finding a spanning
tree in the set of points P U {s}
The function 1-steiner takes the vertex and edge sets of a spanning
tree as input and
returns three values corresponding to the vertex and edge sets of die
constructed 1-Steiner tree, and the decrease in tree-length that was the
result of adding one Steiner point.

all candidate points s are visited and the spanning tree for the points in P
U {s} is computed
each time.
The point that leads to the cheapest tree is then selected
SLECTING s POINT:
an optimal rectilinear Steiner tree can always be embedded in the grid
composed of only those grid lines that carry points of the set P.
candidate points are commonly called Hananpoints.

Spanning-update: It involves the incremental computation in linear


time of the minimum spanning tree for the set P U {s} given the
minimum spanning tree for the set P.
The four points to which point s may be connected are the closest
ones in each of the four regions obtained by partitioning the plane
by two lines crossing s at angles of +45 and 45 degrees.
In the pseudo-code, these four regions are called north, east, south
and west,
while the closest point to s from a point set V (excluding s itself) in a
region r is computed by the function closest-point.

Spanning_update operates in linear time O(n).


Given the fact that the number of Hanan points is 0(n^2),
the worst-time complexity of the function 1-steiner becomes 0(n^3)
Because the function 1-steiner will be called at most 0(n^2) times, the
time complexity of the main function steiner can be stated to be 0(n^5)
it happens very often that a minimum rectilinear Steiner tree problem
instance has many distinct optimal solutions.
It may also happen that solutions exist with Steiner points that are not
Hanan points.
Local Transformations for Global Routing
Once Steiner trees for all nets have been generated independently,
congested areas
in the grid can be identified.
The trees of the nets contributing to the congestion can be reshaped by
applying local transformations.

ARCHITECTURAL SYNTHESIS
Architectural synthesis means constructing the macroscopic structure
of a digital circuit, starting from behavioral models that can be captured by data-flow
or sequencing
graphs.
outcome of architectural synthesis
1. a structural view of the circuit, in particular of its data path, and
2. a logic-level specification of its coritrol unit.
The data path is an interconnection of resources (implementing
arithmetic or logic functions)
steering logic circuits (e.g., multiplexers and busses), that send data
to the appropriate
destination at the appropriate time and registers or memory arrays
to store data.
Structural view of the
differential equation
integrator with one
multiplier and one ALU

Circuit implementations are evaluated on the basis of the following


objectives:
area,
cycle-time (i.e., the clock period)
latency (i.e., the number of cycles to perform all operations)
throughput (i.e., the computation rate)
Resource-dominated circuits
area and performance depend on the resources as well as on the
steering logic, storage circuits, wiring and control. A common
simplification is to consider area and performance as depending only
on the resources.
Circuits for which this assumption holds are called resource-dominated
circuits
Architectural design problem and subproblems:
Realistic design examples have trade-off curves that are not smooth ,
because
of two reasons.
First, the design space is a finite set
Second, there are several non-linear effects that are compounded in
determining the
objectives as a function of the structure of the circuit.

CIRCUIT SPECIFICATIONS FOR ARCHITECTURAL SYNTHESIS


Specifications for architectural synthesis problem include behaviorallevel circuit models, details about the resources being used and
constraints.
In the case of resource-dominated circuits, the area is determined only
by the resource usage.
Resources
Resources implement different types of functions in hardware.
They can be broadly classified as follows
1. Functional resources process data.
They implement arithmetic or logic functions and can be grouped into
two subclasses
1. Primitive resources are subcircuits that are designed carefully
once and often
used
2. Application-specific resources are subcircuits that solve a
particular subtask
2. Memory resources store data
3. Interface resources support data transfer
The major decisions in architectural synthesis are often related to the
usage of functional resources while neglecting the wiring space.

When architectural synthesis targets synchronous circuit


implementations, as
is often the case and as considered, it is convenient to measure the
performance of the resources in terms of cycles required to execute the
corresponding
operation, which we call the execution delay.
Constraints
Interface constraints: They relate to the format and timing of the I/O
data transfers
The timing separation of IO operations can be specified by timing
constraints that can
ensure that a synchronous IO operation
resource binding constraint: a particular operation is required to be
implemented by a given resource (synthesis from partial structure)

THE FUNDAMENTAL ARCHITECTURAL SYNTHESIS PROBLEMS


circuit is specified by
1. A sequencing graph.
2. A set of functional resources.(fully characterized in terms of area
and execution delays.)
3. A set of constraints.
We assume that there are nops, operations.
Sequencing graphs are polar and acyclic, the source and sink vertices
being labeled as vo and vn.
where n = nops + 1. graph G,(V, E) has vertex set V = {vi; i = 0, 1, . . .
, n)
edge set E = ((vi, vj); i, j = 0, 1, . . . , n] representing dependencies.
Architectural synthesis and optimization consists of two stages.
First, placing the operations in time and in space, (i.e., determining the
time interval for their execution and their binding to resources.)
Second, determining the detailed interconnections of the data path and
the logic-level specifications of the control unit.


Scheduling:
We denote the execution delays of the operations by the set D = {di; i =
0, 1, . . . , n}
delay of the source and sink vertices is zero
The start times of the operations, represented by the set T = (ti; i = 0, 1, .
. . , n), are attributes of the vertices of the sequencing graph
latency of a scheduled sequencing graph is denoted by , and it is the
difference between the start time of the sink and the start time of the
source
A scheduled sequencing graph is a vertex-weighted sequencing
graph, where each vertex is labeled by its start time.

two (or more) combinational operations in a sequence can be


chained in the same
execution cycle if their overall propagation delay does not exceed
the cycle-time.
The Spatial Domain: Binding
Single resource type can implement more than one operation type
nres resource types, we denote the resource-type set by (1,2,. . . ,
nres)
The function T : V (1, 2, . . . , nres] denotes the resource type that
can implement an operation.
the binding problem can be extended to-a resource selection (or
module selection) problem by assuming that there may be more
than one resource applicable to an operation (e.g., a ripple-carry
and a cany-look-ahead adder for an addition).
In this case T is a one-to-many mapping.
A simple case of binding is providing a dedicated resource. Each
operation is bound to one resource, and the resource binding B is a
one-to-one function.

A resource binding may associate one instance of a resource type to


more than one operation. In this case, that particular resource is
shared and binding is a many to-one function.
A necessary condition for a resource binding to produce a valid circuit
implementation is that the operations corresponding to a shared
resource do not execute concurrently.
A resource binding can be represented by a labeled hypergraph,
where the vertex set V represents operations and the edge set Eg
represents the binding of the operations to the resources.

A resource binding is compatible with a partial binding when its


restriction to the operations U is identical to the partial binding itself.
Common constraints on binding are upper bounds on the resource usage
of each type, denoted by [ak; k = 1,2, . . . , nres].
These bounds represent the allocation of instances for each resource
type
A resource binding satisfies resource bounds (ak; k = I, 2, . . . , nres)
when B(vi) = (t, r) with r < at, for each operation (vi; i =1, 2, . . . .nops)
Scheduling and binding provide us with an annotation of the sequencing
graph that can be used to estimate the area and performance of the
circuit.

Hierarchical Models:
A hierarchical schedule can be defined by associating a start time to
each vertex in each graph entity.
The start times are now relative to that of the source vertex in the
corresponding graph entity.
The latency computation of a hierarchical sequencing graph, with
bounded delay operations, can be performed by traversing the
hierarchy bottom up
Delay modeling
1. vertex is the latency of the corresponding graph entity
2. delay of a branching vertex is the maximum of the latencies of the
comesponding bodies
3. delay of an iteration vertex is the latency of its body times the
maximum number of iterations

The Synchronization Problem


There are operations whose delay is not known at synthesis time. (datadependent iteration)
Scheduling unbounded-latency sequencing graphs cannot be done
with traditional techniques
One solution is to modify the sequencing graph by isolating the
unbounded-delay operations and by splitting the graph into boundedlatency subgraphs.

AREA AND PERFORMANCE ESTIMATION


A schedule provides the latency . of a circuit for a given cycle-time.
A binding provides us with information about the area of a circuit.
these two objectives can be evaluated for scheduled and bound
sequencing graphs
Resource-Dominated Circuits
The area estimate of a structure is the sum of the areas of the bound
resource instances.
the total area is a weighted sum of the resource usage.
A binding specifies fully the total area, but it is not necessary to
know the binding to determine the area.
it is just sufficient to know how many instances of each resource
type are used.
Called resource area.
The latency of a circuit can be determined by its schedule.

STRATEGIES FOR ARCHITECTURAL OPTIMIZATION


Architectural optimization comprises scheduling and binding.
Complete architectural optimization is applicable to circuits that can
be modeled by sequencing (or equivalent) graphs.
Thus the goal of architectural optimization is to determine a
scheduled sequencing graph with a complete resource binding that
satisfies the given constraints.
Partial architectural optimization problems arise in connection
with circuit models that either fully specify the timing behavior or fully
characterize the resource usage.

is obvious that any circuit model in terms of a scheduled and


It
bound sequencing graph does not require any optimization at all,
because the desired point of the design space is already prescribed.
() Architectural optimization consists of determining a schedule and
a binding
that optimize the objectives (area, latency, cycle-time).
Architectural exploration is often done by exploring the
(area,latency) trade-off
for different values of the cycle-time. This approach is motivated by
the fact that
the cycle-time may be constrained to attain one specific value.
Other approach is the search for the (cycle-time latency)
tradeofffor some binding or the (area cycle-time) trade-off for some
schedules.

Area Latency Optimization


For resource-dominated circuits given the cycle-time, the execution
delays can be determined.
scheduling problems provide the framework for determining the (area /
latency) trade-off points.
solutions to the minimum-latency scheduling problem and to the
minimum resource scheduling problem provide the extreme points of
the design space.
Intermediate solutions can be found by solving resource constrained
minimum-latency scheduling problems or latency-constrained minimum
resource scheduling problems.
In general circuits: area and latency can be determined by binding and
scheduling, but the two problems are now deeply interrelated.
Binding is affected by scheduling, because the amount of resource
sharing depends on the concurrency of the operations.
CAD systems for architectural optimization perform either schednling
followed by binding or vice versa

scheduling before binding : Most approaches to architectural


synthesis perform scheduling before binding. Such an approach fits well
with processor and DSP designs, because circuits often are resource
dominated
binding before scheduling: Performing binding before scheduling
permits the characterization of the steering logic and the more precise
evaluation of the delays.
No resource constraints are required in scheduling, because the
resource usage is determined by binding.
In this case, resource sharing requires that no operation pair with shared
resources executes concurrently.
This approach best fits the synthesis of those ASIC circuits that are
control dominated and where the steering logic parameters can be
comparable to those of some application-specific resource.

Cycle-Time/ Latency Optimization:


scheduling with chaining can be performed by considering the
propagation delays of the resources.
Retiming: When the resources are combinational in nature, the
problem reduces to determining the register boundaries that optimize
the cycle time.
This problem has been referred to as retiming
The formulation and its solution can be extended to cope with
sequential resources
by modelling them as interconnections of a combinational component
and registers.

Cycle-Time/ Area Optimization:


consider now scheduled sequencing graphs where latency is fixed.
we are solving either a partial synthesis problem or the binding
problem after scheduling.
This problem is not relevant for resource-dominated circuits, because
changing the binding does not affect the cycle-time.
for general circuits, cycle-time is bounded by the delays in the
resources, steering logic, etc.
Here that only delays in steering logic matter.

SCHEDULING ALGORITHMS
a sequencing graph prescribes only dependencies among the
operations,
the scheduling-of a sequencing graph determines the precise start
time of each task.
Sequencing and concurrency
The start times must satisfy the original dependencies of the sequencing
graph, which limit the amount of parallelism of the operations,
because any pair of operations related by a sequence dependency (or by
a chain of dependencies) may not execute concurrently.
Impact on area:
the maximum number of concurrent operations of any given type at any
step of the schedule is a lower bound on the number of required
hardware resources of that type. Therefore the choice of a schedule
affects also the area of the implementation.

A MODEL FOR THE SCHEDULING PROBLEMS


sequencing graph is a polar directed acyclic graph G,(V, E),
where the vertex set V = {vi; i = 0, 1, . . . , n] is in one-to-one
correspondence with the set of operations and
the edge set E = {(vi, vj); i, j = 0, 1, . . . , n] represents dependencies.
n = nops+1 and that we denote the source vertex by vo and the sink
by vsink; both are No-Operations
D = (di; i = 0, I, . . . , n) be the set of operation execution delays
the execution delays of the source and sink vertices are both zero, i.e.,
d0 = dn= 0
We denote by T = {ti; i = 0, I, . . . , n) the start time for the operations,
i.e., the cycles in which the operations start.
The sequencing graph requires that the start time of an operation is
at least as large as the start time of each of its direct predecessor plus
its execution delay

the latency of the schedule equals the weight of the longest path from
source to sink.
unconstrained minimum-latency scheduling problem:
Given a set of operations V with integer delays D and a partial order on
the operations E, find an integer labeling of the operations : V such that
resource-constrained scheduling problem
Given a set of operations V with integer delays D, a partial order on the
operations E and upper bounds (ak: k = 1.2.. . . , nops], find an integer
labeling of the operations : V

SCHEDULING WITHOUT RESOURCE CONSTRAINTS


Unconstrained scheduling is applied when dedicated resources are used
also used when resource binding is done prior to scheduling
resource conflicts are solved by serializing the operations that share the
same resource
the minimum latency of a schedule under some resource constraint is
obviously at least as large as the latency computed with unlimited
resources that is why
unconstrained scheduling can be used to derive lower bound on
latency for constrained problems.

Unconstrained Scheduling: The ASAP Scheduling Algorithm


We denote by tS the start times computed by the ASAP Algorithm
by a vector whose entries are {tsi; i = 0, 1, . . . . n]


Latency-Constrained
Scheduling:
The ALAP Scheduling Algorithm
upper bound on the latency, denoted by bar.
solved by executing the ASAP scheduling algorithm and verifying that
The ASAP scheduling algorithm yields the minimum values of the start
times.
the as late as possible (ALAP) scheduling Algorithm, provides the
corresponding maximum values
mobility (or slack): the difference of the start times computed by the
ALAP and ASAP algorithms. Namely i = tiL- tiS; {i = 0, 1,. . . , n}

Zero mobility implies that an operation can be started only at one given
time
step in order to meet the overall latency constraint.
When the mobility is larger than zero, it measures the span of the time
interval in which it may be started.

Scheduling Under Timing Constraints:


can be generalized to the case in which deadlines need to be met by
other operations
A further generalization is considering relative timing constraints that
bind the time separation between operations pairs, regardless of their
absolute value.
absolute timing constraints can be seen as constraints relative to the
source operation
The combination of maximum and minimum timing constraints
permits us to specify the exact distance in time between two
operations
Relative timing constraints are positive integers specified for some
operation pair (vi vj)
A minimum timing constraint
A maximum timing constraint
A consistent modeling of minimum and maximum timing constraints

edges are weighted by the delay of the operation corresponding to


their tail
Additional edges are related to the timing constraints
For every minimum timing constraint lij, we add a forward edge (vi vj)
in the constraint graph with weight equal to the minimum value lij.
For every maximum timing constraint uij we add a backward edge (vj
vi) in the constraint graph with weight equal to the opposite of the
maximum value wij= -uij
the requirement of an upper hound on the time distance between the
start time of two operations may be inconsistent with the time required
to execute the first operation, plus possibly the time required by any
sequence of operations in between.
The longest weighted path in the constraint graph between vi and vj
(that determines the minimum separation in time between operations
vi and vj) must be less than or equal to the maximum timing
constraint uij.
any cycle in the constraint graph including edge (vi, vj) must have
negative or zero weight. constraint graph does not have positive cycles.

Relative Scheduling

We
assume that operations issue completion signals when
execution is finished
a start signal to the source vertex, is also its completion signal.
In sequencing graph G,(V, E), where a subset of the vertices has
unspecified execution delay. Such vertices, as well as the source
vertex, provide a frame of reference for determining the start time of
the operations.
Anchors
The anchors of a constraint graph G(V. E) consist of the source
vertex voand of all vertices with unbounded delay.
start time and stop time of the operations cannot be determined on
an absolute scale
schedule of the operations is relative to the anchors
A defining path p(a. vi) from anchor a to vertex vi is a path in G,(V.
E) with one and only one unbounded weigh: da.
The relevant anchor set of a vertex vi is the subset of anchors


when
considering one path only and when anchors are cascaded along
the path, only the last one affects the start time of the operation at
the head of the path.
An anchor a is redundant for vertex vi, when there is another relevant
anchor b R(vi) such that
For any given vertex vi the irredundant relevant anchor set
represents the smallest subset of anchors that affects the start time
of that vertex.
Let ti be the schedule of operation vi with respect to anchor a,
computed on the polar subgraph induced by anchor a and its
successors, assuming that a is the source of the subgraph and that
all anchors have zero execution delay. Then

if there are no operations with unbounded delays, then the start


times of all operations will be specified in terms of time offsets from
the source vertex, which reduces to the traditional scheduling
formulation

Thus relative scheduling consists of the computation of the offset values v


ia for
all irredundant relevant anchors a of each vertex vi; i = 0, 1, . . . , n

RELATIVE SCHEDULING UNDER TLMING CONSTRAINTS


The constraint graph formulation applies although the weights on the
edges whose
tails are anchors are unknown.
in this case a schedule may or may not exist under the timing
constraint.
It is important to be able to assess the existence of a schedule for any
value of the
unbounded delays, (these values are not known when the schedule is
computed.)
Feasibility: A constraint graph is feasible if all timing constraints are
satisfied when the execution delays of the anchors are zero
well-posed graph: A constraint graph is well-posed if it can be
satisfied for all values of the execution delays of the anchors
A feasible constraint graph Gc(Vc, Ec) is well-posed or it can be made
well-posed if and only if no cycles with unbounded weight exist in
Gc(Vc Ec).

SCHEDULING WITH RESOURCE CONSTRAINTS


The solution of scheduling problems under resource constraints provides
a means for computing the (area/ latency) trade-off points.
The Integer Linear Programming Model
A formal model of the scheduling problem under resource constraints can
be achieved
by using binary decision variables with two indices
The number represents an upper bound on the latency, because the
schedule latency is unknown.
a binary variable, xil, is 1 only when operation vi starts in step L of the
schedule, i.e., L = ti
upper and lower bounds on the start times can be computed by the
ASAP and ALAP algorithms on the corresponding unconstrained problem.
Thus xilis necessarily zero for

the start time of each operation is unique

the start time of any operation vi can be stated in terms of


xiL
the sequencing relations represented by Gs(V, E) must be satisfied

The number of all operations executing at step I of type k must be


lower than or equal to the upper bound ak

Let us denote by t the vector whose entries are the start times.
Then, the minimum-latency scheduling problem under resource
constraints can be stated as
(c holds boolean val for timesteps for which time has be calculated)

c = [0. . . . ,0, 1] vector corresponds to minimizing the latency of the


schedule, because

This model can be enhanced to support relative timing constraints by


adding
the corresponding constraint inequality in terms of the variables X.
For example, a maximum timing constraint uij on the start times of
operations vi, vj, can be expressed

minimum-resource scheduling problem under latency


constraints
The optimization goal is a weighted sum of the resource usage
represented by a.
Hence the objective function can be expressed by cTa
where c is a vector whose entries are the individual resource (area)
costs.
latency constraint

Multiprocessor Scheduling and Hu's Algorithm


In resource-constrained scheduling problem: When we assume that all
operations can be solved by the same type of resource, the problem is
often referred to as a precedence-constrained multiprocessor
scheduling problem
Assume all operations have unit execution delay
scheduling problem can be restated as

We compute first a lower bound on the number of resources required to


schedule a graph with a latency constraint under these assumptions.

A labeling of a sequencing graph consists of marking each vertex with


the weight of its longest path to the sink, measured in terms of edges
Denoted by

p(j) is the number of vertices with label equal to j.


latency is greater than or equal to the weight of the longest path.
A lower bound on the number of resources to complete a schedule
with latency

where y is a positive integer

y' the largest integer such that all vertices with labels larger than or
equal to + 1 - y' have been scheduled up to the critical step and the
following one
Denominator= l
a schedule exists with a resources that satisfy the latency bound

The vertices scheduled up to step L cannot exceed a.L.


At schedule step
the scheduled vertices are at most

This implies that at least a vertex with label


has not been
scheduled
yet. Therefore, to schedule the remaining portion of the graph we need
at least
steps. Thus, the schedule length is at least
which contradicts our hypothesis of satisfying a latency bound of bar.

a denote the upper bound on the resource usage

the algorithm can always achieve a latency with a.bar resources


the algorithm achieves also the minimum-latency schedule under
resource constraints
The algorithm schedules a operations at each step, starting from the
first one until a critical step, after which less than a resources can be
scheduled due to the precedence constraints.
We denote by c the critical step. Then c + 1is the first step in which less
than a operations are scheduled.
The vertices scheduled up to the critical step are a . c and
those scheduled up to step c + 1 are a.(c + ) where 0 << 1.
We denote by y' the largest integer such that all vertices with labels
larger than or equal to + 1 - y' have been scheduled up to the critical
step and the following one.
Then - y' schedule steps are used by the algorithm to schedule the
remaining operations after step c + 1

Hu's algorithm achieves latency with as many resources as

Using upper 2


recalling
that the - y' schedule steps are used by the algorithm to
schedule the
remaining operations after step c+ 1, the total number of steps used by
the algorithm is

Heuristic Scheduling Algorithms:


List Scheduling
consider first the problem of minimizing latency under resource
constraints, represented by vector a.
Here the algorithm is to handle multiple operation types and multiplecycle execution delays.
U.lk are those operations of type k whose predecessors have already
been scheduled early enough, so that the corresponding operations
are completed at step I.
The unfinished operations T.lk are those operations of type k that
started at earlier cycles and whose execution is not finished at step l.
when the execution delays are 1, the set of unfinished operations is
empty.
A priority list of the operations is used in choosing among the
operations.(common priority list is to label the vertices)

List scheduling can also be applied to minimize the resource usage


under latency
constraints.
At the beginning, one resource per type is assumed, i.e., a is a vector
with all entries set to 1.
For this problem, the slack of an operation is used to rank the
operations, where the slack is the difference between the latest
possible start time (computed by an ALAP schedule) and the index of
the schedule step under consideration.
The lower the slack, the higher the urgency in the list is

Heuristic Scheduling Algorithms:


Force-directed Scheduling
The time frame of an operation is the time interval where it can be
scheduled.
width of the time frame of an operation is equal to its mobility plus 1
The operation probability is a function that is zero outside the
corresponding time frame and is equal to the reciprocal of the frame
width inside it
the probability of the operations at time I by [pi(l); i = 0, I, . . . , n]
Operations whose time frame is one unit wide are bound to start in
one specific time step.
For the remaining operations, the larger the width, the lower the
probability that the operation is scheduled in any given step inside the
corresponding time frame
The type distribution is the sum of the probabilities of the operations
implementable by a specific resource type in the set {I, 2, . . . , nres]
at any time step of interest.

distribution graph is a plot of an operation-type distribution over the


schedule steps.
distribution graphs show the likelihood that a resource is used at each
schedule step.
A uniform plot in a distribution graph means that a type is evenly
scattered in the schedule and it relates to a good measure of utilization
of that resource.

In force-directed scheduling the selection of a candidate operation to be


scheduled in a given time step is done using the concept of force.

The force exerted by an elastic spring is proportional to the


displacement of its end points. The elastic constant is the
proportionality factor.
we assume in this section that operations have unit delays
The assignment of an operation to a step is chosen while considering
all forces relating it to the schedule steps in its time frame
the set of forces relating an operation to the different possible control
steps where it can be scheduled and called serf-forces.
Forces related to the operation dependencies and called
predecessor/successor forces.

self-forces
let us consider operation vi of type k = T(vi) when scheduled in step I.
The force relating that operation to a step m [ti, tl] is equal to the
type distribution qk(m) times the variation in probability 1 - pi(m).
The self-force is the sum of the forces relating that operation to all
schedule steps in its time frame.

assigning an operation to a specific step may reduce the time frame of


other operations.
Therefore, the effects implied by one assignment must be taken into
account by considering the predecessor/successor forces, which are the
forces on other operations linked by dependency relations.
the predecessor/successor forces are computed by evaluating the
variation on the self-forces of the predecessors/ successors - due - the
restriction of their time frames.
Let [tiL, tsL] be the initial time frame and [tiLbar, tsLbar] be the
reduced one

The total force on an operation related to a schedule step is computed by


adding to its self-force the predecessor/successor forces.

the selected candidates are determined by reducing iteratively the


candidate set Ulk by deferring those operations with the least force
until the resource bound is met.
The algorithm considers the operations one at a time for scheduling, as
opposed to the strategy of considering each schedule step at a time as
done by list scheduling

Boolean algebra
Boolean algebra is defined by the set B = (0, 1) and by two operations,
denoted
by + and .
The multi-dimensional space spanned by n binary-valued Boolean
variables is denoted by B^n.
It is often referred to as the n-dimensional cube.
A point in B^n is represented by a binary-valued vector of dimension
n.
A literal is an instance of a variable or of its complement.
A product of n literals denotes a point in the Boolean space: it is a zerodimensional cube.
n-input, m-output function is a mapping f : B^n B^m

An incompletely specified scalar Boolean function is defined over a


subset of B^n. The points where the function is not defined are
called dont care conditions.
In the case of multiple-output functions (i.e., m > I), don't care points
may differ for each component, because different outputs may be
sampled under different conditions.
incompletely specified functions are represented as

the subsets of the domain for which the function takes the values
0, I and * are called the off set, on set, and dc set, respectively

The cofactor of f (.x1 x2. . . . . xi,. . . . . xn) with respect to variable xi


is
f,, = f(.x1.r2.. . . , I.. . . .x,).
The cofactor of f(.x1 x2. . . . . xi,. . . . . xn) with respect to variable xi'
is
f; = ,f(r;,. .r2. . . . . 0. . . . . .s,,).

The Boolean difference w.r.t. xi, indicates whether f is sensitive to


changes in input xi


The
consensus of a function with respect to a variable represents the
component that is independent of that variable.

it corresponds to deleting all appearances of that variable


Let i. i = 1. 2,. . . , k, be a set of Boolean functions

Let f, g be two Boolean functions expressed by an expansion with respect


to the same orthonormal basis
Let O be an arbitrary binary operator representing a Boolean function of
two arguments. Then

Representations of Boolean Functions


There are different ways of representing Boolean functions, which can
be classified
into tabular forms, logic expressions and binary decision diagrams
Tabular form:
A complete listing of all points in the Boolean input space and of the
corresponding values of the outputs.
input part is the set of all row vectors in B^n and it can be omitted if a
specific ordering of the points of the domain is assumed.
The output part is the set of corresponding row vectors in (0, 1, *)^m.
Since the size of a truth table is exponential in the number of inputs,
truth tables are used only for functions of small size.
A multiple-output implicant (Tabular form) has input and output
parts.
The input part represents a cube in the domain and is a row vector in
(0, 1, *)^n.
The output part has a symbol in correspondence to each output,
denoting whether the corresponding input implies a value in the set (0,

EXPRESSION FORMS.
Scalar Boolean functions can be represented by expressions of literals
linked by the + and . operators.
Single-level forms use only one operator
Standard two-level forms are sum of products of literals and product of
sums of literals
BINARY DECISION DIAGRAMS
A binary decision diagram represents a set of binary-valued decisions,
culminating in an overall decision that can be either TRUE or FALSE.

Isomorphic OBDD:
Two OBDDs are isomorphic if there is a one-to-one mapping between
the vertex
sets that preserves adjacency, indices and leaf values.
ROBDD
An OBDD is said to be a reduced OBDD (or ROBDD) if it contains no
vertex v with low(v) = high(v). nor any pair [v. u] such that the
suhgraphs rooted in v and in u are isomorphic.
(redundancies have been eliminated from the diagram.)

unique table, which stores the ROBDD information in a strong


canonical form
compared table, is used to improve the performance of the algorithm
The relevant terminal cases of this recursion are ite( f, 1.0) = f, ite(1,
g, h) = g, ite(0, g, h) = h, ite( f, g, g) = g and ite(f. 0, 1) = f

TWO-LEVEL COMBINATIONAL LOGIC OPTIMIZATION


Two-level logic optimization is important:
1. two-level logic optimization has a direct impact on macro-cell
design styles using programmable-logic arrays (PLAs)
2. Two level optimization is of key importance to multiple-level
logic design
3. formal way of processing the representation of systems that can
he described by logic functions
LOGIC OPTIMIZATION PRINCIPLES
The objective of two-level logic minimization is to reduce the size
of a Boolean function in either sum of products or product of sums
form
goals of logic optimization may vary slightly, according to the
implementation styles
Ex: Each row of a PLA is in one-to-one correspondence with a
product term of the sum of products representation.
The primary goal of logic minimization is the reduction of terms

Logic minimization of single-output and multiple-output functions


follows the same principle, but the latter task is obviously more
complex.
completely specified functions are the special case of functions with no
don't care conditions.
f : B^n + (0, 1, *]^m can be represented in several ways,
For each output i = 1,2. . . . . m we define the corresponding on set, off
set and dc set as the subsets of B^n whose image under the ith
component of f (i.e., under f,) is 1, 0 and *, respectively.
multiple-output implicant: it combines an input pattern with an
implied value of the function.
A multiple-output implicant of a Boolean function f : B^n + (0, I, *)^m is
a pair of row vectors of dimensions n and m called the input part and
output part, respectively.
The input part has entries in the set (0, 1, *] and represents a product of
literals.
The output pan has entries in the set (0, I).
For each output component, a 1 implies a TRUE or don't cure value of
the function in correspondence with the input part.

A multiple-output minterm of a Boolean function f : B^n -- (0, 1, *)m


is a multiple-output implicant whose input part has elements in the set
(0, 1) and that implies the TRUE value of one and only one output of
the function.
A multiple-output implicant corresponds to a subset of minterms of the
function.
we can define containment and intersection among sets of implicants.
A cover of a Boolean Function is a set (list) of implicants that covers its
minterms

The on set, off set and dc set of a function f can be modeled by covers.

The size, or cardinalit)., of a cover is the number of its implicants.


A minimum cover is a cover of minimum cardinality.
A minimal, or irredundant, cover of a function is a cover that is not a
proper superset of any cover of the same function
A cover is minimal with respect to single-implicant containment, if no
implicant is contained in any other implicant of the cover.
An implicant is prime if it is not contained by any implicant of the
function. A cover is prime if all its implicants are prime.
For single-output functions, a prime implicant corresponds to a product
of literals where no literal can be dropped while preserving the
implication property.

Exact Logic Minimization


Exact logic minimization addresses the problem of computing a
minimum cover.
If there is a minimum cover, that is prime
A prime implicant table is a binary-valued matrix A whose columns
are in oneto-one correspondence with the prime implicants of the function f
and whose rows
are in one-to-one correspondence with its minterms.

An entry aij is 1 if and only if the jth prime covers the ith minterm.
A minimum cover is a minimum set of columns which covers all rows

Therefore the covering problem can be viewed as the problem of


finding a binary vector x representing a set of primes with minimum
cardinality |x| such that:
matrix A can be seen as the incidence matrix of a hypergraph, whose
vertices correspond to the minterms and whose edges correspond to
the prime implicants.
the covering problem corresponds to an edge cover of the hypergraph.
Row and column dominance can be used to reduce the size of matrix
Extraction of the essentials and removal of dominant rows and
dominated columns can be iterated to yield a reduced prime implicant
table.
selecting the different column (prime) combinations and evaluating the
corresponding cost (based on the essential and dominance rules).

Petrick's method:
writing down the covering clauses of the (reduced) implicant table in a
product of
sums form.
The product of sums form is then transformed into a sum ofproducts
form by canying out the products of the sums.
The corresponding sum of products expression is satisfied when any of
its product terms is TRUE.
product terms represent the primes
a minimum cover is identified by any product term of the sum of
products form with the fewest literals.

ESPRESSO-EXACT algorithm:
,The major improvements of the ESPRESSO-EXACT algorithm over
the QuineMceluskee algorithm consist of the construction of a smaller
reduced prime implicant table and of the use of an efficient branchand-bound algorithm for covering.
ESPRESSO-EXACT partitions the prime implicants into three sets:
essentials,
partially redundunt and
totally redundant.
totally redundant primes are those covered by the essentials
partially redundant set includes the remaining ones
The rows of the reduced implicant table correspond to sets of minterms,
rather than to single minterms as in the case of Quine-McCluskey's
algorithm.
each row corresponds to all mintenns which are covered by the same
subset of prime
implicants

Heuristic Logic Minimization


iterative improvement strategy.
They compute instead a prime cover starting from the initial
specification of the function. This cover is then manipulated by
modifying and/ or deleting implicants .until a cover with a suitable
minimality property is found.
Heuristic logic minimization can be viewed as applying a set of
operators to the logic cover, which is initially provided to the minimizer
along with the don't care set
The most common operators in heuristic minimization are
Expand: Each non-prime implicant is expanded to a prime, i.e., it is
replaced by a prime implicant that contains it.
Reduce: attempts to replace each implicant with another that is
contained in it.
Reshape: Implicants are processed in pairs. One implicant is expanded
while the other is reduced.

ABSTRACT MODELS
Structures
Structural representations can be modeled in terms of incidence
structures.
An incidence structure consists of a set of modules, a set of nets
and an incidence relation among modules and nets.
A simple model for the structure is a hypergraph, where the
vertices correspond to the modules and the edges to the nets.
The incidence relation is then represented by the corresponding
incidence matrix.
An alternative way of specifying a structu~ is to denote each
module by its terminals, called pins (or ports), and to describe the
incidence among nets and pins.

Logic Network
A generalized logic network is a structure, where each leaf module is
associated with a
combinational or sequential logic function.
While this-concept is general and powerful, we consider here two
restrictions to this model: the combinational logic network and the
synchronous logic network.
The combinational logic network, called also logic network or Boolean
network,
is a hierarchical smcture where:
Each leaf module is associated with a multiple-input, single-output
combinational
logic function, called a local function.
Pins are partitioned into two classes, called mnput.i and outputs. Pins
that do not
belong to submodules are also partit~oned into two classes, called
primary inputs
and primary oufput.
Each net has a distinguished terminal, called a source, and an
orientation from

State Diagrams:
The behavioral view of sequential circuits at the logic level can
be expressed by
finite-state machine transition diagrams.
A finite-state machine can be described by:
A set of primary input panems, X.
A set of primary output patterns, Y.
A set of states, S.
A state transition function, S : X x S -t S.
An output function, A : X x S -t Y
for Mealy models or A : S -t
Y
for Moore
models.
An initial state.

Data-flow and Sequencing Graphs:


Abstract models of behavior at the architectural level are in terms of
tasks (or operations) and their dependencies.
Tasks may be No Operations (NOPs), i.e., fake operations that execute
instantaneously with no side effect.
Dependencies arise from several reasons.
A first reason is availability of data. When an input to an operation is the
result of another operation, the former operation depends on the latter.
A second reason is serialization constraints in the specification.
A task may have to follow a second one regardless of data dependency.
Data-flow graphs represent operations and data dependencies.
The data-flow graph model implicitly assumes the existence of
variables (or carriers), whose values store the information required and
generated by the operations.
Each variable has a lifetime that is the interval from its birth to its
death, where the
former is the time at which the value is generated as an output of an
operation and the
latter is the latest time at which the variable is referenced as an input to

A branching vertex is the tail of a set of alternative paths, corresponding


to the possible branches. Iteration can be modeled as a branch based on
the iteration exit condition.
The corresponding vertex is the tail of two edges, one modeling the exit
from the loop and the other the return to the first operation in the loop.
we call sequencing graph G,(V, E) a hierarchy of directed graphs.
A generic element in the hierarchy is called a sequencing graph entity,
when confusion is possible between the overall hierarchical model and
a single graph.
A sequencing graph entity is an extended data-flow graph that has two
kinds of vertices:
operations and links, the latter linking other sequencing graph entities in
the hierarchy.
Sequencing graph entities that are leaves-of the hierarchy have no link
vertices.
A model call vertex is a pointer to another sequencing graph entity at a
lower
level in the hierarchy.
It models a set of dependencies from its direct predecessors to the
source vertex of the called entity and another set of dependencies
from the corresponding sink to its direct successors.

Branching constructs can be modeled by a branching clause and


branching
bodies.
A branching body is a set of tasks that are selected according to the
value of
the branching clause.
The semantic interpretation of the sequencing graph model requires
the notion
of marking the vertices.
A marking denotes the state of the corresponding operation, which
can be
(i) waiting for execution; (ii) executing; and (iii) having completed
execution.
Firing an operation means starting its execution.
Then, the semantics of the model is as follows: an operation can be
fired as soon as all its direct predecessors have completed execution.
() Some attributes can be assigned to the vertices and edges of a
sequencing graph model, such as measures or estimates of the
corresponding area or delay cost.
() In general, the delay of a vertex can be data independent or data

bounded or unbounded delay


Data-dependent delays can be bounded or unbounded.
The former case applies to data-dependent delay branching, where the
maximum and minimum possible delays can be computed.
Bounded-latency graphs
A sequencing graph model with data-independent delays can be
characterized
by its overall delay, called latency.
Graphs with bounded delays (including data- independent ones) are
called bounded-latency graphs.
Else they are called unbounded- latency graphs, because the latency
cannot be computed.

Potrebbero piacerti anche