Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1 Physical Design
Figure 15.1 shows part of the design flow, the physical design steps, for an
ASIC (omitting simulation, test, and other logical design steps that have
already been covered). Some of the steps in Figure 15.1 might be performed in
a different order from that shown. For example, we might, depending on the
size of the system, perform system partitioning before we do any design entry
or synthesis. There may be some iteration between the different steps too.
FIGURE 15.1 Part of an ASIC design flow
showing the system partitioning,
floorplanning, placement, and routing steps.
These steps may be performed in a slightly
different order, iterated or omitted depending
on the type and size of the system and its
ASICs. As the focus shifts from logic to
interconnect, floorplanning assumes an
increasingly important role. Each of the steps
shown in the figure must be performed and
each depends on the previous step.
However, the trend is toward completing
these steps in a parallel fashion and
iterating, rather than in a sequential manner.
We must first apply system partitioning to divide a microelectronics system
into separate ASICs. In floorplanning we estimate sizes and set the initial
relative locations of the various blocks in our ASIC (sometimes we also call
this chip planning). At the same time we allocate space for clock and power
wiring and decide on the location of the I/O and power pads. Placement defines
the location of the logic cells within the flexible blocks and sets aside space for
the interconnect to each logic cell. Placement for a gate-array or standard-cell
design assigns each logic cell to a position in a row. For an FPGA, placement
chooses which of the fixed logic resources on the chip are used for which logic
cells. Floorplanning and placement are closely related and are sometimes
combined in a single CAD tool. Routing makes the connections between logic
cells. Routing is a hard problem by itself and is normally split into two distinct
steps, called global and local routing. Global routing determines where the
interconnections between the placed logic cells and blocks will be situated.
Only the routes to be used by the interconnections are decided in this step, not
the actual locations of the interconnections within the wiring areas. Global
routing is sometimes called loose routing for this reason. Local routing joins
the logic cells with interconnections. Information on which interconnection
areas to use comes from the global router. Only at this stage of layout do we
finally decide on the width, mask layer, and exact location of the
interconnections. Local routing is also known as detailed routing.
Goal. Calculate the sizes of all the blocks and assign them locations.
Objective. Keep the highly connected blocks physically close to each
other.
Placement:
Goal. Assign the interconnect areas and the location of all the logic cells
within the flexible blocks.
There is no magic recipe involved in the choice of the ASIC physical design
steps. These steps have been chosen simply because, as tools and techniques
have developed historically, these steps proved to be the easiest way to split up
the larger problem of ASIC physical design. The boundaries between the steps
are not cast in stone. For example, floorplanning and placement are often
thought of as one step and in some tools placement and routing are performed
together.
System Partitioning
Microelectronic systems typically consist of many functional blocks. If a
functional block is too large to fit in one ASIC, we may have to split, or
partition, the function into pieces using goals and objectives that we need to
specify. For example, we might want to minimize the number of pins for each
ASIC to minimize package cost. We can use CAD tools to help us with this
type of system partitioning.
We know how to measure the first two objectives. Next we shall explain ways
to measure the last two.
15.7.1 Measuring Connectivity
graph). The connections between the two ASICs are external connections, the
connections inside each ASIC are internal connections.
Notice that the number of external connections is not modeled correctly by
the network graph. When we divide the network into two by drawing a line
across connections, we make net cuts. The resulting set of net cuts is the net
cutset. The number of net cuts we make corresponds to the number of external
connections between the two partitions. When we divide the network graph
into the same partitions we make edge cuts and we create the edge cutset. We
have already shown that nets and graph edges are not equivalent when a net
has more than two terminals. Thus the number of edge cuts made when we
partition a graph into two is not necessarily equal to the number of net cuts in
the network. As we shall see presently the differences between nets and graph
edges is important when we consider partitioning a network by partitioning its
graph [ Schweikert and Kernighan, 1979].
15.7.2 A Simple Partitioning Example
Figure 15.7 (a) shows a simple network we need to partition [ Goto and
Matsud, 1986]. There are 12 logic cells, labeled AL, connected by 12 nets
(labeled 112). At this level, each logic cell is a large circuit block and might
be RAM, ROM, an ALU, and so on. Each net might also be a bus, but, for the
moment, we assume that each net is a single connection and all nets are
weighted equally. The goal is to partition our simple network into ASICs. Our
objectives are the following:
Figure 15.7 (b) shows a partitioning with five external connections; two of the
ASICs have three pins; the third has four pins.We might be able to find this
arrangement by hand, but for larger systems we need help.
(a)
(b)
(c)
2. Consider all the logic cells that are not yet in a partition. Select each
of these logic cells in turn.
Figure 15.8 illustrates some of the terms and definitions needed to describe the
KL algorithm. External edges cross between partitions; internal edges are
contained inside a partition. Consider a network with 2 m nodes (where m is an
integer) each of equal size. If we assign a cost to each edge of the network
graph, we can define a cost matrix C = c ij , where c ij = c ji and c ii = 0. If all
connections are equal in importance, the elements of the cost matrix are 1 or 0,
and in this special case we usually call the matrix the connectivity matrix.
Costs higher than 1 could represent the number of wires in a bus, multiple
connections to a single logic cell, or nets that we need to keep close for timing
reasons.
aA,bB
In Figure 15.8 (a) the cut weight is 4 (all the edges have weights of 1).
In order to simplify the measurement of the change in cut weight when we
interchange nodes, we need some more definitions. First, for any node a in
partition A , we define an external edge cost, which measures the connections
from node a to B ,
E a = c ay
yB
(15.14)
(15.15)
.(15.2)
So, in Figure 15.8 (a), I 1 = 0, and I 3 = 2. We define the edge costs for partition
B in a similar way (so E 8 = 2, and I 8 = 1). The cost difference is the difference
between external edge costs and internal edge costs,
Dx = E x I x .
(15.16)
Thus, in Figure 15.8 (a) D 1 = 1, D 3 = 2, and D 8 = 1. Now pick any node in
A , and any node in B . If we swap these nodes, a and b, we need to measure
the reduction in cut weight, which we call the gain, g . We can express g in
terms of the edge costs as follows:
g = D a + D b 2 c ab . (15.17)
The last term accounts for the fact that a and b may be connected. So, in
Figure 15.8 (a), if we swap nodes 1 and 6, then g = D 1 + D 6 2 c 16 = 1 + 1. If
we swap nodes 2 and 8, then g = D 2 + D 8 2 c 28 = 1 + 2 2.
The KL algorithm finds a group of node pairs to swap that increases the
gain even though swapping individual node pairs from that group might
decrease the gain. First we pretend to swap all of the nodes a pair at a time.
Pretend swaps are like studying chess games when you make a series of trial
moves in your head.
This is the algorithm:
1. Find two nodes, a i from A , and b i from B , so that the gain from
swapping them is a maximum. The gain is
g i = D ai + D bi 2 c aibi . (15.18)
2. Next pretend swap a i and b i even if the gain g i is zero or negative,
and do not consider a i and b i eligible for being swapped again.
3. Repeat steps 1 and 2 a total of m times until all the nodes of A and B
have been pretend swapped. We are back where we started, but we have
ordered pairs of nodes in A and B according to the gain from
interchanging those pairs.
4. Now we can choose which nodes we shall actually swap. Suppose
we only swap the first n pairs of nodes that we found in the preceding
process. In other words we swap nodes X = a 1 , a 2 ,, a n from A with
nodes Y = b 1 , b 2 ,, b n from B. The total gain would be
n
Gn =
gi.
(15.19)
i=1
5.
If the maximum value of G n > 0, then we swap the sets of nodes X and Y
and thus reduce the cut weight by G n . We use this new partitioning to start the
process again at the first step. If the maximum value of G n = 0, then we cannot
improve the current partitioning and we stop. We have found a locally
optimum solution.
at each step.
It minimizes the number of edges cut, not the number of nets cut.
It does not allow logic cells to be different sizes.
It is expensive in computation time.
It does not allow partitions to be unequal or find the optimum partition
size.
It does not allow for selected logic cells to be fixed in place.
The results are random.
It does not directly allow for more than two partitions.
we have to find the pair of nodes that give the largest gain when swapped. This
requires an amount of computer time that grows as n 2 log n for a graph with 2n
nodes. This n 2 dependency is a major problem for partitioning large networks.
The FiducciaMattheyses algorithm (the FM algorithm) is an extension to the
KL algorithm that addresses the differences between nets and edges and also
reduces the computational effort [ Fiduccia and Mattheyses, 1982]. The key
features of this algorithm are the following:
Only one logic cell, the base logic cell, moves at a time. In order to stop
the algorithm from moving all the logic cells to one large partition, the
base logic cell is chosen to maintain balance between partitions. The
balance is the ratio of total logic cell size in one partition to the total
logic cell size in the other. Altering the balance allows us to vary the sizes
of the partitions.
Critical nets are used to simplify the gain calculations. A net is a critical
net if it has an attached logic cell that, when swapped, changes the
number of nets cut. It is only necessary to recalculate the gains of logic
cells on critical nets that are attached to the base logic cell.
The logic cells that are free to move are stored in a doubly linked list. The
lists are sorted according to gain. This allows the logic cells with
maximum gain to be found quickly.
already partially completed. The FM algorithm allows you to fix logic cells
by removing them from consideration as the base logic cells you move.
Methods based on the KL algorithm find locally optimum solutions in a
random fashion. There are two reasons for this. The first reason is the random
starting partition. The second reason is that the choice of nodes to swap is
based on the gain. The choice between moves that have equal gain is arbitrary.
Extensions to the KL algorithm address both of these problems. Finding
nodes that are naturally grouped or clustered and assigning them to one of the
initial partitions improves the results of the KL algorithm. Although these are
constructive partitioning methods, they are covered here because they are
closely linked with the KL iterative improvement algorithm.
Floorplanning
Floorplanning Goals and Objectives
Every chip communicates with the outside world. Signals flow onto and off the
chip and we need to supply power. We need to consider the I/O and power
constraints early in the floorplanning process. A silicon chip or die (plural die,
dies, or dice) is mounted on a chip carrier inside a chip package .
Connections are made by bonding the chip pads to fingers on a metal lead
frame that is part of the package. The metal lead-frame fingers connect to the
package pins . A die consists of a logic core inside a pad ring . Figure 16.12
(a) shows a pad-limited die and Figure 16.12 (b) shows a core-limited die .
On a pad-limited die we use tall, thin pad-limited pads , which maximize the
number of pads we can fit around the outside of the chip. On a core-limited die
we use short, wide core-limited pads . Figure 16.12 (c) shows how we can use
both types of pad to change the aspect ratio of a die to be different from that of
the core.
FIGURE 16.12 Pad-limited and core-limited die. (a) A padlimited die. The number of pads determines the die size.
(b) A core-limited die: The core logic determines the die
size. (c) Using both pad-limited pads and core-limited pads
for a square die.
Special power pads are used for the positive supply, or VDD, power buses
(or power rails ) and the ground or negative supply, VSS or GND. Usually one
set of VDD/VSS pads supplies one power ring that runs around the pad ring
and supplies power to the I/O pads only. Another set of VDD/VSS pads
connects to a second power ring that supplies the logic core. We sometimes
call the I/O power dirty power since it has to supply large transient currents to
the output transistors. We keep dirty power separate to avoid injecting noise
into the internal-logic power (the clean power ). I/O pads also contain special
circuits to protect against electrostatic discharge ( ESD ). These circuits can
withstand very short high-voltage (several kilovolt) pulses that can be
generated during human or machine handling.
Depending on the type of package and how the foundry attaches the silicon
die to the chip cavity in the chip carrier, there may be an electrical connection
between the chip carrier and the die substrate. Usually the die is cemented in
the chip cavity with a conductive epoxy, making an electrical connection
between substrate and the package cavity in the chip carrier. If we make an
electrical connection between the substrate and a chip pad, or to a package pin,
it must be to VDD ( n -type substrate) or VSS ( p -type substrate). This
substrate connection (for the whole chip) employs a down bond (or drop
bond) to the carrier. We have several options:
We can dedicate one (or more) chip pad(s) to down bond to the chip
carrier.
We can make a connection from a chip pad to the lead frame and down
bond from the chip pad to the chip carrier.
We can make a connection from a chip pad to the lead frame and down
bond from the lead frame.
We can down bond from the lead frame without using a chip pad.
We can leave the substrate and/or chip carrier unconnected.
Depending on the package design, the type and positioning of down bonds
may be fixed. This means we need to fix the position of the chip pad for down
bonding using a pad seed .
A double bond connects two pads to one chip-carrier finger and one
package pin. We can do this to save package pins or reduce the series
inductance of bond wires (typically a few nanohenries) by parallel connection
of the pads. A multiple-signal pad or pad group is a set of pads. For example,
an oscillator pad usually comprises a set of two adjacent pads that we connect
to an external crystal. The oscillator circuit and the two signal pads form a
single logic cell. Another common example is a clock pad . Some foundries
allow a special form of corner pad (normal pads are edge pads ) that squeezes
two pads into the area at the corners of a chip using a special two-pad corner
cell , to help meet bond-wire angle design rules (see also Figure 16.13 b and
c).
To reduce the series resistive and inductive impedance of power supply
networks, it is normal to use multiple VDD and VSS pads. This is particularly
important with the simultaneously switching outputs ( SSOs ) that occur
when driving buses off-chip [ Wada, Eino, and Anami, 1990]. The output pads
can easily consume most of the power on a CMOS ASIC, because the load on
a pad (usually tens of picofarads) is much larger than typical on-chip capacitive
loads. Depending on the technology it may be necessary to provide dedicated
VDD and VSS pads for every few SSOs. Design rules set how many SSOs can
be used per VDD/VSS pad pair. These dedicated VDD/VSS pads must
follow groups of output pads as they are seeded or planned on the floorplan.
With some chip packages this can become difficult because design rules limit
the location of package pins that may be used for supplies (due to the differing
series inductance of each pin).
Using a pad mapping we translate the logical pad in a netlist to a physical
pad from a pad library . We might control pad seeding and mapping in the
floorplanner. The handling of I/O pads can become quite complex; there are
several nonobvious factors that must be considered when generating a pad ring:
Ideally we would only need to design library pad cells for one
orientation. For example, an edge pad for the south side of the chip, and
a corner pad for the southeast corner. We could then generate other
orientations by rotation and flipping (mirroring). Some ASIC vendors will
not allow rotation or mirroring of logic cells in the mask file. To avoid
these problems we may need to have separate horizontal, vertical, lefthanded, and right-handed pad cells in the library with appropriate logical
to physical pad mappings.
If we mix pad-limited and core-limited edge pads in the same pad ring,
this complicates the design of corner pads. Usually the two types of edge
pad cannot abut. In this case a corner pad also becomes a pad-format
changer , or hybrid corner pad .
In single-supply chips we have one VDD net and one VSS net, both global
power nets . It is also possible to use mixed power supplies (for
example, 3.3 V and 5 V) or multiple power supplies ( digital VDD, analog
VDD).
Figure 16.13 (a) and (b) are magnified views of the southeast corner of our
example chip and show the different types of I/O cells. Figure 16.13 (c) shows
a stagger-bond arrangement using two rows of I/O pads. In this case the
design rules for bond wires (the spacing and the angle at which the bond wires
leave the pads) become very important.
FIGURE 16.13 Bonding pads. (a) This chip uses both padlimited and core-limited pads. (b) A hybrid corner pad. (c) A
chip with stagger-bonded pads. (d) An area-bump bonded
chip (or flip-chip). The chip is turned upside down and
solder bumps connect the pads to the lead frame.
Figure 16.13 (d) shows an area-bump bonding arrangement (also known as
flip-chip, solder-bump or C4, terms coined by IBM who developed this
technology [ Masleid, 1991]) used, for example, with ball-grid array ( BGA )
packages. Even though the bonding pads are located in the center of the chip,
the I/O circuits are still often located at the edges of the chip because of
difficulties in power supply distribution and integrating I/O circuits together
with logic in the center of the die.
In an MGA the pad spacing and I/O-cell spacing is fixedeach pad
occupies a fixed pad slot (or pad site ). This means that the properties of the
pad I/O are also fixed but, if we need to, we can parallel adjacent output cells
to increase the drive. To increase flexibility further the I/O cells can use a
separation, the I/O-cell pitch , that is smaller than the pad pitch . For example,
three 4 mA driver cells can occupy two pad slots. Then we can use two 4 mA
output cells in parallel to drive one pad, forming an 8 mA output pad as shown
in Figure 16.14 . This arrangement also means the I/O pad cells can be changed
without changing the base array. This is useful as bonding techniques improve
and the pads can be moved closer together.
Figure 16.16 (a) shows a clock spine (not to be confused with a channel spine)
routing scheme with all clock pins driven directly from the clock driver. MGAs
and FPGAs often use this fish bone type of clock distribution scheme.
Figure 16.16 (b) shows a clock spine for a cell-based ASIC. Figure 16.16 (c)
shows the clock-driver cell, often part of a special clock-pad cell. Figure 16.16
(d) illustrates clock skew and clock latency . Since all clocked elements are
driven from one net with a clock spine, skew is caused by differing
interconnect lengths and loads. If the clock-driver delay is much larger than the
interconnect delays, a clock spine achieves minimum skew but with long
latency.
Placement
After completing a floorplan we can begin placement of the logic cells within
the flexible blocks. Placement is much more suited to automation than
floorplanning. Thus we shall need measurement techniques and algorithms.
After we complete floorplanning and placement, we can predict both intrablock
and interblock capacitances. This allows us to return to logic synthesis with
more accurate estimates of the capacitive loads that each logic cell must drive.
16.2.1 Placement Terms and Definitions
CBIC, MGA, and FPGA architectures all have rows of logic cells separated by
the interconnectthese are row-based ASICs . Figure 16.18 shows an
example of the interconnect structure for a CBIC. Interconnect runs in
horizontal and vertical directions in the channels and in the vertical direction
by crossing through the logic cells. Figure 16.18 (c) illustrates the fact that it is
possible to use over-the-cell routing ( OTC routing) in areas that are not
blocked. However, OTC routing is complicated by the fact that the logic cells
themselves may contain metal on the routing layers. We shall return to this
topic in Section 17.2.7, Multilevel Routing. Figure 16.19 shows the
interconnect structure of a two-level metal MGA.
FIGURE 16.19 Gate-array interconnect. (a) A small twolevel metal gate array (about 4.6 k-gate). (b) Routing in a
block. (c) Channel routing showing channel density and
channel capacity. The channel height on a gate array may
only be increased in increments of a row. If the interconnect
does not use up all of the channel, the rest of the space is
wasted. The interconnect in the channel runs in m1 in the
horizontal direction with m2 in the vertical direction.
There is no standard terminology for connectors and the terms can be very
confusing. There is a difference between connectors that are joined inside the
logic cell using a high-resistance material such as polysilicon and connectors
that are joined by low-resistance metal. The high-resistance kind are really two
separate alternative connectors (that cannot be used as a feedthrough),
whereas the low-resistance kind are electrically equivalent connectors. There
may be two or more connectors to a logic cell, which are not joined inside the
cell, and which must be joined by the router ( must-join connectors ).
There are also logically equivalent connectors (or functionally equivalent
connectors, sometimes also called just equivalent connectorswhich is very
confusing). The two inputs of a two-input NAND gate may be logically
equivalent connectors. The placement tool can swap these without altering the
logic (but the two inputs may have different delay properties, so it is not
always a good idea to swap them). There can also be logically equivalent
connector groups . For example, in an OAI22 (OR-AND-INVERT) gate there
are four inputs: A1, A2 are inputs to one OR gate (gate A), and B1, B2 are
inputs to the second OR gate (gate B). Then group A = (A1, A2) is logically
equivalent to group B = (B1, B2)if we swap one input (A1 or A2) from gate
A to gate B, we must swap the other input in the group (A2 or A1).
In the case of channeled gate arrays and FPGAs, the horizontal interconnect
areasthe channels, usually on m1have a fixed capacity (sometimes they
are called fixed-resource ASICs for this reason). The channel capacity of
CBICs and channelless MGAs can be expanded to hold as many interconnects
as are needed. Normally we choose, as an objective, to minimize the number of
interconnects that use each channel. In the vertical interconnect direction,
usually m2, FPGAs still have fixed resources. In contrast the placement tool
can always add vertical feedthroughs to a channeled MGA, channelless MGA,
or CBIC. These problems become less important as we move to three and more
levels of interconnect.
16.2.2 Placement Goals and Objectives
The goal of a placement tool is to arrange all the logic cells within the flexible
blocks on a chip. Ideally, the objectives of the placement step are to
Objectives such as these are difficult to define in a way that can be solved
with an algorithm and even harder to actually meet. Current placement tools
use more specific and achievable criteria. The most commonly used placement
objectives are one or more of the following:
Global Routing
The details of global routing differ slightly between cell-based ASICs, gate
arrays, and FPGAs, but the principles are the same in each case. A global
router does not make any connections, it just plans them. We typically global
route the whole chip (or large pieces if it is a large chip) before detail routing
the whole chip (or the pieces). There are two types of areas to global route:
inside the flexible blocks and between blocks (the Viterbi decoder, although a
cell-based ASIC, only involved the global routing of one large flexible block).
17.1.1 Goals and Objectives
The input to the global router is a floorplan that includes the locations of all the
fixed and flexible blocks; the placement information for flexible blocks; and
the locations of all the logic cells. The goal of global routing is to provide
complete instructions to the detailed router on where to route every net. The
objectives of global routing are one or more of the following:
Floorplanning and placement need a fast and easy way to estimate the
interconnect delay in order to evaluate each trial placement; often this is a
predefined look-up table. After placement, the logic cell positions are fixed and
the global router can afford to use better estimates of the interconnect delay. To
illustrate one method, we shall use the Elmore constant to estimate the
interconnect delay for the circuit shown in Figure 17.3 .
Rk4Ck
k=1
= R 14 C 1 + R 24 C 2 + R 34 C 3 + R 44 C 4 ,
where,
R 14 = R pd + R 1
(17.2)
(17.1)
R 24 = R pd + R 1
R 34 = R pd + R 1 + R 3
R 44 = R pd + R 1 + R 3 + R 4
In Eq. 17.2 notice that R 24 = R pd + R 1 (and not R pd + R 1 + R 2 ) because R 1
is the resistance to V 0 (ground) shared by node 2 and node 4.
Suppose we have the following parameters (from the generic 0.5 m
CMOS process, G5) for the layout shown in Figure 17.3 (b):
m2 resistance is 50 m /square.
m2 capacitance (for a minimum-width line) is 0.2 pFmm 1 .
4X inverter delay is 0.02 ns + 0.5 C L ns ( C L is in picofarads).
Delay is measured using 0.35/0.65 output trip points.
m2 minimum width is 3 = 0.9 m.
1X inverter input capacitance is 0.02 pF (a standard load).
R3
(1 mm) (50 10 3 )
= = 56
0.9 m
R4
(2 mm) (50 10 3 )
= = 112
0.9 m
(17.3)
C1
C2
C3
C4
=
=
=
=
=
=
=
=
0.02 pF
0.04 pF
0.2 pF
0.42 pF
(17.4)
=
=
=
=
500 + 6
500 + 6
500 + 6 + 56
500 + 6 + 56 + 112
=
=
=
=
506
506
562
674
(17.5)
Comparing Eqs. 17.6 17.8 , we can see that the delay of the inverter can
be assigned as follows: 20 ps (the intrinsic delay, 0.2 ns, due to the cell output
capacitance), 340 ps (due to the pull-down resistance and the output
capacitance), 4 ps (due to the interconnect from A to B), and 65 ps (due to the
interconnect from A to C). We can see that the error from neglecting
interconnect resistance can be important.
Even using the Elmore constant we still made the following assumptions in
estimating the path delays:
The global router could use more sophisticated estimates that remove some
of these assumptions, but there is a limit to the accuracy with which delay can
be estimated during global routing. For example, the global router does not
know how much of the routing is on which of the layers, or how many vias will
be used and of which type, or how wide the metal lines will be. It may be
possible to estimate how much interconnect will be horizontal and how much
is vertical. Unfortunately, this knowledge does not help much if horizontal
interconnect may be completed in either m1 or m3 and there is a large
difference in parasitic capacitance between m1 and m3, for example.
When the global router attempts to minimize interconnect delay, there is an
important difference between a path and a net. The path that minimizes the
delay between two terminals on a net is not necessarily the same as the path
that minimizes the total path length of the net. For example, to minimize the
path delay (using the Elmore constant as a measure) from the output of inverter
A in Figure 17.3 (a) to the input of inverter B requires a rather complicated
algorithm to construct the best path.
Detailed Routing
The global routing step determines the channels to be used for each
interconnect. Using this information the detailed router decides the exact
location and layers for each interconnect. Figure 17.9 (a) shows typical metal
rules. These rules determine the m1 routing pitch ( track pitch , track
spacing , or just pitch ). We can set the m1 pitch to one of three values:
1.
2.
3.
The same choices apply to the m2 and other metal layers if they are present.
Via-to-via spacing allows the router to place vias adjacent to each other. Viato-line spacing is hard to use in practice because it restricts the router to
nonadjacent vias. Using line-to-line spacing prevents the router from placing a
via at all without using jogs and is rarely used. Via-to-via spacing is the easiest
for a router to use and the most common. Using either via-to-line or via-to-via
spacing means that the routing pitch is larger than the minimum metal pitch.
Sometimes people draw a distinction between a cut and a via when they talk
about large connections such as shown in Figure 17.10 (a). We split or stitch a
large via into identically sized cuts (sometimes called a waffle via ). Because
of the profile of the metal in a contact and the way current flows into a contact,
often the total resistance of several small cuts is less than that of one large cut.
Using identically sized cuts also means the processing conditions during
contact etching, which may vary with the area and perimeter of a contact, are
the same for every cut on the chip.
In a stacked via the contact cuts all overlap in a layout plot and it is
impossible to tell just how many vias on which layers are present. Figure 17.10
(bf) show an alternative way to draw contacts and vias. Though this is not a
standard, using the diagonal box convention makes it possible to recognize
stacked vias and contacts on a layout (in any orientation). I shall use these
conventions when it is necessary.
FIGURE 17.9 The metal routing pitch. (a) An example of based metal design rules for m1 and via1 (m1/m2 via).
(b) Via-to-via pitch for adjacent vias. (c) Via-to-line (or lineto-via) pitch for nonadjacent vias. (d) Line-to-line pitch with
no vias.
each metal layer (for example, m1 for horizontal routing and m2 for vertical
routing), this may lead to problems in cases that have both horizontal and
vertical channels. In these cases we define a preferred metal layer in the
direction of the channel spine. In Figure 17.11 , because the logic cell
connectors are on m2, any vertical channel has to use vias at every logic cell
location. By changing the orientation of the metal directions in vertical
channels, we can avoid this, and instead we only need to place vias at the
intersection of horizontal and vertical channels.
the abutment and bounding boxes; enough layer information to be able to place
cells without violating design rules; and a blockage map the locations of
any metal inside the cell that blocks routing.
The goal of detailed routing is to complete all the connections between logic
cells. The most common objective is to minimize one or more of the following:
We start discussion of routing methods by simplifying the general channelrouting problem. The restricted channel-routing problem limits each net in a
channel to use only one horizontal segment. In other words the channel router
uses only one trunk for each net. This restriction has the effect of minimizing
the number of connections between the routing layers. This is equivalent to
minimizing the number of vias used by the channel router in a two-layer metal
technology. Minimizing the number of vias is an important objective in routing
a channel, but it is not always practical. Sometimes constraints will force a
channel router to use jogs or other methods to complete the routing (see
Section 17.2.5 ). Next, though, we shall study an algorithm that solves the
restricted channel-routing problem.
17.2.4 Left-Edge Algorithm
The left-edge algorithm ( LEA ) is the basis for several routing algorithms [
Hashimoto and Stevens, 1971]. The LEA applies to two-layer channel routing,
using one layer for the trunks and the other layer for the branches. For
example, m1 may be used in the horizontal direction and m2 in the vertical
direction. The LEA proceeds as follows:
1. Sort the nets according to the leftmost edges of the nets horizontal
segment.
2.
Assign the first net on the list to the first free track.
3.
Assign the next net on the list, which will fit, to the track.
4. Repeat this process from step 3 until no more nets will fit in the
current track.
5.
6.
Special Routing
The routing of nets that require special attention, clock and power nets for
example, is normally done before detailed routing of signal nets. The
architecture and structure of these nets is performed as part of floorplanning,
but the sizing and topology of these nets is finalized as part of the routing step.
17.3.1 Clock Routing
Gate arrays normally use a clock spine (a regular grid), eliminating the need
for special routing (see Section 16.1.6, Clock Planning). The clock
distribution grid is designed at the same time as the gate-array base to ensure a
minimum clock skew and minimum clock latencygiven power dissipation
and clock buffer area limitations. Cell-based ASICs may use either a clock
spine, a clock tree, or a hybrid approach. Figure 17.21 shows how a clock
router may minimize clock skew in a clock spine by making the path lengths,
and thus net delays, to every leaf node equalusing jogs in the interconnect
paths if necessary. More sophisticated clock routers perform clock-tree
synthesis (automatically choosing the depth and structure of the clock tree) and
clock-buffer insertion (equalizing the delay to the leaf nodes by balancing
interconnect delays and buffer delays).
circuit extraction, see Section 17.4 ) and the bus currents to the clock router.
The sizes of the clock buses depend on the current they must carry. The limits
are set by reliability issues to be discussed next.
Clock skew induced by hot-electron wearout was mentioned in
Section 16.1.6, Clock Planning. Another factor contributing to unpredictable
clock skew is changes in clock-buffer delays with variations in power-supply
voltage due to data-dependent activity. This activity-induced clock skew can
easily be larger than the skew achievable using a clock router. For example,
there is little point in using software capable of reducing clock skew to less
than 100 ps if, due to fluctuations in power-supply voltage when part of the
chip becomes active, the clock-network delays change by 200 ps.
The power buses supplying the buffers driving the clock spine carry direct
current ( unidirectional current or DC), but the clock spine itself carries
alternating current ( bidirectional current or AC). The difference between
electromigration failure rates due to AC and DC leads to different rules for
sizing clock buses. As we explained in Section 16.1.6, Clock Planning, the
fastest way to drive a large load in CMOS is to taper successive stages by
approximately e 3. This is not necessarily the smallest-area or lowest-power
approach, however [ Veendrick, 1984].
17.3.2 Power Routing
Each of the power buses has to be sized according to the current it will carry.
Too much current in a power bus can lead to a failure through a mechanism
known as electromigration [Young and Christou, 1994]. The required powerbus widths can be estimated automatically from library information, from a
separate power simulation tool, or by entering the power-bus widths to the
routing software by hand. Many routers use a default power-bus width so that
it is quite easy to complete routing of an ASIC without even knowing about
this problem.