Sei sulla pagina 1di 116

CAD FOR VLSI

Subject Code: 06EC754 IA Marks: 25


No. of Lecture Hrs/Week: 04 Exam Hours: 03
Total no. of Lecture Hrs: 52 Exam Marks: 100

PART - A
UNIT – 1&2

INTRODUCTION TO VLSI METHODOLOGIES: VLSI Physical Design Automation - Design and


Fabrication of VLSI Devices – Fabrication process and its impact on Physical Design. 13 Hours

Sherwani : Chapters 1, 2 & 3


Gerez : Chapter 1

UNIT – 3&4

A QUICK TOUR OF VLSI DESIGN AUTOMATION TOOLS: Data structures and Basic Algorithms,
Algorithmic Graph theory and computational complexity, Tractable and Intractable problems. 13 Hours

Sherwani : Chapter 4
Gerez : Chapters 2, 3 & 4

PART B
UNIT – 5&6

GENERAL PURPOSE METHODS FOR COMBINATIONAL OPTIMIZATION: partitioning, floor


planning and pin assignment, placement, routing. 12 Hours

Sherwani : Chapters 5, 6, 7 & 8


Gerez : Chapters 5, 7, 8 & 9
UNIT – 7&8

SIMULATION-LOGIC SYNTHESIS: Verification-High level synthesis - Compaction. Physical Design


Automation of FPGAs, MCMS-VHDLVerilog- Implementation of Simple circuits using VHDL and
Verilog. 14 Hours

Sherwani : Chapters 12, 13 & 14


Gerez : Chapters 10, 11 & 12

REFERENCE BOOKS:

1. “Algorithms for VLSI Physical Design Automation”, N. A. Shervani, 1999.


2. “Algorithms for VLSI Design Automation”, S. H. Gerez, 1998.

-1-
CAD FOR VLSI

CHAPTER - 1

VLSI PHYSICAL DESIGN AUTOMATION

Design domains: The VLSI design can be distinctly divided into three domains namely –
Behavioral, Structural and Physical. As the names imply, the behavioral domain’s
description encompasses of the functional equations, the structural domain comprises of
the building blocks and the physical domain consists of the actual parts that are realized
on the chip. These domains are better illustrated by means of the Gajski’s Y-chart,
which is shown as follows –

BEHAVIORAL STRUCTURAL

Systems

Algorithms Processors

Register transfers ALU, RAM etc.

Logic Flip-flops, Gates etc.

Transfer functions Transistors

Transistor layout

Cell layout

Module layout

Floor-plans

Physical partitions

PHYSICAL

Figure 1.1 Gajski’s Y-chart

-2-
1.1 VLSI DESIGN CYCLE

The design cycle for VLSI is represented by means of the following flowchart –

System specification Size, speed, power, functionality

Architectural design RISC/CISC, ALUs, FLPUs

Functional design Flowcharts, Algorithms

Logic design RTL/HDL, Logic expressions

Circuit design NAND/NOR, Flip-flops, Registers

Physical design Rectangles, squares

Fabrication Wafer, Die, Masks

Packaging & testing DIP, PGA, BGA, QFP, MCM

Figure 1.2 VLSI design cycle (simple)

System specification is the high level representation of the system. The specs are
represented in terms of the size of the chip, its speed, power consumption, and its
functionality. This decision mainly depends on the type of product and its market survey.

Architectural design comprises of the decision processes such as – RISC versus CISC,
the no. of ALUs and FLPUs required, size of cache memory and so on.

Functional design, also called as behavioral design, specifies the inputs and outputs in
terms of flow-charts or timing diagrams. However, the implementation of the hardware is
not specified here, as it happens in the next stage.

-3-
Logical design clearly specifies the arithmetic and logic operations of the design in terms
of HDLs such as Verilog or VHDL. This stage is also called as “RTL description”, as this
stage indicates the register allocations for the internal data. Here, the minimized Boolean
expressions and the timing information of the system are utilized for the simulation and
verification of the behavior. (e.g: Cadence tools - Encounter for RTL compilation and
Incisive for simulation)

Circuit design stage represents the Boolean expressions in terms of the gates and
transistors. At this stage, specific automation tools are used to verify the correctness of
the circuit diagram. (e.g.: Cadence tools - Virtuoso for schematic entry and Spectre for
circuit simulation). The representation of the interconnect is called as “netlist”.

Physical design converts the circuit and interconnect into the actual physical shapes on
the chip. This is called as “layout”, which is created by converting each component into
its geometric representation. (e.g.: Cadence tools - Virtuoso Layout Suite for physical
design and Assura for verification).

Fabrication is the process of realizing the design on the Silicon substrate. The layout
data required for fabrication is typically sent to the foundry on a tape, and hence this
process is called as “Tape Out”, in which the geometric data is available in a format
called as GDS-II (earlier called as Graphic Data System, now called as Generic Data
Stream). This data is utilized to produce photolithographic masks for each layer on the
substrate. The Silicon wafer is typically 20cm in diameter, which is subjected to several
cycles of fabrication steps, such as – deposition, diffusion, implantation, etching etc.

Packaging and testing is the process of separating the individual chips from the wafer
and housing each of them in a protective enclosure. A wafer may contain several hundred
chips, and each one has to be tested and then packaged. There are several approaches for
packaging such as – DIP, PGA, BGA and QFP. The choice of package depends on the
mounting of the chip of the PCB, which in turn depends on the particular product and
application. The MCMs will have bare chips mounted on the modules.

1.2 NEW TRENDS IN VLSI DESIGN CYCLE

The design flow described above is conceptually simple. But the new trends in the
industry require that this design flow needs to be altered. The new trends are –

1. Increasing interconnect delay: Even though devices are becoming smaller and
faster, it is not the same with the interconnect. Almost 60% of path delay is due to
interconnect delay. The solution to reduce this delay is to insert repeaters in long
wires. However, this consumes more area for the interconnect in turn.

-4-
2. Increasing interconnect area: Typically in a microprocessor’s die, the area
covered with devices is only 60% to 70%, and the remaining area is utilized to
accommodate the interconnect. The measure to tackle this problem is to use more
no. of metal layers for interconnect.

3. Increasing no. of metal layers: A three layer design has become very common
nowadays. But vias will occupy more space in the lower layers, as their number is
more in the lower layers. Hence, it becomes necessary to increase the no. of metal
layers for interconnect. Currently, four to five layers are utilized for
microprocessors, and the latest designs contain upto eight layers. However, the
limitation is the increased complexity of process.

4. Increasing planning requirements: The above mentioned factors require that the
design has to be carefully planned. And this planning should happen at the
functional design itself. The planning can be categorized as block planning and
signal planning.

Block planning assigns shapes and locations of the main functional blocks.
Signal planning assigns the routing of major buses and wires in 3-dimensions.

5. Synthesis: This process is the conversion of the design stage into construction.
Depending upon the level of design, the synthesis can be of two types namely -
Logic synthesis and High level synthesis.

Logic synthesis converts HDL description to schematics, and then to produce the
layout. The software tool designed for this purpose is utilized for complete
ASICs, which come under full-custom design. This tool is not applicable for large
regular blocks like RAM, ROM, PLA, ALU and µP chips. This is because, these
chips have large regular blocks, and the full-custom tool becomes very slow and
the design becomes area inefficient. Hence, semi-custom design methods are
adopted for such chips.

High level synthesis is one such approach, in which the design in the functional
description is directly converted into RTL, and then to layout. These tools are
called as “Silicon compilers”, which are capable of converting the functional
design into direct layout, due to the regularity of the design (e.g.: DSP
architectures). This becomes possible due to the definition of the sub-circuits in
terms of cells and macros.

Due to the above mentioned new trends in the design cycle, it becomes necessary to alter
the simple VLSI design cycle. This requires many iterations in the design process, as
indicated in the following figure –

-5-
System specifications

Architectural design

Functional design
Early
Physical design
Logic design Logic verify

Circuit design Circuit verify

Physical design Layout verification

Fabrication

Silicon debug

Packaging & testing

Figure 1.3 VLSI design cycle (modified)

1.3 PHYSICAL DESIGN CYCLE

The input to the physical design cycle is the complete circuit diagram with the netlist, and
the output of the physical design cycle is the completed layout. This happens in several
stages, as indicated by the following flowchart –

-6-
Circuit design

PHYSICAL DESIGN

Partitioning Blocks and Sub-blocks

Floorplanning and Estimating the shapes and areas of


Placement the blocks, Positioning the blocks

Routing Interconnects as per the netlist –


Global and Detailed

Compaction Compressing the layout

Extraction and Design Rule Check,


Verification Layout versus Schematic

Fabrication

Figure 1.4 Physical design cycle

Partitioning: A chip may contain several million transistors. Hence, it is not possible to
layout at the entire chip at one step. Therefore, to reduce the complexity of the physical
design, the chip is divided into several sub-circuits, called as blocks. The partitioning
process is hierarchical, in which the chip may have 5 to 25 blocks at the top most level,
and each block is recursively partitioned into smaller blocks. The factors considered for
partitioning are – size of blocks, no. of blocks, and no. of interconnects between blocks.

Floorplanning & Placement: Floorplanning is the process of estimation of the areas of


each block, as well as the interconnect area. In addition, shapes of the blocks and the
locations of specific components on the chip are also considered. During placement, the

-7-
blocks are exactly positioned on the chip such that minimum area arrangement is done for
the placement of the blocks. At the same time, enough interconnect area is also provided
in between the blocks so that the placement becomes routable.

Routing: Here, the interconnects between the blocks are completed according to the
specified netlist. The routing space is partitioned into “channels” and “switchboxes”.
There are two phases of routing namely – GLOBAL and DETAILED. Global routing
specifies the regions through which a wire should be routed. Detailed routing completes
the point-to-point connections. In other words, global routing finds those channels and
switchboxes through which the wire has to be connected. Detailed routing specifies the
geometric details of the wire such as – location, spacing, layer assignments etc.

Compaction: This is the task through which the total area is reduced, by compressing the
layout in all the directions. The compressing process should ensure that the design rules
are not violated. Hence, the compaction process requires sufficient computing time,
which increases the cost. Therefore, extensive compaction is used only for large volume
applications, such as µPs.

Extraction and Verificaton: There are two main processes called as DRC (Design Rule
Check) and LVS (Layout Versus Schematic). During DRC, the design rules for the layout
are verified, such as device dimensions and wire separations. During LVS, the circuit is
generated from the layout. This is a reverse engineering process, in which the extracted
circuit is compared with the schematic. In addition, there is another process in which the
values of the parasitic R and C, in the layout as well as in the interconnect, are extracted.
This is performed for Performance verification and Reliability verification.

1.4 NEW TRENDS IN PHYSICAL DESIGN CYCLE

1. Chip level signal planning: This includes routing of major signals and buses.
Global signals must be routed in the top metal layers.

2. OTC routing: Apart from the channel and switchbox routing approaches, the
“Over-The-Cell” routing is used to reduce the area and to improve the
performance. In this approach, the pins are not brought to the block boundaries,
bur are brought to the top of the block, as a sea-of-pins.

1.5 DESIGN STYLES

The chip design can be broadly classified into two major styles namely “full-custom” and
“semi-custom”. Full-custom design starts the circuit design from scratch whereas semi-
custom design uses pre-designed parts of a circuit. The differences between these two
styles are tabulated as follows –

-8-
No. Criteria Full-custom Semi-custom
1 Design All the blocks are designed from Some blocks are pre-designed
scratch
2 Placement Any block can be placed anywhere The pre-designed blocks have to be
placed in specific locations
3 Time-to- More Less
market
4 Area Highly optimized and compact Moderately optimized
5 Cost Lesser, if produced in large Lesser, if utilized in smaller volumes
volumes
6 Utility High performing and less costing Moderately performing and
chips (e.g.: µPs) moderately costing chips (e.g.: ASICs)

The semi-custom style contains additional options, as illustrated below –

CHIP DESIGN

Full-custom Semi-custom

Standard cells
Gate arrays
Field Programmable Gate Arrays
Sea-of-gates

Standard cells: A cell is a predefined sub-circuit. Each cell is tested, analyzed, and then
specified inside a cell library. There can be 500-1200 cells in the cell library, each one
being predefined. In the standard cell approach, all the blocks in the chip are cells. Cells
are placed in rows, and the routing happens in the space between two rows, which is
called as a “channel”. However, if a connection should happen between non-adjacent
rows, then empty space has to be provided in between the cells in a row. This space is
called as “feed-through”.

In case of a two-metal process, routing can be done over-the-cell as well (OTC routing),
by means of the second metal layer; in that case, the first metal layer is utilized for
routing within the cell. When three or more metal layers are provided, all the channels
and feed-throughs can be removed, and the routing can be completely made as OTC.
Even though the standard cell design is fast enough, the disadvantage is that its layout is
non-hierarchical; hence, it is suited only for moderate sized circuits. The following figure
illustrates the standard cell structure –

-9-
Figure 1.5 Standard cell structure

Gate Arrays: In case of Gate Arrays, all the cells are identical. The cells or blocks are
placed like an array; hence there exist both horizontal as well as vertical channels.
Depending upon the circuit requirement, the no. of partitioned blocks can be less than or
equal to the total no. of cells on the chip; i.e., all the gates in the array can be utilized, or
some of them can be left out. An example is as shown –

-10-
Figure 1.6 A conceptual gate array

The interconnects are done using horizontal as well as vertical channels. This is done
during fabrication, where the routing layers are fabricated on top of the wafer. In other
words, the gate-array is prefabricated without routing, and it is later customized for the
requirements, during fabrication.

The prefabricated Gate Array is called as “uncommitted gate array”, and the completed
wafer is called as “customized wafer”. As the area is fixed, the no. of tracks in each
channel is also fixed. When more tracks are required, an additional metal layer has to be
utilized. Due to the prefabrication method, the Gate Arrays are cheaper than full-custom
or standard cell. However, the disadvantage is that Gate Arrays are not suitable for
complex circuits. Gate Arrays are also non-hierarchical, just like standard cells.

-11-
Field Programmable Gate Arrays: In FPGAs, both the cells as well as the interconnect
are prefabricated. The blocks are programmable, and they are horizontally arranged. The
routing network is also programmable, which is made up of horizontal as well as vertical
tracks. The cells of FPGA are more complex than the standard cells. However, most of
the cells will have the same layout.

The logic blocks are programmed for the desired outputs, by means of loading different
Look-Up-Tables. Thus, for a (k-bit input, 1-bit output) combinational logic function, 2k
bits are required. Hence, the logic blocks have to use lesser value of k, in order to be
easily programmable. The usual value of k is 5 or 6.

There are two types of FPGAs namely: once-programmable and re-programmable. The
former ones use anti-fuses and cross-fuses. The latter ones use pass-transistor logic. In
the former approach, all of the interconnects are present by means of horizontal and
vertical lines, and the connections are made by the usage of anti-fuses.

The diagram of a committed FPGA using the anti-fuses is as shown below –

Figure 1.7 A committed FPGA

-12-
An anti-fuse is a special type of polymer, which has high resistance initially; it will
become a low-resistance path when high voltage is applied. Using such fuses, the
interconnect can be programmed; the fuse is called as cross-fuse when it is used to
connect the horizontal routes with the vertical ones. The following figure illustrates this
fact, in which anti-fuses are represented using plain squares and the blown anti-fuses are
represented using dark squares. Similarly, the cross-fuses are represented using plain
circles and the blown cross-fuses are represented using dark circles.

However, the FPGA that is realized using anti-fuses is programmable only once, as the
characteristics of the blown anti-fuse cannot be reestablished again. Therefore, to realize
the FPGA in the reprogrammable fashion, pass-transistor logic is utilized, in which the
pass-transistors can be used as switches for realizing the logic functions.

Sea-Of-Gates: In this approach, the master is completely filled with transistors, and there
will be no separate channels provided for routing. Therefore, channels are formed by
routing over the unused transistors. Hence, this design style has the highest density of
devices, with much better optimization of the area. However, as the master contains a
large no. of transistors, many transistors may remain unused throughout their lifetime.

COMPARISON OF DESIGN STYLES

Criteria Full-custom Standard cell Gate array FPGA


Cell size Variable Fixed Fixed Fixed
Cell type Variable Variable Fixed Programmable
Cell placement Variable In rows Fixed Fixed
Interconnects Variable Variable Variable Programmable
Design cost High Medium Medium Low
Area Compact Moderate Moderate Large
Performance High Moderate Moderate Low
Fabrication All layers All layers Only the No layers
routing layers

1.6 SYSTEM PACKAGING STYLES

Any big system can be envisioned as the interconnections between different Printed
Circuit Boards (PCBs), and the PCBs themselves can be envisioned as the
interconnections between different Integrated Circuits. The IC is the packaged version of
the die, on which the chip and its corresponding circuit is realized. Hence, as far as the
system packaging is concerned, there are two factors namely – die packaging, and then
system packaging.

The constraints for the packaging approaches are – Cost, Performance and Area. At the
same time, the issues that concern the packaging styles are – Effective heat removal, and
the adequate provision for Testing and Repair.

-13-
THE DIE PACKAGING APPROACHES can be classified as follows –

IC Packages

Thru-hole mount Surface Mount Assembly Naked die

DIP, Wire bond,


PGA C4
With leads Leadless

SOP, QFP BGA

In all of these packages, the chip is placed inside a plastic or ceramic casing, and the
connections are made from the chip to the outside world by means of the bonding wires.
The IO pads from the die get connected to the Copper legs that are outside the package,
by means of the platinum wires. The pitch inside the chip can be as less as 0.152mm, and
that outside the chip is 0.635mm or more.

The DIP (Dual In-line Package) contains legs at both sides of the package, with 2.54 mm
pitch (1/10”) in between the legs. This is as shown in the figure below –

Figure 1.8 Chip placement on a PCB

-14-
The IC gets connected to the PCB, by mounting the it on through-holes, and by soldering
the legs to the Copper pads at the other side of the PCB. If the PCB is double-sided
and/or multi-layered, then plated-through-holes are utilized.

To avoid the soldering, and to make the replacement of the IC easier, Pin Grid Array
(PGA) is utilized, in which the IO pads are connected by means of concentric rectangular
rows. This package is directly mounted on a ZIF (Zero Insertion Force) socket, and hence
there is no soldering required.

However, both DIP and PGA require thru-holes for the mounting purpose. Hence,
Surface Mount Assembly (SMA) is used, in which direct mounting of the chip on the
PCB is possible, without the usage of through-holes. SMAs reduce the package footprint,
and they can have 1.27mm, 1.00mm or 0.635mm pitch. These packages can have the
external connections either on both sides (SOP – Small Outline Pack), or on all the four
sides (QFP – Quadrature Flat Pack), or as contact balls at the bottom of the package
(BGA – Ball Grid Array). These packaging styles have the decreasing order of area and
increasing order of mounting complexity.

All of the above mentioned packages will impose delay during the functioning of the
chip, because of the wires being attached to the pads. Hence, to reduce the delay, “naked
dies” are utilized, in which the dies are directly mounted on the board. This approach is
sometimes also called as COB (Chip On Board).

In case of naked dies, there is no ceramic or plastic casing for the die, and hence the area
on the board gets minimized. The naked dies can be of two types namely ‘wire bond’ and
‘controlled collapsed chip connection’ (C4). In case of the wire bond, the IO pads are
connected to the PCB, by means of the bonding wires. In case of C4, the solder ball is
directly placed on each pad, and the die is mounted upside down on the board.

THE SYSTEM CONNECTION APPROACHES can be categorized as PCB (Printed


Circuit Board), MCM (Multi-Chip Module) and WSI (Wafer Scale Integration).

PCB: Most of the systems are built using PCBs only, due to the ease of assembly and
maintenance. A PCB can have as many as 30 layers, or more. Layer to layer connections
are made possible by means of via, which runs in between the two surfaces. When many
layers are connected by means of a single via, then it is called as “stacked via”.

MCM: Here, many chips are mounted and interconnected onto a fine pitch substrate, and
the substrate in turn is mounted on the PCB. Thus, MCMs provide higher density
assemblies and faster interconnects. The dies on the substrate can be wire-bonded, or can
use C4 bonding. The wire-bonded example is shown below –

-15-
Figure 1.9 An MCM with wire-bonded dies

The advantages of MCM are – reduction in size, reduced no. of packaging levels, reduced
amount of interconnect, and cheaper assemblies. However, the disadvantage is that more
heat is being dissipated on the substrate, which has to be effectively removed (e.g.: RF
amplifiers and VCOs in cell-phones).

WSI: In this case, several types of circuits are fabricated on the entire wafer, and then the
defect-free circuits are interconnected. Thus, the entire system is realized on the wafer.
Hence, the advantages are – greatly reduced cost, high performance, and greatly
increased reliability. However, the disadvantages are – lesser yield, and the inability to
mix different fabrication processes.

COMPARISON OF DIFFERENT PACKAGING STYLES

Criteria PCB MCM WSI


Figure of merit 2.2 14.6 28.0
Density Medium High Highest
Yield Best Moderate Low
Testing & repair Easiest Easy Difficult
Heat removal Superior Moderate Poor

Note: Figure of merit is given by the product of Propagation speed (inches/ps) and the
Interconnect density (inches/sq.in)

-16-
1.7 HISTORICAL PERSPECTIVES

In the earliest days of chip design, the preparation of masks used to be a completely
manual process. Initially the layout was drawn on paper and this drawing was then
transferred to a rubylith plastic sheet. The photolithographic masks were produced by the
reduction of this particular drawing. As is evident, this type of design is impossible in
VLSI, with million transistors on chip. With the advent of computers, the VLSI design is
automated in every stage, right from design entry till tape out. Design entry for digital
designs is performed by means of HDL description, and that for analog designs is
performed directly by schematic description. As there are no standard cells available for
the analog designs, the schematic has to be directly drawn by the designer.

Once the design entry is complete, it must be possible to verify its functioning. This is
made possible by means of the simulation tools, which are available for both digital as
well as analog domains. After the functional verification is the synthesis part, in which
the schematic becomes finalized, and its physical design needs to be carried out. The
following table depicts the history of chip design from the earlier times till date –

Year Design tools


1950-65 Manual design using Rubylith plastic
1965-75 Layout editors, automatic routers
1975-85 Automatic placement tools
1985-90 Performance driven placement and routing tools
1990-95 OTC routing tools, synthesis tools
1995 onwards Interconnect design and modeling tools, Process related tools

Characteristics of a good design tool: As stated earlier, a good design tool must provide
the following options to the designer: a) Schematic capture and layout, b) Functional
verification by means of simulation, b) Physical verification (DRC, LVS)

The design entry tools that were developed initially were textual, and the present tools are
graphical. Some of the textual entry tools were – BELLE (Basic Embedded Layout
Language) and ABCD (A Better Circuit Description). As far as the graphical entry tools
are concerned, the designer has many options nowadays. The ones which were developed
earlier were – L-Edit and MAGIC.

L-Edit supports files, cells, instances and mask primitives. It uses SLY (Stack Layout
Format); however, portability of designs is possible by converting into CIF (Caltech
Intermediate Format). (SLY and CIF are low level graphics languages, that are used for
specifying the geometry). MAGIC is based on Mead & Conway design style. It allows
automatic routing, compaction and circuit extraction. DRC is done as an event based
checking, and not as a lengthy post-layout operation. MAGIC permits only Manhattan
designs. (Manhattan approach allows only rectilinear routing, in which horizontal and
vertical paths are traced, and other directions are not supported).

-17-
CHAPTER – 2

DESIGN AND FABRICATION OF VLSI DEVICES

MOSFET was discovered by J. Lilienfeld and O. Heil in 1925. BJT was discovered by
William Shockley, John Bardeen and Walter Brattain in 1947. Even though MOSFET
was discovered much before BJT, its usage did not commence at the initial stage, because
of the material problems. This led to the discovery of BJT at a later stage.

As BJT processing was quite successful, the MOSFET’s usage became lesser. However,
MOSFET became popular after 1971, when nMOS technology was developed. The
discovery of CMOS had an enormous impact on the integration processes, due to the
advantages of reduced area and reduced power consumption. The following figures
illustrate the construction as well as the working principle of BJT of NPN type –

Figure 2.1 TTL transistor

Similarly, the following figures depict the working of n-channel E-MOSFET –

Figure 2.2 n-channel MOSFET

-18-
The characteristics of CMOS and MOS technologies can be compared, as given below –

CMOS MOS (nMOS / pMOS)


Static power dissipation is zero. Power is dissipated when output is at ‘0’.
Requires 2N devices. Requires (N+1) devices.
Non-ratioed logic. Ratioed logic.
Layout styles are regular. Layout styles are irregular.
Process is complicated and expensive. Due to only one MOS type, process is less
complicated.

FABRICATION OF VLSI CIRCUITS

The Silicon wafers are sliced from the ingot that is obtained by the Czochralski process.
The wafers are typically of 200 mm diameter, with 400µm thickness. For the realization
of the circuit on the chip, the wafers are subjected to three basic steps namely – Create,
Define and Etch – which are graphically depicted as follows –

Silicon wafers

Material formation by deposition,


diffusion or implantation

Pattern definition by photolithography

Removal of unwanted parts of the


pattern by etching

8 to 10 iterations to testing or packaging

Figure 2.3 Basic fabrication steps

-19-
As is evident from the figure, the Create step comprises of deposition, diffusion or
implantation. The Define step consists of the pattern definition on the chip, by means of
Photolithography. The Etch step is involved in the removal of the unnecessary materials
from the surface of the substrate. For large circuits, this cycle may be performed for upto
200 steps iteratively.

nMOS fabrication process: The steps involved for the fabrication of nMOS structures
can be summarized as follows –

ix) Growth of oxide layer on lightly doped p-type substrate


x) Etching of oxide, to expose the active regions on the wafer
xi) Polysilicon deposition for the gates
xii) Phosphorous diffusion, to form sources and drains
xiii) Field oxide growth
xiv) Etching of oxide, for the contacts
xv) Metallization
xvi) Ion-implantation for D-MOSFETs
xvii) Passivation layer growth

Note: In case of D-MOSFETs, buried contact is utilized for the connection of gate to
source; this buried contact is in between poly and diffusion.

CMOS fabrication process: The CMOS circuits can be realized by the utilization of P-
well, N-well and Twin-well. The following figure shows a p-well CMOS inverter –

Figure 2.4 A p-well CMOS inverter

The complete fabrication process consists of eleven important processes, which are
described as follows –

-20-
1. Crystal growth and wafer preparation: During the Czochralski process, single
crystal ingots are pulled from the molten Silicon. These ingots are ground to a
cylindrical shape. Later, the Silicon slices are sawed from the ingot. These slices
are used as wafers, for the fabrication of VLSI circuits.

2. Epitaxy: (Epi=upon, taxis=ordered) It is the process of deposition of a thin single-


crystal layer, on the surface of the substrate. This is a CVD (Chemical Vapor
Deposition) process, taking place in between 900°C to 1250°C. This is easier than
diffusion or ion-implantation, and hence is preferred for such crystal growth.

3. Dielectric deposition: This is also a CVD process, utilized to form the gate-oxide
layer. Here, Silane is oxidized at 400°C and the chemical reaction is as follows –

SiH4 + O2 → SiO2 + 2H2

4. Oxidation: The oxide layer formed on the substrate has the following functions –

- It serves as a mask against implant or diffusion.


- It provides surface passivation for the complete chip.
- It isolates the devices from each other.
- It acts as gate in MOS structures.
- It isolates the MLM systems from each other.

5. Diffusion: This is the process during which the impurity atoms move into the
crystal lattice. The dopants that are used for the formation of sources and drains
are – Arsine (AsH3), Phosphine (PH3), Diborane (B2H6) etc. The first two are used
for nMOS and the last one is used for pMOS. The dopant atoms can be introduced
in one of the following ways:

- From a chemical source in vapor form.


- From doped oxide source.
- By annealing from an ion-implanted layer.

6. Ion implantation: During this process, the ionized dopant atoms are provided with
high energy, to penetrate beyond the surface. It is a low temperature process, but
with 3 keV to 500 keV energy, for doping Boron, Phosphorous or Arsenic ions.
This is sufficient to implant the ions from about 100Å to 10,000Å below the
surface. Compared to diffusion, precise control is possible in this process.

7. Lithography: (Lith = foundation, graph = writing) This is the process, during


which the required geometric pattern is realized on the surface of the substrate.
Initially, the wafer is spin-coated with a Photoresist. Later, the mask is placed
very close to the wafer and the wafer is then exposed to radiation (UV, X-ray).
After exposure, the unwanted areas are removed by means of etching.

-21-
PhotoResists are the polymeric materials that are sensitive to radiation. The
requisite properties of PR are as follows –

a) Mechanical - Flow characteristics, thickness


b) Chemical - Adhesion, thermal stability
c) Optical - Photosensitivity, resolution

After the exposure, the wafer is first soaked in a solution that develops the images
in the PR. During this process, the PR film at the unwanted areas is removed.
Later, the oxide layer in those areas is removed, by means of etching.

8. Metallization: For the purpose of interconnect, molten metal is sprayed on the


surface of the substrate, via a nozzle. Aluminium was utilized earlier, and now
Copper is preferred, due to its reduced electro-migration properties.

9. Etching: This is the process of selective removal of unmasked portions of a layer.


There are basically two types of etching namely – Dry etching and Wet etching.

Dry etching is accomplished by means of Plasma or RIE (Reactive Ion Etching),


whereas Wet etching is performed by using the chemical solvents. Wet etching is
isotropic; in the sense, the etching process removes larger areas underneath than
the areas at the surface. This happens due to the creeping of the solvent
underneath. Hence, dry etching methods are preferred, which are anisotropic.

10. Planarization: This process accomplishes a smooth surface of the wafer, after each
metallization step. This can be performed for any no. of metal layers, and the
common method used is CMP (Chemical Mechanical Polishing). The Contact and
Via layers are filled with Tungsten plugs, after which, CMP is performed.

11. Packaging: Finally, the wafers are cut into chips and the good chips are packaged
in a small plastic or ceramic case. Pads are connected to the legs by means of tiny
gold or platinum wires, with the help of a microscope. The case is then sealed and
the finished package is tested.

DESIGN RULES

The objectives of the design rules are as follows –

i) to prevent unreliable layouts.


ii) to preserve the integrity of features.
iii) to ensure the connection of thin features.
iv) to avoid the slipping of contact-cuts.
v) to guarantee an acceptable yield.

-22-
Based on the approaches, there are two types of design rules, as follows –

a) Micron-based rules: Here, all the dimensions are expressed in microns, and hence
the layout will have minimum area. But the disadvantage is that during process
shifting from previous technology to the lesser dimensional one, the entire layout
may have to be reworked.
b) λ-based rules: Here, all the dimensions are defined by a parameter “λ”, where “λ”
is the resolution of the process (i.e., minimum feature size). In this case, the
design is simpler, due to the standard parameter. However, the disadvantage is
that, minimum area may not be achievable.

Based on the λ-based design, the basic nMOS rules are as follows –

Diffusion region width 2λ


Poly-silicon region width 2λ
Diffusion-diffusion spacing 3λ
Poly-poly spacing 2λ
Poly-silicon gate extension 2λ
Contact extension 1λ
Metal width 3λ

Based on the prerequisites, the design rules can be further classified as follows –

i) Size rules: These rules specify the minimum features sizes on different layers. In
addition, as interconnects run over a rough surface, they should have a larger size than
that of the devices.

ii) Separation rules: These rules specify the minimum distance between different
features. For good interconnect density, the separation rule is defined similar to the size
rule. The following figure illustrates the size and separation rules together –

Figure 2.5 Size and separation rules

-23-
iii) Overlap rules: These rules specify the overlap rules for the different layers. To avoid
the fatal errors, the overlaps must be distinctly defined for devices, vias and contact cuts.
The following figures demonstrate the overlap rules as applicable to the transistor
formation, as well as for the contacts cuts –

Figure 2.6 Transistor formation

Bottom view Top view

Figure 2.7 Contact cut dimensions

Note: Only nMOS design rules are discussed here. CMOS will have more complicated
design rules, as additional rules are needed for the tubs and the pMOS devices.

-24-
Apart from the above mentioned rules, there are many other rules which cannot be scaled
in “λ”, due to different process constraints. Some of them are as mentioned below –

i) The size of the bonding pads (to be determined by the dia of the bonding wire).
ii) The size of cut in overglass, for the contacts with the pads.
iii) The scribe line width (This line exists in between two chips).
iv) The feature distance from the scribe line, to avoid damage during scribing.
v) The feature distance from the bonding pad, to avoid damage during bonding.
vi) The bonding pitch (to be determined by the accuracy of the bonding machine).

LAYOUT OF BASIC DEVICES

As already discussed, the layout is the process of translating the schematic symbols
into their physical representations. Below given are some of the example layouts for
inverters, NAND gates and NOR gates –

Figure 2.8 nMOS inverter

-25-
Figure 2.9 CMOS inverter

Figure 2.10 nMOS NAND gate

-26-
Figure 2.11 CMOS NAND gate

Figure 2.12 nMOS NOR gate

-27-
Figure 2.13 CMOS NOR gate

MEMORY CELLS

Memory cells are utilized to store and retrieve 1-bit of memory. Each memory cell
will have a line called “word line”, also called as SELECT, which is used for
accessing the particular cell. The memory cell will also have another line called “bit
line”, also called as DATA, which is used for storage and/or retrieval of the bit. This
architecture is illustrated in the figure below –

Figure 2.14 A generic RAM cell

-28-
SRAM: This is built using two cross-coupled inverters to form a latch. The memory cell
is accessed using two pass transistors, which are connected to data bit and its complement
respectively. When SELECT is high, the bit can be either stored or retrieved. When
SELECT is low, the pass transistors are off, and the status corresponds to a hold state.

SELECT SELECT

BIT BIT

Figure 2.15 A CMOS SRAM cell

As the n-MOSFETS pass good ‘0’s and poor ‘1’s, it would be better that transmission
gates are used. It would also be sufficient to use only one transmission gate, instead of
two. But transmission gate will consume larger area, due to the difference in nMOS and
pMOS layouts. Hence, the scheme followed to store the bit is to use the bit as well as its
complement, so that one of them will contain ‘0’, and the data will be latched from either
side. The detailed circuit diagram of the SRAM cell is as shown below –

Figure 2.16 The detailed diagram of SRAM cell

-29-
The switching time of the SRAM cell is determined by the output capacitance and the
feedback network. The cell directly follows the CMOS inverter characteristics, and the
charging and discharging time constants are given by –

Where CL is the load capacitance, βn & βp are the transconductance parameters of the n
and p transistors respectively. (i.e., βn = µnCoxW/L, and βp = µpCoxW/L). As a general
rule, βn = 2βp, and if the dimensions of n and p transistors are same, then –

τch = 2τdis

DRAM: This is built using one transistor, one storage capacitor, and the input
capacitance at the bit line. The construction of the cell is as shown –

Figure 2.17 A CMOS DRAM cell

When both SELECT line and BIT line are asserted, the storage capacitor gets charged
to the bit line voltage, and this corresponds to the write operation. When only the
SELECT line is asserted, then the storage capacitor discharges through the transistor,
and this corresponds to the read operation.

-30-
Write operation: When SELECT is set high, and the Bit Line is at Vmax, the storage
capacitor gets charged according to the formula,

where

Similarly, when SELECT is set high, and Bit Line is at 0V, Cstore decays and hence,

where

Now, tLH = 90% of Vmax and tHL = 10% of Vmax. Therefore, from equation 1,

0.9Vmax = Vmax [(t/τch) / (1+t/τch)]


Or, 0.9 + 0.9 (t/τch) = (t/τch)
Or, 0.9 = 0.1 (t/τch)

i.e., tLH = 9τch 3

Similarly, from equation 2,

0.1Vmax = Vmax [(2e-t/τdis) / (1+e-t/τdis)]

Or, 0.1 + 0.1e-t/τdis = 2e-t/τdis

Or, e-t/τdis = 19

i.e., tHL = 2.94τdis


4

Now as τch = 2 τdis, from equation 3 we have, tLH = 18τdis

Therefore, tLH = 6.12 tHL

Thus, it takes longer to load a logic ‘1’ than to load a logic ‘0’, to the DRAM cell.

-31-
Read operation: Let Vc be the initial voltage at Cstore, and let Vpre be the initial voltage at
Cline. Then the total system charge, QT = Vc Cstore + Vpre Cline.

When SELECT line goes high, the transistor conducts and the capacitors are in parallel.
Hence the equilibrated voltage is given by,

Let “r” be defined as the capacitance ratio such that, r = Cline/Cstore. Then,

Vf = [Vc (Cline/r) + Vpre Cline] / [(Cline/r) + Cline]

2
Or,

If logic ‘1’ is stored in the cell, then Vc = Vmax.

Therefore,

Similarly, if logic ‘0’ is stored in the cell, then Vc = 0.

4
Therefore,

The difference between a logic ‘1’ and a logic ‘0’ is V1 – V0, which is given by,

This equation clearly indicates that small value of “r” leads to large value of ∆V, and
hence results in satisfactory operation of the DRAM cell.

-32-
CHAPTER – 3

FABRICATION PROCESS & ITS IMPACT ON PHYSICAL DESIGN

The biggest driving force behind the digital revolution is the ability to continuously
miniaturize the transistor. Lesser the size of the device, faster will be its operation and
denser will be the chip. As the size of the device gets reduced, more circuitry can be
packed on the chip and more functionality can be implemented. This in turn has the
advantage of making the electronic products portable, cheaper and user-friendly.

On the other hand, if the fab has to be on the profitable side, then the chips must be
manufactured in high volumes. Hence the semiconductor manufacturers use those
processes and dimensions which will allow them to have better yields. When the process
matures and the yield increases, the layout can be shrunk to reduce the dimensions. In
other words, the yield of the process has direct impact on the shrinking of the device size.
Thus, the device size cannot be reduced unless the fab process is matured.

To summarize this discussion, as the transistor size reduces, it becomes faster. As the
area of the device gets reduced, it consumes less power. And, as circuitry gets increased,
the cost of each transistor gets reduced. This in turn makes the products cheaper.

3.1 SCALING

This is the process of shrinking the layout, in which every dimension is reduced by a
particular factor. When a chip is scaled, it leads to smaller die size and increased yield.
For example, if a chip’s dimension is (x X x) in the previous process, and if the shrink
factor is 0.7 in the next process, then the area = 0.7x X 0.7x = 0.49x. Thus by the by the
shrink factor of 0.7, the total area gets reduced by about 50%.

But the scaling procedure is not this easy, because of the following issues –

1. As transistors are scaled, the other characteristics – such as delay, leakage current and
threshold voltage – do not get scaled uniformly. In other words, the shrink factor does
not scale the device characteristics by the same amount.
2. Interconnect delay becomes more dominant with scaling, and 50 – 70% of the
overall delay is contributed by the interconnect. This happens due the reduced size of
the interconnect wires, which in turn reduces the speed of the circuitry.

As scaling directly affects the physical design, and as interconnect planning becomes
more prominent, it becomes important now to consider the methods in which scaling is
going to be performed. The scaling factor is designated as “S” and its value is specified
as (0 < 1/S < 1). This means that S = 1 implies no scaling, and S > 1 implies reduction in
size. The condition of S < 1 is not applicable here, as is obvious.

-33-
There are mainly two types of scaling that are followed namely – full scaling and
constant voltage scaling, which are described as follows.

Full scaling: In this case, all the device dimensions, all the voltages and all the other
parameters are reduced by the same factor. The advantage of full scaling is that high
speed chips with large no. of transistors are possible. However, the disadvantage is that,
there is a limit to the value of “S”, due to the second order effects.

Constant voltage scaling: In this case, only the device dimensions are scaled, and the
voltage levels are maintained as earlier. Hence, naturally the current gets increased by a
factor of “S”. The advantage of CV scaling is that, the gate delay gets reduced by an
additional factor of “S”. In addition, the chip has no multiple power supplies. However,
the disadvantage is higher power dissipation density, which will lead to increased electric
fields, thus reducing the life of the chip.

The following table summarizes the properties of the two scaling methods –

Parameter Full scaling Constant voltage


scaling
Dimensions (W, L, tox) 1/S 1/S
Voltages (Vdd, Vt) 1/S 1
Gate capacitance 1/S 1/S
Current 1/S S
Propagation delay 1/S 1/S2

3.2 STATUS OF THE FABRICATION PROCESS

Here, the status of the fabrication process in 1998 is being discussed, by considering the
0.25µm process. For this process, Vdd = 1.8V and tox = 4.2nm. The process utilizes STI
(Shallow Trench Isolation) and supports 5 to 6 layers of Aluminium interconnect.

The metal lines have highest aspect ratio (t/w). Higher aspect ratio provides better
interconnect performance, but also introduces wall-to-wall capacitance. Pitch between the
metal lines is twice the width. The contacts and vias are filled with Tungsten. Finally,
Planarization is performed by CMP.

A comparison of the fabrication processes for the 0.25µm technology is summarized in


the following table. The table compares the process details of five major companies. All
the processes support stacked vias. Here, M0 is the ground plane, also called as Local
interconnect, in which the poly layer itself is utilized as interconnect. Wherever possible,
poly and diffusion are connected by means of buried contact, thus avoiding metal.

-34-
Company IBM AMD DEC TI Intel
Process name CMOS-6x CS-44 CMOS-7 C07 P856
No. of metal 6 5 6 5 5
layers
M0 Yes Yes No No No
Vdd 1.8V 2.5V 1.8V 1.8V 1.8V
M1 pitch 0.7 0.88 0.84 0.85 0.64
M2 pitch 0.9 0.88 0.84 0.85 0.93
M3 pitch 0.9 0.88 1.7 0.85 0.93
M4 pitch 0.9 1.13 1.7 0.85 1.6
M5 pitch 0.9 3.0 1.7 2.5 2.56

3.3 ISSUES RELATED TO THE FABRICATION PROCESS

1. Parasitic effects: The main two parasitic effects are the Parasitic capacitance and
the Interconnect capacitance. The Parasitic capacitance comprises of Stray
capacitance, Gate-to-Source capacitance and Drain-to-Gate capacitance. In case
of inverters, CDG gets charged in opposite directions for the inputs “0” and “1”.
Hence, the total capacitance of the node = Cstray + CGS + 2CGD.

The interconnect capacitance exists between wires across two layers and between
wires within the layer. This capacitance can be reduced by routing the wires in
perpendicular and by increasing the wire spacing.

2. Interconnect delay: The resistance of a uniform slab of conducting material is


given by,

The empirical formula for the interconnect capacitance is given by,

3. Noise and crosstalk: Low noise margin and High noise margin are is given by,
LNM = max (VIL) – max (VOL), HNM = min (VOH) – min (VIH). The crosstalk
depends on the closeness of lines, distance from the ground plane and the distance
upto which the two lines run close to each other.

-35-
4. Interconnect size and complexity: As the no. of transistors on a chip gets
increased, the no. of nets on the chip also increases. Thus it becomes important to
have advance information about the no. of I/Os a chip should have, based on the
no. of transistors on the chip. Rent’s rule is typically used to have this particular
estimation, and the rule is stated as follows –

where C = Average no. of I/Os (pins) per block


K = Proportionality constant for the sharing of signals
(typically equal to 2.5 for high performance systems)
N= No. of transistors per block
n = Rent’s constant (typical value is 1/1.5 > n > 1/3)

Example: If N = 10,000 and n = 0.33, then C = 2.5 (10000)0.33 = 52 pins.

5. Power dissipation: The power consumption of the chip can be classified as static
and dynamic. The static power consumption is due to the leakage current, and the
dynamic power consumption is due to the charging of CL as well as due to the
switching transients of the inverters. This dynamic power is expressed as,

P = CV2f
For CMOS, the ‘off’ state power is 0.003 mW/MHz/gate and the ‘on’ state power
is 0.8 mW/MHz/gate. For ECL, the power is 25 mW/gate irrespective of the state.

6. Yield and fabrication costs:

a) Cost of an untested die -

where Cw = cost of wafer fabrication


Nd = no. of dies on the wafer
Y = probability of functioning of a die

b) No. of dies on the wafer -

where D = dia of the wafer (usually 10 cm)


α = useless scrap edge width
X = chip dimension

-36-
c) Yield -

where A = area of the chip


δ = defect density (per sq. mm)
c = defect clustering parameter

d) No. of gates -

where P = total no. of pads on the chip


Aio = area of an I/O cell
Ag = area of a logic gate

e) No. of pads -

where S = minimum pad-to-pad pitch

Note: New process will have low yield, due to the instability of the process. Also,
large size chips will result in low yield, due to the uniform defect density.

Exercise-1: The chip dimension is 25 mm and 40% of the chip area is occupied by
interconnect. If the value of λ is 0.7µm, find the no. of transistors on the chip.

Solution: Total area = 25mm x 25mm = 625mm2 = 625 x 106 µm2


Effective area = 625 – (0.4x625) = 625 – 250 = 375 x 106 µm2
Area of transistor = 6 λ x 6 λ = 17.64 µm2
No. of transistors = (375 x 106) / 17.64 = 21258503 ≈ 21 million

Exercise-2: If the diameter of the wafer is 10cm, chip dimension is 6mm and useless
scrap edge width is 4mm, what is the no. of dies on the wafer?

Solution: Given that D = 10cm, X = 6mm and α = 4mm.


The no. of dies on the wafer is given by –

Nd = π (100-4)2 / (4x62) = 201

-37-
3.4 FUTURE OF THE FABRICATION PROCESS

a) SIA roadmap: A production fab costs upwards of two billion dollars (circa
1998). Hence it is clear that the semiconductor manufacturers are no more in a
position to function independently. The process innovations and the research
activities require joint efforts from the manufacturers all over the world. Hence, to
reduce the cost of the process equipment and to hasten the R&D processes,
Semiconductor Industry Association (SIA) was established, and this organization
formulated a vision for the process future. This roadmap (circa 1997) is as
indicated in the table below –

Feature size 250 nm 180 nm 130 nm 100 nm


Time frame 1997 1999 2003 2006
Transistors 3.7 millions 6.2 millions 18 millions 39 millions
per area per sq. cm. per sq. cm. per sq. cm. per sq. cm.
Chip frequency > 500 MHz > 750 MHz > 1100 MHz > 1500 MHz
Chip size 300 sq. mm. 360 sq. mm. 430 sq. mm. 520 sq. mm.
Pins per 512 512 768 768
package
Vdd (desktop) 2.5 1.8 1.5 1.2
Vdd (portable) 1.8 0.9 0.9 0.9
Minimum i/c 640 nm 460 nm 340 nm 260 nm
pitch
Maximum i/c 820 mtr 1480 mtr 2840 mtr 5140 mtr
length
Metal aspect 1.8 1.8 2.1 2.4
ratio

b) Advances in lithography: If the no. of transistors on the chip have to be


increased, then it becomes essential that lithography should support the reduced
device dimensions. This in turn depends on the resolution of the process. Thus the
term ‘photolithography’ becomes a misnomer, as the wavelength of the UV light
becomes a limitation for the process. Hence, lithography in the near future utilizes
X-ray, which can produce resolution upto 20nm and E-beam (Electronic beam)
which produces resolution of still lesser dimension.

c) Innovations in interconnect:

Local interconnect – As mentioned earlier, this allows direct connections


between poly and diffusion, in which poly itself is used for interconnect. Local
interconnect is also known as M0 or Ground plane. By means of M0, 25-50% of
improvement is possible in the cell layout. But here, the resistance of the poly
becomes a serious limitation, which can be reduced by using silicide.

-38-
Copper interconnect – When compared to Aluminium, the metal Copper has
40% lesser resistance. When Copper is used for interconnect, the system speed
increases by about 15%. In addition, Copper has lesser electro-migration. And,
Copper can withstand higher current densities than Aluminium.

But Copper was not used earlier, because of its diffusion into Silicon. Now, as the
process is improved, Copper can be used as interconnect, by providing an ultra-
thin barrier between the metal wires and Silicon.

d) Innovations in devices:

Lower Vt devices have higher speed, and hence such devices can be used in the
chip, for achieving larger speed. But, the leakage current is inversely proportional
to Vt. This will in turn increase the power consumption. Hence, all the transistors
cannot have lower Vt. Therefore to have an optimization, lower Vt devices can be
used in critical paths, where speed requirement is more. The other devices can be
used in the remaining circuitry.

By means of multiple implant passes, Multi-Threshold Devices (MTDs) can be


developed. These devices can be utilized on the chip, to have selection of Vt in
selected areas. The present process allows dual Vt devices.

e) Other process innovations: To overcome the limitations of Silicon, alternatives


are considered and two such alternatives are listed as follows –

i) SOI (Silicon On Insulator): Here, the substrate is either sapphire or


SiO2. To produce the substrate, oxygen is implanted on the Silicon
wafer in heavy doses, and then the wafer is annealed.

The advantages of SOI are – No body effect, No latch up, Lower Vdd,
Lower power consumption and Lower leakage current. The cycle time
is 20-25% lesser when compared to Silicon process.

ii) SiGe (Silicon Germanium): New RF applications require upto 30


GHz speed, and this speed is not achievable with Silicon. Another
alternative is Gallium Arsenide; but GaAs being a compound
semiconductor, requires different process, and hence it is quite
expensive. Therefore, SiGe is used for high frequency applications, in
which Germanium is added in small amount to Silicon.

But Germanium atom is 4% larger than Silicon atom; hence, Ultra


High Vacuum Chemical Vapor Deposition (UHV-CVD) is used for
this process. Some of the applications of SiGe are – VCOs, LNAs, RF
Power amplifiers and Mixers.

-39-
3.5 SOLUTIONS FOR INTERCONNECT ISSUES

a) Solutions for delay and noise:


i) Wider wires, buffers, Copper interconnect
ii) Avoidance of long interconnect and channels
iii) Reduction of crosstalk by avoiding long parallel lines,
and by placing ground line between signals.

b) Solutions for size and complexity of interconnect:


i) Usage of more metal layers
ii) Usage of local interconnect
iii) Usage of pre-routed grids
iv) Encoding of signals into less no. of bits
v) Usage of optical interconnect (especially for clock)

3.6 TOOLS FOR PROCESS DEVELOPMENT

The modern design requires careful planning of interconnect as well as the device
dimensions. Hence for the innovative processes, the design automation requires that such
software tools also must be developed. The following are the immediate requisites –

i) Tools for interconnect design: The tool must cater for the no. of layers, line
widths, spacing and thickness, type and thickness of ILD (Inter-Layer Dielectric),
type of vias etc. The tool must also be able to re-layout a design, based on the
constraints that are imposed.

In addition, the tool must be able to compare between processes and choose the best
one that is feasible; e.g., between 6 layer and 7 layer process.

ii) Tools for transistor design: The tool must be able to simulate the performance of
the transistor for the changed parameters. The tools must also be able to cater for the
better area utilization.

Note: During physical design, the layout information is captured in a symbolic database.
This database has technology independence, as the physical dimensions are only
relative. After the physical design, and prior to tape out, this symbolic database is
converted into the polygon database. This particular polygon database gets
generated for the particular technology, which contains actual feature sizes.

-40-
CHAPTER – 4

DATA STRUCTURES AND BASIC ALGORITHMS

A data structure can be defined as a particular way of storing and organizing data in a
computer, so that the data can be used efficiently. Examples of data structures are linked
lists, hash tables, stacks, queues, trees, forests and so on. An algorithm is a sequence of
unambiguous instructions for solving a problem. These instructions are required for a
system for obtaining a required output, for any legitimate input, in a finite amount of
time. Data structures along with the algorithms constitute the computer programs.

A computer algorithm is a detailed step-by-step method for solving a problem using a


computer. A program is an implementation of one or more algorithms. This procedure
can be illustrated as follows –

Figure 4.1 Computer algorithm

The time efficiency of an algorithm is analyzed by determining the number of repetitions


of the basic operation, as a function of input size. In other words, the “time efficiency” is
determined by the speed of processing and the “space efficiency” is determined by the
amount of memory being utilized. Based on the computational complexity, these two
factors can be again mentioned as time complexity and space complexity. It is the former
one which is of critical importance.

These two complexities of the algorithms are formalized by means of asymptotic


notations. There are three types of asymptotic notations namely upper bound, tight bound
and lower bound, given the symbols as O(g(n)), Θ(g(n)) and Ω(g(n)) respectively. These
three notations correspond to the three limits of the function, whose examples are –

Upper bound: O(g(n)) notation: n Є O(n2), n(n-1) Є O(n2), n3 Є O(n2)


Tight bound: Θ(g(n)) notation: n Є Θ(n2), n(n-1) Є Θ(n2), n3 Є Θ(n2)
Lower bound: Ω(g(n)) notation: n Є Ω(n2), n(n-1) Є Ω (n2), n3 Є Ω (n2)

It is the first notation that is widely used for most of the common algorithms.

-41-
4.1 COMPLEXITY ISSUES AND NP HARDNESS

The real-life problems can be classified, based on the time complexity of obtaining the
solution, as Polynomial and Non-deterministic Polynomial, as shown below –

Problem

P NP
(Deterministic)

NP complete (Decision version)


NP hard (Optimization version)

The class NP indicates that the polynomial time solution does not exist. A problem is said
to be “NP-complete” when the solution to a specific property with specific input size
exists. The problem is said to be “NP-hard” when solution is obtained with a specific
property, without considering the input size.

Most optimization problems in physical design are NP-hard. Therefore, a polynomial


time algorithm does not exist for the solution; hence the choices for solution are –

1. Exponential algorithms: These algorithms are feasible only when the input size
is small. In case of large problems, exponential time complexity algorithms
may be used, to solve only the small sub-cases.
2. Special case algorithms: The NP hard problem can be simplified by applying
some restrictions. Then the problem can be solved under polynomial time.
3. Approximation algorithms: When input size is more and restrictions are not
possible, then the above mentioned algorithms cannot be made use of. In such
cases, if near-optimality is sufficient, then approximation algorithms are made
use of, which will provide approximate solutions. In this case, performance is
measured by the performance ratio as follows –

Performance ratio, γ = Φ / Φ*

where Φ = solution produced by the algorithm


Φ* = optimal solution for the problem
4. Heuristic algorithms: These algorithms produce a solution, but do not guarantee
the optimality. Hence these algorithms must be tested on various benchmark
examples, to verify their effectiveness. Most of the physical design problems
utilize the heuristic algorithms for solutions.

-42-
4.2 BASIC TERMINOLOGY

A graph is a collection of vertices connected by edges, represented as G = (V,E). The set


of vertices is represented as V(G) and the set of edges that connect distinct vertices is
represented as E(G).

A graph is called as “directed” when E(G) is a set of ordered pairs of distinct vertices,
called directed edges. Connected acyclic graph is called as “Tree”. Collection of
unconnected trees is called as “Forest”.

A hypergraph is a pair (V, E), where V is a set of vertices and E is a family of sets of
vertices. A hypergraph is said to be connected if every pair of vertices are connected.

A bipartite graph is a graph G whose vertex set can be partitioned into two subsets X and
Y, such that each edge has one end in X and the other end in Y. Such a partition (X, Y) is
called bipartition of the graph.

A graph is called planar if it can be drawn in the plane without any two edges crossing.
There are many different ways of ‘drawing’ a planar graph, and each such drawing is
called as an embedding of G. The example of a planar graph is as shown –

The edges which bound a region define a face. The unbounded region is called the
external or outside face, also called as infinite face. A face is called an “odd face” if it has
odd number of edges. A face with even number of edges is called an “even face”.

A dual graph G' of a given planar graph G is a graph which has a vertex for each face of
G, and an edge for each edge in G joining two neighboring faces. An example is as
shown –

-43-
4. 3 BASIC ALGORITHMS

The basic algorithms that are utilized for physical design can be classified as follows –

Basic Algorithms

I. Graph algorithms II. Computational geometry algorithms

DFS
1. Graph search BFS Line sweep method
Topological Extended line sweep method

Kruskal’s
2. Spanning tree
Prim’s

SPSP (Dijkstra)
3. Shortest path
APSP (Floyd-Warshall)

4. Matching algorithms

5. Min-cut & Max-cut algorithms

6. Steiner tree algorithms

I. GRAPH ALGORITHMS: These are the algorithms in which the physical design
problems are modeled using graphs. Some of them are discussed below.

1. Graph search algorithms: These are the algorithms in which the


automation tool traverses the nodes of the graph in different ways, as
illustrated below -

a) Depth-First-Search: As the name suggests, the graph is searched depth-


wise. While visiting each vertex, the procedure followed is to visit LHS
first, and traverse down through all the edges and vertices, and then come
back to RHS. Thus, when all the edges of a vertex have been explored, the
algorithm back-tracks to the previous vertex.

-44-
For the example shown below, using the DFS approach, the enqueueing
is ABGFDEC. The algorithm is also indicated below. The algorithm uses
an array MARKED (u). The time complexity for DFS is O(|V| + |E|).

Figure 4.2 DFS algorithm

b) Breadth-First-Search: This algorithm explores all the vertices adjacent to a


vertex, before exploring any other vertex. And then the visited vertices are
enqueued. For the figure shown, the euqueueing is ABDCGEF. The
previous algorithm DFS being recursive uses a stack, whereas BFS uses a
queue. Time complexity of BFS is O(|V| + |E|).

Figure 4.3 BFS approach

-45-
c) Topological search: In this case, the parents are visited first and then the
children are visited. The visited vertex is deleted and the next vertices as
per topology are visited. For the example shown, the enqueuing order is
ABCDEF. Time complexity of this algorithm is O(|V| + |E|).

Figure 4.4 Topological approach

2. Spanning tree algorithms: Given G = (V, E), a spanning tree is a


subgraph of G that is a tree and contains all vertices of V. Spanning tree is
obtained by removing edges from E until all cycles have disappeared,
while all vertices remain connected.

Here the objective is to find a set of edges which spans all the vertices. In
physical design, the edges have to be selected such that the total wire
length is minimum. Hence MST (Minimum Spanning Tree) is an edge
selection problem. The goal of MST is to find E’ Є E such that E’ induces
a tree and the total cost of edges is minimum. Several algorithms exist, out
of which two main ones are discussed as follows –

a) Kruskal’s algorithm: In this approach, the edges are sorted by the


increasing weight. Each vertex is assigned to a set, in which each set
represents a partial spanning tree, and all the sets together form a spanning
forest. Edges belonging to the same sets are discarded and the edges
belonging to disjoint sets are combined to form a new set. Thus the
algorithm constructs partial spanning trees, and connects them to obtain
MST. The time complexity of this algorithm is O(|E|log|E|).

The algorithm is illustrated by means of the figures as shown –

-46-
Figure 4.5 Kruskal’s algorithm

b) Prim’s algorithm: This algorithm starts with any arbitrary vertex, and
edges are added to the tree one-by-one. This goes on until the tree
becomes a spanning tree and contains all vertices. As this happens in a
loop, and the time complexity is O(n2), where ‘n’ is the no. of vertices.

Figure 4.6 Prim’s algorithm

3. Shortest path algorithms: These find the shortest path in the graph.

a) Single pair shortest path (SPSP): Also called as Dijkstra’s algorithm, it


finds the shortest path from a given vertex, and continues the procedure
until the designated vertex is reached. The algorithm can be used for
finding the shortest path from a given vertex to all the vertices in the
graph. As the algorithm uses a loop which gets executed by O(n) times,
the time complexity is O(n2). The algorithm is as shown –

-47-
Figure 4.7: Shortest path from B to F

-48-
b) All pairs shortest paths (APSP): This is a variant of SPSP, in which the
shortest path is required for all possible pairs in the graph. Floyd-Warshall
algorithm is used for this purpose, whose time complexity is O(n3). This
algorithm plays a key role in the global routing phase.

For the given directed graph G = (V, E), let dij (k) be the weight of a
shortest path from vi to vj, with all the intermediate vertices from
{v1, v2,……vk}. For k=0, there are no intermediate vertices from vi to vj,
and hence dij (0) = wt (vi, vj). Therefore, a recursive formulation for “all
pairs shortest path” can be given as,

4. Matching algorithms: A matching in a graph is a set of edges without


common vertices. Matching is used for converting the planar graphs into
bipartite. An example is as shown –

Figure 4.8: Matching in a graph

5. Min-cut and Max-cut algorithms: A cut is a partition of the vertices of a


graph into two disjoint subsets. In an unweighted graph, the size of a cut is
the no. of edges crossing the cut. In a weighted graph, the same term is
defined by the sum of weights of the edges crossing the cut.

A cut is called as “mincut” if the size of the cut is not larger than the size
of any other cut. Similarly, A cut is called as “maxcut” if the size of the
cut is not smaller than the size of any other cut.

-49-
ADDITIONAL INFO: Euler’s formula states that for any planar graph, V + F - E = 2,
where V is the no. of vertices, F is the no. of faces and E is the no. of edges. This is
applicable to the planar graph as well as its dual.

The procedure for drawing the dual of a graph is as follows – Take the faces of G as the
vertices of G*, and for every pair of faces having a common edge in G, join the
corresponding vertices in G* by an edge that crosses the edge only once. There must be a
vertex in the dual graph, which corresponds to the infinite face. The examples are as
shown below –

Example for finding max-cut for a planar graph: The Max-cut problem can be
defined as follows: For a given graph G = (V, E), find the maximum bipartite graph
of G. Let this graph be G' = (V, E'), which is obtained by deleting K edges of G; then
G has a maximum cutsize of |E| - K. The procedure for obtaining the maximum
bipartite graph is summarized as follows –

- Obtain the dual of the given planar graph


- From the dual graph, draw the weighted graph that corresponds to the odd faces
- Delete those edges in the given planar graph which have minimum weight matching
in the weighted graph
- The resultant graph after the deletion of said edges will be bipartite.

The following figure illustrates the steps mentioned above. In figure (a), the odd faces
are 3, 5, 10 and 13. In figure (c), the minimum weight matching is 4, which is
between the vertices (3, 13) and (5, 10). Figure (d) shows the bipartite graph.

-50-
Figure 4.9: Maxcut of a planar graph

(a) Planar graph example (b) Its dual


(c) Weighted graph corresponding to odd faces (d) Resultant bipartite graph

The algorithm for obtaining bipartite is presented by Hadlock. The algorithm contains
four procedures. The procedure PLANAR-EMBED finds a planar embedding of the
given graph G. The procedure CONSTRUCT-DUAL creates the dual graph for the
planar embedding. The procedure CONSTRUCT-WT-GRAPH constructs a complete
weighted graph, by using the vertices corresponding to the odd faces. The procedure
MIN-WT-MATCHING pairs up the vertices in R which have minimum weight
matching between the odd faces of G. All these edges are deleted, and the resulting
graph is bipartite. The algorithm is as shown below –

-51-
6. Steiner tree algorithms: This is an improvement of MST, in which extra
intermediate vertices and edges may be added to the graph in order to
reduce the length of the spanning tree. An example is shown –

Figure 4.10: Steiner point

Here, the point S is introduced as an extra point, which reduces the total length between
the points A, B and C. The point S is called as “Steiner point”, and the other points are
called as “Demand points”.

A Steiner tree whose edges are constrained to rectilinear shapes is called a Rectilinear
Steiner Tree (RST). An RST with minimum cost is called as Rectilinear Steiner
Minimum Tree (RSMT). The RSTs can be easily illustrated by means of an underlying
grid graph, and RSMTs can be obtained by means of the grid graph.

The Steiner tree and its grid version are illustrated in the examples shown below –

-52-
Figure 4.11: Steiner Tree

(a) (b) (c)

(d) (e) (f)

Figure 4.12: Steiner trees with underlying grid graph

(b) is an ST (c) and (d) are RSTs (e) and (f) are RSMTs

-53-
II. COMPUTATIONAL GEOMETRY ALGORITHMS: These are the algorithms in
which the physical design problems concerned to lines are addressed.

1. Line sweep method: The goal here is to find all the pair-wise intersections of the
given line segments. The direct approach is to take every pair of line and check its
intersection. But the time complexity will be O(n2). Hence to reduce the
complexity, the line sweep method is used, which has a complexity of O(nlogn).
The method is as follows –

i) Represent the ‘n’ line segments by their ‘2n’ endpoints, by sorting the
x-coordinate values from left to right. Hence we will have (2n+K) event
points, where ‘K’ is the no. of intersections.
ii) Take an imaginary vertical sweep line at the leftmost side. Let the sweep line
traverse the endpoints from left to right, halting at each x-coordinate in the
sorted list.
iii) During the process of traversal, starting from the left endpoint, let the x and y
coordinates of the event points be stored in two individual data structures.
iv) When the event point is a right endpoint of the line segment, the segment is
deleted from the ordering.
v) An intersection is detected when two segments are consecutive in order.

These steps are illustrated by means of an example –

Figure 4.13: Line sweep method

-54-
2. Extended line sweep method: The above mentioned algorithm halts when it
detects one intersection. This algorithm can be extended to detect all the ‘K’
intersecting pairs.

For this to happen, each point of intersection is inserted into a heap Q of sorted
endpoints, in the x-coordinate order. Whenever the intersection is reported, the
order of the intersecting segments is swapped. The algorithm halts when all the
endpoints and intersections are traversed by the sweep line.

Note: The extended line sweep method takes into account the special case of line
segments being only horizontal or only vertical. The horizontal line will have only
one y-coordinate and vertical line will have only one x-coordinate. These are
stored accordingly in the heap Q, with horizontal line having two x-coordinates
and vertical line having two y-coordinates, for their corresponding endpoints.

-55-
4.4 BASIC DATA STRUCTURES

In physical design, a rectangular section of the layout within a single layer is called as a
“tile”. Obviously, tiles within a layer are not allowed to overlap. The block tiles are the
elements of the layout, which can be used to represent the different design areas such as
p-diffusion, n-diffusion, poly segment etc. For the sake of simplicity, these block tiles are
referred as “blocks”. An example is shown –

Figure 4.14: Block tile representation of a layout

Atomic operations for layout editors:

The atomic operations are the basic set of operations which are utilized to
manipulate a layout. During design automation, the following are the atomic
operations which a layout editor must support –

1. Point finding: Given a point p = (x,y), determine whether p lies within a block,
and then identify that block.
2. Neighbor finding: Given a block B, determine all the other blocks that touch B.
3. Block visibility: For the given block B, determine all the blocks that are visible
in the x and y directions.
4. Area searching: Given a fixed area A defined by its upper left corner (x,y), its
length l and width w, determine whether A intersects with any blocks that are
in the particular area.
5. Directed area enumeration: Given a fixed area A, visit each block that is
intersecting A exactly once in a sorted order; i.e., at the top, at the bottom, at
left, and at right).
6. Block insertion: Insert a new block B such that it does not intersect with any
other existing block.
7. Block deletion: Remove a block B from the layout.
8. Plowing: Given an area A and direction d, remove all the blocks Bi from A by
shifting them in direction d, preserving their order.
9. Compaction: Compress the entire layout, for area optimization. If the
compaction is only in x-axis or y-axis, then it is called as “1-dimensional”. If
the compaction is in both the axes, then it is called “2-dimensional”.
10. Channel generation: Determine the vacant space in the layout and then
partition it into tiles.

-56-
The data structures that are used for physical design must be able to represent the blocks
and the vacant spaces of the design. There are four types of data structures that are used –
Linked list of blocks, Bin-based method, Neighbor pointers and Corner stitching. These
are explained as follows.

1. Linked list of blocks: This is the simplest data structure that is used to store the
components of a layout. Here, each node in the list represents a block, as shown –

Figure 4.15: Linked list representation

The location of the block in the list is represented by its coordinates of upper left corner.
In addition, the height and width are specified, along with a text for the description. The
linking to the next block is achieved by means of the pointer. The space complexity of
this method is O(n).

The algorithm for neighbor finding for a given block B can be specified as follows, where
B is the given block and L is the linked list –

-57-
The disadvantage of the linked list is that it is not suitable for non-hierarchical systems.
In addition, the linked list does not represent the vacant space in an explicit way. To do
that, the vacant space has to be converted into a collection of vacant tiles, and these tiles
have to be stored in the list. This is as shown below –

Figure 4.16: Vacant tile creation

Here, the entire area is partitioned into a collection of block tiles and vacant tiles. The
vacant tiles are organized as “maximal horizontal strips”. Hence the linked list now
represents vacant tiles in the same way as that of block tiles. This modified data structure
is called as “modified linked list”.

2. Bin-based method: This method does not create vacant tiles. Instead, a virtual grid is
superimposed on the layout area, and each element in the grid is called as a “bin”. The
bins are represented using a 2-dimensional array, as B(row, column). Hence all the blocks
intersecting a particular bin are linked together, and the 2-dimensional array is used to
locate the lists for different bins. This method is as shown in the figure below –

Figure 4.17: Bin based representation

-58-
In the figure, B(2,3) contains the blocks D,E,F,G,H. Similarly, B(2,4) contains the blocks
G and H. The blocks in a given area can be located quickly by indexing into the array and
searching the lists of relevant bins.

The advantage of Bin based method is that it is not necessary to specify the vacant space.
But the disadvantage exists in deciding the size of the bins. If the bins are too small, then
the storage requirements get increased. The reasons are – i) A block may intersect many
bins, thereby increasing the memory storage in the data structure. ii) Many bins may
remain empty; but still they must be tested and the information must be stored.

On the other hand, if the bins are too large, then the performance will be reduced as the
linked lists become too long. This is because, as bins get larger, it takes longer to search
the lists in each bin, as each bin may contain many no. of blocks.

Hence, the best case occurs when each bin contains exactly n/b blocks, where ‘n’ is the
no. of blocks and ‘b’ is the no. of bins. The space complexity in this case is O(bn).

As the Bin based data structure does not represent the vacant space, operations such as
compaction become quite tedious. Below given is the algorithm to find the neighboring
blocks in the Bin based method –

-59-
3. Neighbor pointers: The previous two data structures do not have a method to contain
local information, such as the neighboring blocks. This limitation can be overcome by
having pointers to all the neighbors. Hence in this method, each block is represented by
its top left coordinates, height and width [(x,y), h, w], and in addition, pointers to the
neighbors. This is as shown below –

Figure 4.18: Neighbor pointers

The figure shows the neighboring pointers for block A. All the other blocks will have
similar pointers. The space complexity of this data structure is O(n2). By using this data
structure, plowing and compaction operations can be easily performed. This is as shown
in the figure below –

Figure 4.19: Plowing of block B using Neighbor pointers

-60-
As each block contains information about the neighbors, the operations of plowing and
compaction are performed simply by rearranging the pointers after a block is moved.

However, the disadvantage of this data structure is the maintenance difficulty of the
pointers. A simple modification requires all the pointers to be updated. In addition, as
vacant space is not explicitly represented, channel generation becomes complicated.

4. Corner stitching: This is the first data structure to represent both vacant and block
tiles in the design. Hence, modification of the layout becomes much easier. Here, the
vacant space is divided into maximal horizontal strips. Thus, the complete layout is
represented as vacant and block tiles. And then, the tiles are linked by two sets of pointers
called as “corner stitches”. The scheme is as shown –

Figure 4.20: Corner stitches

Each tile contains four stitches, and these are indicated as right top, top right, left bottom
and bottom left. Thus, all the atomic operations are possible using these pointers. An
example of the layout is as shown –

Figure 4.21: Layout using corner stitches

-61-
In the figure, the stitches exceeding the boundary of the layout are not shown. However,
they are represented as NULL pointers in the data structure. Below given is the
implementation of some of the atomic operations using corner stitch points –

a) Point finding: Let the current point be P1 and let the destination point be P2. Then the
following sequence of steps is required to find P2 –

i) The first step is to move up or down, using “rt” and “lb” pointers, until a tile is
found whose vertical range matches that of P2.
ii) The next step is to move left or right, using “bl” and “tr” pointers, until a tile is
found whose horizontal range matches that of P2.
iii) If there is a misalignment in search operations, then the above mentioned steps
have to be iterated until the tile containing P2 is located.

The figure and the corresponding algorithm are as shown –

Figure 4.22: Point finding using corner stitches

-62-
Note: On an average, this algorithm traverses √N tiles.

b) Neighbor finding: Initially, all the tiles that touch a given side of a given tile can be
found. Later, the procedure can be iterated for the other three sides. The procedure can
be illustrated as follows, for neighbor finding at the left side -

Figure 4.23: Neighbor finding using corner stitches

From the “tr” pointer of the given tile, the algorithm starts traversing using the “lb”
pointer downwards. The procedure continues until a tile is reached whose “lb” pointer
does not lie within the vertical range of the given tile.

-63-
c) Area search: The objective is to find whether there is any block residing in the given
area. The following figure shows an area in dotted line –

Figure 4.24: Area search using corner stitches

The steps of the algorithm can be as follows –

i) At first, using the point finding algorithm, locate the tile in which the upper
left corner of the given area is located.
ii) Check if the tile is a block; if not, then it must be a space tile. Check if its right
edge is within the given area.
iii) If a block is found, then the search is complete. If no, then the next tile that is
touching the right edge of the given area is located.
iv) The first two steps are repeated until the given area has been searched.

d) Enumerate all tiles: After the area search, the objective is to enumerate all the tiles that
are within the given area, as shown in the figure –

Figure 4.25: Enumerate all tiles using corner stitches

-64-
The algorithm is as follows –

i) Initially, find the tile in which upper left corner of the given area is located.
Then move along the top edge through all the tiles.
ii) Number the tiles accordingly, as the algorithm traverses.
iii) After one traversal, locate the other block that intersects the area, using the
neighbor finding algorithm.
iv) Repeat the above steps until the lower right corner of the given area is reached.

e) Block insertion: The figure is shown in which a new block E is inserted –

Figure 4.26: Block insertion using corner stitches

The steps for insertion of a block are as follows –

i) First of all, the algorithm must check whether a block exists in the given area.
ii) If not, then the vacant tile is located, which must contain the top edge of the
new block to be inserted.
iii) The tile found is split along the horizontal line, and the corner stitches at the
right side of the new block are updated.
iv) Now, the vacant tile containing the bottom edge of the new block is found,
and split accordingly. Later, the corner stitches at the left side of the new
block are updated.
v) Finally, the algorithm must traverse along the left and right sides of the new
block, and update the corner stitches of the neighboring tiles accordingly.

-65-
e) Block deletion: The figure is shown in which the block C is deleted –

Figure 4.27: Block deletion using corner stitches

The steps for deletion of a block are as follows –

i) At first, the block that has to be deleted is converted into a space tile.
ii) Then search the neighboring tiles at the right side. For each vacant tile
neighbor, the algorithm should check the vertical span and then merge this tile
with the other vacant tiles horizontally.
iii) Later, search the neighbors at the left side and repeat the above step. After
each horizontal merge, a vertical merge must be performed if possible.

LAYOUT SPECIFICATON LANGUAGES:

In the automation of physical design, it becomes necessary to specify the geometric


information of the layout in terms of and intermediate language. CIF and GDS are the
two popular forms of layout description. Here, a brief description of CIF is provided.

CIF stands for Caltech-Intermediate-Format, developed by Mead and Conway, at Caltech


University. This format provides a sequence of commands, which are used to specify the
geometrical shapes of the layout. The layout is always represented in the first quadrant,
and the geometrical dimensions are specified in microns.

For example, a box of length 30 µm, width 50 µm and the box centered at (15, 25) is
specified in CIF as B 30 50 15 25. Similarly, P is used for Polygon, R is used for Circle,
W is used for Wire, and so on. Some of the CIF commands are as follows –

-66-
Shape Command Example
Box B length width center (direction) B 30 50 15 25 (-20 20)
Polygon P path P 00 00 20 50 50 30 40 00
Circle R diameter center R 10 30 10
Wire W width path W 20 00 10 30 10 30 30 80 30
Layer L shortname L metal1

In case of Box, the direction is optional. It is specified only when the box does not have
Manhattan or rectilinear features. The following figure illustrates the commands –

Figure 4.28: CIF terminologies for geometric shapes

Figure (a) represents a box, whose bottom left corner is at origin for the given data in the
table. Figure (b) represents a polygon whose vertices are at (0,0), (20,50), (50,30) and
(40,0) respectively. Figure (c) represents a wire with width 20 and its path is specified by
the coordinates (0,10), (30,10), (30,30) and (80,30). Obviously, the coordinates are for
the centerline, and the wire must have uniform width throughout. The following figure
shows the box that is specified as B 25 60 80 40 -20 20:

Figure 4.29: Box representation in CIF, with direction vector

-67-
4.5 GRAPH ALGORITHMS FOR PHYSICAL DESIGN

In VLSI physical design, the layout is a collection of rectangles. Routing is performed by


means of rectangles whose width can be ignored, and hence they are considered as wires.
Therefore, when the layout is represented as a graph, the layout contains only rectangles
and lines. This graph represents the relationships between the rectangles, and it can be of
two types namely – graphs dealing with lines and graphs with rectangles.

a) Graphs related to a set of lines: Lines can be of two types namely – the ones that are
aligned to the axis, and the ones that are not aligned. The first type of lines are called as
“line intervals”, and an example is as shown –

Figure 4.30: Line intervals

For the line intervals that are shown, the relationships between them can be represented
by means of three types of graphs namely – Overlap graph, Containment graph and
Interval graph.

Overlap graph is defined as: GO = (V, EO) where


V = { vi | vi represents the interval Ii }
and EO = { (vi,vj) | li < lj < ri < rj) }

Containment graph is defined as: GC = (V, EC) where


V = { vi | vi represents the interval Ii }
and EC = { (vi,vj) | li < lj, ri > rj) }

Interval graph is defined as: GI = (V, EI) where


V = { vi | vi represents the interval Ii }
and E I = EO U EC

These particular graphs for the line intervals shown above are as follows –

-68-
Figure 4.31: Overlap graph

Figure 4.32: Containment graph

Figure 4.33: Interval graph

The lines that are non-aligned are sometimes represented using a matching diagram, as
shown below –

Figure 4.34: Matching diagram (channel)

-69-
Another class of graphs known as Permutation graphs is frequently used in routing, and
can be defined by the matching diagram.

Permutation graph is defined as: GP = (V, EP) where


V = { vi | vi represents the interval Ii }
and EP = { (vi,vj) | if line i intersects line j }

The GP corresponding to the matching diagram shown above is as shown –

Figure 4.35: Permutation graph for the channel

The matching diagram shown above is a two-sided box, which is used for “channel
routing”. On the other hand, “switchbox routing” uses four-sided box, as shown below –

Figure 4.36: A switchbox

The graph corresponding to switchbox is called as “Circle graph”, which can be –

-70-
Figure 4.37: Circle graph for the swithbox

b) Graphs related to a set of rectangles: As is obvious, no two rectangles can overlap


within a plane. However, rectangles may share edges, in which case, they are neighbors
to each other. Hence, the relationships between the rectangles in a layout are represented
using the neighborhood graphs.

Neighborhood graph is defined as: G = (V, E) where


V = { vi | vi represents the rectangle Ri }
and E = { (vi,vj) | Ri and Rj are neighbors }

The following figure illustrates a set of rectangles and the next figure depicts the
corresponding the neighborhood graph –

Figure 4.38: A set of rectangles

-71-
Figure 4.39: The corresponding neighborhood graph

On the contrary, when a graph is given, its rectangular dual can be constructed with the
usage of the same neighborhood principles. The example is as shown –

Figure 4.40: The graph and its rectangular dual

Note: The real-life problems can be classified as the ones that can be solved in
polynomial time, and the ones that need exponential time. Hence, a problem that can be
solved in polynomial time is called as “tractable”. It is called as “intractable” otherwise.

The intractable problems come under NP category. The NP-complete problems contain
those problems which are “likely to be intractable”. Obviously, the NP-hard problems are
intractable, and hence they require heuristic algorithms for solution.

-72-
CHAPTER – 5

PARTITIONING

Partitioning is the process of decomposition of a complex system into smaller


subsystems. While performing the partitioning process, the following factors must be
taken into consideration –

i) Original functionality of the system should remain intact after partitioning.


ii) The interconnections between any two subsystems must be minimized.
iii) The time required for partitioning must be a small fraction of the total design time.

The partition is also referred to as a “cut”, and the cost of partition is called as the
“cutsize”. The interconnections between the subsystems are also called as “nets”.

There can be three levels of partitioning, as illustrated below –

System

System level partitioning


PCBs

Board level partitioning


Chips

Chip level partitioning


Sub-circuits

Figure 5.1: Levels of partitioning

In System level partitioning, the system is decomposed into many circuit boards. If
there is more no. of boards, then less will be the reliability; this is due to the delay
encountered by the signals in the system bus. Hence, the no. of boards into which the
system has to be decomposed has to be decided in prior. In addition, care should be taken
while planning such that, the no. of nets that are used to connect each board with the
other boards must be within the terminal count of the particular board.

In Board level partitioning, the no. of chips to be present on the board has to be decided.
Again, more no. of chips means less reliability, due to the off-chip delay. Moreover, the
packages and the pin counts become the deciding factor. For example, DIP allows 64 pins
max. whereas PGA allows more than 300 pins.

-73-
In Chip level partitioning, the blocks into which the chip is going to be divided becomes
the decisive factor. The blocks can be either full-custom or semi-custom. In addition, to
simplify the routing task, the no. of nets cut by the partitioning should be minimized.
Moreover, the length of the critical path also has to be minimized. This is illustrated in
the figure shown, in which (a) indicates a good layout and (b) indicates a bad example –

Figure 5.2: Example of chip level partitioning

5.1 PARAMETERS FOR PARTITIONING PROBLEM

Initial definitions:

Let V1, V2, .…, Vk be the partitions of a sub-circuit.


Let Cij be the cutsize between the partitions Vi and Vj.
Let Count (Vi) be the terminal count of partition Vi.
Let P = {P1, P2, …., Pm} be a set of hyperpaths.
Let H(Pi) be the no. of times a hyperpath Pi is cut.

a) Interconnect between partitions: As stated earlier, this has to be minimized, and


hence this is called as “mincut” problem. Thus, the first objective is –

b) Delay due to partitioning: The delay between partitions is significantly larger than
the delay within a partition. Thus, the second objective is –

-74-
c) No. of terminals: The no. of nets required for connection cannot obviously exceed the
no. of terminals. At system level partitioning, this is decided by the terminal count of
the PCB connector. At board level partitioning, this is decided by the pin count of the
package of the chips. At chip level partitioning, this is decided by the perimeter of the
area of the sub-circuit. Therefore at any level, the first constraint is –

d) Area of each partition: Lesser the area of the chip, lesser is its cost. But it may not be
possible to minimize the area always, and hence an upper bound has to be specified.
Therefore, this constraint can be stated as –

e) No. of partitions: A large no. of partitions will ease the design, but will increase the
cost as well as the interconnect. On the other hand, a small no. of partitions will make
the design complex. Hence, the constraint for the no. of partitions can be stated as –

DESIGN STYLE SPECIFIC PARTITIONING PROBLEM:

Full custom design: Here, the partitions can be of different sizes and hence there are no
area constraints, at the chip level partitioning. The estimated terminal count for a partition
is given by, Ti = Pi / d, i = 1, 2, …., k

where Pi = perimeter of the block, d = terminal pitch

Standard cell: Here, each sub-circuit should correspond to a cell in the standard cell
library. Hence, the complexity of partitioning depends on the type of standard cells that
are available in the library.

Gate array: Here, the circuit is bi-partitioned recursively until each resulting partition
corresponds to a gate in the Gate array. Thus, each bi-partitioning should have the
objective of mincut.

5.2 CLASSIFICATION OF PARTITIONING ALGORITHMS

As mincut problems are NP complete, the algorithms developed for partitioning are
heuristic. These algorithms can be classified based on three factors namely – the initial
partitioning, the nature of algorithms and the process used. The first two are of only
historical interest, and the last one is the one that is going to be discussed here. However,
the classification of all these algorithms can be indicated as follows –

-75-
Partitioning algorithms

Based on the Based on the nature Based on the


initial partitioning of algorithms process used

Constructive Deterministic Group migration


Iterative Probabilistic Simulated annealing
and evolution

The last type is the one that utilizes the algorithms that are mentioned in the first two
types also. Therefore, as already mentioned, the algorithms listed in the last type are
going to be discussed here.

I GROUP MIGRATION ALGORITHMS:

These algorithms belong to a class of iterative improvement algorithms, in which the


layout is divided into some initial partitions, and then the changes are applied to these
partitions, to reduce the cutsize. The process is repeated until no further improvement
is possible. The algorithm proposed by Kernighan and Lin is the first one in this type
of algorithms, after which its extensions follow, as indicated below –

Group migration algorithms

Kernighan-Lin Its extensions

Fiduccia-Mattheyses
Goldberg and Burstein

1. Kernighan-Lin algorithm: This is a bi-sectioning algorithm, which means that it


starts by initially partitioning the graph G = (V,E) into two subsets of equal sizes.
Then a vertex pair is chosen to be exchanged across this bi-section, only if this
exchange reduces the cutsize. This procedure of exchange of vertex-pairs is
carried out iteratively, until no further improvement is obtained. The procedure of
K-L algorithm is illustrated by means of an example, as shown below –

-76-
Figure 5.3: Example of K-L algorithm

a) Initial bisections [cutsize = 9]

1 3

2 6

5 7

8
4

b) After exchanging vertex pair (3,5) [cutsize = 6]

1 3

2 4

5 7

8
6

c) After exchanging vertex pair (4,6) [cutsize = 1]

-77-
d) Final bisections (with vertices reoriented)

When a vertex pair is chosen, the results are tabulated by means of the respective gain.
The gain of a vertex pair for ith iteration is given by, g(i) = Da + Db – 2Cab where D is the
edge cost of the edges ‘a’ and ‘b’ respectively, and C is the no. of common edges
between the vertices ‘a’ and ‘b’. The edge cost in turn is given by, D = Inedge – Outedge,
where the edges correspond to the partitions, within and without.

Initially the vertex pair (3,5) is chosen, and the results after the exchange of vertices are,

D3 = (3-1) = 2, D5 = (2-1) =1, C35 = 0 and g(1) = 2+1-0 = 3.

Similarly, the gain for the vertex pair (4,6) is given by g(2) = (3-0) + (3-1) – 0 = 5. Using
these gains, the cutsize for each iteration can be deduced by the algorithm, by subtracting
the total gain from the initial cutsize. Hence, the results for this particular procedure can
be tabulated as follows –

Iteration Vertex Gain Total Cutsize


(i) pair g(i) gain
0 - - - 9
1 (3,5) 3 3 6
2 (4,6) 5 8 1

The procedure can be continued by considering the other vertex pairs as well. But in this
example, as the cutsize reaches the minimum value of “1”, the algorithm can stop at two
iterations as well. Albeit, the algorithm can continue by choosing all the vertex pairs, and
then deduce the results from the log-table of the vertex exchanges. Hence, the pseudo-
code description for the K-L algorithm can be as follows –

-78-
Even though, the K-L algorithm is quite simple and efficient, it has some inherent
disadvantages, which are –
a) Its time complexity is O(n3).
b) It requires pre-specified partition sizes.
c) It cannot handle weighted graphs.
d) It is not applicable to hypergraphs.

Note: Hypergraph implies the graph in which edges can connect any no. of vertices. An
example is as shown, in which four hyperedges are present –

-79-
2. Fiduccia-Mattheyses algorithm: This is a modified version of K-L algorithm in
which the following modifications are implemented –

a) A single vertex is moved across the cut in a single move.


b) The vertices to be moved are selected in such a way as to increase the speed.
c) The concept of cutsize is extended to hypergraphs also.

The data structure for this particular algorithm is as shown –

Figure 5.4: The data structure for F-M algorithm

3. Goldberg and Burstein algorithm: For physical design problems, the ratio of the
no. of edges to the no. of vertices is typically in between 1.8 and 2.5. But K-L
algorithm yields good bisection when this particular ratio is greater than 5.
Therefore, in order to achieve good bisections for VLSI problems, Goldberg and
Burstein suggested an improvement of the bisection algorithm, in which the edges
are contracted such that the ratio is increased. An example is as shown –

Figure 5.5: Matching and edge contraction

-80-
Initially, a matching is found in the given graph, and then each edge in the
matching is contracted, in order to reduce the density of graph. Any bisection
algorithm can now be applied to the contracted graph, after which, the edges are
un-contracted within each partition. In the example shown, the edge to vertex
ratio initially is 27/12 = 2.25. After the matching and contraction procedures, the
edge to vertex ratio becomes 15/6 = 2.5.

II SIMULATED ANNEALING AND EVOLUTION:

This class of algorithms belongs to the probabilistic and iterative type. In these two
methods, the annealing process that is used for metals and the biological evolution
process that is present in nature, are simulated.

1. Simulated annealing: Annealing is the process of heating the material to a high


temperature, and then slowly cooling it down as per a schedule. When heated, the
metal reaches an amorphous liquid state, and when cooled very slowly, it attains a
highly ordered crystalline state. The same process is simulated here, for the
partitioning of a circuit with very large no. of components.

Here, the energy within the material corresponds to the partitioning score. The
idea is to attain the purpose of partitioning with global minimum energy, which
corresponds to the perfect crystal. The procedure is as follows –

i) An initial random partition is selected, by exchanging some elements between


the partitions, and the resulting change in score is calculated.
ii) If this change “δs” is less than zero, then the move is accepted, as this value
indicates lower energy.
iii) If δs ≥ 0, then the move is accepted with the probability e- δs/t, where “t” is the
temperature. This means that, with increase in temperature, the probability of
accepting the increased score decreases.
iv) Through the iterative process, the algorithm achieves a global minimum.

The algorithm for simulated annealing is listed as follows, in which the following
variables are made use of –

“t0” is the initial temperature of heating.


“α” is typically 0.95, selected for the cooling schedule.
“αt” is the temperature decrement.

Higher the value of “t0” and higher the value of “α”, better will be the result. This
indicates the high initial temperature and very slow cooling. But then, the time
required for solution also increases. For the performance improvement, “e- 0.7/t”
can be used as decrement function, instead of “αt”.

-81-
In this listing, the SELECT function is used to select two random components,
one from each partition. The EXCHANGE function is used to generate a trial
partitioning, without moving the components. The SCORE function is used to
calculate the cost for this trial partitioning. If the cost is reduced, then the actual
partitioning is performed by using the selected components. The MOVE function
is used to actually move the components across the partitions.

If the cost is greater, then RANDOM function is used to calculate the probability
of movement. The decision to move the components is taken based on this
probability. If it is less than e- δs/t, then MOVE is accepted; otherwise the current
trial EXCHANGE is discarded, and different random components are selected.

2. Simulated evolution: Evolution is the betterment process that happens from


generation to generation. Here, the “bad” genes from the old generation are
eliminated and the “good” genes are retained for the next generation. That is how
in nature, improvement is happening with each generation.

In case of Simulated Evolution, state models are created for partitioning process.
These state models correspond to the old generations, and the cost of each state
corresponds to “genes”. A state is defined as a function S: M → L, where M is a
finite set of movable elements and L is a finite set of locations. As far as the
partitioning is concerned, the elements of a state correspond to vertices, and the
locations of states correspond to the two positions. The algorithm retains the state
of lowest cost in each iteration. The listing of the algorithm is as follows –

-82-
Here, the procedure PERTURB decides the position of each element within the
state, based on the cost. i.e., for each movable element m Є M in the current state
S, the procedure decides to retain its current location, based on its cost. The
procedure UPDATE decides the value of control parameter ‘p’. If Cpre = Ccur, then
the value of ‘p’ is decremented; otherwise, its initial value is retained. This
parameter ‘p’ is utilized to determine the number of moves within each state, in
order to obtain minimum cost.

The variable ‘R’ is utilized for determining the no. of states, which corresponds to
generations. Its value has to be chosen by the designer. As Simulated Evolution
uses the history of previous trial partitioning, it is more efficient than Simulated
Annealing. However, the Simulated Evolution algorithm requires more memory
space for storing the history.

-83-
CHAPTER – 6

FLOORPLANNING AND PIN ASSIGNMENT

After the partitioning of the layout, the following tasks have to be performed –

a) The area occupied by each block has to be estimated.


b) The possible shapes of the blocks have to be ascertained.
c) The no. of terminals that are decided for each block has to be assigned.

Here, the first two tasks come under the Floorplanning phase and the third task comes
under Pin assignment phase. The assigning of the specific locations to the blocks comes
under the Placement phase, and the interconnects are completed during Routing phase.

The blocks whose dimensions are known are called as “fixed blocks”, and the blocks
whose dimensions are yet to be determined are called as “flexible blocks”. If all the
blocks are of fixed type, then the layout becomes only a Placement problem. Otherwise,
the layout becomes a Floorplanning problem.

The factors to be considered for Floorplanning are –

i) Shape of the blocks: The blocks have to be mostly rectangular, with a lower as
well as upper limit for the aspect ratios.
ii) Packaging of the chip: The blocks must be placed in such a way that the heat
generated must be dissipated uniformly over the entire surface.
iii) Pre-placed blocks: In the layout, the locations of some of the blocks may be fixed.
e.g.: clock buffers to be placed at the centre of the chip.

6.1 FLOORPLANNING

As already defined, Floorplanning is the process of placing the flexible blocks, whose
areas are known and the dimensions are not known. i.e., for each block Bi, its area ai is
known, and the aspect ratio limits Ail and Aih are specified. The floorplanning algorithm
has to determine the width wi and height hi of each block Bi such that Ail < hi /wi < Aih.

There can be two types of floorplans namely – slicing floorplan and hierarchical
floorplan. The slicing floorplan is obtained by recursively partitioning a rectangle into
two parts, either by a vertical line or by a horizontal line. If the floorplan is different from
this method, which means that, if the floorplan cannot be distinctly divided into vertical
or horizontal slices, then it comes under hierarchical floorplan.

The examples for both of these types of floorplans are as shown –

-84-
Figure 6.1: (a) Example layout (b) Its slicing tree

Figure 6.2: Example layout and its hierarchical tree

-85-
The methods of floorplanning can be classified as – Constraint based methods, Integer
programming based methods, Rectangular dualization based methods, Hierarchical tree
based methods, Simulated evolution methods and Timing driven algorithms. Here, most
importantly, two methods of floorplanning are discussed, as follows –

1. Integer programming based floorplanning: This method ensures that two


blocks will not overlap after the floorplanning, by considering the dimensions of
the particular blocks. Let (xi,yi) be the coordinates of the left-bottom corner of the
block Bi, and wi & hi be respectively the width and height of the block Bi. If block
Bj should not overlap with Bi, then there are four chances, as shown below –

Chip layout
Bj

Bj
Bj Bi Hmax

Bj

Wmax

Figure 6.3: Neighborhood criteria for the blocks

As is evident from the figure, the block Bj can be located at the right side of Bi, at
the left side of Bi, above Bi or below Bi. Therefore, the conditions of the
dimensions for these blocks can enumerated as –

xi + wi ≤ xj (Bj is to the right of Bi), or


xi – wj ≥ xj (Bj is to the left of Bi), or
yi + hi ≤ yj (Bj is above Bi), or
yi – hj ≥ yj (Bj is below Bi)
Based on the above mentioned conditions, the algorithm should ensure that there
is no overlap of any block with any other block. In addition, the width and height
of any block cannot obviously exceed Wmax and Hmax respectively. These
conditions are computed in the algorithm by making use of the integer variables.

-86-
2. Rectangular dualization based floorplanning: This approach can be utilized
when the output from a partitioning algorithm is represented as a graph. The
floorplan can be obtained by converting the graph into its rectangular dual. An
example for this approach is as shown –

(a) Planar directed graph

(b) Floorplan of the digraph

Figure 6.4: Rectangular dualization

However, this approach is suitable only if the graph can be directly converted into its
rectangular dual, which may not be the case always. Hence, the following are the
limitations of this particular approach –

a) During rectangular dualization, areas and aspect ratios are ignored.


b) There are many graphs which do not have rectangular duals.

-87-
6.2 PIN ASSIGNMENT

The purpose of pin assignment is to define the signal of each pin on the package. Pins can
be broadly classified as two types namely – functionally equivalent and equipotential.
These are better explained by means of the layout as shown below –

Figure 6.5: Illustration of two types of pins

The layout shown above corresponds to an nMOS NAND gate, in which the two inputs
are functionally equivalent, in the sense that, both the inputs can be exchanged with each
other, without affecting the functioning of the gate. Similarly, the output’s net is available
on both sides of the gate, which are equipotential, and hence any of them can be chosen
as the output. Due to these types of pins, it is possible to exchange the pins even if the
blocks are pre-designed. This gives the flexibility of performing the pin assignment,
either during floorplanning, or during placement, or even after placement.

The pin assignment techniques are classified into two categories namely – “general” and
“special”. General techniques are applicable at any level and any region within the chip,
whereas special techniques are used within a specific region of the chip – such as a
channel or a switchbox. The methods are summarized as shown –

Pin assignment

General Special

Concentric circle mapping Channel pin assignment


Nine zone method

These three pin assignment methods are discussed in the following paragraphs.

-88-
1. Concentric circle mapping: Here, two concentric circles are considered for the pin
assignment, in which, the inner one is for the pins on the component, and the outer one
is for the pins on the other components which have to be interconnected with the inner
ones. An example is as shown –

Figure 6.6: Concentric circle mapping

In figure (a), the pins on the component and the external pins for interconnect are
shown. In figure (b), two circles are drawn such that the inner circle is drawn inside all
the pins of the component, and the outer circle is drawn outside those pins. In the next
figure (c), lines are drawn from all the pins to the centre of these circles. Now, all of
the pins are defined on these circles by the points of intersection of the lines with the
circles, as indicated in figure (d). Finally, the pin assignment becomes complete by
mapping the points on the outer circle to those on the inner circle, in a cyclic fashion.
Figure (e) shows the best case of pin assignment and figure (f) shows the worst case.

2. Nine zone method: This technique is based on zones in a Cartesian coordinate system,
in which the centre of the coordinate system is located inside the pins of the
component. These pins are considered to be interchangeable, and hence this group of
pins is called as “pin class”. Later a rectangle is drawn, on which each of the nets
connected to the pin class is defined. This rectangle can be positioned in nine zones,
which is as shown in the figure below –

-89-
Figure 6.7: Nine zone method

3. Channel pin assignment: In VLSI terminology, channel is the two-dimensional free


space, in which the interconnects are drawn from two sides, and switchbox is the space
in which interconnects are drawn from all the four sides. As mentioned earlier, a
significant portion of the chip is used for interconnect, and hence it is desirable to
reduce the channel density as far as possible. One such approach for reducing the
channel density is as shown –

Figure 6.8: Reducing the channel density

In figure (a), a channel is shown which requires 3 tracks (3 directions). By moving the
pins, the no. of tracks is reduced to only 1, as shown in figure (b).

Another method of reducing the channel density is by using a dynamic programming


formulation, which is called as “(i,j,k) – solution”. This is explained below.

If the pin assignment algorithm assigns (t0,t1,…ti) terminals at the top and (b0,b1,…bj)
terminals at the bottom, exactly to the ‘k’ columns of the channel, then it is called as
(i,j,k) – solution. These solutions can be classified into 4 types, as follows –

-90-
Type 0: No terminal assignment to a column k
Type 1: Only ti is assigned to a column k
Type 2: Only bj is assigned to a column k
Type 3: Both ti and bj are assigned to a column k

The figures are as shown below –

Figure 6.9: Four types of (i,j,k) – solutions

The goal of the algorithm is to have optimal pin assignment, for reduced channel
density. Hence Type 0 cannot be present in the solution. Let x(i,j,k), y(i,j,k) & z(i,j,k)
be the local densities for Type 1, Type 2 and Type 3 respectively. Let R(i,j) denote the
set of nets for each type. Then we have,

where the notations are as follows –

R1(i,j) denotes the set of nets with one terminal in {t1,t2,…,ti-1,b1,b2,…,bj}


R2(i,j) denotes the set of nets with one terminal in {t1,t2,…,ti,b1,b2,…,bj-1}
R3(i,j) denotes the set of nets with one terminal in {t1,t2,…,ti-1,b1,b2,…,bj-1}
The other terminal for all these types is in TOP U BOT; i.e, {t1,t2,…,ti,b1,b2,…,bj}

-91-
CHAPTER – 7

PLACEMENT

As already mentioned earlier, the placement process on the chip is the assigning of the
specific locations to the blocks on the layout. The input to the placement phase is a set of
blocks, the no. of terminals for each block, and the netlist. The placement can have three
levels of process, as shown –

System level Minimum area


Maximum heat dissipation

Components on both sides,


Board level Minimum no. of routing layers
Minimized lengths of critical nets

Circuit is on only one side


Chip level The no. of routing layers is limited

Figure 7.1: Levels of placement

The goals for the chip level placement are –

i) The layout has to be divided into a set of rectangles, and each block has to be
placed in its corresponding rectangle.
ii) No two rectangles should overlap. i.e., Ri ∩ Rj = ø.
iii) Placement should be routable fully.
iv) The total area of the rectangle should be minimized.
v) The total wirelength should be minimized.

Based on the process used, the placement algorithms can be classified as – simulation
based and partitioning based, which are summarized as follows –

Placement algorithms

Simulation based Partitioning based

Simulated annealing Breuer’s algorithm


Simulated evolution Terminal propagation algorithm
Sequence pair technique

-93-
7.1 SIMULATION BASED ALGORITHMS

There are many problems in the natural world, which get solved over a period of time, by
following certain algorithms in their own way. For example, in crystals the molecules and
atoms arrange themselves such that the crystals will have minimum size and no residual
strain. Similarly, the herds of animals move around, until each herd has enough space,
and the herd can maintain its predator-prey relationships when compared to other herds.
These problems resemble the placement and packaging problems of the physical design.
Hence, the simulation based placement algorithms simulate some of such phenomena.

1. Simulated annealing algorithm:

Generally a change in the placement configuration is performed in order to reduce the


cost, by moving a component or by interchanging the locations of two components. Thus
in simulated annealing, all the moves that result in a decrease in cost are accepted. In
addition, the moves that result in an increase in cost are also accepted, but with a
probability that decreases over the iterations.

The parameters that are used for the actual annealing process are utilized here, with the
use of a parameter called temperature T. This parameter controls the probability of
accepting the moves which result in an increased cost. The acceptance probability is
given by e-∆C/T, where ∆C is the increase in cost. The algorithm starts with a very high
value of T, which gradually decreases so that the moves that increase the cost have lower
probability of being accepted. Finally, the temperature reduces to a very low value, which
causes only those moves that reduce the cost, to be accepted. In this way, the algorithm
converges to a near optimal configuration. The algorithm is listed as follows –

-93-
The parameters used in this algorithm are – initial temperature (init-temp) and final
temperature (final_temp). In addition, the parameter inner_loop_criterion is used, which
is the number of trials at each temperature. The functions used are – PERTURB, COST
and SCHEDULE. The function PERTURB is used for shuffling a configuration. The
function COST is obviously the cost function. Lastly, the function SCHEDULE is used
for changing the temperature.

2. Simulated evolution algorithm:

This is analogous to the natural process of mutation of species, as they evolve in a better
way, to adapt to their environment. The steps followed are as follows –

a) The algorithm starts with an initial set of placement configurations, which is


called as the population. The individuals in this population are called as genes. A
set of genes that make up a partial solution is called a schema.
b) Simulated evolution algorithm is iterative, and each iteration is called as a
generation. During each iteration, the individuals of the population are evaluated
on the basis of fitness tests, which determine the quality of each placement.
c) Two placement configurations among the population are selected as parents, with
probabilities based on their fitness. The operators called crossover, mutation and
inversion, are then applied on the parents to combine ‘genes’ from each parent to
generate a new individual called the offspring.
d) A new generation is then formed, by including some of the parents and the
offspring, on the basis of their fitness. As the weak individuals are deleted, the
next generation tends to have ‘genes’ that have good fitness.
e) As the fitness of the entire population improves over the generations, the overall
placement quality improves over the iterations.

The three genetic operators that are used for creating the offspring from the previous
generation are discussed below –

i) Crossover: This operator generates offspring by combining schemata of two


individuals at a time.

ii) Mutation: This operator causes incremental random changes in the offspring that are
produced by crossover. This is the process by which new genes which did
not exist in the original generation can be generated.

iii) Selection: This is the process in which, after the offspring is generated, the
individuals for the next generation are chosen based on some criteria.

The simulated evolution algorithm for placement is listed as follows –

-94-
3. Sequence pair technique:

This is a technique to pack the modules in such a way that overlaps are avoided in the
layout. The term “sequence pair” refers to a pair of rectilinear lines that are drawn on the
layout, one from South-West to North-East, and the other one from South-East to North-
West. An example layout is as shown –

Figure 7.2: Sequence pair for the given placement

-95-
This procedure of encoding a placement on a chip to a sequence pair, by means of non-
intersecting, non-overlapping lines is called as “Gridding”. The sequence pair can be
called as S1 and S2, in which sequence S1 is from SW to NE, and S2 from SE to NW.
Each sequence has to pass through all the blocks that are present in the layout. In the
example shown above, only the sequence S1 is indicated.

As already mentioned, the objective of the sequence pair is to generate relations between
the modules of the chip, such that overlap during placement is avoided. For the placement
shown above, considering both S1 and S2, the sequence pair is “abcd” and “cdab”. The
relations of block “a” as per both S1 and S2 are as follows –

LeftOf(a) => modules that are before “a” in both S1 and S2.
RightOf(a) => modules that are after “a” in both S1 and S2.
AboveOf(a) => modules that are before “a” in S1 and after “a” in S2.
BelowOf(a) => modules that are after “a” in S1 and before “a” in S2.

Therefore, for the figure shown above,

LeftOf(a) =() and RightOf(a) = (b)


AboveOf(a) =() and BelowOf(a) = (c,d)

Based on these details, constraint graphs are generated for the example, which are called
as Gh and Gv, for horizontal and vertical constraints respectively. In these graphs, the
modules are used as vertices and relations are used as edges, as shown below –

Figure 7.3: Constraint graphs from the given sequence pair

These graphs are directed acyclic graphs, and they don’t have any overlaps. These
graphs can be represented in data structures, and the resultant placement will have no
overlaps. In addition, the placement will also achieve closest packing of the blocks.

-96-
7.2 PARTITIONING BASED ALGORITHMS

In these algorithms, the given circuit is repeatedly partitioned into two sub-circuits. At
each level of partitioning, the available layout area is partitioned alternately into
horizontal and vertical sub-sections. The process is carried out until each sub-circuit
consists of a single gate, and has a unique place on the layout.

1. Breuer’s algorithm: Breuer presented several placement procedures, in


which different sequence of cut-lines are used. The main idea here is to
reduce the no. of nets being cut, when the circuit is partitioned. The four
important placement procedures are shown as follows –

Figure 7.4: Different sequences of cut-lines

i) Cut-oriented mincut placement: In figure (a), the layout is first cut


into two blocks, so that the net cut is minimized. The process is carried
out for the second cut line, which is horizontal. The procedure is
sequentially followed; the procedure is easy to implement.
ii) Quadrature placement: In figure (b), the layout is partitioned into
four regions of equal sizes. After the minimization of cut-size, the
procedure is repeated in the same way for each block. This method
reduces the routing density in the centre.

-97-
iii) Bisection placement: In figure (c), the layout area is bisected by a
horizontal cut-line, and then each row is bisected further. Later, each
row is repeatedly bisected by vertical lines. This method is usually
used for std. cell placements.
iv) Slice bisection placement: In figure (d), the layout is partitioned
repeatedly by the horizontal cut-lines. After the blocks are assigned to
the rows, the vertical cut-lines are utilized to bisect the columns. In
this method, the congestion at the periphery gets reduced.

To minimize the no. of nets, all of the above mentioned procedures can use
the group migration algorithms, such as K-L algorithm and its variations.

2. Terminal propagation algorithm: If the partitioning algorithms are used


directly, then the terminals in the blocks may move away from each other
after a particular cut. This not only increases the net length, but also
increases the congestion in the channels as well. The problem can be
solved by propagating a dummy terminal, as shown –

Figure 7.5: Terminal propagation

In figures (a) and (b), the terminals A and B, which are connected to each
other, are moving away after the partitioning. The placement algorithm
can preserve the information regarding the connected terminals, by
propagating a dummy terminal to the nearest point on the boundary, as
shown in figure (c). When this dummy terminal is generated, the
partitioning algorithm will not assign it into different partitions, but will
retain their connectedness in the data structure. Thus the terminal
propagation algorithm reduces the net length.

-98-
CHAPTER – 8

ROUTING

Routing can be defined as the process of finding the geometric layouts of all the nets on
the chip. The inputs to the general routing problem are –

a) netlist
b) timing budget for critical nets
c) location information of blocks
d) location information of pins
e) RC delay per unit length on each metal layer

Special nets such as clock, power and ground nets are routed by separate routers. The
routing process can be divided into two phases namely – global and detailed. The global
routing phase generates a route for each net, without specifying the actual geometric
layout of the wires. The detailed routing phase finds the actual geometric layout of each
net, within the assigned routing regions. The two phases of routing are as shown in the
following figures –

Figure 8.1: Two phases of routing

The global routing consists of three distinct stages – region definition, region assignment
and pin assignment. Region definition is the process of partitioning the entire routing
space into routing regions. The routing regions can be as follows –

-99-
Routing regions

Between the blocks Above the blocks (OTC)

Channels 3D-switchboxes
2D-switchboxes

A channel is a rectangular area that is bounded by the blocks on two opposite sides. A
2D-switchbox is a rectangular area bounded by the blocks on all the four sides. These
routing regions are as shown –

Figure 8.2: Routing regions

As indicated in the figure, the 2D-switchbox has pins on all the four sides, as well as in
the middle. The pins in the middle are used to make connections to the nets that are
routed in 3D-switchboxes. Hence in addition, the 3D-switchbox is the rectangular area
with pins on all the six sides. Thus, channels and 2D-switchboxes exist within a layer
whereas 3D-switchboxes exist in the upper layers. In the 3D-switchbox, the pins at the
bottom are used to connect to the nets in channels and 2D-switchboxes, and the pins at
the top can be used to connect to the C4 solder bumps.

In a 5-layer process, only M1, M2 & M3 are used for channel routing, because the upper
two layers are used for the special routing nets. The capacity of a channel is given by,

Capacity of channel = (l x h) / (w + s)

where l = no. of layers, h = height of the channel, w = wire width & s = wire separation.
e.g.: if l=2, h=18λ, w=3λ and s=3λ, then the capacity of the channel = (2x18) / (3+3) = 6.

-100-
As the next stage of global routing, the purpose of region assignment is to identify the
sequence of regions through which a net will be routed. The next pin assignment stage
assigns a pin for each net on the region boundaries.

8.1 GRAPH MODELS FOR ROUTING

In the global routing phase, the routing regions along with their relationships and the
capacities, are modeled as graphs. The graph models must be able to capture the complete
layout information, which includes the adjacencies and capacities of the routing regions.
There are three graph models which are widely used namely – grid graph model, checker
board model and channel intersection graph model. These are explained as follows.

1. Grid graph model:

In this model, a layout is considered to be a collection of unit side square cells, which
are arranged in an array. In the graph, each cell in the layout is represented by a
vertex, and there will be an edge between two vertices if the corresponding cells are
adjacent. The terminal in a cell is assigned to the corresponding vertex in the graph.
The capacity and length of each edge is set equal to one, as shown –

Figure 8.3: Grid graph model

Here, figure (a) represents a layout and figure (b) represents the corresponding grid
graph. Using this grid graph and the given net-list, it is easily possible to find the
routing path. For a two-terminal net, the routing problem is simply to find a path
connecting the vertices that correspond to the terminals. For a multi-terminal net, the
routing problem is to find a Steiner tree in the grid graph.

-101-
2. Checker board model:

This model is a more general one than the grid model. It approximates the entire
layout area as a ‘coarse grid’ and all the terminals located inside a coarse grid cell are
assigned that cell number. The following figure (b) shows a checker board graph
model of a layout that is shown in figure (a). Here, the partially blocked edges have a
capacity of one, whereas the unblocked edges have a capacity of two. When the cell
numbers of all the terminals of a net are given, the global routing problem is to find a
routing in the coarse grid graph.

Figure 8.4: Checker board graph model

A checker board graph can also be formed from a cut tree of floorplan. Each block in
a floorplan is represented by a vertex, and there is an edge between the vertices if the
corresponding blocks are adjacent to each other. The following figure shows an
example of a checker board graph for a cut tree of a floorplan –

Figure 8.5: Checker board graph of a floorplan

-102-
3. Channel intersection graph model:

This is an accurate model for global routing. Given a layout, we can define a channel
intersection graph, where each vertex represents a channel intersection. In this case,
two vertices are adjacent if there exists a channel between them. In other words, the
channels appear as edges. The following figures show an example layout (a) and its
corresponding channel intersection graph (b) –

Figure 8.6: Channel intersection graph

The channel intersection graph shown above does not contain information about the
terminals. Hence, this graph should be extended to include the pins as vertices, so that the
connections between the pins can be considered. An example for the extended channel
intersection graph is as shown below. Here, the terminals are represented by means of the
vertices, and these vertices are added to the graph shown in the previous figure –

-103-
Figure 8.7: Extended channel intersection graph

Usage of Steiner trees in global routing: As mentioned earlier, the global routing of
multi-terminal nets can be formulated as a Steiner tree problem. As is already known, the
Steiner tree interconnects a set of specified points called demand points, and some other
points called Steiner points. The diameter of a Steiner tree is defined as the maximum
length of a path between any two vertices. The objective function for high-performance
circuits is to minimize the maximum diameter of the selected Steiner trees. An example is
shown below. Here, both the Steiner trees are of length 30, but the one shown in (b) has
diameter equal to 20, which is much smaller that the diameter of the tree shown in (a).

Figure 8.8: Difference between diameter & length in Steiner trees

-104-
8.2 CLASSIFICATION OF GLOBAL ROUTING ALGORITHMS

There are two approaches for global routing namely sequential and concurrent. In the
sequential approach, the nets are routed one by one, whereas the concurrent approach
considers the routing of all the nets simultaneously. Both the approaches have their own
advantages as well as disadvantages. In the sequential approach, once a net has been
routed, it may block the other nets which are yet to be routed. Hence, this approach
requires that the nets are sequenced according to their criticality. The important nets are
assigned a high criticality number. The concurrent approach avoids this ordering
problem, by the simultaneous consideration of the nets. However, this approach is
computationally hard and there are no efficient polynomial algorithms for the solution.
All of the available algorithms for both of these approaches are summarized below –

Global routing algorithms

Sequential approach Concurrent approach

Two-terminal Multi-terminal Integer programming based


Lee’s
Maze routing Soukup’s Steiner tree based
Line probe Hadlock’s
Shortest path based Separability based,
Non-rectilinear Steiner tree based,
Steiner min-max tree based,
Weighted Steiner tree based

1. MAZE ROUTING ALGORITHMS:

These algorithms are used to find a path between two points, in a planar rectangular
grid graph. In the grid graph, the areas available for routing are represented as
unblocked vertices, and the obstacles are represented as blocked vertices. The
objective of the maze routing algorithm is to find a path between the source and the
target, without using any blocked vertex. Several methods have been developed for
the path exploration, out of which three are discussed here.

1.1 Lee’s algorithm: This is an improved version of the breadth-first search. The
search can be visualized as a wave propagating from the source S till the
destination T. The source is labeled ‘0’ and the wavefront propagates to all the
unblocked vertices adjacent to the source. Every unblocked vertex adjacent to the
source is marked with a label ‘1’. Then, every unblocked vertex adjacent to
vertices with a label ‘1’ is marked with a label ‘2’, and so on. This process
continues until the target vertex is reached or no further expansion of the wave
can be carried out. An example of the algorithm is shown in the figure –

-105-
Figure 8.9: A net routed by Lee’s algorithm

Due to the breadth-first nature of the search, Lee’s maze router is guaranteed to find a
path between the source and target, if one exists. In addition, it is guaranteed to be the
shortest path between the vertices. Lee’s Algorithm is formally described as follows –

-106-
In the algorithm described above, the inputs are: an array B, the source s and the target t.
The array element B[v] denotes if a vertex is blocked or unblocked. In addition, the
algorithm uses an array L, where L[v] denotes the distance from the source to the vertex.
Two linked lists plist (Propagation list) and nlist (Neighbor list) are used to keep track of
the vertices on the wavefront and their neighbor vertices respectively. It is assumed that
the neighbors of a vertex are visited in counter-clockwise order.

1.2 Soukup’s algorithm: The limitation of the Lee’s algorithm is that it requires a
large search time, due to the equal amount of search in the directions away from
target, as well as in the directions towards it. This limitation is overcome in
Soukup’s algorithm, which is an iterative one. The steps followed are –

a) The algorithm explores in the direction toward the target until it reaches either the
target or an obstacle.
b) If the target is reached, the exploration phase ends. If the target is not reached, the
search is conducted iteratively.
c) If the search goes away from the target, the algorithm simply changes the
direction so that it goes towards the target and a new iteration begins.
d) If an obstacle is reached, the breadth-first search is employed, until a vertex is
found which can be used to continue the search towards the target.
e) Then, a new iteration begins to find a path towards the target.

The following figure illustrates the Soukup’s algorithm with an example, in which the
number near a vertex indicates the order in which that vertex was visited.

Figure 8.10: A net routed by Soukup’s algorithm

-107-
The search method for this algorithm is a combined breadth-first and depth-first search.
Hence, this algorithm improves the speed of the Lee’s algorithm by a factor of 10 to 50.
However, the path between source and target may not be the shortest one. The following
is the formal description of Soukup’s Algorithm –

-108-
1.3 Hadlock’s algorithm: This is another approach to improve upon the speed of
search. In this algorithm, the length of a path (P) connecting source and target is
given by M(s,t) + 2d(P), where M(s,t) is the Manhattan distance between source
and target, and d(P) is the number of vertices that are directed away from the
target. The length of P is minimized if and only if d is minimized, as M(s,t) is
constant for given pair of source and target. The following figure illustrates the
Hadlock’s algorithm, in which the number near a vertex indicates the order in
which that vertex was visited –

Figure 8.11: A net routed by Hadlock’s algorithm

The exploration phase of the algorithm uses a different approach of numbering of the
vertices. Here, instead of labeling the wavefront by a number corresponding to the
distance from the source, the algorithm uses the detour number. The detour number of a
path is the number of times the path has turned away from the target.

A formal description of Hadlock’s Algorithm is given below. Here, the function


DETOUR-NUMBER(v) returns detour number of a vertex. The procedure
DELETE(nlist, plist) deletes the vertices which are in plist from nlist. Finally, the
function MINIMUM-DETOUR(nlist) returns the minimum detour number among all
the vertices in the list nlist.

-109-
Comparison of maze routing algorithms

Algorithm Lee’s Soukup’s Hadlock’s


Complexity O (h x w) O (h x w) O (h x w)
Approach BFS DFS & BFS BFS
Path traced Shortest path Shorter path Shortest path
Search time Larger time Shortest time Shorter time

All the maze routers are grid based methods, in which the information must be kept for
each grid node. Thus for a large grid, a very large memory space is needed to implement
these algorithms. There may be 5000 to 10000 nets in a typical chip. Such numbers make
these maze routing algorithms infeasible for large chips. In order to reduce the large
memory requirements and the run times, line-probe algorithms were developed.

-110-
2. LINE PROBE METHOD:

The line-probe algorithms were developed independently by Mikami & Tabuchi in


1968, and Hightower in 1969. The basic idea of a line probe algorithm is to reduce
the size of memory requirement by using line segments instead of grid nodes. The
time and space complexities of these algorithms is O(L), where L is the number of
line segments produced. The basic operations of these algorithms are as follows –

a) Initially, lists slist and tlist contain the line segments generated from the source
and target respectively. These line segments do not pass through any obstacle.
b) If a line segment from slist intersects with a line segment in tlist, the exploration
phase ends; otherwise, the exploration phase proceeds iteratively.
c) New line segments are generated during each iteration. These segments originate
from the ‘escape’ points on existing line segments in slist and tlist.
d) The new line segments generated from slist are appended to slist. Similarly,
segments generated from a segment in tlist are appended to tlist.
e) If a line segment from slist intersects with a line segment from tlist, then the
exploration phase ends.
f) The path can be formed by retracing the segments in tlist, and then going through
the intersection, and finally retracing the segments in slist, until source is reached.

The Mikami and the Hightower algorithms differ only in the process of choosing
escape points. In Mikami’s algorithm, every grid node on the line segment is an
‘escape’ point, which generates new perpendicular line segments. The following
figure shows a path generated by Mikami’s algorithm –

Figure 8.12: A net routed by Mikami-Tabuchi’s algorithm

-111-
The Hightower’s algorithm makes use of only a single ‘escape’ point on each line
segment, as shown in the figure below –

Figure 8.13: A net routed by Hightower’s algorithm

The advantage of Hightower’s algorithm is that it generates fewer escape points when
compared to the previous one. However, this itself is its disadvantage in that it may not
be able to find a path joining two points, even when such a path exists.

3. SHORTEST PATH BASED ALGORITHM:

This is a simple approach to route a two-terminal net, which is suitable for channel
intersection graph. This algorithm is based on Dijkstra’s shortest path algorithm for
global routing a set N of two-terminal nets in a routing graph G. The output of the
algorithm is a set of P paths for the nets in N. The algorithm is described as follows –

-112-
4. STEINER TREE BASED ALGORITHMS:

The Steiner tree approach is the natural approach for routing multi-terminal nets, and
usually Rectilinear Steiner Trees (RST) are used. Let S be a net to be routed, and let
G(S) be the underlying grid graph that is obtained by drawing horizontal and vertical
lines through each point of S. An example of S with six vertices is as shown –

P3
P6
• P4
• •

P1 •

P2

P5

Underlying grid
MST
Edge layouts

Figure 8.14: Grid, MST and edge layouts

If an edge of is rectilinearized as a shortest path between two vertices on the grid, then it
is called as a “staircase edge layout”. For example, all the edge layouts in the figure
shown are staircase layouts. A staircase layout with exactly one turn on the grid is called
as an “L-shaped layout”. A staircase layout having exactly two turns on the grid is called
as a “Z-shaped layout”. In the figure shown, the edge layouts of P3 and P1 are L-shaped
and Z-shaped layouts respectively.

An RST obtained from an MST of a net S, by rectilinearizing each edge using staircase
layouts on G(S), is called S-RST. An S-RST in which the layout of each MST edge is an
L-shaped layout is called an L-RST. Similarly, an S-RST in which the layout of each
MST edge is a Z-shaped layout is called a Z-RST. An optimal S-RST is an S-RST of the
least cost among all the S-RST’s.

Different algorithms exist for global routing using Steiner trees such as – Separability
based algorithm, non-rectilinear Steiner trees, MIN-MAX Steiner trees and weighted
Steiner trees. These are discussed in the following.

-113-
4.1 Separability based algorithm: This algorithm is used to find an optimal S-RST
from a separable MST. A pair of nonadjacent edges is called separable if the
staircase layout of two edges does not intersect or overlap. An MST is called as a
separable MST (SMST) if all pairs of non-adjacent edges satisfy this property.

If an edge is deleted from an SMST, the staircase layouts of the two resulting
subtrees do not intersect or overlap each other. Overlaps can occur only between
edges that are incident on a common vertex. This property enables the use of
dynamic programming techniques to obtain an optimal S-RST. This algorithm
works in two steps, which are explained below.

In the first step, an SMST is constructed for the given net, by using a modified
Prim’s algorithm. In the second step, an optimal Z-RST is obtained by using the
SMST obtained. This optimal Z-RST is equivalent to an optimal S-RST, which is
used as an approximation of the minimum cost RST. The two algorithms are as
shown below –

-114-
4.2 Non-rectilinear Steiner tree based algorithm: In order to obtain the smaller
length Steiner trees, the concept of separable MST’s in δ-geometry is being
introduced. In δ-geometry, edges with angles iπ/δ for all i, are allowed, where δ is
a positive integer. The values of δ = 2, 4 and ∞ correspond to rectilinear, 45° and
Euclidean geometries respectively. The following figures illustrate the trees for
the values of δ = 2 and 4 respectively –

(a) 2-geometry (b) 4- geometry

Figure 8.15: δ-geometry with values 2 and 4

As is evident from the figure, the tree length in 4-geometry is shorter than the one in
the rectilinear geometry (2-geometry). The experiments have shown that tree length
can be reduced up to 10-12% by using 4-geometry as compared to 2-geometry.
Length reduction is quite marginal for higher geometries. As a consequence, it is
sufficient to consider the layouts in 4-geometry for the global routing problem.

4.3 Steiner min-max tree based algorithm: MIN-MAX Steiner trees are used for
minimizing the traffic in the densest channels. The approach uses a restricted case
of Steiner tree, called Steiner Min-Max Tree (SMMT), in which the maximum
weight edge is minimized. In SMMT, real vertices represent channels containing
terminals of a net, and Steiner vertices represent the intermediate channels, with
the weights corresponding to densities.

Given a weighted coarse grid graph G = (V, E) and a Boolean array d such that
d(v) is true if the vertex v є V corresponds to terminals of Ni, an SMMT of Ni can
be obtained by using the algorithm shown below. In the algorithm listed, the
SMMT is represented as T –

-115-
In this algorithm, Function EXIST-ODSV(T, d) returns TRUE if there exists a one-
degree Steiner vertex in T. Function GET-ODSV(T, d) returns a one-degree Steiner
vertex from T. REMOVE(v, T) removes vertex and edges incident on it from T.

4.4 Weighted Steiner tree based algorithm: This is an approach that works in the
presence of obstacles and simultaneously minimizes wire lengths and the density
of the routing regions. A weighted Steiner tree is a Steiner tree with weighted
lengths. The term weighted length indicates that an edge with length l in a region
with weight w has weighted length lw. A weighted rectilinear Steiner tree
(WRST) is a weighted Steiner tree with rectilinear edges. The algorithm to find an
approximation of minimum-weight WRST comprises of two steps –

- The first step is to find an MST T for a given net using Prim’s algorithm.
- In the second step, the edges of T are rectilinearized one by one.

In general, there are more than one possible staircase layouts. Let Pi(ej) denote
one of the possible staircase layouts and let Qi, j denote the final minimum cost
layout. The formal description of the algorithm is given below, in which the
function FIND-P(i, ej, R) finds Pi(ej) and function CLEANUP(Qi, j) removes
overlapped layouts. The function WT(Qi, j) gives the total weighted length of Qi, j.

-116-

Potrebbero piacerti anche