p587 Paulin

HAL: A MULTI-PARADIGM APPROACH TO
AUTOMATIC DATA PATH SYNTHESIS
P.G. PAULIN (1) - J.P. KNIGHT (2) - E.F. GIRCZYC (3)

(1) Dept. 5Ll1, Bell-Northern Research
P.O.Box 3511, Stn C, Ottawa, ONT. KlY 4H7
(2) Carleton University, Colonel By Dr, Ottawa, ONT. KlS 5B6
(3) University of Alberta, Edmonton, ALTA. T6G 2G7
Abstract 2. OVEFWIEW OF ADTOMATIC DATA PATH SYWl’HESIS
A novel approach to automatic data path synthesis is pre- The task of automatically designing a data path from a
sented. This approach features innovations in the syn- functional description is usually divided into many sub-
thesis process as well as in the system implementation. tasks [,I. The first is the definition of the function
of the circuit in a hi h-level hardware description lan-
The synthesis process exhibits three new features. The guage (HDL) e.g. VHDL 9 41. This is usually followed by
first relates to a subtask that performs an expert analy- a translation to a graph-based representation derived
sis of the input data flow graph and attempts to evenly from the data flow. These graphs are given different
y;st;;;gte operations requiring similar resources. This names in different synthesis systems (e.g. VT - Value
using a novel “load balancing” technique. The Trace 151, DDG - Data Dependency Graph [61, DAG - Di-
second consists of a global preselection of -operator rected Acyclic Graph [7], CDFG - Control and Data Flow
cells to fulfill an exolicit sueed constraint. Finallv. Graph [al) but are simply different adaptations of simi-
the third deals with new techniques for register ah lar basic concepts. We will use the generic term ‘Data
multiplexer optimization. These features support ex- Flow Graph’ (DFG) for simplicity.
tended design space search by taking an explicit perform-
ance specification into account. The next step is the allocation of operator cells (i.e.
hardware components) to realize the abstract operations
The system implementation is based on the LOOPS multi- (nodes) of the DFG. The DFG is Dartitioned into discrete
paradigm programming system. In this approach the over- control steps and registers are Allocated to realize the
all task can be partitioned into complementary subtasks values (edges) of the DFG. We must stress the difference
requiring different programming paradigms. These subtasks between ‘allocation’, which is simply the creation of a
will be realized using an object-based paradigm, a set of operator cells to execute the specified operations
knowledge-based expert system paradigm, a functional within the scheduling constraint, and ‘assignment’ which
paradigm, or combinations of all three. is the actual mapping of those operations to specific op-
erator cells. This is a many to one assignment: many op-
Two complete examples are given to demonstrate the erations can be assigned to a single operator cell.
functionality of the system and to allow comparison with
existing systems. The data path is then synthesized using the allocated op-
erator cells, registers and some extra data transfer
components (e.g. multiplexers, buses). The final real-
ization of this data path is done either by binding the
1. 1WTRODucF1ON high-level components to pre-designed templates that rep-
resent their gate-level structure or by synthesizing them
directly with module builders. Interconnection of these
The ability of IC foundries to manufacture increasingly structures is also done at this stage.
comulex chios is leadina the IC desianer’s involvement
away from debice-level coniiderations atid towards archi- The final step is the synthesis of the controller (often
tectural ones. A side-effect of this shift is the inte- a microprogrammed or PLA based controller design) using
gration of system designers into IC design teams. the information derived during the control step parti-
tionina and data uath creation. Interconnection of the
One of the most important tools that have contributed to controller with the data path is also done here.
this continuing trend is the so-called silicon compiler
111. The input language of a silicon compiler can be di- We can thus summarize the synthesis task:
vided into two major categories: structural and furic-
tional [21.
A structural input language specifies a hierarchical de- 1. Translation of HDL description to a DFG
scription of subcircuits and their interconnections. ).
2. Operator cell allocation
Conversely, a functional input language describes a cir- Control steo partitionina
cuit by specifying it’s input/output mapping, without any 4. Register alio‘cation *
explicit structure specification. According to [21, an 5. Data path synthesis
ideal silicon compiler environment would combine both ap- 6. Binding of data path operator cells
proaches. 7. Synthesis of control section
8. Interconnection of components
The system proposed here attempts to do this by allowing
the user to oartition the overall structure into hiah- The HAL modules (the acronym is derived from Hardware
level block; and interfaces. , It then uses a functional ALlocater, not to mention the obvious AI reference) deal
description of each block to automatically generate a with the second to fifth subtasks.
distinct data path. Extended design space exploration is
made possible by taking an explicit performance specifi-
cation into account. We will present a novel approach to
the synthesis task and will compare it with the w-
proaches of other well known systems.
We then give a short description of the implementation
approach, followed by a more detailed description of the
synthesis Finally, we present some exper-
imental resZoach’ .
23rd Design Automation Conference

0 IEEE
587
3. LITERATURE SURVEY 4.1. COkfPhRISON OF SYNTHRS IS PROCRSSES
There are just about as many different approaches to au- The data path synthesis process described here differs
tomatic data Dath aeneration as there are svstems that from the above-mentioned systems in at least four funda-
attempt this- task. They can, however, be divided into mental ways.
two large subclasses: the knowledge-based heuristic -
$yach 19, 101, and the algorithmic approach [5, 8, ?!,
. While both produce acceptable results, it is widely
agreed that the former has a higher potential in the long Hiah-level exploration of the desian space
term as it allows the capture of more of the design proc-
ess [5J. Many of the recent data path generators follow
this approach. The first and most important difference is in the high-
level exploration of the design space.
We will examine three well-known systems from Carnegie-
Mellon University and one from Carleton University. These The HAL system takes an explicit timing constraint and
sytems are closely related to the one presented here. applies it in the beginning of the synthesis process.
They are the Desian Automation Assistant (DAA) IS. 10. High-level decisions such as control step partitioning
131; the SMlJCS data path generator f5.1, the Facet ‘s'stem and ooerator cell allocation are affected directlv bv
1121. and the ELF system from Carleton University [E 1 . this cbnstraint. This allows the designer to explore a
full spectrum of implementations, from maximally parallel
The Design Automation Assistant (DAA) first assigns basic ones to fully serial ones. The examples in the following
storage elements to hardware modules and ports. Then the sections illustrate the variations of the resulting data
synthesis operation assigns minimum delay information to path structure due to this methodology.
develop a parallel design. Next it maps the unassigned
storage elements to register modules. Finally, each data The three systems from CMU place very strong emphasis on
flow operation is assigned to a hardware module. The the synthesis of maximally~parallel~data paths- This is
system then optimizes the realization during the expert in keeDin0 with the GPIC or comouter desian orientation
analysis phase that follows. This analysis is based on a their _ creators have given them: The DAP. system also op-
speed-area cost function. As we mentioned earlier, the timizes the realization taking limited speed-area con-
approach used here is of the heuristic type, using a straints into account. Even in this case, major changes
approach in the data path cannot be done easily without destroying
knowledge-based expert system, or KBES. This the initial structure. For example, if an ALU was allo-
has the advantage of allowing the use of a large amount cated originally, the data path structure derived using
of design expertise to reduce the search space. this allocation is optimized for this case only. Replac-
ing the ALU by single function operator cells to increase
The SMUCS s stem is based on an algorithmic approach. It speed would render the resulting structure poorly adapted
maps (binds 7 the abstract operations to hardware operator to the new constraint. This should be done very early in
ceils in an iterative fashion. It proceeds by repeatedly the design process, when the system still has a global
analvzino
a-- d
the intermediate data oath and then bindina one view of the structure and its interrelation with the
element at a time until all elkments are bound. -This speed-area constraints.
binding is based on a minimization of the incremental
cost associated with the cell being considered. This cost
is a composite of the incremental cost of the current it- Operation Schedulinq
eration and also of the costs incurred in the next few
iterations.
A second fundamental difference in the synthesis process
The Facet system also makes use of an al orithmic ap-
presented here is in the operation scheduling (control
proach. The system uses clique partitioning 9 141 to group step partizizning). For example, while the Facet system
and assign abstract operations to operator cells, and
uses an soon as oossible" (ASAP) schedulina tech-
storaoe ooerations to reaisters. The data oath is then
nique, the HAL system reiies on a "load balancing"-analy-
direc<ly generated from these elements without subsequent sis. This analysis attempts to evenly distribute
optimization (as opposed to the DAA system). operations that make use of similar resources.
Finally, the ELF system proceeds by assigning an urgency
weight to each operation (node) of the control and data
flow oraoh (CDFG). The 'knowledae' that the allocator
Allocation and assianment of operator cells
uses to translate the DFG is represented as graph grammar
production rules. The allocator enters a ioop which is
oerformed for each time slot. A areedv alaorithm with
interchange is used to select-which rule to ap- The third difference relates to the allocation and as-
pairwise
ply for the realization of each operation. signment of operator cells. The iterative nature of the
FMJCS and ELF systems allov little backtracking. If a
misguided assignment was made early in the design proc-
ess, then that assignment is nearly irreversible. This
is due to the inherent short-sightedness of the systems.
4. SYSTEM COMPARISON
The HAL system attempts to resolve this problem by doing
The system presented here is similar to these in some re- the operator allocation in a global expert analysis
spects but also embodies many new concepts in the data phase. The actual assignment of operations to specific
synthesis process as well as in the implementation operator cells is done later in the data path synthesis
path
Furthermore, vhile the systems developed at subtask.
approach.
Carnegie-Mellon University address the general purpose IC
design task (with a strong emphasis on computer designs),
the ELF and HAL systems deal with application specific IC Operator cell qeneratiog
(ASIC) designs. This is due primarily to the intended
use of the synthesized circuits in telecommunication ap-
plications. The HAL system is designed to allow the use of an estab-
lished standard cell library. or a module builder. The
system allows merging operations into specialized ALU's
but it can constrain the operations that can be merged.
Thus it can be told not to generate the ALU performing
the ADD, OR and MULTIPLY operations that is used in the
Facet example later.
588
The implementation approach differs in one major aspect:
while most of the other mentioned systems rely nearly ex-
clusively on a single programming paradigm, the HAL sys-
tem makes use of a multi-paradigm approach.
This decision stemsfrom the fact that different tasks
are best suited to different programming styles. In turn,
each style hasan associated computing efficiency that MathopI
must be traded off against the ease of representation.

opa.rol
High-level decisions such as the initial operator cell
allocation or register combining require knowledge ex-
pressed in the form of heuristics. A rule-based pro-
duction system is ideally suited for this tvne
__ of
knowledge gepresentation. *
On the other hand, tasks such as the operator cell inter-
connection are far better suited to purely algorithmic
approaches, both for ease of representation and computing
efficiency. Functional programming paradigms such as
LISP [15] could be used effectively for this task.
Finally, the overall representation and classification of
the graphs, abstract operations and operator cells (e.g.
ALU’s, multiplexers, registers, etc.1 would benefit from
rich data representation. An object-based programming
;aradigm seems ideally suited to this requirement.
5. HAL SYSTEM IHPLEHSNTATION
As mentioned above! the implementation approach used here

is based on a multi-paradigm programming language. The
softvare was written on a XEROX LISP Machine using the
LOOPS object-oriented, multi-paradigm programming system
[161. This system features:
1. Object-based programming with non-hierarchical
inheritance (i.e. multiple superclasses).
2. InterLISP - list processing and functional program-
ming.
3. Rulesets - for creation of rule-based expert systems.
4. Active values - invocation of procedures when a vari- Figure . Class hierarchy of objects.
able is read or set (very useful for simulation pur-
poses). invertible or non-invertible operations. The latter dis-
5. Composite objects - a way of defining templates for tinction will allows us to perform pairwise operand
related objects that are instantiated as groups. interchanges to minimize the number of multiplexer in-
6. Interpreted or compiled execution of programs. as we shall see belov. Another subdivision of the
puts,
The overall structure of the system is object-based. For operation class is the I/O class which comprises input
example, the DFC, operator cells, registers, etc., are and output operations.
all represented as objects. The rule sets and synthesis
algorithms are described as methods of the objects. Properties of the operator cell class: This class de-
scribes all the objects that represent the hardware com-
The main idvantage of object-oriented frameworks is that ponents of the final data path..The main subdivisions are
make possible a direct and natural correspondence the hioh-level ooerator cell class (labelled 'MathOur').
%%en the digital system and it's model I171. The ori- the register clas;, the multiplexer class and the' I/6
gins of ;;i-17 approach can be found in artificial intelli- port classes. Instances of these classes are assigned to
gence knowledge representation techniques, one or more instances of the operation class (with the
especially semantic networks, and in programming lan- exception of multiplexers that are created to direct op-
guages such as Simula 1181 and Smalltalk 1191. The object erands to the proper operator cells, and are not assigned
representation scheme in LOOPS is largely based on the to any particular operation instance).
latter. Registers are assigned to one or more storage operations.
Objects are organized into classes for the purpose of They are used to store intermediate results available at
capturing common characteristics. These classes in turn the output of operator cells (when the operands are not
are ordered hierarchically. A class placed at a lower hi- stable).
erarchical level inherits all the properties of its
superclass (unless these properties are redefined in the
class itself). Inheritance is useful here because it al-
lows the programmer to limit his specification to the
wavs in which the subclass differs from it's
superclass(
The class hierarchy of the HAL system is illustrated in

the class inheritance lattice of Figure 1. The main sub-
classes of the general element class are the abstract op-
eration class and the operator cell class.
Properties of the operation class: This class (labelled
'Ooeration' in the lattice) describes all the obiects
th'at represent the operations of the DFG. Operation; are
assigned to an object by creating an "instance" z$t;e
corresponding operation class (e.g. Add, And, Or, . .
A major subdivision of the operator class is the logic
and arithmetic operation class (labeled 'MathOp"'). AS
illustrated by the class inheritance lattice, this class
is further subdivided into one or two operand and
589
6. HAL MODULE DESCRIPTIONS program given in Figure 2. This program solves a second
order differential equation. The program might be used
to describe a subsystem of a controller or have a digital
signal processing application.
6.1. INHAL MODULE
This module accepts as input a data flow graph represen- [Solve the differential equation: y" + Sxy' + 3y - 01
tation of a functional description of a circuit to be de-
signed. This DFG indicates the operations to be program diffeq(input,ouput);
performed, their precedence constraints, as well as con- var a,dx,x,u,y,xl,ul,yl:integer;
ditional and loop constructs. begin
[Read in the ordinate, ‘a’ at which we want the
The module starts by determining the critical path(s) of value of the func:tion, the step size 'dx', and
the DFG. This also determines the shortest possible axe- boundary condition]
cution time (using a maximally parallel akchitecture). read(a,dx,x,u,y):
Finally, the operations on non-critical waths are as- while x<a do
signed-to specific cycles of the critical bath, while at- begin
tempting to minimi& the subsequent operitor cell cost. xl := x + dx :
This last step is a non-trivial one. The method used ul :- II - 5uxdx - 3ydx ;
here is divided into three phases: yl := y + udx ;
x := xl; " :- ul; y := yl;
Phase 1 Minimization of the concurrency of similar op- end;
erations. writelnty);
end;
Phase 2 Estimation of required hardware resources top-
erator cells) by invoking the HidHAL module de-
scribed in the section.
Phase 3 Minimisat ion

next
of the concurrency of operations

I Figure 2. Pascal
second
program
order
for
differential
iterative solution
eouation.
of
requiring similar resources.

l';,ewfata flow graph derived from this description is
The first phase involves the determination of both an III Figure 3. It is to be noted that the redundant
ASAP (as soon as possible1 scheduling and an ALAP (as 'udx' multiplication would typically be eliminated by an
late as oossible) schedulina. Combinina results for both optimizing compiler. For this example, we will use the
schedules will determine th; possible -time frames for graph exactly as it was defined by the designer.
each operation. A useful heuristic is to assume uniform
statistical distribution of the operations along these
1” =I
’
time frames. The resultina distribution araoh should in-
dicate where concurrency- of similar operations can be
minimized.
The system iteratively assigns operations to cycles along

the critical path using heuristics that attempt to mini- Cycle
mize the concurrency of similar operations while taking 1 l +
into account the cost associated with their realization. (
For example, two concurrent multiplications imply a a

Y
higher realization cost than three additions, so they Cycle
will be given a higher priority in the distribution bal- 2 + >
ancing Process.
In the second phase the MidRAL module will be invoked to Cycle

generate a first estimate of the operator cell allocation 3
using the scheduled DFG derived in the first phase.
The
goal
similar
0*e
ever
third
estimated.
of a pair
chase
is to minimize
resources,
and a subtraction),
reassigned
this
is similar
For example
since
of concurrent
to a control
is possible.
to the first.
the concut-rencv
those
if an ALU has
then one of the
step
operations
where
of
resources
the
been
exceot
operations
(e.g.
operations
ALU is free,
here
have now been

assianed
an addition
will
the
usinq
when-
to
be
Figure 3. Data flow graph

P
representation
Yl
control
of the
1
x
Pascal program of Figure 2: The raw DFG

would not have operations scheduled in
time (clock cycles). Here! ASAP scheduling
Sample schedulinq is shown to allow comparison with the op-
timized schedulina shown below.
The functionality of the InRAL module (and later, of the
entire system) is demonstrated using the sample Pascal A detailed illustration of the DFG after the scheduling
process is shown in Figure 4.
I" the next section, we will compare the operator cell

allocation (the task assigned to the MidHAL module) for
the ASAP scheduling and the optimized scheduling.
IF
Not enough time even with different operations
in parallel.
THEN
Find most parallel functions.
Of these, find most used functions.
Of these, find the function that is the CheaDeSt
to realize.
Find single function operator cell to realize
function.
Cycle Create an instance of cell.
3 Remove from the operation trace the operations
performed by cell.
W-invoke rule set with modified operation trace.
Figure 6. Sample rule 2
This rule invocation process is repeated till all oper-

at ions are allocated to a set of operator cells. Note
that the actual assignment of specific operator cells to
Figure 4. zE;;;;ed DFG (time constraint = 4 operations is not done here It will be done in the next
:
Note that the ‘*’ operations module, while taking geometrical locality considerations
have been partitioned so that a maximum of into account.
two occur in a given cycle. Moreover, the
overall concurrency of operations has been The table in Figure 7 illustrates the allocation of com-
changed from 5.4,l.l to 3,3,3,2. ponents for the example of Figure 3 (using ASAP schedul-
ing). It is interesting to note the many variations of
the set of allocated operators even for small changes in
the timing constraint (e,g. the 7 to 9 cycle area). A
system that started from a maximally parallel realization
6.2. HIDHAL MODULE vould not be likely to find all these intermediate tran-
sformations. It is simpler to determine them using a
global analysis at the highest level.
This module accepts as input the scheduled DFG (the out-
put of the InEAL module), a time constraint
brary of available resources. It then seieczida ,“etlb;
locally optimum operator cells to execute the operations
described in the DFG. This is done while taking into con- NUMBER OF CYCLES
sideration a standard cell based component library, an ____________________-------------------
associated cost table and the timing constraint. The cost COMPONENT NUMBER OF CCMPOtIENTS ALLOCATED
table specifies an area cost function of the operator -
cell. This cost function takes into account an estimate
of the associated multiplexer, register and intercon- 4 13
nection costs. --- ---
alu @
The selection process uses a rule-based expert system. -
The knowledae base consists of a set of heuristic se-
lect ion ruies. The rule set is invoked in an iterative comparator <
fashion. Each invocation allocates a specific hardware -
component, based on the current state of the operation
trace (a simplified representation of the DFG that does subtractor
not include precedence constraints). Once a component is -
allocated to a set of operations, the operation trace is
reduced accordingly, and the rule set is te-invoked. adder +
Two sample rules are illustrated below: 77

t
multiplier *
1 l
IF -
Enough time with different operations in parallel.

AND
No of different operations that are subset of Figure 7. Operator cell allocation for all time con-
ALU functions is less or equal to straints: Example of Figure 3 f ASAP
(current no of cycles - minimum no of cycles). scheduling).
AND
Cost of ALU less than total cost of The cost curves in Figure 8 below illustrate the result
single function cells. of the allocation process using the ASAP scheduling and
the optimized scheduling. The light solid line indicates
THEN the cost curve for the original ASAP scheduling (this
Find best ALU to perform operations. curve is derived from the table in Figure 7). The heavy
Create an instance of ALU. solid line represents the cost curve associated with the
Add to list of allocated operators. optimized scheduling described above (refer back to Fig-
Remove from the operation trace the operations ure 4). The small dotted section represents the result
performed by ALU. of a one phase scheduling process and justifies the pres-
Re-invoke rule set with modified operation trace. ence of the second and third phases.
Figure 5. Sample rule 1 I
591
locality considerations and local clique partitioning:
the details of which will not be discussed in the limited
Cost (Cell area and mux & interconnect estimate) scope of this paper. This partitioning i; in contrast
with the global clique partitioning used the Facet
60 - system. The effect of this approach will be illustrated
in the examples of the following sections.
Storage Elimination: This subtask attempts to eliminate

storage of operator outputs when their input operands are
50 - stable. This elimination will always reduce the number
of multiplexer inputs to the assigned- register, and in
ASAP scheduling some cases will disuense with the resister altoaether.
It should be noted that this cannot be attempted any ear-
Optimized scheduling lier as the exact register assignment must be known.
40 -
Multiplexer operand interchange: Finally, the operands of
commutative operations (e.g. additions multiplications.1
are select ive.ly interchanged to minimize the number of
multiplexer inputs. This task is non-trivial and re-
30 - quires an elaborate algorithm to generate good results
After 1st phase within reasonable computation limits. This extra compu-
tation greatly reduces the overall number of multiplexer
After 3rd phase inputs.
Control information: During the entire synthesis process,

all control information relevant to each operator cell is
stored directly in the object representing it. This
characteristic will render the implementation of a high-
i level functional simulator extremely straightforward.
0 +--.. I -a---~---c---~---~---~--1,--1;--1;--13
0
Number of cycles 7. EXPERIMENTAL RESULTS
Fiaure 9. Cost as a function of timing constraint

Results for differential equation resolution example
Figure 9 illustrates the data path generated from the

6.3. SXRAL MODULE data flow araoh described earlier ( Fiaure 4). The table.
shown in -Figure 10, gives a summary-of the results fok
all time constraints. These results include the relative
This module assigns the DFG operations to specific opera- operator cell cost (based on cell area), the number of
tor cells and then maps the DFG onto a data path. The registers needed, the total number of multiplexer inputs,
process is divided into seven main steps. the number of connections between the operator cells and
Assignment of operator cells to operations finally the number of control lines required. The latter
2 Creation and insertion of storage operations includes the register, multiplexer and ALU control lines.
3. Assignment of register cells to storage operations Finally, in the last column the CPU execution time is
4. Data path generation given for a XEROX 1109 Lisp machine with 10 Meg of vir-
Register combining tual memory (3.5 Meg active RAM).
2 Storage elimination
7. Multiplexer operand interchange
Ass ianment of operator cells to operations: The first

task-is the assignment of operations to specific hardware
ooerator cells allocated bv the MidHAL module. The s&stem
attempts to shorten intei-connect wires by using- the
structure of the DFG (as well as the implicit resulting
data path structure) to quide the assignment process.
This also favors the reutilization of registers providing
the operator with it’s operands, thus allowing potential
savings once again.
Creation and insertion of storage operations: The second

step is the determination and insertion of storage oper-
ations. Storage operations are inserted between depend-
ent but non-consecutive arithmetic operations. These
operations are like the arithmetic and logical operations
of the DFG in the sense that they are abstract objects
that do not necessarily correspond to a register cell on
a one-to-one basis.
Assignment of register cells to storage operations: The
next step is the creation and assignment of registers to
the abstract storage operations. Each register is as-
signed one or more storage operations. TO do this, the
storage operations are combined into small groups of dis-
ioint elements (i.e. storaae ooerations that occur in Figure 9. Data path realized for examplel: time
different cycles). Cliqu; partitioning [141 is used for constraint - 4 cycles.
this purpose. Each of these groups is then assigned a
register.
Data path synthesis: In the third step the resulting DFG

mapped to distributed structure data path.
Azltiplexers areacreated as needed (to direct operands to
operator cells) and interconnection is performed. These
multiplexers will be implemented simply as pass transis-
tors and decoding logic gates.
Register Combining: In this optimization step, compatible

registers are combined using geometrical and functional
592
operator No NO MUX NO ctr1 CPU
cost Reg Inputs Conn. Lines fsec)
-------
4 34 6 13 32 15 140
----__--
5-6 30 6 17 35 17 150
------
>6 20 7 17 32 20 135
-------
Figure 10. Summary of results for examplel: all

time constraints.
Results for Facet examole
To highlight the differences between the HAL system and &I5

the Facet system, we will use the example given in [121.
The DFG derived from this example is shown in Figure 11. Figure 12. Data Path generated by Facet.
V6
Figure 13. Data path generated by HAL (using Facet

ALU’S).
J
The last three lines of the table of Figure 14 summarize

the characteristics of the datapaths generated using the
ALU’s available in the HAL system, for a11 time con-
straints.
Figure 11. DFG for Facet example.

No Operator No No MUX NO ctr1 CPU
Cycles cost Reg Inputs Corm. Lines (set)
The data path synthesized by the Facet system is shown in -p---p-
Figure 12. TO allow the storage of input values into
registers Rl and R2, two extra multiplexers (in dotted FACET ALUS (Linel: FACET Datapath, Line2: HAL DP)
lines) were added (these extra multiplexers are also in-
cluded in the data path generated from the HAL system,
for the same reason). The number of registers, mux in- 31 20 n/a
puts I operator connections and estimated number of con- --P---P
trol lines are given in the first line of Figure 14.
4 n/a 5 13 26 16 100
Figure 13 illustrates the data path generated from the -------
HAL system from the same DFG and using the same ALU’.%.
These figures highlight the different approaches of reg- HAL ALUS (Lines 3-5: HAL Datapaths)
ister and multiplexer partitioning. The HAL data path
characteristics are given in the second line of
Figure 14.
Figure 14. summary of results for exemple2: all

time constraints.
It would not be realistic to compare the two systems on

the basis of a single example. The example serves only
to illustrate the differences in the two approaches. Nev-
ertheless, the HAL system performs comoarablv well in all
cases. What is more-important, especially fi-om the ap-
plication specific IC design point of view, is that the
system allows the exploration of a larger- design space,
as more speed/area tradeoffs may be consldered.
593
8. SYSTEH LIMITATIONS 12. BIBLIOGRAPHY
The system generates only purely synchronous data paths. 1. D. Johannsen, "Bristle Blocks: A Silicon Compiler",
These data paths are synthesized from separate data flov Proceedings of the 16th Design Automation Conference,
graphs that are treated independently. The data flov June 1979, pp.310-313.
graphs cannot be nested (although inline expansion could
be used to achieve the same effect). Currently, oper- 2. A.V. Goldberg, S.S. Hirschhorn, X.3. Lieberherr, "Ap-
ations are assigned to a single clock cycle. Finally, proaches Toward Silicon Compilation", IEEE Circuits
while the user mav soecifv soeed constraints exolicitlv. and Devices Magazine, May 1985, pp. 29-39.
area constraints a& bnly &p&ified implicitly ii.e. by
relaxing the time constraint). 3. A.C. Parker, "Automated Synthesis of Digital Sys-
tems", IEEE Design L Test, November 1984, pp. 75-01.
4. S.G. Shiva and P.F. Klon, "The VHSIC Hardware De-
9. CONcLIJsION scription Language", VLSI Design, June 1985. pp.
86-106.
We have shown the innovations the HAL system brings to 5. D.E. Thomas et al, "Automatic Data Path Synthesis",
automatic data path synthesis by comparing it with well- IEEE Computer, December 1983, pp. 59-70.
known systems. These innovations are present in the im-
plementation approach as well as in the synthesis process 6. J. Allen, 'Computer Architecture for Digital Signal
itself. Processing", Proceedings of the IEEE, May 1985, pp.
852-873.
The overall structure of the programming environment is
consistently object-based, while the methods associated I. G.A. Frank, D.L. Franke, W.F. Ingogly, "An Architec-
with each object can be expressed in one or another pro- ture Design and Assessment System", VLSI Design, Au-
gramming paradigm. This allows for a conceptually uni- gust 1985, pp. 30-50.
form representation of the digital system throughout the
whole synthesis process, while enabling partitioning of 8. E.F. Gircryc and J.P. Knight, "An ADA to Standard
the overall task into several complementary subtasks Cell Hardware Compiler Based on Graph Grammars and
suited to different paradigms. Scheduling", Proc. of the IEEE International Confer-
;;:!,;I Computer Design (ICCD), October 1984, pp.
The synthesis methodology presented features load balanc-
ing of the operations described by the DFG! expert allo-
cation of operator cells to fulfill explicit speed/area 9. T.Uehara! “A Knowledge-Based Logic Design System",
constraints, localized register combining, and optimized IEEE Design L Test, October 1985, pp. 27-34.
interchanging of multiplexer operands. The extended de-
sign space exploration enabled by this methodology is of 10. T.J. Kowalski, D.J. Geiger, W.H. Wolf, and W.
major importance in relation to the desired use of the Fichtner, "The VLSI Design Automation Assistant: From
system (i.e. application specific IC'sl, Algorithms to Silicon", IEEE Design & Test, August
1985, pp.33-43.
11. J.R. Southard, "MacPitts: An Approach to Silicon Com-
10. FUTURE WORK pilation", IEEE Computer, December 1983, ~~-74-82.
A planned extension is the incorporation of multi-cycle 12. C. Tseng, D.P. Siewiorek, "Facet: A Procedure for the
operations. This extension would be especially useful Automated Synthesis of Digital Systems", Proceedings
for multiplication and division operations which require of the 20th Design Automation Conference, 1983, PP.
much longer times than simpler combinatorial logic. AS 490-496.
buses are likely to be useful in many designs, they will
soon be included in the objects of the system. These 13. T.J. Kowalski and D.E. Thomas, "The VLSI Design Auto-
buses will be used in place of muxes whenever it is more mation Assistant: in IBM System/370 Design", IEEE
cost-efficient to do so. Design 6 Test, February 1985,~~. 60-69.
The next natural step in extending the system is to de- 14. E.H. Reingold. J.Nievergelt, N.Deo. "Combinatorial
fine simulation methods for each type of operator cell. Algorithms: Theory and Practice", Prentice-Hall,
This will be done to allow testing of the correctness of 1977.
the data paths generated. Object-based programming was
originally devised for simulation purposes, so this task 15. P. Winston and B. Horn, "LISP", Addison-Wesley. 1981.
will be relatively straightforward.
16. M. Stefik, D. Bobrow, S. Mittal, L. Conway, 'Know-
ledge Programming in LOOPS: Report on an EXperimental
Course”, The AI Magazine, Fall 1983, pp. 3-13.
11. ACKNOWLEDGI?J.IENTS
17. A. Borgida, S. Greenspan, J. Mylopoulos, "Knowledge
Representation as the Basis for Requirements SPecifi-
We would like to thank the Computer Aided Engineering and cations", IEEE Computer, April 1985, pp. 82-91.
Artificial Intelligence departments of Bell-Northern Re-
search as well as the Electronics Engineering department 18. G. Birtwistle et al, 'Simula Begin", Philadelphia:
of Carleton University for their active involvement in Auerbach, 1973.
this cooperative research project.
19. The Xerox Learning Group, "The Smalltalk- System",
This research was funded in part by grants from the Na- Byte Magazine. August 1981, pp. 36-48.
tural Sciences and Engineering Research Council, Canada
(NSERCC) and from Bell-Northern Research, Ottawa.
594

p587 Paulin

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

p587 Paulin

Caricato da

Copyright:

Formati disponibili

HAL: A MULTI-PARADIGM APPROACH TO

AUTOMATIC DATA PATH SYNTHESIS

P.G. PAULIN (1) - J.P. KNIGHT (2) - E.F. GIRCZYC (3)

Abstract 2. OVEFWIEW OF ADTOMATIC DATA PATH SYWl’HESIS

23rd Design Automation Conference

must be traded off against the ease of representation.

5. HAL SYSTEM IHPLEHSNTATION

As mentioned above! the implementation approach used here

The class hierarchy of the HAL system is illustrated in

Phase 3 Minimisat ion

of the concurrency of operations

requiring similar resources.

The system iteratively assigns operations to cycles along

For example, two concurrent multiplications imply a a

In the second phase the MidRAL module will be invoked to Cycle

have now been

Figure 3. Data flow graph

Pascal program of Figure 2: The raw DFG

I" the next section, we will compare the operator cell

This rule invocation process is repeated till all oper-

Two sample rules are illustrated below: 77

Enough time with different operations in parallel.

Figure 5. Sample rule 1 I

Storage Elimination: This subtask attempts to eliminate

Control information: During the entire synthesis process,

Number of cycles 7. EXPERIMENTAL RESULTS

Fiaure 9. Cost as a function of timing constraint

Figure 9 illustrates the data path generated from the

Ass ianment of operator cells to operations: The first

Creation and insertion of storage operations: The second

Data path synthesis: In the third step the resulting DFG

Register Combining: In this optimization step, compatible

Figure 10. Summary of results for examplel: all

Results for Facet examole

To highlight the differences between the HAL system and &I5

Figure 13. Data path generated by HAL (using Facet

The last three lines of the table of Figure 14 summarize

Figure 11. DFG for Facet example.

Figure 14. summary of results for exemple2: all

It would not be realistic to compare the two systems on

Potrebbero piacerti anche