Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
A novel approach to automatic data path synthesis is pre- The task of automatically designing a data path from a
sented. This approach features innovations in the syn- functional description is usually divided into many sub-
thesis process as well as in the system implementation. tasks [,I. The first is the definition of the function
of the circuit in a hi h-level hardware description lan-
The synthesis process exhibits three new features. The guage (HDL) e.g. VHDL 9 41. This is usually followed by
first relates to a subtask that performs an expert analy- a translation to a graph-based representation derived
sis of the input data flow graph and attempts to evenly from the data flow. These graphs are given different
y;st;;;gte operations requiring similar resources. This names in different synthesis systems (e.g. VT - Value
using a novel “load balancing” technique. The Trace 151, DDG - Data Dependency Graph [61, DAG - Di-
second consists of a global preselection of -operator rected Acyclic Graph [7], CDFG - Control and Data Flow
cells to fulfill an exolicit sueed constraint. Finallv. Graph [al) but are simply different adaptations of simi-
the third deals with new techniques for register ah lar basic concepts. We will use the generic term ‘Data
multiplexer optimization. These features support ex- Flow Graph’ (DFG) for simplicity.
tended design space search by taking an explicit perform-
ance specification into account. The next step is the allocation of operator cells (i.e.
hardware components) to realize the abstract operations
The system implementation is based on the LOOPS multi- (nodes) of the DFG. The DFG is Dartitioned into discrete
paradigm programming system. In this approach the over- control steps and registers are Allocated to realize the
all task can be partitioned into complementary subtasks values (edges) of the DFG. We must stress the difference
requiring different programming paradigms. These subtasks between ‘allocation’, which is simply the creation of a
will be realized using an object-based paradigm, a set of operator cells to execute the specified operations
knowledge-based expert system paradigm, a functional within the scheduling constraint, and ‘assignment’ which
paradigm, or combinations of all three. is the actual mapping of those operations to specific op-
erator cells. This is a many to one assignment: many op-
Two complete examples are given to demonstrate the erations can be assigned to a single operator cell.
functionality of the system and to allow comparison with
existing systems. The data path is then synthesized using the allocated op-
erator cells, registers and some extra data transfer
components (e.g. multiplexers, buses). The final real-
ization of this data path is done either by binding the
1. 1WTRODucF1ON high-level components to pre-designed templates that rep-
resent their gate-level structure or by synthesizing them
directly with module builders. Interconnection of these
The ability of IC foundries to manufacture increasingly structures is also done at this stage.
comulex chios is leadina the IC desianer’s involvement
away from debice-level coniiderations atid towards archi- The final step is the synthesis of the controller (often
tectural ones. A side-effect of this shift is the inte- a microprogrammed or PLA based controller design) using
gration of system designers into IC design teams. the information derived during the control step parti-
tionina and data uath creation. Interconnection of the
One of the most important tools that have contributed to controller with the data path is also done here.
this continuing trend is the so-called silicon compiler
111. The input language of a silicon compiler can be di- We can thus summarize the synthesis task:
vided into two major categories: structural and furic-
tional [21.
A structural input language specifies a hierarchical de- 1. Translation of HDL description to a DFG
scription of subcircuits and their interconnections. ).
2. Operator cell allocation
Conversely, a functional input language describes a cir- Control steo partitionina
cuit by specifying it’s input/output mapping, without any 4. Register alio‘cation *
explicit structure specification. According to [21, an 5. Data path synthesis
ideal silicon compiler environment would combine both ap- 6. Binding of data path operator cells
proaches. 7. Synthesis of control section
8. Interconnection of components
The system proposed here attempts to do this by allowing
the user to oartition the overall structure into hiah- The HAL modules (the acronym is derived from Hardware
level block; and interfaces. , It then uses a functional ALlocater, not to mention the obvious AI reference) deal
description of each block to automatically generate a with the second to fifth subtasks.
distinct data path. Extended design space exploration is
made possible by taking an explicit performance specifi-
cation into account. We will present a novel approach to
the synthesis task and will compare it with the w-
proaches of other well known systems.
We then give a short description of the implementation
approach, followed by a more detailed description of the
synthesis Finally, we present some exper-
imental resZoach’ .
There are just about as many different approaches to au- The data path synthesis process described here differs
tomatic data Dath aeneration as there are svstems that from the above-mentioned systems in at least four funda-
attempt this- task. They can, however, be divided into mental ways.
two large subclasses: the knowledge-based heuristic -
$yach 19, 101, and the algorithmic approach [5, 8, ?!,
. While both produce acceptable results, it is widely
agreed that the former has a higher potential in the long Hiah-level exploration of the desian space
term as it allows the capture of more of the design proc-
ess [5J. Many of the recent data path generators follow
this approach. The first and most important difference is in the high-
level exploration of the design space.
We will examine three well-known systems from Carnegie-
Mellon University and one from Carleton University. These The HAL system takes an explicit timing constraint and
sytems are closely related to the one presented here. applies it in the beginning of the synthesis process.
They are the Desian Automation Assistant (DAA) IS. 10. High-level decisions such as control step partitioning
131; the SMlJCS data path generator f5.1, the Facet ‘s'stem and ooerator cell allocation are affected directlv bv
1121. and the ELF system from Carleton University [E 1 . this cbnstraint. This allows the designer to explore a
full spectrum of implementations, from maximally parallel
The Design Automation Assistant (DAA) first assigns basic ones to fully serial ones. The examples in the following
storage elements to hardware modules and ports. Then the sections illustrate the variations of the resulting data
synthesis operation assigns minimum delay information to path structure due to this methodology.
develop a parallel design. Next it maps the unassigned
storage elements to register modules. Finally, each data The three systems from CMU place very strong emphasis on
flow operation is assigned to a hardware module. The the synthesis of maximally~parallel~data paths- This is
system then optimizes the realization during the expert in keeDin0 with the GPIC or comouter desian orientation
analysis phase that follows. This analysis is based on a their _ creators have given them: The DAP. system also op-
speed-area cost function. As we mentioned earlier, the timizes the realization taking limited speed-area con-
approach used here is of the heuristic type, using a straints into account. Even in this case, major changes
approach in the data path cannot be done easily without destroying
knowledge-based expert system, or KBES. This the initial structure. For example, if an ALU was allo-
has the advantage of allowing the use of a large amount cated originally, the data path structure derived using
of design expertise to reduce the search space. this allocation is optimized for this case only. Replac-
ing the ALU by single function operator cells to increase
The SMUCS s stem is based on an algorithmic approach. It speed would render the resulting structure poorly adapted
maps (binds 7 the abstract operations to hardware operator to the new constraint. This should be done very early in
ceils in an iterative fashion. It proceeds by repeatedly the design process, when the system still has a global
analvzino
a-- d
the intermediate data oath and then bindina one view of the structure and its interrelation with the
element at a time until all elkments are bound. -This speed-area constraints.
binding is based on a minimization of the incremental
cost associated with the cell being considered. This cost
is a composite of the incremental cost of the current it- Operation Schedulinq
eration and also of the costs incurred in the next few
iterations.
A second fundamental difference in the synthesis process
The Facet system also makes use of an al orithmic ap-
presented here is in the operation scheduling (control
proach. The system uses clique partitioning 9 141 to group step partizizning). For example, while the Facet system
and assign abstract operations to operator cells, and
uses an soon as oossible" (ASAP) schedulina tech-
storaoe ooerations to reaisters. The data oath is then
nique, the HAL system reiies on a "load balancing"-analy-
direc<ly generated from these elements without subsequent sis. This analysis attempts to evenly distribute
optimization (as opposed to the DAA system). operations that make use of similar resources.
Finally, the ELF system proceeds by assigning an urgency
weight to each operation (node) of the control and data
flow oraoh (CDFG). The 'knowledae' that the allocator
Allocation and assianment of operator cells
uses to translate the DFG is represented as graph grammar
production rules. The allocator enters a ioop which is
oerformed for each time slot. A areedv alaorithm with
interchange is used to select-which rule to ap- The third difference relates to the allocation and as-
pairwise
ply for the realization of each operation. signment of operator cells. The iterative nature of the
FMJCS and ELF systems allov little backtracking. If a
misguided assignment was made early in the design proc-
ess, then that assignment is nearly irreversible. This
is due to the inherent short-sightedness of the systems.
4. SYSTEM COMPARISON
The HAL system attempts to resolve this problem by doing
The system presented here is similar to these in some re- the operator allocation in a global expert analysis
spects but also embodies many new concepts in the data phase. The actual assignment of operations to specific
synthesis process as well as in the implementation operator cells is done later in the data path synthesis
path
Furthermore, vhile the systems developed at subtask.
approach.
Carnegie-Mellon University address the general purpose IC
design task (with a strong emphasis on computer designs),
the ELF and HAL systems deal with application specific IC Operator cell qeneratiog
(ASIC) designs. This is due primarily to the intended
use of the synthesized circuits in telecommunication ap-
plications. The HAL system is designed to allow the use of an estab-
lished standard cell library. or a module builder. The
system allows merging operations into specialized ALU's
but it can constrain the operations that can be merged.
Thus it can be told not to generate the ALU performing
the ADD, OR and MULTIPLY operations that is used in the
Facet example later.
588
The implementation approach differs in one major aspect:
while most of the other mentioned systems rely nearly ex-
clusively on a single programming paradigm, the HAL sys-
tem makes use of a multi-paradigm approach.
This decision stemsfrom the fact that different tasks
are best suited to different programming styles. In turn,
each style hasan associated computing efficiency that MathopI
589
6. HAL MODULE DESCRIPTIONS program given in Figure 2. This program solves a second
order differential equation. The program might be used
to describe a subsystem of a controller or have a digital
signal processing application.
6.1. INHAL MODULE
This module accepts as input a data flow graph represen- [Solve the differential equation: y" + Sxy' + 3y - 01
tation of a functional description of a circuit to be de-
signed. This DFG indicates the operations to be program diffeq(input,ouput);
performed, their precedence constraints, as well as con- var a,dx,x,u,y,xl,ul,yl:integer;
ditional and loop constructs. begin
[Read in the ordinate, ‘a’ at which we want the
The module starts by determining the critical path(s) of value of the func:tion, the step size 'dx', and
the DFG. This also determines the shortest possible axe- boundary condition]
cution time (using a maximally parallel akchitecture). read(a,dx,x,u,y):
Finally, the operations on non-critical waths are as- while x<a do
signed-to specific cycles of the critical bath, while at- begin
tempting to minimi& the subsequent operitor cell cost. xl := x + dx :
This last step is a non-trivial one. The method used ul :- II - 5uxdx - 3ydx ;
here is divided into three phases: yl := y + udx ;
x := xl; " :- ul; y := yl;
Phase 1 Minimization of the concurrency of similar op- end;
erations. writelnty);
end;
Phase 2 Estimation of required hardware resources top-
erator cells) by invoking the HidHAL module de-
scribed in the section.
1” =I
’
time frames. The resultina distribution araoh should in-
dicate where concurrency- of similar operations can be
minimized.
The
goal
similar
0*e
ever
third
estimated.
of a pair
chase
is to minimize
resources,
and a subtraction),
reassigned
this
is similar
For example
since
of concurrent
to a control
is possible.
to the first.
the concut-rencv
those
if an ALU has
then one of the
step
operations
where
of
resources
the
been
exceot
operations
(e.g.
operations
ALU is free,
here
when-
to
be
of the
1
x
IF -
591
locality considerations and local clique partitioning:
the details of which will not be discussed in the limited
Cost (Cell area and mux & interconnect estimate) scope of this paper. This partitioning i; in contrast
with the global clique partitioning used the Facet
60 - system. The effect of this approach will be illustrated
in the examples of the following sections.
592
operator No NO MUX NO ctr1 CPU
cost Reg Inputs Conn. Lines fsec)
-------
4 34 6 13 32 15 140
----__--
5-6 30 6 17 35 17 150
------
>6 20 7 17 32 20 135
-------
V6
593
8. SYSTEH LIMITATIONS 12. BIBLIOGRAPHY
The system generates only purely synchronous data paths. 1. D. Johannsen, "Bristle Blocks: A Silicon Compiler",
These data paths are synthesized from separate data flov Proceedings of the 16th Design Automation Conference,
graphs that are treated independently. The data flov June 1979, pp.310-313.
graphs cannot be nested (although inline expansion could
be used to achieve the same effect). Currently, oper- 2. A.V. Goldberg, S.S. Hirschhorn, X.3. Lieberherr, "Ap-
ations are assigned to a single clock cycle. Finally, proaches Toward Silicon Compilation", IEEE Circuits
while the user mav soecifv soeed constraints exolicitlv. and Devices Magazine, May 1985, pp. 29-39.
area constraints a& bnly &p&ified implicitly ii.e. by
relaxing the time constraint). 3. A.C. Parker, "Automated Synthesis of Digital Sys-
tems", IEEE Design L Test, November 1984, pp. 75-01.
4. S.G. Shiva and P.F. Klon, "The VHSIC Hardware De-
9. CONcLIJsION scription Language", VLSI Design, June 1985. pp.
86-106.
We have shown the innovations the HAL system brings to 5. D.E. Thomas et al, "Automatic Data Path Synthesis",
automatic data path synthesis by comparing it with well- IEEE Computer, December 1983, pp. 59-70.
known systems. These innovations are present in the im-
plementation approach as well as in the synthesis process 6. J. Allen, 'Computer Architecture for Digital Signal
itself. Processing", Proceedings of the IEEE, May 1985, pp.
852-873.
The overall structure of the programming environment is
consistently object-based, while the methods associated I. G.A. Frank, D.L. Franke, W.F. Ingogly, "An Architec-
with each object can be expressed in one or another pro- ture Design and Assessment System", VLSI Design, Au-
gramming paradigm. This allows for a conceptually uni- gust 1985, pp. 30-50.
form representation of the digital system throughout the
whole synthesis process, while enabling partitioning of 8. E.F. Gircryc and J.P. Knight, "An ADA to Standard
the overall task into several complementary subtasks Cell Hardware Compiler Based on Graph Grammars and
suited to different paradigms. Scheduling", Proc. of the IEEE International Confer-
;;:!,;I Computer Design (ICCD), October 1984, pp.
The synthesis methodology presented features load balanc-
ing of the operations described by the DFG! expert allo-
cation of operator cells to fulfill explicit speed/area 9. T.Uehara! “A Knowledge-Based Logic Design System",
constraints, localized register combining, and optimized IEEE Design L Test, October 1985, pp. 27-34.
interchanging of multiplexer operands. The extended de-
sign space exploration enabled by this methodology is of 10. T.J. Kowalski, D.J. Geiger, W.H. Wolf, and W.
major importance in relation to the desired use of the Fichtner, "The VLSI Design Automation Assistant: From
system (i.e. application specific IC'sl, Algorithms to Silicon", IEEE Design & Test, August
1985, pp.33-43.
11. J.R. Southard, "MacPitts: An Approach to Silicon Com-
10. FUTURE WORK pilation", IEEE Computer, December 1983, ~~-74-82.
A planned extension is the incorporation of multi-cycle 12. C. Tseng, D.P. Siewiorek, "Facet: A Procedure for the
operations. This extension would be especially useful Automated Synthesis of Digital Systems", Proceedings
for multiplication and division operations which require of the 20th Design Automation Conference, 1983, PP.
much longer times than simpler combinatorial logic. AS 490-496.
buses are likely to be useful in many designs, they will
soon be included in the objects of the system. These 13. T.J. Kowalski and D.E. Thomas, "The VLSI Design Auto-
buses will be used in place of muxes whenever it is more mation Assistant: in IBM System/370 Design", IEEE
cost-efficient to do so. Design 6 Test, February 1985,~~. 60-69.
The next natural step in extending the system is to de- 14. E.H. Reingold. J.Nievergelt, N.Deo. "Combinatorial
fine simulation methods for each type of operator cell. Algorithms: Theory and Practice", Prentice-Hall,
This will be done to allow testing of the correctness of 1977.
the data paths generated. Object-based programming was
originally devised for simulation purposes, so this task 15. P. Winston and B. Horn, "LISP", Addison-Wesley. 1981.
will be relatively straightforward.
16. M. Stefik, D. Bobrow, S. Mittal, L. Conway, 'Know-
ledge Programming in LOOPS: Report on an EXperimental
Course”, The AI Magazine, Fall 1983, pp. 3-13.
11. ACKNOWLEDGI?J.IENTS
17. A. Borgida, S. Greenspan, J. Mylopoulos, "Knowledge
Representation as the Basis for Requirements SPecifi-
We would like to thank the Computer Aided Engineering and cations", IEEE Computer, April 1985, pp. 82-91.
Artificial Intelligence departments of Bell-Northern Re-
search as well as the Electronics Engineering department 18. G. Birtwistle et al, 'Simula Begin", Philadelphia:
of Carleton University for their active involvement in Auerbach, 1973.
this cooperative research project.
19. The Xerox Learning Group, "The Smalltalk- System",
This research was funded in part by grants from the Na- Byte Magazine. August 1981, pp. 36-48.
tural Sciences and Engineering Research Council, Canada
(NSERCC) and from Bell-Northern Research, Ottawa.
594