Sei sulla pagina 1di 20

Hardware Software Partitioning using Greedy Algorithm

Submitted toSubmitted byMr. Lava Bhargava Abhishek Goyal Associate Professor 2009UEC302 Department of Electronics and Communication Engineering Malaviya National Institute of Technology, Jaipur

CONTENTS
Introduction Greedy Algorithm Vulcan 2nd Algorithm Real life Implementation : GSM Conclusion

Introduction


Software running on an existing processor is less expensive, more easily modifiable, and more quickly designable than an equivalent application-specific hardware implementation. Hardware provides better performance. A system designer's goal is to implement a system using a minimal amount of application-specific hardware, if any at all, to satisfy required performance. One of special features of hardware/software partitioning is that it is two-way partitioning, involving hardware and software. There are two key metrics, one of which (performance) is improved by moving objects to a specific group (hardware), while the other (hardware size) is improved by moving objects out of that same group..

Dealing with Defects


Software

bugs are tolerable and less costly to fix. Hardware bugs on the other hand
Can cost hundreds of thousands in non recoverable cost (NRE)
Months of delay

Many start-ups went down because of this reason

Non-performing hardware
Repartitioning decision in the last minute

Greedy Algorithm
We
are given a set of functions F = f1,f2,...,fn which compose the functionality of the system under design. The functions may be at any of various levels of granularity, such as tasks (e.g. processes, procedures or code groupings) or arithmetic operations. {C1,C2,...,Cn}, where Ci = {Gi, Vi}.

We are also given a set of performance constraints C =

Vi is a constraint on the maximum execution-time of the


all functions in group Gi.

The algorithm uses a procedure Move(P, fi) which returns


a new partitioning P' obtained by moving fi to S if it is currently in H, or to H it is currently in S.

Vulcan 2nd Algorithm


This algorithm is derived from the greedy algorithm of
with an extension to ensure that performance constraints are met.

The The

algorithm uses a procedure, Successors(fi), which returns a set of objects that succeed fi in the internal model of the system's functionality. procedure Satisfies Performance(P) returns true if partition P satisfies all performance constraints

The algorithm starts by creating an all hardware partitioning. thus guaranteeing performance To move a function requires not only cost improvement but also that all performance constraints still be satisfied
Once a function is moved, the algorithm tries to move closely related functions before trying others.

Real life Implementation : GSM


Extracted from the thesis: Model based approach to Hardware/Software Partitioning of SOC Designs By-Pradeep Adhipathi

Our intention is to find the cheapest possible solution


that can satisfy all the design constraints, a heuristic based on process complexities can find good hardwaresoftware partitions.

The idea is that a process that has longer execution times


and requires repeated execution at a fast rate is more complex than one with a low rate of activation and shorter execution time.

Greedy algorithm can be used to find the maximum


number of these less complex nodes that can fit into a given processor i.e. the software part.

Step1: Creation of a PMG:

Process Model Graphs (PmG) are directed graphs made of nodes and arcs, where nodes represent processes and arcs are signals.

Additionally, the nodes of the PmG have ports to which the arcs are connected. The ports can be input only, output only or bi-directional.
Thus, a process model graph is an abstract representation of a system that might be described in detail using a system level modeling language like SystemC.

Step 2: Annotated PmG Certain properties of the processes and signals are valuable in making partitioning decisions. A PmG that stores these properties in its nodes and arcs is called an annotated PmG. The activation rate and execution time delay are major attributes that are used by the Partitioner. Activation Rate is the rate at which a signal triggers a process. Since this is a synchronous system, a common clock triggers all the processes. Thus, the rate of activation has a fixed value of 300MHz as of a starcore processor. It is affected by following constrains:

I/O delay
Buffer Size Bus Width Power Requirement

Each module of the GSM system operates on a 20ms block of voice data, sampled at 8KHz.

At the output, the delay between the first block and the second cannot be more than 5.1ms.

Step 3: Textual Representation of the PmG

This section gives a very brief introduction to the file


format used to represent a PmG.

Code segment showing identifiers

Example of defining attributes for the PmG

Syntax for Signal and Port definition

IO delay representation in a PmG

Step 4: The Partitions

The PmG is annotated with the delay and activation rates


and process complexities is calculated for the system.

This process follows the greedy algorithm explained the


earlier. That is, it starts with each node and executes the algorithm, until it reaches the threshold.

From the partitions that succeed the constraints, using


complexity table (denoted as cost earlier), algorithm will create the biggest partition for implementation via software.

From the complexities table, it is apparent that every process except the Viterbi Decoder can be fit into the StarCore processor. For the transmitter all the processes are executed via software according to output of greedy algorithm, the sum of their execution time as computed from the delay Table is 5.0043. This meets the timing constraint. For the receiver except the Viterbi decoder rest of the processes are executed via software.

Conclusion
Greedy algorithms are among the most simplest and fastest algorithms. But, Both of the greedy algorithms suffer from the limitation that they are easily trapped in a local minimum. As a simple example. consider an initial partitioning that is performance satisfying, in which two heavily communicating functions f1 and f2 are initially in hardware. Suppose that moving either f1 or f2 to software results in performance violations, but moving both f1 arid f2 results in a performance satisfying partitioning. Neither of the above algorithms can find the latter solution because doing so requires accepting an intermediate, seemingly negative move of a single function. So we require some new complex algorithms.

THANKS

Potrebbero piacerti anche