Sei sulla pagina 1di 5

System-Level Modeling of Dynamically Reconfigurable

Co-Processors

Yang Qu1, Kari Tiensyrjä1, Kostas Masselos2


1
VTT Electronics, P.O. Box 1100 (Kaitoväylä 1), FIN-90571 Oulu, FINLAND
E-mail: yang.qu@vtt.fi
2
INTRACOM SA, Emerging Technologies and Markets Division, 19,5km Markopoulou
Avenue, P.O.Box 68, GR 19002 Peania, Attika, Greece

Abstract. Dynamically reconfigurable co-processors (DRCs) are interesting


design alternatives when both flexibility and performance are concerns.
However, it is difficult to study the performance impact of including such
devices into design when using traditional design methods and tools. In this
paper, we present easily adaptable system-level techniques, which are able to
perform fast exploration of different reconfiguration alternatives. A SystemC-
based modeling method for DRCs and a high-level synthesis-based estimation
tool to support system partitioning are presented.

1 Introduction and Related Work

The technology developments have made it possible to re-program configurable


hardware at run time. Such device is generally referred to as dynamically
reconfigurable logic (DRL). Unlike software or hardware implementation, DRLs
spread computation over both time and space. The new feature requires various
changes in the traditional design flow. At the system level, the problems are how to
support HW/SW/DRL partitioning, how to evaluate different reconfiguration
alternatives, how to model the DRLs with the aim of fast design space exploration,
etc. In the era that the design level is moving higher and higher, the design of
Reconfigurable System-on-Chip (RSoC) requires an easily adaptable solution to
enhance traditional design methods and tools in order to reduce the time-to-market.
Authors in [1] proposed a VHDL modeling technique of the reconfigurable process
that is simulatable and takes reconfiguration overhead into account, but the approach
is not suitable for design space exploration. In [2], a system-level model of runtime
reconfigurable system was proposed. However, the reconfiguration overhead was not
addressed. In [3], the VCC tool was used to evaluate different design options of a
reconfigurable platform, but context scheduling is not addressed.
Our research focuses on high-level design methodology of reconfigurable systems,
where DRLs are used as co-processors. This paper presents a system-level modeling
technique of DRCs and the associated tools. The work is an extension of [4]. The
main advantage of the approach is that it can be easily embedded into a SoC design
flow to allow fast design space exploration for different reconfiguration alternatives
without going into implementation. The system-level model describes the behavior of
the reconfiguration process and relates the performance impact of the reconfiguration
process to a set of parameters extracted from reconfiguration technologies of interest.
Thus, by tuning the parameters, designers can easily evaluate the trade-offs between
different technologies. In simulation, the model can automatically detect the
reconfiguration requests and trigger the reconfiguration process. The modeling
methodology is supported by an estimation tool for the system partitioning and a
transformation tool for reuse of existing SystemC code.
The structure of the paper is as following. Section 2 introduces the modeling
technique and supporting tools. The validation work using a MPEG2 decoder case is
described in section 3. Section 4 gives the conclusions.

2 Proposed System-Level Modeling Techniques

The important tasks in system-level design of RSoC are to identify candidate


components and to reveal reconfiguration overhead. The candidate components are
application functions that are considered to gain benefit from being implemented on
DRCs. The decision whether a task should be a candidate component is clearly
application dependent. The criterion is that the task should have two features in
combination: flexibility (that would exclude an ASIC implementation) and high
computational complexity (that would exclude a software implementation). Flexibility
may come either from the point that the task will be upgraded in the future or in view
of hardware resources sharing with other tasks with non-overlapping lifetimes for
global area optimization. The reconfiguration overhead is the feature closely related
to DRL technologies and run-time behavior of the candidate components.
Our modeling technique focuses on three issues: selection of candidate
components, modeling of the reconfiguration overhead for fast design space
exploration, and design reuse. They are separately addressed in following sections.

2.1 Estimation Approach to Support Identifying Candidate Components

We developed a high-level synthesis-based estimation tool [5], which can produce


estimates of the execution time and hardware resources required for embedded FPGA
type DRCs, in order to support the selection of candidate components with the aim of
total area reduction. Traditional HW/SW partitioning methods will be involved when
making a full HW/SW/DRL partitioning.
The input is C code of tasks to be studied. A SUIF-based front-end preprocessor is
used to extract Control-Data Flow Graphs (CDFG), based on which well-known high-
level synthesis tasks are carried out to produce the estimates. As-soon-as-possible
(ASAP) and as-late-as-possible (ALAP) scheduling are used to determine the critical
paths, from which we estimate the execution time. A modified version of Force-
Directed Scheduling (FDS) is used to estimate the hardware resources required for the
tasks. Finally, allocation algorithms are used to estimate the hardware resources
required for interconnection with multiplexer type of interconnection units. The
current estimator targets a Virtex2-like embedded FPGA in which main resources are
LookUp-Tables (LUTs) and multipliers.
2.2 Modeling of Reconfiguration Overhead

The modeling of reconfiguration overhead is divided into two steps. In the first step,
different technology-dependent features are mapped onto a set of parameters, which
are the size of configuration data, the clock speed of configuration and the extra
delays apart from loading of the configuration data. In the second step, a SystemC
module that models the behavior of run-time reconfiguration process is created and is
used in system-level simulation to reveal the reconfiguration overhead.
A general SystemC model of RSoC is shown in Fig. 1. The left side is an overview
of the RSoC. The DRC is a single SystemC module, which implements the same bus
interfaces in the same way as other HW/SW modules. A configuration memory is
modeled, which could be an on-chip or off-chip memory that holds the configuration
data. The right side shows the internal structure of the DRC, which is in fact a
hierarchical SystemC module. Each candidate component (F1 to Fn) is an individual
SystemC module, which implements the top-level bus interfaces with separate system
address space, and is instantiated inside the DRC. Each candidate component has two
extra ports. One is a DONE signal port routed to the Configuration Scheduler (CS).
The port is used to acknowledge the CS that this task can be safely swapped out. The
other is connected to a shared memory that saves the data to be preserved during
reconfiguration. The Input Splitter (IS) is an address decoder and it manages all
incoming Interface-Method-Calls (IMCs). The CS monitors the operation states of the
candidate components and controls the reconfiguration process.

Fig. 1. System-level Modeling of Reconfigurable SoC


The main idea of the modeling method is as following. When the IS captures an
IMC to a candidate component, it will hold the IMC and pass the control to the CS,
which decides if reconfiguration is needed. If so, the CS will call a reconfiguration
procedure that uses the parameters specified in step 1 to generate memory traffic and
associated delays to mimic the reconfiguration latency. When the CS is done, the IS
will dispatch the IMC to the target module. If the module cannot be activated at the
moment, a message of request to reconfigure the target module will be put into a
FIFO queue and the IMC will return with the value of FALSE. When a module
finishes its operation, it will send a DONE signal to the CS, and the CS will check if
there is any waiting message in the FIFO queue. If so and it is possible to activate the
waiting module, the CS will call the reconfiguration procedure. Concerning the
practical implementation effort, the pre-emption of a running module is not supported.
The modeling method is for non-blocking IMCs. The use of blocking IMC requires
the behavior of the system bus to be changed in order to avoid the bus being locked
when the called module is off the device.
There is a state diagram common to all candidate components, based on which the
CS makes reconfiguration decisions. A state diagram of partial reconfiguration is
presented in Fig. 2. For single context and multi-context DRCs, similar state diagrams
can be used in the model. The main advantage of the modeling method is that the rest
of the system and the candidate components need not to be changed between a static
approach and run-time reconfiguration approaches, which makes this method very
useful in making fast design space exploration.

Fig. 2. Reconfiguration state diagram

2.3 Transformation Tool to Support Reuse of Existing SystemC Modules

We developed a tool that can automatically transform SystemC modules, which


however must follow a defined modeling pattern, into a SystemC module of a DRC.
The inputs are SystemC files of a static architecture and a script file, which gives the
names of the modules that are selected as candidate components and the associated
design parameters. The outputs are SystemC files of a modified architecture, in which
those specified SystemC modules have been replaced with a DRC module. The kernel
of the tool contains a C++ parser to analyze the SystemC files, a script file parser and
a template module of the DRC. There are two specific requirements for the input
moduls. Firstly, modules should implement the bus interface methods with defined
names. Otherwise the transformer would not have the knowledge of their meanings.
Secondly, a port of DONE signal with specified name should exist in a candidate
module in order to let the CS capture its status.

3 Case Study

A MPEG2 decoder case is chosen to prove the approach is very useful for the task of
fast design space exploration. The starting point is a SystemC transaction-level model
of a static architecture of the decoding system. Control-oriented tasks, such as
variable-length decoding, are assigned to a RISC processor. Motion compensation is
assigned to a DSP core. The color converter (CC), which processes 8 pixels in
parallel, and the IDCT are assigned to two separate hardwired ASICs. A shared
memory and a one-level system bus are used. The task is to study the possibility of
moving the IDCT and the CC from ASIC implementation to a DRC.
The DRC is a Virtex2-like FPGA. The partial, single/multi-context reconfiguration
are to be considered. Features of the target DRC are as following. There are 3200
LUTs and 40 multipliers available. The size of bitstream to configure the full device
is 200k bytes. In partial reconfiguration, the size of configuration data is proportional
to the number of LUTs required. In the multi-context reconfiguration, there are two
layers of programming bits and 5 clock cycles are required for context switching. The
configuration clock is running at 50MHz, and 8 bits are loaded every cycle.
We started with the estimation of the requirement of the configuration data in
partial reconfiguration. The estimation tool showed 2983 LUTs and 2688 LUTs were
required for the IDCT and the CC separately, which correspond to 186k and 168k
configuration data. Three simulation packages were created using the modeling
method described in section 2.2 and the simulation results are given in Table 1. The
differences between three configuration styles are clearly revealed. Designers can
easily make design decisions when information of ASIC area of the two functions and
the estimates of design time are available.
The case study proves the approach is useful in helping designers to rapidly
perform design space exploration. The estimation tool can produce results within
minutes without any manual effort. In the SystemC modeling, the transformation tool
can significantly reduce the amount of coding work. Designers need to edit only the
script file of the design parameters, which can be easily done within a minute.

Table 1. Comparison of reconfiguration latencies

Original Single Multi Partial


Decoding time (ms/fr) 15.35 26.69 18.69 25.78
Conf. latency (ms/fr) NA 8.00 2e-4 7,09

4 Conclusions

In this paper, we have presented a system-level modeling methodology of DRCs. The


use of DRCs will create a flexible system and result in shorter time-to-market when
comparing with equivalent ASIC-type SoC implementation. It is very important to
have an approach that allows designers in the early phase of design to rapidly explore
the differences of using different reconfiguration alternatives. Our easy-to-use
approach has been proved with a MPEG2 case to be able to fulfill the task.

References

1. Robinson, D., Lysaght, P.: Methods of exploiting simulation technology for simulating the
timing of dynamically reconfigurable logic. IEE Proc. Vol. 147, No. 3. (2000) 175-180
2. Rissa, T., et al.: System-level modeling and implementation technique for run-time
reconfigurable systems. Proc. 10th Annual IEEE Symposium on FCCM. (2002) 295 – 296
3. Vanzago, L., et al.: Design space exploration for a wireless protocol on a reconfigurable
platform. Proc. DATE (2003) 662 – 667
4. Pelkonen, A., et al.: System-Level Modeling of Dynamically Reconfigurable Hardware with
SystemC. Proc. IPDPS’03 (2003) 174-181
5. Yang Qu, Soininen, J.-P.: Estimating the utilization of embedded FPGA co-processor. Proc.
Euromicro Symposium on DSD. (2003) 214-221

Potrebbero piacerti anche