Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Embedded Processors
[Main Project Report]
By
Arun C. Pullat
Y2326
Group No: 23
SL No: 08
Project Guide
Dr. Priya Chandran, Asst. Professor, Computer Engineering Department,
NITC
------------------------------------------------------------------------------------------------------------
Embedded processors are changing the world. The development of scenario specific
processors with an emphasis on a highly specific functionality, fast development time and
high reliability has opened up several new vistas for the future of microprocessor
development. With the development of embedded processors likely to become THE
happening field in the next few years, an absolute necessity in processor development is
the use of a processor simulation tool to validate new embedded processor designs and
obtain performance statistics. Tools that automatically generate these simulators have
tremendous applicability. This project is an attempt at developing one such Automated
Generator of Cycle Level Simulators for Embedded Microprocessors.
The generator should conform to a number of requirements. It should be easy to use, after
all, the purpose of an automated generator is to prevent the user from making his very
own hand coded simulator by providing at least reasonable time gains. It should permit
the user to enter all kinds of processor details in a simple, yet highly flexible and powerful
manner. It should provide him with a number of modules that simulate commonly
prevalent processor functionality to simplify the entry procedure and generate new
modules of his own, if necessary. Once the processor details are assimilated and the
simulator generated, functionality must be provided to validate the execution of some
input programs as well as the collection of performance statistics. Several options must be
provided as well permitting the user to customize the manner of execution, for example,
the extent of coupling of performance and functionality. But most importantly, the
simulator generated should be on par with the hand coded equivalents as far as execution
time of sample input programs is concerned.
Contents
1 Introduction 1
1.1 Problem Specification ... 1
1.2 Motivation .. 1
1.3 Literature Survey 1
2 Design . 2
2.1 Execution Strategy 2
2.2 Generator Strategy . 2
2.2.1 Interpreted Approach 2
2.2.2 Compiled Approach 3
2.4 Granularity .. 10
3 Implementation 11
3.1 Project Execution Overview 11
3.2 Code Layout during Implementation ... 12
3.3 Implementation Specifics ... 13
4 Execution . 25
6 Possible Enhancements .. 27
7 References 28
-1-
1 Introduction
1.2 Motivation
Generating a working hardware model of every new Computer Architecture development
or modification is an impossibility due to the sheer cost involved. This is the reason why
most research work in this field is heavily dependent upon simulations of these new ideas.
Again, generating hand coded simulations for each new design is highly time consuming
and needs a large amount of effort. In such a scenario, it is obvious that an automated
generator of simulated processors would be extremely useful. All this generator should
need is the processor details given to it in a specific easily readable format.
2 Design
There are several issues to be considered in the design of a system which automatically
generates a simulator for any target processor. These issues are all listed below in no
particular order of importance.
PDL Specifications
Processor Simulation
Parser Simulator Statistics
Target Executable
+
Inputs
Fig 2.1 Block Diagram for an Interpreted Approach to Automated Simulator Generation [1]
-3-
PDL Specifications
Execution Simulation
Compiler Simulator Statistics
Fig 2.2 Block Diagram for a Compiled Approach to Automated Simulator Generation [1]
Architecture Description
Fig 2.3 Block Diagram for the layout of a Generic Architecture Description
-4-
Microarchitecture
Design
2.3.2
Registers Memory Pipeline
Semantics
Fig 2.4 Block Diagram for the layout of a Microarchitecture Design [1]
The three main components of the microarchitecture in the design for the
problem comprise Registers, Memory and the Pipeline Semantics.
2.3.1.1 Registers
2.3.1.2 Memory
Instruction Set
Design
Fig 2.5 Block Diagram for the layout of an Instruction Set Specification
Global Statistics
Specification
Fig 2.5 Block Diagram for the layout of the Global Statistics Specification
The statistics that were collected using the earlier phases need to be
displayed at certain intervals or upon the occurrence of certain
events during the course of the execution, or after the entire
execution procedure is completed. This behaviour is to be specified
using an ADL with the events or cycles explicitly defined.
- 10 -
2.4 Granularity
Another crucial design decision is the provision of multiple granularities of importance
for the architectural details. For instance, at very low granularities the simulation will be
working at the functional units level. Above this comes the cycle level granularity and so
on, until we attain an instruction level granularity or even an instruction word granularity.
The major aspect to be taken into consideration here is the payoff between performance
and accuracy. The lower the granularity, the better the simulator accuracy and the worse
its performance, but the situation begins to reverse as the granularity increases up to the
instruction level. In effect functionality needs primarily the instruction set specified and
can be used to show whether the application runs correctly on this target architecture
while performance makes use of the micro architectural specifics to determine the
execution time in clock cycles etc. These two aspects are not complementary ([1], [4]). A
greater diversion into one would negatively affect the simulation of the other. The extent
of depth into both is an important feature which can be decided by the user at runtime.
Finally, one of the desirable attributes of the simulator is that it should be able to perform
almost on par with hand coded simulations of the same processors. This can be done using
a number of design tweaks ([1], [2], [3], [4]). The comparison of performance and
exactness of functionality can be the deciding factor in this case.
3 Implementation
As a walkthrough of the implementation of the project, we shall first focus on the
design decisions made use of from the above Design sections and then proceed to
an in-detailed explanation of each phase of the project development.
Input Assembly
Language File
In the implementation, the user needs to specify the processor details in the
specification document. In our project, the specification language used is
the eXtensible Markup Language or XML in short. All the deterministic
content is provided in XML while the non-deterministic content is
provided as a combination of XML and an RTL. In the implementation, the
RTL made use of is simple C++ code which is just directly embedded into
the generated code without any intermediate processing.
The core generator language, that is, the language used to develop the
generator is C++. The compiler used for the developer as well as
automatically generated code is the GNU C++ compiler, preferably version
3.3.1 or later. The multiple files in the implementation are linked together
using a Makefile implementation with a number of design tweaks. The
parser used for the XML processor specification is the Xerces C++ parser
developed by the Apache group. Documentation regarding the API s for
this parser is freely available at [7].
Code Layout
Registers
Build Files Logical Register File
Addressing Modes
Input Parser
Conversion Functions
Automata Functions
Miscellaneous Files
File Handling
String Manipulation
Executable
Main Files
Makefile
Some sample XML code for a physical register file is given below:
<physical_register_file>
<name>pint</name>
<count>32</count>
<type>integer</type>
<bitSize>32</bitSize>
</physical_register_file>
Some sample XML code for a physical register file is given below:
<logical_register_file>
<name>gpr</name>
<count>32</count>
<sign>false</sign>
<type>integer</type>
<bitSize>32</bitSize>
<mnemonics>zero at v0 v1 a0 a1 a2 a3 t0 t1
t2 t3 t4 t5 t6 t7 s0 s1 s2 s3 s4 s5 s6 s7
t8 t9 kt0 kt1 gp sp s8 ra</mnemonics>
<map>pint</map>
</logical_register_file>
The above XML code is used to define a logical register file named
gpr comprising of 32 logical registers of type integer and a width
of 32 bits, in essence data type int . The critical entities here are
the mnemonics field which specify the names of the logical
registers from index 0 to count-1 and the map field which
specifies the physical register file which is mapped to by the logical
register file. In most system architectures, the mapping scheme is
based on register renaming, but due to the inability to implement
pipelining in our project, the subsequent lack of out of order
execution negates the need for the provision of register renaming.
We have made use of a simple one-on-one mapping scheme
between the component logical registers and their corresponding
physical registers.
3.3.1.5 Memory
<memory>
<name>instruction_memory</name>
<type>instruction</type>
<size>1000000</size>
</memory>
<memory>
<name>data_memory</name>
<type>data</type>
<size>3000000</size>
</memory>
- 16 -
<addressing_mode>
<name>register</name>
<regex>r</regex>
</addressing_mode>
<addressing_mode>
<name>immediate</name>
<regex>#n</regex>
</addressing_mode>
<addressing_mode>
<name>displacement</name>
<regex>n(r)</regex>
</addressing_mode>
- 17 -
Here, we find the attributes used as name and regex . The name
specifies the mechanism to be used to obtain operand values in our
implementation, since there are a few predefined addressing modes
already available for use. For all other modes, an RTL needs to be
implemented, similar to the pipeline semantics design. The regex or
regular expression field though provides a degree of flexibility to
the user to specify the syntax of his operands corresponding to the
predefined addressing modes. In most processors, this would be
sufficient, but this need not be the case for embedded processors.
The regular expression field makes extensive use of DFAs in its
working for both determining the operand type as well as to extract
values.
<instruction>
<mnemonic>add</mnemonic>
<field>
<name>dest</name>
<type>int</type>
</field>
<field>
<name>source1</name>
<type>int</type>
</field>
<field>
<name>source2</name>
<type>int</type>
</field>
<rtl>
dest = addInt( source1, source2 );
</rtl>
</instruction>
class Instruction_add
{
public:
int fieldCount ;
int dest ;
int destIndex ;
int source1 ;
int source1Index ;
int source2 ;
int source2Index ;
char fieldValue[3][16] ;
Instruction_add();
int getField( char * instruction );
int getValue();
int setValue();
int semantics()
{
getValue();
setValue();
return 0;
}
};
The input file parser is the instruction set functionality that parses
an input assembler file and obtains the instructions from the file. In
the case of an assembler file, all there is to implementation is the
extraction of a single line from the input program and instantiation
of an instruction class based on the assembler mnemonic used. The
assembler mnemonic can be obtained by tokenizing this instruction
and by obtaining the first word in any instruction.
This section consists of the input XML files used for test purposes as
well as the parser for the same. For the purposes of this project, I have
used the Xerces C++ parser by the apache group. This software is free
software and available for download. The API s needed for parsing an
XML document are available as part of the Xerces package and the
documentation for the same can be found at the Xerces site [7]. A good
understanding of an XML DOM tree structure is necessary first.
The Xerces C++ XML Parser makes use of C++ functions and
API s to access and manipulate an XML document. The API makes
use of a tree structure to parse through the file known as a DOM or
Document Object Model tree. One of the major requirements of an
XML file is the presence of a root tag, and it is this tag that forms
the root of the DOM tree.
The next step in the process is to begin the parsing of the file. For
this purpose, the root node is determined using the instruction:
parser -> parse( sourceFileName );
If there are any errors in the XML document, the are caught now,
and the corresponding error handler is used to dispense with the
necessary action. Following this, the parser is now used to obtain
the value of the root node.
The miscellaneous files contain code needed by all the other code
sections, yet do not belong to any of these sections exclusively.
The File Handling code deals with flushing the buffers that contain
the content to be put into the files in the Build directory. These can
be flushed in either an append mode or an overwrite mode. In our
implementation, the file handling code is of a very simple nature.
There exists only one major function which is used to write the
contents of a buffer onto a file whose file name is specified.
list<Logical_Register_File *>logical_register_file_list ;
list<Logical_Register_File *>::iterator
logical_register_file_list_iterator ;
Both the list and its iterator are declared in the include.h files as
extern global variables, but defined within this file.
There are similar lists for Physical register files and for DFAs. The list
of DFAs is used to search for the addressing mode of a particular
instruction operand.
The main files section corresponds to files such as the root C++ file,
the global include file and the Makefile used for compilation of the
entire project [8]. The Makefile has been configured in a manner that
permits us to compile only the necessary sections, possibly only the
Microarchitecture, at other times, only the instruction set code.
4 Execution
The steps involved in the execution of the code are very simple. Firstly, if the generator
has not been compiled yet, the compilation can be done using the all option prevalent in
the Makefile.
# make all
This will result in the formation of a simulator executable in the parent directory. This
is the simulator generator file, which is now used to generate the actual simulator for a
specified architecture specification. Assuming the Processor Specifications are written in
the file ./XML/sample.xml , the simulator is generated using two parses. The first
execution is used to generate the C++ code for the non-deterministic XML and RTL
specifications while the second parse is parse is used to write code that instantiates the
predefined classes such as Memory and the Logical Register Files.
# ./simulator ./XML/sample.xml 0
# ./simulator ./XML/sample.xml 1
The advantage of this two step parsing process is that suppose the architectural
specifications are changed, only the first execution step is needed to update the simulator.
On the other hand, if the input assembler has some of its instructions changed, or if a
different program is used altogether, only the second step needs to be executed to update
the simulator generated code.
After the above steps are executed, the code corresponding to the simulator would have
been created in the Build/ directory as mentioned under the implementation. The
simulator here simply needs to be compiled by making use of a different Makefile which
is provided in the Build/ directory.
# cd Build
# make
# ./build 25 15
When the assembly program is one which takes two inputs, prints the values and also
prints the sum and stores it in a certain memory location, the output for the above
execution is:
The value is : 25
The value is : 15
The value is : 40
- 26 -
6 Possible Enhancements
As specified in the results of the project, there are several enhancements that can be made
to the implementation:
Extended Bit Arithmetic Libraries to deal with cases where the sizes of integral
registers exceed 64 bits and that of floating point registers exceed 96 bits
respectively.
Validation Mechanism for the generated simulator in order to enable the user to
verify whether the execution of the simulator is the same as that of the original
system.
Provision for non C++ RTL usage in an XML input file, in effect, formation of a
new compiler for the Register Transfer Language with a simple grammar yet
powerful functionality.
Provision for the usage of Addressing Modes that exceed those already provided.
These modes should be user defined, especially with regard to the semantics of an
addressing mode.
- 28 -
7 References
[2] Prabhat Mishra, Nikil Dutt, Alex Nicolau. Functional Abstraction driven Design
Space Exploration of Heterogeneous Programmable Architectures, TR SM-
IMP/DIST/08, University of California, Irvine. March 2001.
[3] Mehrdad Reshadi, Prabhat Mishra, Nikil Dutt. Instruction Set Compiled
Simulation: A Technique for Fast and Flexible Instruction Set Simulation, University
of California, Anaheim. DAC, June 2003.
http://www.cecs.uci.edu/~aces/Instruction%20Set%20Compiled%20Simulation.pdf
[4] Wei Qin. Modeling and Description of Embedded Processors for the Development
of Software Tools. PhD Thesis, Department of Electrical Engineering, Princeton
University. August 2004.