Sei sulla pagina 1di 33

Automated Generation of Cycle Level Simulators for

Embedded Processors
[Main Project Report]

Submitted in partial fulfillment of the requirements for the award of the


Degree
of
Bachelor of Technology
of
The National Institute of Technology, Calicut

By
Arun C. Pullat
Y2326
Group No: 23
SL No: 08

Project Guide
Dr. Priya Chandran, Asst. Professor, Computer Engineering Department,
NITC

NATIONAL INSTITUTE OF TECHNOLOGY, CALICUT


Kerala 673601
April 2006
National Institute of Technology, Calicut
Department of Computer Engineering

Certified that this Main Project Report Entitled

Automated Generation of Cycle Level


Simulators for Embedded Processors

is a bonafide report of the work done by

Arun C. Pullat Y2326

Under the guidance of


Dr. Priya Chandran
Assistant Professor
Dept. of Computer Engineering

in partial fulfillment of the


Bachelor of Technology Degree

------------------------------------------------------------------------------------------------------------

Dr. Priya Chandran Dr. M. P. Sebastian


Assistant Professor Professor and Head
Dept. of Computer Engineering Dept. of Computer Engineering
Abstract

Embedded processors are changing the world. The development of scenario specific
processors with an emphasis on a highly specific functionality, fast development time and
high reliability has opened up several new vistas for the future of microprocessor
development. With the development of embedded processors likely to become THE
happening field in the next few years, an absolute necessity in processor development is
the use of a processor simulation tool to validate new embedded processor designs and
obtain performance statistics. Tools that automatically generate these simulators have
tremendous applicability. This project is an attempt at developing one such Automated
Generator of Cycle Level Simulators for Embedded Microprocessors.

The generator should conform to a number of requirements. It should be easy to use, after
all, the purpose of an automated generator is to prevent the user from making his very
own hand coded simulator by providing at least reasonable time gains. It should permit
the user to enter all kinds of processor details in a simple, yet highly flexible and powerful
manner. It should provide him with a number of modules that simulate commonly
prevalent processor functionality to simplify the entry procedure and generate new
modules of his own, if necessary. Once the processor details are assimilated and the
simulator generated, functionality must be provided to validate the execution of some
input programs as well as the collection of performance statistics. Several options must be
provided as well permitting the user to customize the manner of execution, for example,
the extent of coupling of performance and functionality. But most importantly, the
simulator generated should be on par with the hand coded equivalents as far as execution
time of sample input programs is concerned.
Contents
1 Introduction 1
1.1 Problem Specification ... 1
1.2 Motivation .. 1
1.3 Literature Survey 1

2 Design . 2
2.1 Execution Strategy 2
2.2 Generator Strategy . 2
2.2.1 Interpreted Approach 2
2.2.2 Compiled Approach 3

2.3 Architecture Description Language 3


2.3.1 Microarchitecture Design 4
2.3.1.1 Registers . 4
2.3.1.2 Memory .. 5
2.3.1.3 Pipeline Semantics . 5

2.3.2 Instruction Set Specification 6


2.3.2.1 Input Program Type 6
2.3.2.2 Addressing Modes .. 7
2.3.2.3 Semantic Functions . 7
2.3.2.4 Pipeline Behaviour . 8
2.3.2.5 Input File Parser . 8

2.3.3 Global Statistics Definition .. 9


2.3.3.1 Statistics Definition 9
2.3.3.2 Statistics Collection and Updation . 9
2.3.3.3 Statistics Display 9

2.4 Granularity .. 10

2.5 Other Design Issues 10


2.5.1 Generation Modes 10
2.5.2 Execution Modes .. 10

3 Implementation 11
3.1 Project Execution Overview 11
3.2 Code Layout during Implementation ... 12
3.3 Implementation Specifics ... 13

3.3.1 Microarchitecture Implementation 13


3.3.1.1 Physical Registers .. 13
3.3.1.2 Physical Register Files .. 14
3.3.1.3 Logical Registers .. 14
3.3.1.4 Logical Register Files 14
3.3.1.5 Memory . 15

3.3.2 Instruction Set Implementation . 16


3.3.2.1 Addressing Modes . 16
3.3.2.2 Instruction Class Generation . 17
3.3.2.3 Semantic Functions 18
3.3.2.4 Input File Parser 19

3.3.3 XML Related Files .. 19


3.3.3.1 Xerces C++ XML Parser .. 19
3.3.3.2 XML Input File . 21

3.3.4 Miscellaneous Files . 21


3.3.4.1 Type Conversion Functionality . 21
3.3.4.2 Automata Functionality . 21
3.3.4.3 File Handling . 22
3.3.4.4 String and Instruction Manipulation . 23

3.3.5 Data Files . 23


3.3.6 Main Files 24
3.3.7 Build Files 24

4 Execution . 25

5 Results of the Project . 26

6 Possible Enhancements .. 27

7 References 28
-1-

1 Introduction

1.1 Problem Specification


The main purpose of this project is to develop software that is able to generate a simulator
for any embedded processor upon taking the specifications of this processor as its input.
The simulation should subsequently be able to execute any application relevant to the
original processor. This application can be in assembly language or machine language.
The simulator should have a feature to determine both functional correctness and
performance, the exact extent of which is specifiable. The simulations should closely
match their hand coded equivalents in speed. The method of application execution on the
simulated machine can be either interpreted or compiled.

1.2 Motivation
Generating a working hardware model of every new Computer Architecture development
or modification is an impossibility due to the sheer cost involved. This is the reason why
most research work in this field is heavily dependent upon simulations of these new ideas.
Again, generating hand coded simulations for each new design is highly time consuming
and needs a large amount of effort. In such a scenario, it is obvious that an automated
generator of simulated processors would be extremely useful. All this generator should
need is the processor details given to it in a specific easily readable format.

1.3 Literature Survey


The first phase of the literature survey involved an introduction into the various
approaches used in Processor Description. This mostly involved an analysis of AUGUST
[1], the results of which are used extensively in the design of the project. The next phases
included analysis of alternative design techniques such as functional abstraction based
specification techniques [2] which deals largely with the RTL (Register Transfer
Language) angle of processor design; embedded processor modeling including a two layer
structure for processor modeling [4] and Binary decoder synthesis. The options for
Instruction Set Design Specification, especially with respect to the use of XML [1] and
specially customizable versions of XML such as PDXML [5], were studied. The
interpreted and compiled modes for the generation of the simulator ([1], [3]), were
compared to determine suitability for the project requirements. The final phase pertaining
to implementation, where the focus shifted from theoretical design and concept to
practical issues, dealt with extensive references to [1], [6], [7] and [8].
-2-

2 Design
There are several issues to be considered in the design of a system which automatically
generates a simulator for any target processor. These issues are all listed below in no
particular order of importance.

2.1 Execution Strategy


The design strategy that will be followed during implementation is the generation of an
Execution-driven simulator. Given that a host is a machine on which the simulation runs
and that the target is the machine whose execution is simulated, an execution driven
approach is one in which the target machine executes on the host machine taking an
assembly program for the target machine as its input [1]. Functionality is implemented
making use of the instruction set of the target and performance is also to be evaluated.
Alternative strategies such as Trace Driven executions [1] are also viable possibilities.

2.2 Generator Strategy


The simulator that is produced by the generator can execute a sample input program of the
target machine on the host machine by making use of two different strategies. The
strategy to be used, as specified in [1], is either interpretation or compilation.

2.2.1 Interpreted Approach


In an interpreted approach, the target machine architecture is taken as the
input to generate a simulated processor. The processor, at run time, parses
the input target machine program on an instruction by instruction basis. In
effect, the inputs in this strategy include the target machine program and its
corresponding input values. The benefits of this strategy are that there is a
clear tag on each instruction when its turn for execution comes up. Also,
this is also easier to understand and implement. Its drawbacks include a
low simulation speed, since each instruction is being parsed only at run
time.

PDL Specifications

Processor Simulation
Parser Simulator Statistics

Target Executable
+
Inputs

Fig 2.1 Block Diagram for an Interpreted Approach to Automated Simulator Generation [1]
-3-

2.2.2 Compiled Approach


In a compiled approach, the target machine executable program is taken
into consideration at compile time itself and not at run time. In this
strategy, new source code corresponding to the target machine program is
generated and compiled. At run time, only the inputs for this program need
to be provided. In effect, the compile time inputs include the processor
specifications at first, followed by the target machine program, while the
run time inputs include only the program run time inputs. The benefits of
this approach are a superior simulation speed of the order of a magnitude
better than the interpreted approach. The only drawback is the difficulty to
visualize and implement the approach in a simple manner.

PDL Specifications

Execution Simulation
Compiler Simulator Statistics

Target Program Inputs

Fig 2.2 Block Diagram for a Compiled Approach to Automated Simulator Generation [1]

2.3 Architecture Description Language


The next phase of the design of the project entails deciding on an Architecture Description
Language (ADL) ([4], [5]). This language should enable the user to provide all possible
details about the processor in a simple, machine readable manner. In other words, it
should be both simple and powerful. The primary components of the machine description
under an ADL include the Microarchitecture description, Instruction Set Specification and
the Global Statistics definition.

Architecture Description

Microarchitecture Instruction Set Statistics Collection


Design Design Specification

Fig 2.3 Block Diagram for the layout of a Generic Architecture Description
-4-

2.3.1 Microarchitecture Design

In Computer Architecture, microarchitecture refers to the design and layout


of a microprocessor, microcontroller, or digital signal processor, whose
considerations generally include the overall block design. Examples of this
include the presence of register and register files, the number of execution
units, the type of execution units (such as floating point, integer, branch
prediction, SIMD), the nature of the pipelining (which might include such
stages as instruction fetch, decode, assign, execution, completion in a very
simple pipeline), cache, memory design (level 1, level 2 interfaces), and
peripheral support. Hardware developers nowadays are beginning to assign
increasing importance towards design for manufacturing, design for low-
power consumption, and design for very high performance.

Examples of microarchitecture "schools of thought" include CISC


(complex instruction set computing), RISC (reduced instruction set
computing), VLIW (very long instruction word), and variants thereof.
Increasingly, the design of the compiler for high level language
programmability is viewed as part of the microarchitecture design decision
locus, but will not be an issue for consideration in our present problem. [6]

Microarchitecture
Design

2.3.2
Registers Memory Pipeline
Semantics

Fig 2.4 Block Diagram for the layout of a Microarchitecture Design [1]

The three main components of the microarchitecture in the design for the
problem comprise Registers, Memory and the Pipeline Semantics.

2.3.1.1 Registers

In Computer Architecture, a processor register is a small amount


of very fast computer memory used to speed the execution of
computer programs by providing quick access to commonly used
values typically, the values being calculated at a given point in
time. Most, but not all, modern computer architectures operate on
the principle of moving data from main memory into registers,
operating on them, then moving the result back into main
memory a so-called load-store architecture.
-5-

Processor registers are the top of the memory hierarchy, and


provide the fastest way for the system to access data. The term is
often used to refer only to the group of registers that can be directly
indexed for input or output of an instruction, as defined by the
instruction set. More properly, these are called the "architectural
registers". For instance, the x86 instruction set defines a set of eight
32-bit registers, but a CPU that implements the x86 instruction set
will contain many more registers than just these eight. [6]

2.3.1.2 Memory

Computer storage, or computer memory, refers to the system


components, devices and recording media that retain binary
information for some interval of time. In casual language, memory,
or active memory usually refers to random access memory (RAM),
or other forms of fast but temporary (non-persistent) storage, while
storage or long-term memory typically refers to hard disks (HD)
and other forms of storage which are slower to access but persistent
when power is switched off.

In our design, we refer to Memory as that storage that can be


directly accessed by the central processing unit of the computer.
Primary storage typically consists of three kinds of storage -
registers, which are internal to the central processing unit. Registers
contain information that the arithmetic and logic unit needs to carry
out current instruction. They are specified in detail above; Main
Storage contains the programs that are currently being run and the
data the programs are operating on. The arithmetic and logic unit
can quickly transfer information between a processor register and a
location in main storage. In modern computers, RAM is used for
main storage. When people refer to computer memory, they usually
mean main storage; Processor Cache is a special class of storage
used by some central processing units. Some of the information in
the main storage is duplicated in the processor cache, which is
slightly slower but of much higher capacity than processor
registers, and significantly faster than main storage. [6]

2.3.1.3 Pipeline Semantics

An instruction pipeline is a technique used in the design of


microprocessors and other digital electronic devices to increase
their performance. Pipelining reduces cycle time of a processor and
hence increases instruction throughput, the number of instructions
that can be executed in a unit of time. For example, the RISC
pipeline is broken into five stages with a set of flip flops between
each stage, Instruction fetch, Instruction decode and register fetch,
Execute, Memory access and finally Register write back. [6]
-6-

In a simulator generator, the number of stages in a pipeline is


arbitrary in nature, in other words, the number of stages as well as
the functions performed in each of these stages is to be determined
using the ADL. For this non-deterministic behaviour, there needs to
be provision for a Register Transfer Language (RTL) which is to be
parsed using a separate compiler.

2.3.2 Instruction Set Specification


The instruction set specification deals with the manner in which the
simulator generator deals with the target machine program whose
execution is to be simulated. In this section, the critical design factors are
the mode of program input, the operand processing techniques, the
predefined functions to be provided for easier development and the
pipeline behaviour for each instruction.

Instruction Set
Design

Addressing Semantic Instruction Input File Parser


Modes Functions Pipeline Design
Specification Provision Behaviour

Fig 2.5 Block Diagram for the layout of an Instruction Set Specification

2.3.2.1 Input Program Type

The first major design issue to be taken into consideration with


regard to instruction set specification is the type of the input target
machine program, it can be either in the binary format or in the
assembly mode, depending on which several other design decisions
need to be made. For instance, if the program is in the binary
format, there needs to be some mechanism by which each
instruction is parsed and converted into its equivalent assembly
instruction. Following this, the mnemonic, source and target
operands can be obtained and the execution simulated. On the other
hand, while using assembly input, issues such as the absence of bit
boundaries in instructions arise, and these need to be handled
separately from the actual input program. Also, since there are no
more binary values directly provided within the instruction,
instructions which make use of branches or memory references,
that is, instructions which need to access either data or instruction
memory, become more difficult to handle.
-7-

2.3.2.2 Addressing Modes

In the scenario of embedded processors, the need for the provision


of multiple addressing modes becomes a necessity. Due to the non-
deterministic nature of the addressing modes in use, provision
needs to be provided to the user to define and use any addressing
mode he wants in any manner he deems necessary.

Addressing modes, in short, define how architectures specify the


address of an object they need to access. Addressing modes also
specify constants and registers in addition to locations in memory.
When a memory location is used, the actual memory address
specified by the addressing mode is called the effective address.

Addressing modes have the ability to significantly reduce


instruction counts. They also add to the complexity of building a
computer and may increase the average CPI (clock cycles per
instruction) of computers that implement these modes. Thus, the
usage of various addressing modes is quite important in helping the
architect choose what to include. The most commonly used
addressing modes are immediate, displacement and register indirect
modes.

Addressing modes behaviour should comprise of regular expression


facility denoting the instruction operands in the case of assembly
operands, while in the case of binary information, there should be
the ability to specify bit boundaries. This needs both deterministic
functionality in terms of pre-defined addressing mode types as well
as an RTL to specify the behaviour when it is not pre-defined, but a
new concept being introduced by the user himself.

2.3.2.3 Semantic Functions

Semantic Functions refer to those precompiled and predefined


functions that can be used by a user in his RTL code to simplify it.
For instance, providing an addInt function that takes two integers as
its inputs and returns an integer can simplify the RTL code, as all
the user needs to do to add two integers is use this function and
pass it the necessary arguments. All other issues such as overflow
and type checking will be done automatically. The design decisions
to make here include the selection of functions that can be provided
to the user in a manner that increases his power yet does not reduce
his flexibility, while designing a new processor, as well as the
behaviour of each function.
-8-

2.3.2.4 Pipeline Behaviour

Pipelining is the most significant design consideration in any


modern processor. As specified in the pipeline semantics section in
the microarchitecture design description an instruction pipeline is a
technique used in the design of microprocessors and other digital
electronic devices to increase their performance. Pipelining reduces
cycle time of a processor and hence increases instruction
throughput, the number of instructions that can be executed in a
unit of time. For example, the RISC pipeline is broken into five
stages with a set of flip flops between each stage, Instruction fetch,
Instruction decode and register fetch, Execute, Memory access and
finally Register write back. [6]

Every instruction has certain functions that are performed in each


cycle or sub-cycle pertaining to a pipeline stage. This behaviour has
to be explicitly specified here by making use of an RTL section,
since this behaviour is non-deterministic in nature. For instance, in
a simple MIPS pipeline, the load instruction loads its register
operand with the value at the memory location in the ID stage, the
add instruction performs the add operation at the EXE stage, while
the store instruction stores the register value into the memory
location only in the fourth stage. This behaviour can be specified
either with regard to the instruction or with regard to each
addressing mode, which is also another design issue.

2.3.2.5 Input File Parser

The final component of the instruction set specification is the Input


Assembly File Parser. The strategy to be used in parsing this file
depends on the Input Program Type, that is, whether the input file
is in the assembly form or in the binary form. The tokenizing
strategies to be used in either case vary vastly, for instance, the
separators to be used in the binary case are the bit boundaries, while
word separators such as spaces or commas specified in the
instruction syntax are the markers in the assembly case. Further, the
number of instructions to be fetched will depend on the Instruction
Fetch mechanism used in the pipelining stages. Hence, the parser
retrieves the instructions in the manner specified by the Instruction
Fetch RTL Semantics and pushes it into the pipeline for execution
and simulation purposes.
-9-

2.3.3 Global Statistics Definition


Along with the provision for specifying the behaviour of the processor, the
main intention of any simulation is to provide the user with tools to
analyze any section of the execution of the target machine program, as well
as obtain run time statistics on the processor. The design in this project
takes into consideration the concept of Global statistics that are defined by
the user. The user needs to define the statistics, as well as the updation and
display details in a combination of the Architecture Descriptor Language
and the Register Transfer Language. All the work will be done
automatically by the simulator, once these specifications are in place.

Global Statistics
Specification

Statistics Statistics Collection Statistics Display


Definition and Updation

Fig 2.5 Block Diagram for the layout of the Global Statistics Specification

2.3.3.1 Statistics Definition

Every statistic to be used during execution is to be defined by the


user, since the necessary statistics change from processor to
processor. The user needs to define all the entities of a statistic
using the ADL itself. For instance, the statistics that a particular
processor needs or a particular simulation run needs can be the
number of pipeline stalls, the number of clock cycles per
instruction on an average, the execution speed etc.

2.3.3.2 Statistics Collection and Updation

The statistics defined in the above section need to be updated at


certain sections or during certain events during the simulation of
the execution of the target machine program. The phases or cycles
or events when these updations take place should be defined using
the ADL while the exact formulae or procedure to be used for the
collection or updation has to be specified by making use of an RTL.

2.3.3.3 Statistics Display

The statistics that were collected using the earlier phases need to be
displayed at certain intervals or upon the occurrence of certain
events during the course of the execution, or after the entire
execution procedure is completed. This behaviour is to be specified
using an ADL with the events or cycles explicitly defined.
- 10 -

2.4 Granularity
Another crucial design decision is the provision of multiple granularities of importance
for the architectural details. For instance, at very low granularities the simulation will be
working at the functional units level. Above this comes the cycle level granularity and so
on, until we attain an instruction level granularity or even an instruction word granularity.
The major aspect to be taken into consideration here is the payoff between performance
and accuracy. The lower the granularity, the better the simulator accuracy and the worse
its performance, but the situation begins to reverse as the granularity increases up to the
instruction level. In effect functionality needs primarily the instruction set specified and
can be used to show whether the application runs correctly on this target architecture
while performance makes use of the micro architectural specifics to determine the
execution time in clock cycles etc. These two aspects are not complementary ([1], [4]). A
greater diversion into one would negatively affect the simulation of the other. The extent
of depth into both is an important feature which can be decided by the user at runtime.
Finally, one of the desirable attributes of the simulator is that it should be able to perform
almost on par with hand coded simulations of the same processors. This can be done using
a number of design tweaks ([1], [2], [3], [4]). The comparison of performance and
exactness of functionality can be the deciding factor in this case.

2.5 Other Design Issues


2.5.1 Generation Modes
Most modern processors are highly complex entities with a number of
variable features. If the entire simulator needs to be generated from scratch
for every minor design modification, the time spent in compilation will
offset most of the benefits of a fast run time execution. Hence, in our
problem design, functionality should be provided for three possible cases,
full generation, architecture generation (where the inputs remain the same
but some architectural feature has been modified) and program
regeneration (where the architecture is unchanged but a new input program
is used) ([1], [2]).

2.5.2 Execution Modes


Upon execution of a target machine program, the user may want a degree
of customizability with regard to the information on display and
indications of execution status, due to which, there needs to be provision
for a normal mode, debug mode and user configurable mode (instruction
execution or cycle level output) [1]. To be more specific, the extent of
depth involved, in other words, whether an Instruction Set Simulator (a
functional simulator for a processor that does not generate timing
information) [4]or a Cycle Accurate Simulator, one that generates
functional results as well as timing information [4], is the main target of
development, needs to be explicitly specified by the user.
- 11 -

3 Implementation
As a walkthrough of the implementation of the project, we shall first focus on the
design decisions made use of from the above Design sections and then proceed to
an in-detailed explanation of each phase of the project development.

3.1 Project Execution Overview

Processor Description Non-Deterministic


Input File (XML) Entities C++ Code

Deterministic Entities Compiled Instruction


C++ Code Set Simulator

Input Assembly
Language File

Fig 3.1 Block Diagram of the Execution of the Simulator Generator

In the implementation, the user needs to specify the processor details in the
specification document. In our project, the specification language used is
the eXtensible Markup Language or XML in short. All the deterministic
content is provided in XML while the non-deterministic content is
provided as a combination of XML and an RTL. In the implementation, the
RTL made use of is simple C++ code which is just directly embedded into
the generated code without any intermediate processing.

The core generator language, that is, the language used to develop the
generator is C++. The compiler used for the developer as well as
automatically generated code is the GNU C++ compiler, preferably version
3.3.1 or later. The multiple files in the implementation are linked together
using a Makefile implementation with a number of design tweaks. The
parser used for the XML processor specification is the Xerces C++ parser
developed by the Apache group. Documentation regarding the API s for
this parser is freely available at [7].

The core of the execution comprises conversion of the non-deterministic


XML and RTL code into further C++ code. The second parse generates
instantiation code corresponding to certain XML code and combines this
with the earlier C++ code, the input assembly file and the predefined C++
code to generate the code for the simulator which can now be compiled and
executed for the given assembly code.
- 12 -

3.2 Code Layout during Implementation


The implementation of the above design leads to an implementation
structure which is represented by the tree structure below:

Code Layout

Memory Logical Registers

Architectural Entities Physical Registers

Registers
Build Files Logical Register File

Physical Register File


Data Files

Addressing Modes

Instruction Set Files Semantic Functions

Input Parser

Conversion Functions

Automata Functions
Miscellaneous Files
File Handling

String Manipulation

Sample XML Input


XML Files Deterministic
XML Parser
Non-Deterministic

Executable
Main Files
Makefile

Fig. 3.2 Distribution of Code in the implementation of the problem


- 13 -

3.3 Implementation Specifics

3.3.1 Microarchitecture Implementation


The Microarchitecture entities are among the first components that
need to be generated as part of the simulation process. These include
the registers used, both physical and logical, the corresponding register
files, system memory and pipeline semantics, the last part of which was
not implemented due to a lack of time.

The critical aspect to consider while developing all microarchitectural


entities in code is that the classes can be defined during the
development of the project itself and not from the specification input.
The input is used only for instantiating these classes and assigning
them their run time parameter values. This is not the case in pipeline
semantics, where the input RTL code needs to be converted to the
corresponding C++ code.

3.3.1.1 Physical Registers

A physical register is a small amount of very fast computer


memory used to speed the execution of computer programs by
providing quick access to commonly used values, typically, the
values being calculated at a given point in time.

In the implementation, a register can be thought of as a location for


storage of a single value of a particular type, whose speed of
retrieval is very fast. In the implementation, a physical register is
the equivalent of a reservation station. This cannot be directly
referenced, but can be accessed by means of a logical register.

The implementation does not have extended bit arithmetic due to


which a few specially defined classes have been provided for each
register. Registers can range from 8 bits to 96 bits. To be more
precise, the C++ data type of the registers can be of either integer
type - char (8 bit), short int (16 bit), int (32 bit), long long (64 bit);
or of floating point type float (32 bit), double (64 bit) or long
double (96 bits). This is specified using a combination of the type
and bitsize attributes of the physical register file.

Physical registers are created when a corresponding Physical


Register File is created. It has no independent existence of its own
but is referred to by an index in the physical register file.
- 14 -

3.3.1.2 Physical Register Files

A Physical Register File is an array of physical registers along with


an index for each one of these physical registers. Any architecture
may have one or more physical register files. The properties of a
Physical register file are, in essence, the same as that of its
component registers.

Some sample XML code for a physical register file is given below:

<physical_register_file>
<name>pint</name>
<count>32</count>
<type>integer</type>
<bitSize>32</bitSize>
</physical_register_file>

The above code is used to define a physical register file called


pint consisting of 32 physical registers of width 32 bits each. The
type of the register is integer, which in turn determines the data
type of the physical registers as int .

Since there can be a number of physical register files in a single


architecture description, there exists a global list of physical
register files, which can be searched in a linear manner.

3.3.1.3 Logical Registers

Logical Registers are not the microarchitectural entities used to


store and process data or values, but in effect refer to a physical
register which holds the actual value. Hence, a logical register may
be looked upon as comprising of a name by means of which it is
referred to in assembly mnemonics, as well as either a pointer to a
physical register or an index field which holds the value of a
physical register index.

As in the case of Physical Registers, Logical Registers also do not


have an independent existence of their own. They come into
existence only as part of a Logical Register File. Its type is decided
in the same way as that of a physical register.

3.3.1.4 Logical Register Files

A Logical Register File is an array of logical registers along with an


index for each one of these logical registers. Any architecture may
have one or more logical register files. The properties of a Logical
register file are, in essence, the same as that of its component
registers.
- 15 -

Some sample XML code for a physical register file is given below:

<logical_register_file>
<name>gpr</name>
<count>32</count>
<sign>false</sign>
<type>integer</type>
<bitSize>32</bitSize>
<mnemonics>zero at v0 v1 a0 a1 a2 a3 t0 t1
t2 t3 t4 t5 t6 t7 s0 s1 s2 s3 s4 s5 s6 s7
t8 t9 kt0 kt1 gp sp s8 ra</mnemonics>
<map>pint</map>
</logical_register_file>

The above XML code is used to define a logical register file named
gpr comprising of 32 logical registers of type integer and a width
of 32 bits, in essence data type int . The critical entities here are
the mnemonics field which specify the names of the logical
registers from index 0 to count-1 and the map field which
specifies the physical register file which is mapped to by the logical
register file. In most system architectures, the mapping scheme is
based on register renaming, but due to the inability to implement
pipelining in our project, the subsequent lack of out of order
execution negates the need for the provision of register renaming.
We have made use of a simple one-on-one mapping scheme
between the component logical registers and their corresponding
physical registers.

3.3.1.5 Memory

The implementation of memory in our project is an exceedingly


simple one. There exists a character array of a certain length in
bytes that has to be specified by the user in his XML specifications.
Values of the types specified above can be stored into this memory
and retrieved using extensive pointer usage. Byte boundaries for
data types that exceed one byte need to be strictly adhered to. The
manner of data storage can be either little endian or big endian, but
is little endian by default. The memory can also correspond to data
memory or instruction memory. The only limitation that the
program defines is the restriction of a single stretch of instruction
memory and a single stretch of data memory, that is, only one array
can be defined for each.

Sample XML code for Memory is as shown below:

<memory>
<name>instruction_memory</name>
<type>instruction</type>
<size>1000000</size>
</memory>

<memory>
<name>data_memory</name>
<type>data</type>
<size>3000000</size>
</memory>
- 16 -

The code above is reasonably obvious and is used to define


3,000,000 bytes of data memory and 1,000,000 bytes of instruction
memory to be used by the target machine program.

The memory is capable of handling exceptions such as a


segmentation fault. There are crucial functions to convert the host
system memory references to the target machine memory
references by means of address translation. Functions such as getInt
and putInt are used to retrieve and store values at locations
specified by a base address and an optional offset and offset size to
aid in array usage.

3.3.2 Instruction Set Implementation

The conversion of the input XML instruction set design specification


into C++ code is the next major code division. The XML file is parsed
to generate classes corresponding to each instruction with fields for
each of its operands and functions necessary for the initialization,
assignment and modification of these fields. The input assembler file is
parsed in the second phase and each instruction line is instantiated
using its corresponding class and the values assigned using the
operands and their types.

3.3.2.1 Addressing Modes

An important concept used here is that of addressing modes. Since


the input here is at the assembly level and not at the machine code
level, addressing modes can come into the picture such as register,
direct, immediate and register indirect. Each instruction can make
use of any of these addressing modes, whose behaviour and value
setting and retrieval is abstracted from that of the instruction itself.
This separation in behaviour is immensely beneficial in the
pipelining implementation where the actions done in each pipeline
cycle need not be specified for each instruction, but only for the
corresponding operand addressing modes, something that can
considerably reduce user XML code. As of the current
implementation, this reduces the instruction semantics specification
only, since pipelining has not been implemented.

The XML Sample Code for a few addressing modes are:

<addressing_mode>
<name>register</name>
<regex>r</regex>
</addressing_mode>
<addressing_mode>
<name>immediate</name>
<regex>#n</regex>
</addressing_mode>
<addressing_mode>
<name>displacement</name>
<regex>n(r)</regex>
</addressing_mode>
- 17 -

Here, we find the attributes used as name and regex . The name
specifies the mechanism to be used to obtain operand values in our
implementation, since there are a few predefined addressing modes
already available for use. For all other modes, an RTL needs to be
implemented, similar to the pipeline semantics design. The regex or
regular expression field though provides a degree of flexibility to
the user to specify the syntax of his operands corresponding to the
predefined addressing modes. In most processors, this would be
sufficient, but this need not be the case for embedded processors.
The regular expression field makes extensive use of DFAs in its
working for both determining the operand type as well as to extract
values.

3.3.2.2 Instruction Class Generation

The bulk of the work in Instruction Set Specification is the


generation of classes corresponding to each instruction. Since the
instruction set specification is non-deterministic in nature, the XML
and RTL embedded C++ code are combined to generate further
C++ classes for each instruction. The conversion is best illustrated
by an example:

Sample XML code for an instruction specification:

<instruction>
<mnemonic>add</mnemonic>
<field>
<name>dest</name>
<type>int</type>
</field>
<field>
<name>source1</name>
<type>int</type>
</field>
<field>
<name>source2</name>
<type>int</type>
</field>

<rtl>
dest = addInt( source1, source2 );
</rtl>
</instruction>

As shown above, the instruction specification is very simple, with


the focus on the mnemonics, number of fields and finally the RTL
code, which is simple embedded C++ code to denote the semantics
of the instruction. The main components of the C++ class code
generated by the above are shown below:
- 18 -

class Instruction_add
{
public:
int fieldCount ;
int dest ;
int destIndex ;
int source1 ;
int source1Index ;
int source2 ;
int source2Index ;
char fieldValue[3][16] ;

Instruction_add();
int getField( char * instruction );
int getValue();
int setValue();
int semantics()
{
getValue();

dest = addInt( source1, source2 );

setValue();
return 0;
}
};

The implementation code embedded within each of these functions


with the exception of the semantics function has been removed. In
the class specified above, the fields dest, source1 and source2 are as
declared in the XML specification. The getField function is used
for parsing the input assembly instruction and storing the operand
fields within the fieldValue array. The getValue function is used to
obtain the values of each of these operands and store them in the
corresponding local field variables. The setValue function parses
the RTL code to determine the destination and stores the result of
the calculations from the dest field into the corresponding
destination memory address or register. The semantics function, as
shown, just calls getValue to update the field variables, embeds the
RTL C++ code to update them and makes use of setValue to write
back the destination value. Most instructions make use of a similar
class structure, with the variants being the class name, the number
of fields and the embedded C++ semantics code.

3.3.2.3 Semantic Functions

The instruction semantics are further aided by the presence of


predefined and precompiled semantic functions that can be made
use of by the user XML code while specifying the overall
instruction behaviour. These include functions for mathematical
operations, memory access and input and output. For example, as
seen in the instruction semantics above, the addInt function was
used to add two integers and return the integer value. All issues
such as overflow and type mismatch were handled by this function
itself. Several such functions are available for use by the architect.
These are predefined and precompiled, and are hence faster to
execute and easier to use.
- 19 -

3.3.2.4 Input File Parser

The input file parser is the instruction set functionality that parses
an input assembler file and obtains the instructions from the file. In
the case of an assembler file, all there is to implementation is the
extraction of a single line from the input program and instantiation
of an instruction class based on the assembler mnemonic used. The
assembler mnemonic can be obtained by tokenizing this instruction
and by obtaining the first word in any instruction.

3.3.3 XML Related Files

This section consists of the input XML files used for test purposes as
well as the parser for the same. For the purposes of this project, I have
used the Xerces C++ parser by the apache group. This software is free
software and available for download. The API s needed for parsing an
XML document are available as part of the Xerces package and the
documentation for the same can be found at the Xerces site [7]. A good
understanding of an XML DOM tree structure is necessary first.

The XML Parser consists of a deterministic and non-deterministic part,


corresponding to the parts that are used for generating instantiation
code and the parts that are used to generate classes. Both these sections
need to be invoked in separate runs of the simulator generator to
generate the corresponding Build files. The parser needs a syntactically
accurate XML file as its input as it is not armed to deal with syntax
errors, all it does is indicate the presence of an error. A general
introduction into parsing an XML file using a combination of the DOM
tree and the Xerces parser can be given here.

3.3.3.1 Xerces C++ XML Parser

The Xerces C++ XML Parser makes use of C++ functions and
API s to access and manipulate an XML document. The API makes
use of a tree structure to parse through the file known as a DOM or
Document Object Model tree. One of the major requirements of an
XML file is the presence of a root tag, and it is this tag that forms
the root of the DOM tree.

The DOM tree consists of a number of entities including elements,


their children and their attributes. The mechanism used in our
implementation to parse through the tree is a recursive one, which
when given a node goes through its properties, parses through its
attributes and finally, parses through its children one sibling at a
time. In effect, the strategy used to parse through the tree is
generally a DFS, but in the case of tags which are deterministic in
nature, the mechanism resembles a local BFS but a global DFS.
- 20 -

The APIs provided by the Xerces Parser is similar to other DOM


Parsers. The XML Platform is first initialized by making use of the
XMLPlatformUtils :: Initialize(); function. This is
followed by the creation of a new Dom Parser to parse through the
DOM tree by making use of the instruction:
XercesDOMParser * parser = new XercesDOMParser;

This is then followed by the initialization of the parser to our


current requirements:

parser -> setValidationScheme( XercesDOMParser ::


Val_Auto );
parser -> setDoNamespaces( false );
parser -> setDoSchema( false );

The next step in the process is to begin the parsing of the file. For
this purpose, the root node is determined using the instruction:
parser -> parse( sourceFileName );

If there are any errors in the XML document, the are caught now,
and the corresponding error handler is used to dispense with the
necessary action. Following this, the parser is now used to obtain
the value of the root node.

DOMNode * pDoc = parser -> getDocument();

The manipulation of the document is now done by making use of


this root node stored in (*pdoc). We can recursively go through the
tree by using a root node, obtaining its first child and going deeper
into the tree. The Second child onwards can be accessed as siblings
of the previous child node.

node = parentNode -> getFirstChild();


node = node -> getNextSibling();

The tag or element name can be obtained using:


XMLString::transcode( node -> getNodeName() );

While the value or text (if any) stored within the


tags can be obtained using:
XMLString::transcode( node -> getNodeValue() );

The DOM Tree can hence be progressively traversed to obtain the


necessary tags and their corresponding values. In case attributes are
used in an XML file, they are not considered as elements but can be
accessed by making use of the attlist property of each element. The
attributes are returned as a list which can be searched in a linear
manner. Since our implementation avoids the use of attributes, we
shall not delve deeper into that topic.
- 21 -

3.3.3.2 XML Input File

The processor specifications are input using an XML file which is


referred to as the XML input file. This file is the one which is
parsed by the Xerces C++ APIs. In case of an error, the API refuses
to parse the file any further and indicates the same. Some of the
samples of XML code from the file have been shown in the
sections preceding this section. The requirements of this file are
simple, it should make use of the proper XML syntax and it does
not need any corresponding Document Type Definition file.

3.3.4 Miscellaneous Files :

The miscellaneous files contain code needed by all the other code
sections, yet do not belong to any of these sections exclusively.

3.3.4.1 Type Conversion Functionality

The conversions files and functions are provided to convert values


from one data type to another. This mainly deals with conversions
between integral values, strings and Boolean arrays. The functions
pertaining to Boolean arrays have been provided with extended bit
arithmetic in mind, in other words, usage of integral values which
need more than 64 bits for their representation. An example of a
conversion function is:

unsigned short int btousi( bool[], int );

The above function is used to convert a Boolean array of a


specified bit size, for example 16, into an unsigned short integer.
There exist several other functions for conversions between the
Boolean arrays and their actual integral values. Floating point
Boolean arrays have not been taken into consideration. In addition
to the Boolean related functions, there also exist functions for the
conversion of strings to a series of other data types.

3.3.4.2 Automata Functionality

The Automata functions code deals with the generation of DFAs


for use in the context of addressing modes. There are three issues to
be considered here the generation of a DFA using an input regular
expression, parsing a DFA with an input operand to determine if
the operand actually is of that particular addressing mode and
finally, parsing the operand to obtain its value. The input
addressing modes are specified by the user in the input XML file
along with the regular expression with r denoting registers and
n denoting numbers. Once this class is generated, the operand
input fields from the assembler file are parsed using all possible
DFAs to determine the correct addressing mode. Once a match is
found, the corresponding value is either set or retrieved depending
on context.
- 22 -

Some of the functions used in the context of Automata generation


and manipulation are listed below. The generate DFA function
makes use of a regular expression as an input to generate the DFA.
There is a standard behavioural pattern for most nodes in a DFA.
The n node for instance is entered when a digit is parsed, is
retained as long as a digit continues to stream in and exits for any
other character. The DFA is simulated by a 255 element array for
every node with the next node value being the content of the array
with the array index being the corresponding ascii element.
int generateDFA( char * );

The parse function takes the assembly instruction operand as its


input and checks if the operand string successfully moves through
the DFA in focus. If the final state is a success state, true is
returned, else a false value is returned.
bool parse( char * );

The getValue function is used to obtain the value of an operand of


the given addressing mode type. This function is used only in case
the parse return value for the same operand for the given DFA is
true.
int getValue( char *, Memory * );

The setValue function is used to set the value of an operand of the


given addressing mode type with the specified argument value.
This function is used only in case the parse return value for the
same operand for the given DFA is true.
int setValue( char *, int, Memory * );

3.3.4.3 File Handling

The File Handling code deals with flushing the buffers that contain
the content to be put into the files in the Build directory. These can
be flushed in either an append mode or an overwrite mode. In our
implementation, the file handling code is of a very simple nature.
There exists only one major function which is used to write the
contents of a buffer onto a file whose file name is specified.

int writeContents( char * fileName, char *


writeFileContents, int mode );
- 23 -

3.3.4.4 String and Instruction Manipulation

The String and Instruction manipulation functions consist primarily


of those functions that are not readily available in C++ to perform
string manipulation, for example, the string tokenizing function.
The files also contain the code necessary to parse the assembler
input file instructions, determine the operands used, determine
which operand is a target and which operands are sources,
elimination of unwanted white spaces etc. All string parsing
functions are defined within these confines.

The major functions comprising this section are specified below.


The tokenize function is a very rudimentary string manipulation
function which converts a sentence into an array of words with
almost every word separator taken into consideration.

int tokenize ( char * sentence, char ** wordList );

The tokenizeInstruction function is very similar to the above except


that it is developed for a more specific purpose to parse assembly
instructions. In this case, the list of separators do not include
symbols such as brackets, square brackets and braces that are part
of instruction operands.

int tokenizeInstruction ( char * sentence, char


wordList[][16] );

The getAssignmentTarget is a unique function in that it is used to


obtain the target in any assignment expression. For instance, in the
expression a = b + c, the function will store the value a in the
target string.

int getAssignmentTarget ( char * target, char *


sentence );

The numberOfTokens function is a very simple function which


tokenizes a string but only returns the number of words in it and not
the words as such.

int numberOfTokens ( char * sentence );

3.3.5 Data Files :

Data Files correspond to the instantiation of data that are used as a


bridge between the Microarchitecture and instruction set design code.
For instance, there is a list which specifies all the logical register files
in use, which is searched linearly to obtain the physical register
corresponding to any logical register, as long as it exists. There also
exists a DFA list for determining the Addressing Mode of an operand
used.
- 24 -

Whenever a register is encountered in an instruction, the list of logical


register files is searched to see whether the register is actually part of a
particular logical register file.

list<Logical_Register_File *>logical_register_file_list ;
list<Logical_Register_File *>::iterator
logical_register_file_list_iterator ;

Both the list and its iterator are declared in the include.h files as
extern global variables, but defined within this file.

There are similar lists for Physical register files and for DFAs. The list
of DFAs is used to search for the addressing mode of a particular
instruction operand.

3.3.6 Main Files :

The main files section corresponds to files such as the root C++ file,
the global include file and the Makefile used for compilation of the
entire project [8]. The Makefile has been configured in a manner that
permits us to compile only the necessary sections, possibly only the
Microarchitecture, at other times, only the instruction set code.

Some sample Makefile code, corresponding to the instruction set is


shown below:

# Compiling the Necessary Instruction Entities


instruction : ${INSTRUCTION}/addressing_modes.cpp

g++ -ggdb3 -o ${OBJS}/addressing_modes.o -L ${LIBS} -I


${INCS} -c ${INSTRUCTION}/addressing_modes.cpp

g++ -ggdb3 -o ${OBJS}/semantic_functions.o -L ${LIBS}


-I ${INCS} -c ${INSTRUCTION}/semantic_functions.cpp

g++ -ggdb3 -o ${OBJS}/file_parser.o -L ${LIBS} -I


${INCS} -c ${INSTRUCTION}/file_parser.cpp

3.3.7 Build Files :

The code that is generated automatically by the simulator generator


resides here. This includes files corresponding to both deterministic
and non-deterministic parsing of the XML File. The Non-Deterministic
class definitions are found in certain files, while code corresponding to
instantiation of these and other classes is generated during the second
parse of the XML and assembler input files. Compilation of the files in
this section is done using a separate Makefile. There also exists a
global header file to link the contents of the deterministic and non-
deterministic contents, so that all name dependencies are resolved at
run time. Upon building these files, the executable is now the compiled
simulator of the processor, with the assembler program compiled along
with it.
- 25 -

4 Execution
The steps involved in the execution of the code are very simple. Firstly, if the generator
has not been compiled yet, the compilation can be done using the all option prevalent in
the Makefile.

# make all

This will result in the formation of a simulator executable in the parent directory. This
is the simulator generator file, which is now used to generate the actual simulator for a
specified architecture specification. Assuming the Processor Specifications are written in
the file ./XML/sample.xml , the simulator is generated using two parses. The first
execution is used to generate the C++ code for the non-deterministic XML and RTL
specifications while the second parse is parse is used to write code that instantiates the
predefined classes such as Memory and the Logical Register Files.

# ./simulator ./XML/sample.xml 0
# ./simulator ./XML/sample.xml 1

The advantage of this two step parsing process is that suppose the architectural
specifications are changed, only the first execution step is needed to update the simulator.
On the other hand, if the input assembler has some of its instructions changed, or if a
different program is used altogether, only the second step needs to be executed to update
the simulator generated code.

After the above steps are executed, the code corresponding to the simulator would have
been created in the Build/ directory as mentioned under the implementation. The
simulator here simply needs to be compiled by making use of a different Makefile which
is provided in the Build/ directory.

# cd Build
# make
# ./build 25 15

When the assembly program is one which takes two inputs, prints the values and also
prints the sum and stores it in a certain memory location, the output for the above
execution is:

The value is : 25
The value is : 15
The value is : 40
- 26 -

5 Results of the Project


The initial aim of the project was to develop a fully functional generator program. But the
project has fallen short of this objective. The currently present generator is capable of
utilizing the deterministic classes developed as part of the first parsing as well as
generating new classes for instructions in the Build Files section upon a second parsing.
The product is capable of parsing and executing a simple assembly language program but
without pipelining taken into consideration. In other words, an Instruction Set Simulator
(a functional simulator for a processor that does not generate timing information) has been
developed, with a few minor deficiencies such as the absence of a provision for branching
instructions. Pipelining is part of the next step to be taken regarding the project and is a
logical extension to the highly modular and categorized code used during development,
with most of the work involved in RTL parsing and register renaming functionalities.
- 27 -

6 Possible Enhancements
As specified in the results of the project, there are several enhancements that can be made
to the implementation:

Pipelining, with the focus on cycle accurate simulation of processor execution


including hazards and stalls.

Support for branching with regard to both hardware and software.

Extended Bit Arithmetic Libraries to deal with cases where the sizes of integral
registers exceed 64 bits and that of floating point registers exceed 96 bits
respectively.

Validation Mechanism for the generated simulator in order to enable the user to
verify whether the execution of the simulator is the same as that of the original
system.

A Mapping scheme between physical and logical registers which is specified by


the user by means of an RTL along with functionality that enables as well as
supports the concept of register renaming.

Provision for non C++ RTL usage in an XML input file, in effect, formation of a
new compiler for the Register Transfer Language with a simple grammar yet
powerful functionality.

Provision for the usage of Addressing Modes that exceed those already provided.
These modes should be user defined, especially with regard to the semantics of an
addressing mode.
- 28 -

7 References

[1] Priya Chandran. Automatic Generation of Compiled Cycle Level


Microarchitecture Simulators for Super speculative Processors. PhD Thesis, Indian
Institute of Science, Bangalore, India. June 2004.

[2] Prabhat Mishra, Nikil Dutt, Alex Nicolau. Functional Abstraction driven Design
Space Exploration of Heterogeneous Programmable Architectures, TR SM-
IMP/DIST/08, University of California, Irvine. March 2001.

[3] Mehrdad Reshadi, Prabhat Mishra, Nikil Dutt. Instruction Set Compiled
Simulation: A Technique for Fast and Flexible Instruction Set Simulation, University
of California, Anaheim. DAC, June 2003.
http://www.cecs.uci.edu/~aces/Instruction%20Set%20Compiled%20Simulation.pdf

[4] Wei Qin. Modeling and Description of Embedded Processors for the Development
of Software Tools. PhD Thesis, Department of Electrical Engineering, Princeton
University. August 2004.

[5] S. P. Seng, K. V. Palem, R. M. Rabbah, W. F. Wong, W. Luk, P. Y. K. Cheung.


PDXML: Extensible Markup Language for Processor Description. PhD Thesis,
Computer Sciences Department, National University Singapore. February 2000.

[6] Wikipedia, http://en.wikipedia.org/wiki/Main_Page

[7] Xerces C++ XML Parser Documentation, http://xml.apache.org/xerces-c/

[8] The Makefile and Compilation Tutorial,


http://www.sethi.org/classes/cet375/lab_notes/lab_04_makefile_and_compilation.html

Potrebbero piacerti anche