Sei sulla pagina 1di 32

SPARC-V8 Microprocessor Implementation in AHIR-V2 Framework

Submitted in partial fulllment of the requirements for the degree of

MASTER OF TECHNOLOGY (Microelectronics)

by

ARUN C (10307938)

under the guidance of Prof. MADHAV P DESAI

Department of Electrical Engineering INDIAN INSTITUTE OF TECHNOLOGY BOMBAY June 2013

Dissertation Approval for M. Tech.

This dissertation entitled SPARC-V8 Microprocessor Implementation in AHIRV2 Framework by Arun C (10307938) is approved for the degree of Master of Technology in Microelectronics.

Prof. (Examiner)

Prof. Madhav P Desai (Supervisor)

Prof. (Examiner)

Prof. (Chairman)

Mumbai June 28, 2013.

Declaration

I declare that this written submission represents my ideas in my own words and where others ideas or words have been included, I have adequately cited and referenced the original sources. I also declare that I have adhered to all principles of academic honesty and integrity and have not misrepresented or fabricated or falsied any idea/data/fact/source in my submission. I understand that any violation of the above will be cause for disciplinary action by the Institute and can also evoke penal action from the sources which have thus not been properly cited or from whom proper permission has not been taken when needed.

Arun C 10307938 Dept. of Electrical Engg. IIT Bombay 28th , June 2013.

Acknowledgements

I would like to express my sincere gratitude to my advisor, Prof. Madhav P Desai for his guidance, and encouragement throughout this work. His scientic, technical, and editorial advice were essential for my work as an academic researcher. The regular discussions with him on every aspect of this project helped me rene my approach towards the problem and motivated me to give my best. My thanks also go to all my colleagues and VLSI Lab sta for the discussions of my project work, especially Mr. Sarath M for the corrections and ideas contributed, my brother Mr. Anoop C for the moral and timely advises, and Ms. Nasima Kazi for her editorial review of this report and her fruitful feedback. I would like to thank my entire family and friends for their seamless support and encouragement during the past years. I would also love to add a word of thanks to the current and past VLSI lab admins, Electrical Oce sta and all my friends especially who were with me in the 5th oor, Hostel-12, D Block.

Arun C

ii

Abstract

Today, the product development in electronics industry is characterized by very short market cycles with ever increasing complexity. To keep pace with the current market trends, we require rapid prototyping, design and implementation of products including the hardware and the software required to support it. In this thesis, we investigate a systematic and automated framework that can potentially address this challenge. In particular, we consider the use of high level synthesis techniques to design, verify and implement a microprocessor. We have implemented the SPARC-V8 microprocessor using the high level synthesis tool chain AHIR-V2, developed at IIT Bombay. The SPARC processor is designed as a multi-threaded C-model, converted to VHDL using AHIR and the FPGA prototype is developed using the Image recongurable computing framework. Through this, we present an approach that can signicantly cut short the product realization time and simplify the verication complexity of the system. Both the hardware and the software are co-developed with necessary interfacing mechanisms at appropriate levels of abstraction. The virtual processor in C is developed so that it can execute applications, the same way an actual SPARC processor does. In addition to this, to ensure the performance and error free operation in a real situation where a processor is used, we developed a minimal operating system with a command line interface. The necessary peripherals to support the processor such as keyboard and console are also emulated. This complete system runs in a host computer and is used for extensive bug xing and debugging. After getting satisfactory condence in the C-model, the VHDL model is generated using the AHIR-V2 compiler chain and the processor is realized in FPGA. The nal FPGA prototype should work as a standalone system that can boot-up and present the user with a terminal. AHIR high level synthesis methodology can potentially provide sucient parallelism in the C level and in the subsequent VHDL model. However, the performance and the achieved instructions per cycle of the nal processor prototype need to be investigated further. The basic SPARC-V8 model can be enhanced using dierent stages of pipelining and memory cache to improve the throughput of the system.

Contents

1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Project Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 AHIR - A Hardware Intermediate Representation 2.1 Pipes in AHIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 IMAGE Recongurable Computing Platform 3.1 IMAGE FPGA Board Details . . . . . . . . . . . . . . . . . . . . . . . . . 4 The 4.1 4.2 4.3

1 1 1 3 4 4 6 7

Virtual C Processor 9 SPARC V8 Specications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Interfaces in the virtual processor . . . . . . . . . . . . . . . . . . . . . . . 10 Memory-Map and Program Loading . . . . . . . . . . . . . . . . . . . . . . 11

5 External Interfaces and Peripherals 14 5.1 Keyboard and Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.2 The AHIR-AHB Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6 Software 7 Results and Future Work 7.1 Completed work and current status . . . . . . . . . . . . . . . . . . . . . . 7.2 Immediate extensions to this project . . . . . . . . . . . . . . . . . . . . . 7.3 Future scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Appendix - Interface signals and Interrupts. B Appendix - stdio sparc.h C Appendix - AHB Ahir Bridge ii 18 20 20 21 21 22 23 24

List of Figures

1.1 2.1 3.1 3.2 4.1 4.2 4.3 5.1 5.2 5.3 6.1

Processor Development ow . . . . . . . . . . . . . . . . . . . . . . . . . . Interacting hardware threads in AHIR using pipes . . . . . . . . . . . . . . Overview of the IMAGE System . . . . . . . . . . . . . . . . . . . . . . . . Xilinx Spartan-3 IMAGE Board . . . . . . . . . . . . . . . . . . . . . . . .

2 4 6 7

Interface Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Memory Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Program loading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Peripheral Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Complete System with Ahir-AHB Bridge . . . . . . . . . . . . . . . . . . . 16 AHB-AHIRPipe Bridge Overall Scheme . . . . . . . . . . . . . . . . . . . . 17 OS screenshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

C.1 AHB Timing Waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

iii

1
Introduction 1.1 Motivation
The traditional design paradigms of a complex IC, such as a microprocessor, are speed, power and area, with more emphasis on speed. However as the technology shrinks and more and more functions started packing per unit area, the power aspect have become very critical. Besides turn around time or Time to Market, reliability and verication hurdles today pose a major challenge to IC design. Our work is motivated towards addressing these challenges rather than the traditional speed paradigm, using high level synthesis approaches and recongruable computing. AHIR-V2 framework compiler tools are the high level synthesis tool chain developed at IIT-Bombay. ImageRC is the recongruable computing platform developed by Powai Labs Pvt. Ltd1 . The scope of this project is to realize a reasonably complex design such as SPARC-V8 microprocessor in this framework. The microprocessor system is realized as a stand alone system in an FPGA using the ImageRC platform with its own minimal operating system and a few applications to support. We use this platform to study and enhance the performance, implementation and ow related aspects of this framework. The turn around time of the complete product, direct HDL verication using software testbench and the ease of implementation using recongurable framework are highlighted in this project. The throughput per clock cycle of the generated design from the high level specication is not a primary study goal of this project.

1.2

Project Organization
C M odel V HDL M odel F P GA prototype
AHIR IM AGE

In a nutshell, the overall processor development ow can be summarized as below.

Our work is organized in the following steps, starting from the CM odel and ending with the F P GA prototype.
1

http://www.powailabs.com/

SPARC-V8 Microprocessor using AHIR-V2

1. We have developed the SPARC-V8 model in C language, true to specications, conforming to AHIR-V2 philosophy. This model is developed as a standalone execution system that has its own peripherals and memory subsystem. 2. For the execution model to be complete, we have developed applications which will be deployed in the nal hardware. The applications tests the system under all common working conditions such as external peripherals, interrupts etc. The system is tested extensively in the host machine with this execution model. 3. The C-Model is converted to VHDL-Model using AHIR-V2 compiler chain. 4. This VHDL code is tested against the same applications used for software verication with a suitable RTL verication tool such as Mentor ModelSimT M or GHDL. 5. The RTL is implemented in an FPGA using the ImageRC platform. The system should boot up with a minimal OS and applications as a full live system. 6. To extend further the usefulness of the system, we have developed an ARM AHB bus-interface that takes care of interfacing with the peripherals and memory which support the AHB bus protocol. The AHB wrapper is an AHIR to AHB bridge with support to full rate data transfer. i.e. data transfer upto one word per clock cycle. There are two main abstraction levels before bringing up the nal system; C level and the VHDL level. The entire development ow is summarized in the Figure 1.1

Software

Hardware

Abstraction Levels

Processor C Model

Virtual C Model

Ah ir P ipe s

AHIR

Application (C)

Ahir FLI

Processor VHDL Model

VHDL Model

Figure 1.1: Processor Development ow

eR ag Im C I AP
IMAGE RC

Processor FPGA

Implementation (IMAGE Reconfigurable Computing Platform)

Dept. of Elec Engg.

IIT Bombay

2
AHIR - A Hardware Intermediate Representation
AHIR stands for A Hardware Intermediate Representation [1]. AHIR compiler is the high level synthesis tool set developed at IIT-Bombay. High level synthesis converts the algorithmic description of a digital system in a software language such as C/C++ to hardware language like VHDL. In this project we use this compiler chain for converting the C-SPARC model to VHDL-SPARC model. Software compilers such as GCC (The GNU Compiler Collection1 ) does a very good job in transforming a high level language such as C to machine code. This is a fairly advanced eld with excellent and improved optimization techniques such as software pipelining[4][5], loop unrolling[6][5] etc. If we could use this optimization and leverage on the output of the software compiler to further transform it to a hardware, the resulting system could be combination of the best of both worlds. AHIR tools uses a two tier approach for the high level synthesis problem. The algorithmic specication in C is byte compiled using standard C compilers. The C compiler used in the project is LLVM2 . The byte object code is converted into an intermediate representation called Ahir Assembly language or Aa language. The Aa uses petri nets as the basic data structure mechanism to represent the control ow. The Aa representation is optimized and subsequently transformed into VHDL through one more intermediate representation called vC (virtual circuit). This VHDL is used for hardware realization. Thus the full ow is as follows. C C bytecode Aa VC V HDL In AHIR, a specication is factorized into three components: control-ow, data-ow and storage. These components are orthogonal to each other enabling the analysis and optimizations to be applied independently. AHIR framework and methodology is correct by construction. i.e., the hardware produced from the high level language is the exact functional transformation of the input. AHIR has the potential to be used for a variety of digital design styles including synchronous and asynchronous designs owing to the representation using petri nets. However the current state implements the synchronous design and we have concentrated our work on the same.
1 2

llvm

llvm2aa

Aa2V C

vc2vhdl

http://gcc.gnu.org/ Low Level Virtual Machine - University of Illinois at Urbana-Champaign (http://llvm.org/)

SPARC-V8 Microprocessor using AHIR-V2

At present, the hardware verication support is for Mentor Graphics ModelSimT M and GHDL. These functional verication softwares are used in the hardware verication stage. In AHIR, modules can be independent in hardware with appropriate handshaking protocols. This is analogous to POSIX pthreads. The current version of this tool set is AHIR-V2.

2.1

Pipes in AHIR

A typical AHIR system communicates with the external using pipes. A pipe is a essentially a FIFO with handshaking signals that provide a blocking nature to the data transfer. There are functions available in AHIR for read and write operations with pipes. Once the read request is placed, the execution is blocked until the data is written to and read from the pipe. The read and write can be in separate functions. Besides, the read and write to a pipe need not be in any order. We can place a read request in a pipe before some data is written to it and vice-versa.

Ahir Pipe I read AHIR Module-I (Thread in software model) write Ahir Pipe II read write AHIR Module-II (Thread in software model)

Figure 2.1: Interacting hardware threads in AHIR using pipes Because the execution is blocked when a pipe is used, these can also be used as a locking mechanism. This is a hardware mutex analogous to the software mutex pthread mutex functions.

2.2

Advantages

It is easier to express an algorithm in a software language such as C compared to an HDL. Also the hardware design would be open to designers with the working knowledge of a software language. To describe a hardware in an HDL like VHDL, the developer requires to break down the problem in terms of FSMs and code accordingly. An algorithmic approach Dept. of Elec Engg. IIT Bombay 4

SPARC-V8 Microprocessor using AHIR-V2

is usually not enough. Further being a concurrent language, HDL coding requires a deep understanding of timing and hazard related issues. In a software language, the basic philosophy is to sequentially describe an algorithm and branch or take decisions on a step by step basis. The concurrency is involved only when using software threads and associated mechanisms. Hence it becomes relatively easier to code in a software language rather than using an HDL. Similar arguments are also valid for functional verication scenario. In principle, high level synthesis approaches such as AHIR do not need complex verication steps necessary at RTL level. The penalty paid in such an approach would be the eciency of the circuit generated, notably speed. However there are numerous applications where the prime concern is not speed, but the algorithmic complexity and the scale or magnitude of the problem itself. Our approach using the high level synthesis tools ts well in this situation. Using high level synthesis, minute implementation details can be abstracted out at higher levels. Thus the productivity increases considerably. Also the complexity of the algorithms that can be implemented is higher. Beside these advantages, high level synthesis enables designers to explore the design space in a much ecient way[8]. Specic to AHIR, the same software testbench can be used for hardware simulation as well as software simulation. AHIR can write out the necessary foreign language interface functions in VHDL/C , that serves this purpose. In AHIR, the hardware generated is an exact replica of the software model, as the methodology is correct by construction. Hence the modications required on the transformation to hardware would be very less or nil.

Dept. of Elec Engg.

IIT Bombay

3
IMAGE Recongurable Computing Platform
Image is the recongurable computing platform developed at Powai Labs Pvt. Ltd. An Image board is an FPGA board with the necessary API to load VHDL into the FPGA and run applications in it from a host system, seamlessly. The memory available in the FPGA is congured as a dual port RAM with the necessary API functions to read and write data to the memory locations from the host side as well as from the FPGA.
Host System FPGA Board

Software Applications

IMAGE rcApi

PC I

DP RAM

FPGA

Figure 3.1: Overview of the IMAGE System Image along with AHIR enables us to develop the software and hardware together and debug it in four levels of abstraction, 1. Software level - Both the hardware(C Model) and the software can be compiled into a single executable which can be used to explore the architecture, co-develop and debug the system. 2. Post AHIR HDL simulation - AHIR can write out necessary interface functions with which the hardware (VHDL Model) can be tested against the same software 6

SPARC-V8 Microprocessor using AHIR-V2

applications developed. Currently AHIR supports Mentor Graphics ModelSImT M and GHDL simulator for HDL level simulation. 3. Post Synthesis simulation - After synthesis, the resulting HDL mapped to a specic technology can be simulated with the same applications in a very similar way as above. 4. Hardware level - This is the nal level, i.e. after implementing the system in Image FPGA board. In order to facilitate the extra debugging requirements needed at the hardware level, we may have to provide additional facilities in the software applications and/or in the hardware model.

3.1

IMAGE FPGA Board Details

Xilinx Spartan-3 FPGAs[9] are used for implementing the SPARC processor.
Xilinx Spartan - 3 x 4 FPGA Board

FPGA01 Host System

16x4 = 64KB

Software Applications

IMAGE rcApi

PC I

DP RAM

FPGA00

FPGA10

FPGA11
IMAGE Reserved
4KB

Figure 3.2: Xilinx Spartan-3 IMAGE Board A single Spartan-3 FPGA has 66K LUTs and 66K ip ops. The SPARC processor does not t into a single Spartan-3 FPGA. Hence we have used an Image board with 4 Spartan-3 FPGAs. Each Spartan-3 FPGA has 16KB memory available and hence the total memory available for the system is 16x4 = 64KB. This is sucient and enough memory to deploy the applications that we intend to develop and implement at this stage. Later Dept. of Elec Engg. IIT Bombay 7

SPARC-V8 Microprocessor using AHIR-V2

we will target a high end FPGA system like Xilinx Virtex-6 [10] for higher performance and on-chip memory. The Image reserves upto 4KB memory from the total available memory for internal housekeeping of the recongurable platform. Besides there is a small hardware overhead for the inter-FPGA communication and to provide support for the RC API. We have used the Xilinx Spartan FPGA board because it is very cost eective and provides necessary performance for our immediate research goal. All the IOs in Image are memory mapped IOs. Besides Image provides necessary mutex mechanisms to prevent memory related hazards using Dekker algorithm [3].

Dept. of Elec Engg.

IIT Bombay

4
The Virtual C Processor
The rst step in our project is to develop a full execution model based on SPARC V8 specication in C. This is compiled and run in the host system. This processor by itself is a multi-threaded system with separate threads for core CPU functions, IO interfaces and memory interfaces. The system has to incorporate peripherals such as CONSOLE and KEYBOARD for user interaction. The CONSOLE serves as the display device and KEYBOARD as the input device. Each of these devices is coded in C as a separate thread. The console and the keyboard is eventually mapped to the host systems terminal console and keyboard by these independent threads. The complete system can thus execute as an actual SPARC processor. Thus the entire multi-threaded execution system acts as a virtual SPARC processor implemented with C. In order to provide a memory(RAM) for this virtual SPARC machine, a memory-map is used. The memory-map is an array in the host systems memory. This conforms to the Image Board given in section 3.1 on page 7, used for the nal hardware realization. The SPARC machine reads and writes to this memory-map. Further the peripherals work as memory mapped IO and uses this memory-map for transferring the data. The peripherals and the associated interrupts are explained in the Chapter 5 on page 14 The details on how a program is loaded into the memory-map is given in the section 4.3 on page 11. A brief explanation about the processor and the SPARC-V8 standards is given below.

4.1

SPARC V8 Specications

The basic microprocessor works on the f etch decode execute writeback algorithm. It reads instructions from the memory, decodes it, executes the decoded instruction and writes back the results to the memory if needed. SPARC (originally from Scalable Processor Architecture) is a RISC standard developed by Sun Microsystems. Following are the important specications of this standard. RISC architecture. SPARC-V8 has only 82 instructions altogether. Word length is 32 bits - Registers, address bus and the data bus are 32 bits wide. Register Windows - SPARC oers extensive register set as part of its RISC philosophy. The registers are grouped into register banks or register windows. These 9

SPARC-V8 Microprocessor using AHIR-V2

are organized into a circular stack with each bank having 16 registers. The number of register windows is implementation dependent. Our implementation employs 4 register windows. This corresponds to a total of (4 16 + 8 globals) registers. All the registers are 32 bit wide. Delayed Execution - Some instructions such as branch instruction are executed always after the next instruction in the instruction stream. This feature is called delayed execution. The number of delay slots is implementation dependent. Our model uses 1 delay slot. For a complete and comprehensive list of SPARC-V8 features and specs the reader may refer to the bibliography [2]. For improving the performance of a microprocessor we should consider multi-core architectures, pipelining and cache memory subsystem. Replicating an execution thread multiple times and sharing a critical and common resource among them (e.g. memory) is essentially the multi-core architecture. Similarly pipelining can reduce the latency and overhead needed in the execution of individual units. Now this works well for a specialized situation like an FFT unit. We may share a common resource such as an ALU among dierent threads and utilize it to the maximum potential. However this does not work well in a microprocessor scenario because the processor can have branches and the execution does not follow a predetermined path strictly. At this stage, we have concentrated on the simplicity and these features have been omitted. A dierent group under Prof. Madhav P Desai is working on the memory subsystem and cache architectures in such a scenario. This is one of the future research directions of this project.

4.2

Interfaces in the virtual processor

The SPARC CPU interacts with the external using interfaces. All the interface signals such as reset, interrupts, error out etc are designed as memory mapped IOs in Image. Following naming conventions are used for interface signals: 1. pb stands for processor to bus (from processor to external). 2. bp stands for bus to processor (from external to processor). For a complete list of interface signals in our work, the reader may refer to Appendix A on page 22. As mentioned before, all interfaces are implemented as memory mapped IOs. The working scheme of one interface, the interrupt handler module is explained next. The AHIR module for interrupts (bp IRL interface) checks a predened IO location in the memory using the Image hardware functions. This location is also written by the interrupt handlers dened in the software routines in the application side using the Image RC API. Thus exchange of information takes place between the processor and the ISR (interrupt service routine) through the shared memory locations serving as IOs. In a similar manner all the interface signals are implemented.

Dept. of Elec Engg.

IIT Bombay

10

SPARC-V8 Microprocessor using AHIR-V2

FPGA

Core CPU

pb_error_out

Memory Interface IMAGE hardware Api functions

bp_reset_in

bp_IRL

DP RAM

IMAGE RC API Applications Host System

Figure 4.1: Interface Signals. The interface scheme used is illustrated in Figure 4.1.

4.3

Memory-Map and Program Loading

As mentioned before, the memory-map is an array inside the host system memory that acts as the RAM for the SPARC model. Currently the total size of this memory map is 64KB. This is initialized as shown in Figure 4.2. The application program to be run in the virtual processor is written in C. This is compiled to sparc obj code using gcc-sparc compiler. Further the obj code is disassembled to a hex dump. A perl script, generateMemoryMap.pl manipulates this hex dump, to the form the SPARC model expects.
gcc,dissasemble perlscript

ex1.c ex1.o.txt ex1 memorymap.hex

Dept. of Elec Engg.

IIT Bombay

11

SPARC-V8 Microprocessor using AHIR-V2

IMAGE reserved

FPGA interrupt table

keyboard

console

Stack

8KB 64 KB

Figure 4.2: Memory Mapping. This hex le is loaded into the memory-map before running the SPARC CPU thread. Thus the memory-map is initialized. Next a reset is provided and the processor is initialized to the address 0. The instruction at the 0th location is fetched and the normal f etch decode execute writeback cycle follows. Further the processor starts executing the instructions in the memory-map one by one sequentially unless it sees a jump or an interrupt. The Image API takes care of the physical memory mapping of the RAM. Physically, the memory is split across the FPGAs. However this information is hidden from the designer. The mechanism of program loading is given in the Figure 4.3. Extensive debugging is done on this virtual C processor with multiple applications. This virtual processor is the backbone of our project. After getting reasonable condence with elaborate testing we will port the C code into VHDL model using AHIR-V2 HLS. Since the AHIR framework is correct by construction (2), porting to VHDL will naturally follow it. The VHDL processor also uses the virtual C processor FLI interfaces for debugging. Only dierence here is that instead of SPARC CPU thread in C, a VHDL model will be running in a suitable HDL verication tool such as the Mentor Graphics ModelSim or GHDL. The FLI utilities will communicate with the VHDL model running in the ModelSim and the rest of the virtual C processor. One of the main highlights of our project is the ease of debugging and bug xing with the C model compared to a VHDL model. Also the same C test applications can be used to test the model in VHDL. The next step is to implement the VHDL in FPGA. The peripherals will be mapped to actual peripherals at this time. The system should bootup and run as a full edged live system. The FPGA acts as a validation platform and a user platform. Image provides basic validation facilities such as to check whether the memory is initialized properly, read and write are successful or not, etc. At present only this basic debugging facility is built into the hardware. However extensive debugging logs and reports are available in the software model.

Dept. of Elec Engg.

IIT Bombay

12

SPARC-V8 Microprocessor using AHIR-V2

C-program int main () { int I; } GCC (SPARC) Compile/Disassemble 9d e3 bf IMAGE API ff ff DP RAM IMAGE API SPARC State Machine HW(VHDL)/ SW(C)

* dumps logs * traps/interrupts not shown

VIRTUAL PROCESSOR
(In C, using pthreads)

generateMemoryMap.pl

Figure 4.3: Program loading.

Dept. of Elec Engg.

IIT Bombay

13

5
External Interfaces and Peripherals
The SPARC processor interacts with the user using the external peripherals. There are two peripherals the CONSOLE and the KEYBOARD. CONSOLE is the output peripheral, i.e., the display and KEYBOARD is the input peripheral. Both are mapped as memory mapped IOs in our work. The CONSOLE uses a memory map from 0x3000 onwards. Similarly the KEYBOARD memory map is from 0x6000. For further details on the current memory mapping scheme please refer to Figure 4.2. All interfaces are memory mapped IOs. The peripheral interface system is shown below in Figure 5.1 The virtual processor has the software driver modules that initiates the interrupt part. These driver modules are separate pthreads inside the virtual processor itself owing to their standalone nature. Eventually these threads map the CONSOLE and KEYBOARD operations to that of the host system.

5.1

Keyboard and Display

Keyboard
The working scheme of the KEYBOARD memory mapped IO is as follows. The external peripheral (This input is taken from the host system keyboard. The keyboard driver module acts as the external peripheral here for the SPARC CPU thread) writes data to a known location in the memory. Provides interrupt to the processor. The current interrupt level for the keyboard is coded as 1A. Once the interrupt is received, the processor reads the data from the memory. When the processor read is complete, the peripheral may again write to the same location. The low level library functions receive from terminal() is used to read data from the terminal (KEYBOARD).

Display
In the CONSOLE or the display peripheral, the role of the CPU and the peripheral is reversed as compared to keyboard. The working scheme for the display is as follows.

14

SPARC-V8 Microprocessor using AHIR-V2

Keyboard (System)

Keyboard Driver (pthread)

VIRTUAL PROCESSOR
(In C, using pthreads)

IMAGE API

9d e3 bf

HOST SYSTEM

ff ff DP RAM

IMAGE API

SPARC State Machine HW(VHDL)/ SW(C)

IMAGE API

Display (System)

Display Driver (pthread)

Figure 5.1: Peripheral Interface. The CPU writes data to a known location in the memory. The peripheral reads the data, displays it in CONSOLE and raises an interrupt. The peripheral as in the case of keyboard is a standalone driver module in the virtual processor coded as a separate pthread. This data is eventually displayed in the host systems console. The interrupt is currently coded as interrupt level 1B. The low level library functions send to terminal() is used to write data to the CONSOLE. The current peripherals and the interrupts is a bare minimum scheme to debug the SPARC model. Also it does not strictly adhere to the unix philosophies of interrupt handling. We have designed the ARM bus standard, AHBT M bus [7], around the AHIR core module to improve the usability of the system. However this bus is currently not implemented as of now since the Image board (3.2) used for this project does not support the AHB protocol. A brief overview of the AHB bridge is provided next.

Dept. of Elec Engg.

IIT Bombay

15

SPARC-V8 Microprocessor using AHIR-V2

5.2

The AHIR-AHB Bridge

The AHIR-AHB Bridge is a wrapper around the AHIR core. The main function of the bridge is to provide full rate cycle accurate data transfer as per the AHB standards. This enables us to develop applications for a wide range of useful peripherals as ARM standards are widely used throughout the industry. The Ahir-AHB bridge is coded in VHDL to provide maximum speed and eciency. The complete system with Ahir-AHB bridge is shown in Figure 5.2

Figure 5.2: Complete System with Ahir-AHB Bridge One side of this bridge is the AHB-bus controller and the other side is AHIR-pipe controller. The AHB Controller takes care of the AHB protocol and the Pipe Controller deals with the AHIR-pipe protocol. The Pipe Controller has internal queues (FIFO) to match the AHB rate with the AHIR-pipe rate. The overall scheme is as shown in the gure 5.3 The timing diagram of the Ahir-AHB Bridge is given in Appendix C on page 24. Note: Further details on implementation and timing characteristics of this bridge is available in the document AHB AhirPipe Bridge Design.pdf 1 .

Internal document, Dept. of Electrical Engg, IIT-Bombay

Dept. of Elec Engg.

IIT Bombay

16

SPARC-V8 Microprocessor using AHIR-V2

Figure 5.3: AHB-AHIRPipe Bridge Overall Scheme

Dept. of Elec Engg.

IIT Bombay

17

6
Software
The common applications that go into a processor has been tested in the nal hardware. The applications include arithmetic and logical operations, char and word display, char and display input, programs using loops and stacks, programs using trap handlers, nested function calls and a combination of these. Besides, all the instruction has been tested individually in SPARC assembly language in the same platform. Further, we have implemented a basic Unix like shell as the Operating System in the SPARC processor. The OS initializes the trap table, CONSOLE and KEYBOARD memory mapped IOs. At present it accepts only two commands; echo : echoes the word typed next to the console. exit : exits the OS (exits the virtual C processor too). Rest all the commands will be displayed as Cmd NOT implemented.

Figure 6.1: OS screenshot A screenshot of the OS is given in Figure 6.1. 18

SPARC-V8 Microprocessor using AHIR-V2

In future if more dedicated operating system support is required, we intend to look into open source OS developed for SPARC stations such as RTEMS1

stdio sparc.h
The stdio sparc.h library provides the basic functions for programmers usage. Currently it has functions for integer, character manipulation and word manipulation. All the library functions involving CONSOLE and KEYBOARD uses the low level library fuctions send to terminal() and receive from terminal(). These functions are also coded in the same library. For a list of functions that are available as of now, the reader may refer to Appendix B on page 23.

www.rtems.org

Dept. of Elec Engg.

IIT Bombay

19

7
Results and Future Work 7.1 Completed work and current status
We have completed the virtual SPARC C-Model, true to spec. We have also integrated the necessary tools (gcc-sparc, disassembler) and the associated scripts (generateMemoryMap.pl) with which an application can be written in C, compiled and run in this virtual model. A complete software framework is developed along with interrupt table and routines to emulate the peripherals such as keyboard and console. Trap routines are also integrated to support the traps like window underow, window overow etc which are very frequent with the normal working of the processor. Thus the associated peripherals and program loading mechanism are also developed. The virtual C-model is tested with several applications for normal instructions, stack operations and interrupts with the devices (Keyboard and Console) attached. Besides, the instructions have been tested in the individual level also to ensure completeness and bug free operation. The C-model is taken upto the VHDL model and the nal bit-les to program the FPGAs are generated. However the actual live system after burning to FPGA is not tested till date to this report. An AHB-Ahir bridge is developed in VHDL and veried using hardware simulations. This bridge is also yet to be implemented in the nal system as the current Image board does not have this support. Initially we had targeted to t the design in two out of four FPGAs in the board. In-order to balance the utilization between the FPGAs, all the instructions that access memory is put in the rst FPGA (The load and store class of instructions). The second FPGA is used to implement all other instructions. This has reduced the inter-FPGA communication and sped up the whole system. However after synthesis, the utilization of one of the FPGAs went up to more than 80%, and the other remained close to 75%. Due to this, a huge number of nets became non routed in the highly utilized FPGA. We have observed that, any utilization close to 80% and above results in very high congestion and Xilinx ISE-14.2 is not capable of routing such a design in Spartan-3. Another observation is that the divide instruction of the SPARC takes many FPGA resources, close to 8K slices in Spartan-3 FPGA. This is 20

SPARC-V8 Microprocessor using AHIR-V2

about 10% of the total resources available. Hence we took a decision to move the divider out of the FPGA to the software side. After this the FPGA utilization is less than 75% and the design is routed properly. Currently, each FPGA uses about 50K slices. Entire ow upto the hardware stage is automated using Makeow.

7.2

Immediate extensions to this project

Boot-up the live system and evaluate the performance. Implement the divider also in one of the remaining two FPGAs of the board. A quantitative proling has to be done in-order to nd out the bottle-necks and associated issues. Debug routines have to be integrated to support testing in the hardware level. One FPGA can be reserved just to provide support for debugging and improve reliability. Move to the latest AHIR version. Currently this is not done because of a problem with the $switch statement in Aa2VC tool.

7.3

Future scope

The performance of this system has to be evaluated thoroughly to identify the potential bottlenecks and solutions. This project tests the AHIR ow to a reasonably complex level. Further enhancements to the AHIR ow is an important research goal of this project. From the processor optimization point, cache memory and pipeline stages have to be developed with an ecient cache hit/miss mechanisms and the necessary pipeline ush in the event of a branch. Further we will attempt muti-core architectures. Research goals also span to recongurable and reliable computing. Fault tolerant and dependable systems can be built in this framework with relatively less eort. The platform is expected to provide valuable insight to these areas also.

Dept. of Elec Engg.

IIT Bombay

21

A
Appendix - Interface signals and Interrupts.
Interface Signals
Name bp IRL[3:0] Purpose (direction) Memory Location Interrupt. 0x37D4 0-no interrupt, 15-highest priority.(input) reset signal (input) 0x37D8 bp reset in pb error processor in error mode (output) 0x37DC bp FPU present Floating unit present (input) Not mapped currently Not mapped currently bp FPU exception Floating unit exception (input) Floating point condition code (input) Not mapped currently bp FPU cc bp CP present Co-processor present (input) Not mapped currently Co-processor exception (input) Not mapped currently bp CP exception Co-processor condition code (input) Not mapped currently bp CP cc Note: pb stands for processor to bus, i.e., output signal. bp stands for bus to processor, i.e., input signal.

Interrupt Levels
Name Reset Window overow Window underow Keyboard Interrupt Display Interrupt Level 00 05 06 1A 1B

22

B
Appendix - stdio sparc.h
1. Integer functions (a) int get int(void) - returns an integer from the terminal. (b) void put int(int) - displays the integer on the terminal. 2. Character functions (a) char get char(void) - returns a character from the terminal. (b) void put char(char) - displays the character on the terminal. 3. Word functions (a) void get word(char array[ ]) - takes a word from the terminal and stores it in the character array. Terminates with NUL. (b) void put word(char array[ ]) - displays the word on the terminal. array[] should be terminated with NUL. 4. void exit(void) - exit function. This exits the virtual execution environment also.

Note: Low level functions are not shown.

23

C
Appendix - AHB Ahir Bridge

Figure C.1: AHB Timing Waveform 24

Bibliography

[1] Sameer D Sahasrabuddhe, A competitive pathway from high-level programs to hardware specications. PhD Thesis - Dept of Electrical Engineering, IIT Bombay [2] The SPARC Architecture Manual Version 8 http://www.sparc.com/standards/V8.pdf. [3] E.W. Dijkstra, Cooperating Sequential Processes http://www.cs.utexas.edu/users/EWD/transcriptions/EWD01xx/EWD123.html [4] J. Ruttenberg, G.R. Gao, A. Stoutchinin, and W. Lichtenstein, Software pipelining showdown: optimal vs. heuristic methods in a production compiler, In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, June 1996, pages 1-11. [5] GCC optimizations http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html LLVM optimizations http://llvm.org/docs/Passes.html [6] Introduction to Parallel Computing. Petersen, W.P., Arbenz, P, Oxford University Press. pages 9-12. [7] AMBA Open Specications http://www.arm.com/products/system-ip/amba/amba-open-specications.php [8] High-Level Synthesis from Algorithm to Digital Circuit - Philippe Coussy and Adam Morawiec Editors Springer Publications. ISBN: 978-1-4020-8587-1 [9] Xilinx Spartan-3 Datasheet http://www.xilinx.com/support/documentation/data sheets/ds099.pdf [10] Xilinx Virtex-6 Datasheet http://www.xilinx.com/support/documentation/data sheets/ds150.pdf 25

Potrebbero piacerti anche