Sei sulla pagina 1di 12

Second Progress Report

on

A Recongurable VLIW processor System


Submitted by

PAVAN NAIK PORIKA


(Registration No : 10VL16F ) of

MASTER OF TECHNOLOGY
in

VLSI Design
Under the guidance of

Mrs Aparna.P

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA SURATHKAL, SRINIVASNAGAR-575025 KARNATAKA, INDIA DECEMBER 2011

Abstract Field-Programmable Gate Arrays (FPGAs) are constantly improving in terms of performance and area, and provide a technology platform that allows fast and complex recongurable designs. So Computer architectures based on recongurable hardware are becoming more popular. This project is on the designing and implementation of a recongurable very long instruction word (VLIW) processor system. This processor is implemented as a softcore using verilog code on a eld-programmable gate arrays (FPGA). This VLIW processor can exploit data level as well as instruction level parallelism inherent in an application and make its execution faster. More importantly, we achieve our results by saving expensive FPGA area through the sharing of resources.

ii

Contents
Abstract Contents List of Figures 1 INTRODUCTION 1.1 BASIC PROCESSOR DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 INSTRUCTION SET ARCHITECTURE . . . . . . . . . . . . . . . . . . . . 32-BIT RISC PROCESSOR DESIGN 2.1 ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TARGET fOR NEXT EVALUATION 3.1 VLIW PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 CONTROL UNIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 IMPLIMENTATON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii iii iv 1 1 2 2 2 4 5 7 7 7 7 8

References

iii

List of Figures
1 2 3 4 5 6 7 8 Block Diagram of Processor . . . . Instruction set architecture . . . . . Schematic of the 32-bit RAM . . . . Schematic of the 32-bit R0M . . . . Simulated result of the 32-bit RAM Simulated result of the 32-bit R0M . Simulated result of addition program Simulated result counter program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 3 5 5 6 6

iv

INTRODUCTION

VLIW processor has the main architectures which can exploit ILP in a single core processor. This architectures exploit ILP by issuing multiple operations per issue-slot to additional functional units (FUs).

1.1

BASIC PROCESSOR DESIGN

The basic design of a single processor contains physically separated memories for program instructions and data. This implies that the width of databus may dier per memory type. This is especially useful for VLIW architectures, because we want to issue very wide words from instruction memory. A four-stage design consisting of fetch, decode, execute, and writeback stages is used for this processor. This processor has four Arithmetic Logic Units (ALUs), two Multiplier units (MULs), one Control unit (CTRL), one Memory unit (MEM), a General-purpose Register (GR) le with 64 32-bit registers and a Branch Register (BR) le with 8 1-bit registers.

PC

GR

CTRL

INST . MEMORY

FETCH

DECODE

EXECUTE A M A A M A

WRITE BACK

DATA MEMORY

BR

MEM

Figure 1: Block Diagram of Processor

The Figure 1 depicts the organization of a 4-issue processor. The fetch unit fetches a VLIW instruction from the attached instruction memory, and passes it on to the decode unit. In this stage, the instruction is being split into syllables. Also, the register contents used as operands are fetched from the register les. The actual operations take place in either the execute unit, or in one of the parallel CTRL or MEM units. ALU1 and MUL operations are performed in the execute stage. This stage is designed parametric, so that the number of ALU and MUL functional units could be adapted. The processor should have exactly one CTRL and MEM unit, so these units are designed outside the parametric execute unit. All jump and branch 1

operations are handled by the CTRL unit, and all data memory load and store operations are handled by the MEM unit. To ensure that all results to the GR and BR registers, external data memory and the internal Program Counter (PC) are written at the same time per instruction, all write activities are performed in the writeback unit.

1.2

INSTRUCTION SET ARCHITECTURE

Each syllable in this processor will take 32 bits and each instruction contains 4 different syllables so the default instruction size of the processor is 128 bit as shown in gure 2. As a processor contains 4 ALU units, all syllables are able to issue an ALU operation and the other operations are distributed among the syllables. Syllable 0 is able to issue CTRL operations, syllables 1 and 2 are able to issue MUL operations and syllable 3 is able to issue MEM operations.

Figure 2: Instruction set architecture

2
2.1

32-BIT RISC PROCESSOR DESIGN


ARCHITECTURE

A 32-bit RISC processor is designed. It contains 256 32 RAM, 128 32 ROM, 64 general purpose registers, a ALU which can performs operations on 32-bit data and a control unit which controls all control signals like chip select, read , write and branch operations. The decoder is designed in such a way that it divides the instruction in to opcode, mode of operation and registers. By reading the opcode and mode of operation in selects the operation in execution unit and control unit generates signals like chip select, read and write. The RISC processor contains special instruction memory and branch memory. Instruction memory contains machine code of the program and program counter(PC) increments ofter execution of each instruction so that next instruction is fetched and executed. Branch memory contains the branch address, when the branch instructions in decoded the branch address

copied to program counter(PC) so that next instruction for execution is shifted to specied branch address. Schematic of the 32-bit RAM as shown in gure 3 contains two data in/out ports and two address pots so that two data can be read or written at a time and it contains separate control signals for both ports. The register memory also contains the same architecture as RAM.

Figure 3: Schematic of the 32-bit RAM

Schematic of the 32-bit ROM as shown in gure 4 contains two data out ports and two address pots so that two data can be read at a time and it contains separate control signals for both ports.

Figure 4: Schematic of the 32-bit R0M

2.2

INSTRUCTION SET

The processor has 25 different instructions to perform all arithmetic, logical, branch and data transfer with 3 different modes. Mode0 of instructions are based on the registerregister logic in which all operations are performed registers , Mode1 of instructions are based on immediate mode in which all operations are performed on direct data and in Mode2 is branch instruction. The instructions of the processor are shown in the Tables 1, 2, 3, 4.

OPCODE ADD ADDI SUB SUBI INC DEC MUL MULI DIV DIVI

MACHINE CODE 0000001 0000010 0000011 0000100 0000101 0000110 0000111 0001000 0001001 0001010

MODE 00 01 00 01 00 00 00 01 00 01

REG AND BRANCH ADDRESSES XXX........XXX XXX.........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX

Table 1: Arithmetic Instructions

OPCODE XCHANG MOV MOVI PUSH POP

MACHINE CODE 0010100 0010000 0010001 0010010 0010100

MODE 00 00 01 00 00

REG AND BRANCH ADDRESSES XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX

Table 2: Data transfer Instructions

OPCODE JUMP JUMPI

MACHINE CODE 0110000 0110001

MODE 10 10

REG AND BRANCH ADDRESSES XXX........XXX XXX........XXX

Table 3: Branch Instructions

OPCODE ASHFTL ASHFTR LSHFTL LSHFTR NOT NOTI NAND NANDI NOR NOPI

MACHINE CODE 0100000 0100001 0100010 0100011 0100100 0100101 0100110 0100111 0101000 0101001

MODE 00 00 00 00 00 01 00 01 00 01

REG AND BRANCH ADDRESSES XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX

Table 4: Logical Operation Instructions

2.3

RESULTS

Simulated result of 32-bit RAM is as shown in Figure 5

Figure 5: Simulated result of the 32-bit RAM

Simulated result of 32-bit ROM is as shown in Figure 6

Figure 6: Simulated result of the 32-bit R0M

Simulated results of a addition program is as shown in Figure 7 MOVI reg[1] 15d2 (00100010100001000000000000001000) MOVI reg[2] 15d1 (00100010100001000000000000000100) ADD reg[1] reg[2] reg[3] (00000010000000100001000001100000) END (00000000000000000000000000000000)

Figure 7: Simulated result of addition program

Simulated results of a counter program is as shown in Figure 8 MOVI reg[2] 15d10 (00100010100001000000000000101000) MOVI reg[3] 15d0 (00100010100001000000000000000000) MOVB breg[0] 6d3 (00101010100001100000000000000000) INC reg[1] (00001010000000100000100000000000) JUMPC reg[3] 15b0 (01100011000000000000100001000000) END (00000000000000000000000000000000)

Figure 8: Simulated result counter program

TARGET fOR NEXT EVALUATION


The targets for next evaluation are as follows:

3.1

VLIW PROCESSOR

A 4 issue VLIW processor is to be designed with each instruction length of 128 bits witch contain 4 operations in it. T he Execution unit contains 4-ALUs and 2-multipliers, as the instruction length is 128 bits decoder should divide the 128 bit in two 4 small instructions to execute the operations separately and simultaneously. RAM and ROM is to be designed so that 8 datas can be read from the memory or written in to the memory at a time.

3.2

CONTROL UNIT

A special control unit is to de designed. This control unit has to generate control signals to manage all ALUs, multipliers, general purpose registers and branch registers.

3.3

IMPLIMENTATON

After the design of VLIW processor the performance of the VLIW processor is compared with the risc processor by implementing the processors in to a FPGA board.

References
[1] S. W. Fakhar Anjam and F. Nadeem, A Shared Recongurable VLIW Multiprocessor System, in Computer Engineering Laboratory, Delft University of Technology Delft, The Netherlands. [2] G. B. Stephen Wong, Thijs van, -VEX: A Recongurable and Extensible VLIW Processor, in Delft University of Technology Delft, The Netherlands. [3] M. D. Ciletti, Modeling, synthesis, and rapid prototyping with the verilog (tm) hdl, Recherche, vol. 67, p. 02, 1999. [4] L. H. S. de Pablo, J.A. Cebrin, A very simple 8-bit RISC processor for FPGA.

Potrebbero piacerti anche