Project Report 2

Second Progress Report
on
A Recongurable VLIW processor System

Submitted by
PAVAN NAIK PORIKA

(Registration No : 10VL16F ) of
MASTER OF TECHNOLOGY
in
VLSI Design
Under the guidance of
Mrs Aparna.P
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA SURATHKAL, SRINIVASNAGAR-575025 KARNATAKA, INDIA DECEMBER 2011
Abstract Field-Programmable Gate Arrays (FPGAs) are constantly improving in terms of performance and area, and provide a technology platform that allows fast and complex recongurable designs. So Computer architectures based on recongurable hardware are becoming more popular. This project is on the designing and implementation of a recongurable very long instruction word (VLIW) processor system. This processor is implemented as a softcore using verilog code on a eld-programmable gate arrays (FPGA). This VLIW processor can exploit data level as well as instruction level parallelism inherent in an application and make its execution faster. More importantly, we achieve our results by saving expensive FPGA area through the sharing of resources.
ii
Contents
Abstract Contents List of Figures 1 INTRODUCTION 1.1 BASIC PROCESSOR DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 INSTRUCTION SET ARCHITECTURE . . . . . . . . . . . . . . . . . . . . 32-BIT RISC PROCESSOR DESIGN 2.1 ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TARGET fOR NEXT EVALUATION 3.1 VLIW PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 CONTROL UNIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 IMPLIMENTATON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii iii iv 1 1 2 2 2 4 5 7 7 7 7 8
References
iii
List of Figures
1 2 3 4 5 6 7 8 Block Diagram of Processor . . . . Instruction set architecture . . . . . Schematic of the 32-bit RAM . . . . Schematic of the 32-bit R0M . . . . Simulated result of the 32-bit RAM Simulated result of the 32-bit R0M . Simulated result of addition program Simulated result counter program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 3 5 5 6 6
iv
INTRODUCTION
VLIW processor has the main architectures which can exploit ILP in a single core processor. This architectures exploit ILP by issuing multiple operations per issue-slot to additional functional units (FUs).
1.1
BASIC PROCESSOR DESIGN
The basic design of a single processor contains physically separated memories for program instructions and data. This implies that the width of databus may dier per memory type. This is especially useful for VLIW architectures, because we want to issue very wide words from instruction memory. A four-stage design consisting of fetch, decode, execute, and writeback stages is used for this processor. This processor has four Arithmetic Logic Units (ALUs), two Multiplier units (MULs), one Control unit (CTRL), one Memory unit (MEM), a General-purpose Register (GR) le with 64 32-bit registers and a Branch Register (BR) le with 8 1-bit registers.
PC
GR
CTRL
INST . MEMORY
FETCH
DECODE
EXECUTE A M A A M A
WRITE BACK
DATA MEMORY
BR
MEM
Figure 1: Block Diagram of Processor
The Figure 1 depicts the organization of a 4-issue processor. The fetch unit fetches a VLIW instruction from the attached instruction memory, and passes it on to the decode unit. In this stage, the instruction is being split into syllables. Also, the register contents used as operands are fetched from the register les. The actual operations take place in either the execute unit, or in one of the parallel CTRL or MEM units. ALU1 and MUL operations are performed in the execute stage. This stage is designed parametric, so that the number of ALU and MUL functional units could be adapted. The processor should have exactly one CTRL and MEM unit, so these units are designed outside the parametric execute unit. All jump and branch 1
operations are handled by the CTRL unit, and all data memory load and store operations are handled by the MEM unit. To ensure that all results to the GR and BR registers, external data memory and the internal Program Counter (PC) are written at the same time per instruction, all write activities are performed in the writeback unit.
1.2
INSTRUCTION SET ARCHITECTURE
Each syllable in this processor will take 32 bits and each instruction contains 4 different syllables so the default instruction size of the processor is 128 bit as shown in gure 2. As a processor contains 4 ALU units, all syllables are able to issue an ALU operation and the other operations are distributed among the syllables. Syllable 0 is able to issue CTRL operations, syllables 1 and 2 are able to issue MUL operations and syllable 3 is able to issue MEM operations.
Figure 2: Instruction set architecture
2
2.1
32-BIT RISC PROCESSOR DESIGN

ARCHITECTURE
A 32-bit RISC processor is designed. It contains 256 32 RAM, 128 32 ROM, 64 general purpose registers, a ALU which can performs operations on 32-bit data and a control unit which controls all control signals like chip select, read , write and branch operations. The decoder is designed in such a way that it divides the instruction in to opcode, mode of operation and registers. By reading the opcode and mode of operation in selects the operation in execution unit and control unit generates signals like chip select, read and write. The RISC processor contains special instruction memory and branch memory. Instruction memory contains machine code of the program and program counter(PC) increments ofter execution of each instruction so that next instruction is fetched and executed. Branch memory contains the branch address, when the branch instructions in decoded the branch address
copied to program counter(PC) so that next instruction for execution is shifted to specied branch address. Schematic of the 32-bit RAM as shown in gure 3 contains two data in/out ports and two address pots so that two data can be read or written at a time and it contains separate control signals for both ports. The register memory also contains the same architecture as RAM.
Figure 3: Schematic of the 32-bit RAM
Schematic of the 32-bit ROM as shown in gure 4 contains two data out ports and two address pots so that two data can be read at a time and it contains separate control signals for both ports.
Figure 4: Schematic of the 32-bit R0M
2.2
INSTRUCTION SET
The processor has 25 different instructions to perform all arithmetic, logical, branch and data transfer with 3 different modes. Mode0 of instructions are based on the registerregister logic in which all operations are performed registers , Mode1 of instructions are based on immediate mode in which all operations are performed on direct data and in Mode2 is branch instruction. The instructions of the processor are shown in the Tables 1, 2, 3, 4.
OPCODE ADD ADDI SUB SUBI INC DEC MUL MULI DIV DIVI
MACHINE CODE 0000001 0000010 0000011 0000100 0000101 0000110 0000111 0001000 0001001 0001010
MODE 00 01 00 01 00 00 00 01 00 01
REG AND BRANCH ADDRESSES XXX........XXX XXX.........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX
Table 1: Arithmetic Instructions
OPCODE XCHANG MOV MOVI PUSH POP
MACHINE CODE 0010100 0010000 0010001 0010010 0010100
MODE 00 00 01 00 00
REG AND BRANCH ADDRESSES XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX
Table 2: Data transfer Instructions
OPCODE JUMP JUMPI
MACHINE CODE 0110000 0110001
MODE 10 10
REG AND BRANCH ADDRESSES XXX........XXX XXX........XXX
Table 3: Branch Instructions
OPCODE ASHFTL ASHFTR LSHFTL LSHFTR NOT NOTI NAND NANDI NOR NOPI
MACHINE CODE 0100000 0100001 0100010 0100011 0100100 0100101 0100110 0100111 0101000 0101001
MODE 00 00 00 00 00 01 00 01 00 01
REG AND BRANCH ADDRESSES XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX
Table 4: Logical Operation Instructions
2.3
RESULTS
Simulated result of 32-bit RAM is as shown in Figure 5
Figure 5: Simulated result of the 32-bit RAM
Simulated result of 32-bit ROM is as shown in Figure 6
Figure 6: Simulated result of the 32-bit R0M
Simulated results of a addition program is as shown in Figure 7 MOVI reg[1] 15d2 (00100010100001000000000000001000) MOVI reg[2] 15d1 (00100010100001000000000000000100) ADD reg[1] reg[2] reg[3] (00000010000000100001000001100000) END (00000000000000000000000000000000)
Figure 7: Simulated result of addition program
Simulated results of a counter program is as shown in Figure 8 MOVI reg[2] 15d10 (00100010100001000000000000101000) MOVI reg[3] 15d0 (00100010100001000000000000000000) MOVB breg[0] 6d3 (00101010100001100000000000000000) INC reg[1] (00001010000000100000100000000000) JUMPC reg[3] 15b0 (01100011000000000000100001000000) END (00000000000000000000000000000000)
Figure 8: Simulated result counter program
TARGET fOR NEXT EVALUATION

The targets for next evaluation are as follows:
3.1
VLIW PROCESSOR
A 4 issue VLIW processor is to be designed with each instruction length of 128 bits witch contain 4 operations in it. T he Execution unit contains 4-ALUs and 2-multipliers, as the instruction length is 128 bits decoder should divide the 128 bit in two 4 small instructions to execute the operations separately and simultaneously. RAM and ROM is to be designed so that 8 datas can be read from the memory or written in to the memory at a time.
3.2
CONTROL UNIT
A special control unit is to de designed. This control unit has to generate control signals to manage all ALUs, multipliers, general purpose registers and branch registers.
3.3
IMPLIMENTATON
After the design of VLIW processor the performance of the VLIW processor is compared with the risc processor by implementing the processors in to a FPGA board.
References
[1] S. W. Fakhar Anjam and F. Nadeem, A Shared Recongurable VLIW Multiprocessor System, in Computer Engineering Laboratory, Delft University of Technology Delft, The Netherlands. [2] G. B. Stephen Wong, Thijs van, -VEX: A Recongurable and Extensible VLIW Processor, in Delft University of Technology Delft, The Netherlands. [3] M. D. Ciletti, Modeling, synthesis, and rapid prototyping with the verilog (tm) hdl, Recherche, vol. 67, p. 02, 1999. [4] L. H. S. de Pablo, J.A. Cebrin, A very simple 8-bit RISC processor for FPGA.

Project Report 2

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Project Report 2

Caricato da

Copyright:

Formati disponibili

Second Progress Report

A Recongurable VLIW processor System

PAVAN NAIK PORIKA

BASIC PROCESSOR DESIGN

Figure 1: Block Diagram of Processor

INSTRUCTION SET ARCHITECTURE

Figure 2: Instruction set architecture

32-BIT RISC PROCESSOR DESIGN

Figure 3: Schematic of the 32-bit RAM

Figure 4: Schematic of the 32-bit R0M

Table 1: Arithmetic Instructions

OPCODE XCHANG MOV MOVI PUSH POP

MACHINE CODE 0010100 0010000 0010001 0010010 0010100

REG AND BRANCH ADDRESSES XXX........XXX XXX........XXX XXX........XXX XXX........XXX XXX........XXX

Table 2: Data transfer Instructions

OPCODE JUMP JUMPI

MACHINE CODE 0110000 0110001

REG AND BRANCH ADDRESSES XXX........XXX XXX........XXX

Table 3: Branch Instructions

Table 4: Logical Operation Instructions

Simulated result of 32-bit RAM is as shown in Figure 5

Figure 5: Simulated result of the 32-bit RAM

Simulated result of 32-bit ROM is as shown in Figure 6

Figure 6: Simulated result of the 32-bit R0M

Figure 7: Simulated result of addition program

Figure 8: Simulated result counter program

TARGET fOR NEXT EVALUATION

Potrebbero piacerti anche