Sei sulla pagina 1di 7

METHODOLOGY FOR SYNTHESIS, TESTING, AND VERIFICATION OF PIPELINED ARCHITECTURE PROCESSORS FROM BEHAVIORAL-LEVELONLY HDL CODE AND A CASE

STUDY EXAMPLE
J. Robert Heath* and Sreenivas Durbha Dept. o f Electrical Engineering, 453 Anderson Hall University of Kentucky Lexington, KY 40506 Heath @ enpr.uky.edu,sdurbha@ ~cocd2.intel.com
*CorrespondingAuthor

ABSTRACT
A goal of computer designers is to reduce the development cycle time for complex pipelined architecture core processor systems. A research egort is described which had a major objective of determining if an approach and methodology could be developed which will allow complex pipelined architecture processors with stringent system functional, timing, and pelformance requirements to be correctly and eficiently synthesized from a high behavioral-level-only HDL design description, thus reducing development cycle time. A second research objective was to synthesize to target FPGA technology using primarily standard available PC based CAD tools. Contributions include a developed approach and methodology which are verijied by presentation of the results of a case study example which resulted in the correct synthesis of a FPGA protorype of a behavioral-level-only HDL described pipeline architecture processor. Correct synthesis was verified via experimental testing of the processor prototype. Kev Words: High Level Design and Synthesis, HDLs, Computers, FPGAs, Prototyping, Testing, Verification.

using current techniques of largely structural HDL coding. This largely depends on the current generation of Computer Aided Design (CAD) tools. Hence a useful offshoot of the research problem is to show how best to utilize current CAD tools to suit our system design, design capture, prototype synthesis, and prototype experimental testing goals. The entire work was carried out in a Windows 95 PC based environment using three different kinds of CAD tools for simulation and synthesis purposes from three different vendors.

B. Previous Related Research


This research differs from previous research with similar goals in that the input HDL code, to be used for synthesis, is written in a purely high level behavioral fashion for a complex timing-critical system except for the memory elements, which were drawn from the vendor provided library of components to allow a more efficient synthesis in terms of FPGA chip resources [1-4]. The one thing that characterizes this research as being different from previous efforts is that a very easy and economical (in terms of P G A resource utilization) design was obtained that embedded into a single ORCA 2C40A FPGA chip with almost no constraints imposed on it. This was achieved by partitioning the code in a manner that made it easy for the synthesis tools to understand despite maintaining its purely behavioral character [6]. Also, when there was no compatibility between the CAD tool used to generate the netlists for synthesis purposes and the tool used for the purposes of mapping, place, and route and the bitstream generation for the final download, ways were devised to get around the resultant problems [6]. Code used in pre-synthesis simulation was almost the same as that used in the synthesis but for a few minor changes [6].

I.

INTRODUCTION AND BACKGROUND

This section discusses the motivation for the reported research and the goals and objectives of the research.

A. Goals and Objectives


The goal of this research was to determine and describe how a contemporary pipelined architecture processor (a MIPS R2000 was used as a case study example) can be captured in high behavioral-level-only Verilog HDL code such that a prototype of the processor would synthesize and fit into a moderate sized FPGA currently in commercial use and would, hence, allow verification o f a correct behavioral level, processor HDL description and synthesis via experimental prototype testing. This entails tailoring the behavioral code to achieve ef6cient (minimum) use i f the FPGA resources, thus providing an easier means of FPGA prototyping with results comparable to that achieved by

11.

MIPS R2000 INSTRUCTION SET ARCHITECTURE OVERVIEW

We now briefly overview the Instruction Set Architecture (ISA) of the MIPS R2000 pipelined

0-7803-6748-0/01/$10.00 02001 IEEE

143

architecture processor [SI for which a behavioral level HDL description will be developed, synthesized, and tested for verification of a correct behavioral level design capture and synthesis.

A. Specification of MIPS R2000 Architecture


The following are the specifications of our version of the MIPS R2000 architecture. Instruction Word Length - 32-bits. Data Word Length - 32-bits. 0 Instruction Memory (IM) Cache Size - 128 bytes (32 x 32). Data Memory (DM) Cache Size - 128 bytes (32 x 32). Register File (RF) Size - 128 bytes (32 x 32). A Program Counter (PC). Two adders, one that increments the PC and another that helps offset the PC during a jump instruction. 0 Arithmetic Logic Unit (ALU) functionality. 0 Control and Data Hazard Detection functionality. Forwarding functionality to nullify effects of data as well as control hazards. 0 Branch Detect functionality. Controller functionality. Four pipeline stage registers interspersing five pipeline stages to hold the outputs of the functional units in each of the pipeline stages. They are given mnemonics IFAD for the pipeline register between the Instruction Fetch (IF) and Instruction Decode (ID) pipeline stages; IDEX for the pipeline register between the ID and Instruction Execute (EX) stages; EX/MEM for the register between EX and Memory Access (MEM) stages; and MEIWWB for the pipeline register between the MEM and the Write Back (WB) pipeline stages. 0 Maskable Hardware Vectored Priority Interrupt System (MHVPIS) functionality to handle all interrupts.

The MIPS addressing modes are [ 5 ] : register, dispalacement, immediate, PC relative, and pseudodirect. We implemented enough of the full instruction set 1.0 require all basic functionality found in a commercial version of the processor and we implemented enough of the instruction set to allow writing and execution of test programs required to verify all basic functionality of the processor and its instruction set. Our version of the processor implements the above 17 assembly language instructions. Its organization and architecture are shown .in Fig. 2.1 including the Forwarding and Hazard Detection Units.

111. BEHAVIORAL LEVEL VERILOG HDL DESCRIPTION OF PROCESSOR


We briefly discuss the approach and strategy followed to develop a behavioral description of the above described pipelined ISA in Verilog HDL. We address the factors that were considered when developing the Verilog code keeping in mind the goal of the research effort, which was to make the entire design described in behavioral code efficiently and correctly embed into a FPGA chip implying utilization of a minimum of FPGA chip resources. While writing the behavioral code, different and other ways of representing the same functionality in Verilog HDL were studied. Changes made in behavioral code to make it properly synthesize were addressed. Paper page constraints do not allow detail related to the above issues. Detailed discussiodexamples of the issues are found in [ 6 ] .

A. Approach and Method of Behavioral Description of Pipelined Processor


Since the pipeline is organized to keep the flow of instructions from the IF stage to the WB stage and each pipeline stage has the functional units that are most accessed by the units nearest to them, the behavioral description was laid out in a hierarchical manner as illustrated in Fig. 3.1. A bottom-up approach to pre-synthesis simulation is followed in testing and verification of the code for the pipelined processor. This hierarchical lay out of the code turned out to be the best way to write behavioral level code for the processor since each of the modules could be individually tested before interfacing them with the others.. In Fig. 3.1 all modules except those that were used in the memory (both IM and DM) as well as the Register File were written at a behavioral level. The modules at the bottom of the schematic namely, scuba-im.v, scuba-RlR2.v and scuba-dm.v are the ORCA Foundry Development Systems \SCUBA tool generated Verilog HDL modules. This was necessitated because the size of the memory and the register files was large with respect to the logic available on the FPGA. It was also easier for the pre-synthesis simulation tool, SILOS III to handle the smaller modules during

B. Instruction Formats, Addressing Modes and


Architecture
MIPS assembly language instructions use three different instruction formats. The R-Format instructions are the ALU instructions which operate on two register source operands with the result placed in a third register destination. The R-type instructions that we implement are ADD, SUB, AND,OR, Set-On-Less-Than (slt), Shift Left Logical (sll), Shift Right Logical (srl) and XOR. The load (lw) and store (sw) instructions are I format. The conditional branch instructions are I format.. There is a Branch Equal (beq) and a Branch-Not-Equal (bne) instruction. Other I format instructions implemented were the ADDI, ORI, ANDI, and slti (set-on-less-than-immediate) instructions. The Jump Q) instruction (J Format) is not conditional.

144

simulation. Throughout the code-writing phase of the research project, the architecture of each of the units and their interface with the other units was constructed by having

high-level functional diagrams of the units. All signals must interface with their counterparts in successive modules.

SI I

145

B. Nuances of Behavioral Verilog Description


There are only certain types of behavioral Verilog that most present day synthesis tools support [7]. Simulation tools on the other hand support most behavioral constructs [81. The different synthesizable behavioral constructs we used in writing the Verilog HDL code for our pipelined processor based on the set of PC based CAD tools we used for design capture, pre-synthesis simulation, and synthesis is summarized in Fig. 3.2. We also indicated behavioral level constructs which will not normally synthesize in Fig. 3.2. We feel the behavioral constructs we used would work for most CAD tool sets. Examples, advantages, and disadvantages of using each of those constructs are discussed in [6]. Finally, we summarize the high-level view of the functionality and control action within each pipeline stage that formed the primary consideration in writing the behavioral level Verilog HDL code for the processor [6]. Referring to Fig. 3.1, processor functionality was

behaviorally coded in the following order: 1) The IF Pipeline Stage, 2) The ID Pipeline Stage including the Controller and Hazard Detection functionality, 3) The EX Pipeline Stage, 4) The MEM Pipeline Stage, 5 ) The WB Pipeline Stage, 6) The Instruction and Data Cache Memory, 7) The Forwarding functionality, 8) The Maskable Hardware Vectored Priority Interrupt System (MHVPIS) functionality, and 9) Highest Level Module containing all above functional units.

IV. HIGH LEVEL SYNTHESIS OF PIPELINE PROCESSOR PROTOTYPE


High level synthesis is the automatic synthesis of a design structure from a behavioral specification, at levels above and including the logic level.

f0nvard.v Forwarding U n i t

hazard.v Hazard Detection

MEM Pipeline
dmv Data Memory

inv

Instruction Memory

RegFi1e.v
Re~ster File

adder.v 32-bit adder

rl
scuba-im.v

v l
scuba-RlR2.v

Figure 3.1: The Behavioral Level HDL Hierarchy of the Pipelined Processor.

146

A. The Design Flow


Every hardware design project has a design flow that will best achieve the goals of the project; namely, start with the design specifications and arrive at the end product with the desired functionality. This is targeted for a minimum time maximum reliability design cycle.

B. Synthesis Results and FPGA Resource Utilization


This section presents the results of the synthesis in terms of the resource utilization of the ORCA 2C40A FPGA. A resource utilization summary for the top-level pipelined processor core-only synthesis indicates that 6 1% of the 900 Programmable Functional Units (PFUs) are occupied by the design. Only 3% of the mstate buffers are used. The tristate buffers are chiefly used in the SCUBA generated memory elements. Inputloutput (YO) utilization is 78% of the I/O resources. Because of FPGA chip I/O pin limitations and since the ORCA Evaluation Board containing a 7-segment LCD display were used to experimentally test the synthesized pipelined processor core, a test multiplexor was programmed into the ORCA 2C40 FPGA chip along with the processor core to increase the number of signals, busses, register outputs,etc that could be multiplexed to the LCD displays. A resource utilization summary for the pipelined processor core and multiplexor test-circuitry synthesis indicates that 74% of the PFU area is occupied by the design. Only 3% (256R200) of the tristate buffers are used. The YO utilization is 11% of the I/O resources.

Svnthesizable Constructs of Verilog HDL


and, arrays, always, assign, begin, buf, bufif0, bufifl, case, casex, casez, compiler directives, concatenation, default, disbale, end, forever, for, function, input, inout, instantiation, if, integer, memories, module, not, notif0, notifl , negedge, nor, nand, or, output, operators, parameter, posedge, real, reg, repeat, supplyo, supplyl, tri, task, trior, while, wire, wor, xor, xnor,

Unsvnthesizable Constructs of Verilog HDL


time, defparam, $finish, fork, join, initial, delays, User Defined Primitives (UDPs), wait.

Figure 3.2: Synthesizable and Unsynthesizable Verilog Constructs


The design capture of the processor was tested and verified by simulating it with test-benches using the SimuCAD Silos Simulation Environment, SILOS III [7]. Then Synopsys FPGA Express [9] was used as the means to generate synthesizable netlists with the behavioral Verilog HDL input. Little to no changes were made from the simulation-readyHDL to the synthesis-ready HDL. A netlist obtained in EDIF format was then input to the Map, Place and Route tools of the ORCA Foundry Control Center (OFCC)[ 101. The obtained NCD (circuit description) format file was then used to generate the bit-stream to be downloaded into the ORCA2C40A FPGA. A ORCA evaluation board, containing the FPGA chip with the processor synthesized within it, was used to test and verify correct synthesis of the pipeline architecture processor prototype from behavioral level HDL code. Complete details of the methodology, utilized software tools, behavioral level Verilog code and testbenches and results can be found in [6] including problems encountered in synthesizing behaviorallevel-only code and their resolution.

V. TESTING AND VERIFICATION OF PIPELINED PROCESSOR PROTOTYPE


We explain the experimental testing procedure used to verify the functionality of the synthesized prototype. The overall experimental testing procedure was to first test and verify that the synthesized prototype processor would correctly fetch and execute on an individual basis (one instruction in the pipeline at the time) all seventeen (17) assembly language instructions of its ISA. The synthesized processor passed this test. Secondly, test programs were written and executed on the prototype processor. These test programs contained diverse control structures (straight line control flow, conditional and unconditional branches, loops with conditional exit, etc) which would test the processors ability to correctly execute entire programs and test the Hazard Detection and Forwarding functionality of the processor. Finally, the processors ability to properly handle interrupts via use of its MHVPIS was tested. Paper page constraints allow only brief description and presentation of results of multiple instruction (program) testing for one program. Complete testing and verification is described in [ 6 ] .

A. Multiple Instruction (Program) Testing


The example test program we will use is shown in Fig. 5.la. This program contains a loop which is conditionally exited, it contains ALU instructions, a jump ( j ) instruction

147

and it also tests the Hazard Detection and Forwarding functionality of the pipelined processor. add $Rl,$R3, $R3 # $R1= 2* $R3 # $R1= 4* $R3 Loop:add $R1, $R1, $R1 add $R1, $R1, $R5 # $R1= $R1+ $R5 lw $R8,O($R1) # $R8 = Mem[$R1+0] beq $R8, $R5, Exit # if ($R5= $R8) go to Exit # $R3 = $R3 + $R4 add $R3, $R3, $R4 j Loop # Jump to address Loop Exit: add $RIA, $R10, $ R l l # $RIA = $R10 +$R11
I

Figure 5.la: Multiple Instruction Test Program


~~

immem 0:OOO00000000000000023184200210802 00212802 2C280000 24A80008 00632002 8: 08000002 034A5802 00000000 00000000
00000000 00000000 00000000 00000000 10: 00000000 00000000 00000000 00000000

00000000000000000000000000000000
18:00000000000000000000000000000000 00000o0o 00000000 000oOOOo oooooOOO

dmmem 0:0000000000020120 0000000000020120 00000000000000000000000200000000 8:0002012000000000 00000000OOO00000 oooOOO00 00000000 00000000 mOOO0 10: 00000000 oo0oo00o 00000000 00000000 000o0O00000000000000000000000000 18: 00000000 00000OOO OOOOOOOO 00000000 00000000 o00oo0oo 00000000 00000000 RegFile.mem
0:00000000000000000000oooo 00100100 oooo0O00 00000002 000oOOOo 00o0O000

Figure 5.lb shows the states of the Instruction Memory (im.mem), the Data Memory (dm.mem), and the Register File (RegFiiemem) before execution of the program of Fig. 5.la. The structure of the figures show the addresses of memory and register file locations (represented in hexadecimal notation) on the left side starting from 0 , (0 h) going through 7d (7 h) in the first two rows. The third and fourth rows start from address 8 d (8 h) and go through 15d(F h) and so on until the last two rows starting with the address 24d (18 h) through 3 l d (1F h). Thus the data is stored in each address location as indicated by the 8-digit hexadecimal number. Fig. 5 . 1 ~ shows the contents of the Reg.File.mem after execution of the program. Known results are left in {.he form of the highlighted and underlined values. The test program should execute all the instructions starting with the add at the Loop begin address 02 in im.mem of Fig.5.2b through the beq instruction at address 06 and after evaluating the branch equal test to false twice should go on to execute the add instruction at address 07 and then execute the jump to Loop beginning address. The third time however, the control of execution after reaching the beq instruction will evaluate the branch equal test to true to branch to the Exit loop address, which is 09 in the Instruction Memory. If this add operation executes correctly it should deposit (00000025)hcx in $RlA, which is the result of the addition of (00000012)h~x and (00000013)hcxin $R10 and $R11 respectively. Hence looking at $RlA shows us the test ran successfully as can be seen from the result in the RegFile.mem map of Fig. 5 . 2 ~ .Other test programs with different control flows were written and all were successfully executed on the prototype processor further verifying a correct behavioral level HDL description and synthesis of the pipelined processor [ 6 ] .

8: OOOOOO00 00000OOO 00000012 00000013


O o o m oo000000 000oOooo 00000000 10: 0000000000000000 oo0ooooo Ooo0O000

RegFile.mem 0:00000000004004020000000000100100 00000000000000020000000000000000 8:00000002000000000000001200000013 00000000 OOOOOOOO 00000000 oo00o0oo 10: 00o0000000000000 O O O m m o O 0 0 o o m 0 0 OOOOOOOO 000OOOoo 00000000 18: 00000000 00000000 00000025 OOOO0000 00000000 oO0o0ooo 00000000 0000OOOo

000000000000000000000000" 18: 0000000000000OOO oooo0000 00000000 o0000000 00000000 000o0000 0000oooo

Figure 5.lb: Memory and Register File State before Execution This program illustrates a series of add operations performed on a single register $R1, to increment it to a value that will be added to another register $R5 that would eventually hold the value that needs to be compared to another register $R8 for the conditional branch operation. The outcome of the conditional branch operation will decide whether to branch and exit the loop or to re-enter the loop.

Figure2.lc: Register File State after Execution The ability of the prototype processor to correctly handle interrupts was tested. All tests were successful in that the processor's MHVPIS successfully recognized the interrupts and in response successfully transferred control to the correct interrupt service routines [ 6 ] . Even though the ORCA 2C40A P G A technology is rated at 33 MHz, we could only run our prototype, synthesized from multiple vendor PC based noncommercialized CAD tools, at 1.979 MHz. The low frequency of operation of the prototype can be attributed to the behavioral level only design capture medium and the utilized PC based design synthesis tools

148

were not commercial versions with full optimization capability.

VI.

CONCLUSIONS

The pipelined processor architecture prototype developed from behavioral-level-only HDL code as described within the paper was experimentally verified to be fully functional. Thus it has been shown that a pipelined processor i t h stringent and complex system functional, architecture w timing, and performance requirements can be correctly synthesized and embedded into an P G A with the design capture being done using an HDL at only the behavioral level of abstraction. Using behavioral-level-only HDL coding rather than more time consuming detailed lower level coding significantly reduces the development cycle time for such processor systems. It was shown that a more efficient synthesis can be achieved if a very small portion of the behavioral level HDL code is reduced to the register level; primarily that part of the code used to describe the Instruction and Data Memory caches of the pipelined processor.

REFERENCES
1. C. A. Fields. Proper Use of Hierarchy in HDL-Based High Density FPGA Design. pp. 168-177. Lecture Notes in Computer Science, Proc. Field-programmable Logic and Applications, 5" Int. Workshop, FPL '95, Aug./Sept. 1995. 2. Y. Li and W. Chu. Aizup - A Pipelined Processor Design and Implementation on XILJNX FPGA Chip. pp. 98-106. IEEE Comp. Soc. Press, 1996. 3. J. S . Gray. Homebrewing RISCs in FPGAs. www3.sympatico.ca/jsgray/j32.ppt 4. M. Gschwind and V. Salapura. A VHDL Design Methodology for FPGAs. pp. 208-217. Lecture Notes in Computer Science, Proc., Field-programmable Logic and Applications, 5" Int. Workshop, FPL '95, Aug./Sept. 1995. 5. D. A. Patterson and J. L. Hennessy, Computer Organization and Design: The Hardware/Sofhyare Intelface, 2"dEdition, Morgan Kauffmann, Inc., 1998. 6. S. Durbha, Prototyping and Testing of a Pipelined Processor from Behavioral Level Code, Master's Thesis, Dept. of EE, Univ. of KY, Lexington, KY, May, 2000. 7. M. D. Ciletti, Modeling, Synthesis, and Rapid Prototyping with the Verilog HDL, Prentice Hall, 1999. 8. S. Palnitkar, Verilog HDL: A Guide to Digital and Synthesis, Prentice Hall, Inc. 1996. 9. FPGA Express Users Manual. 1999. www.s ynopsys.com/products/fpga/fpga-expresshtml 10. Lucent Technologies. ORCA Foundry Development System. User's Guide, EPIC User's Guide (Version 9.1) 11. Lucent Technologies. Optimized Reconfigurable Cell Array (ORCATM), OR2CxxA Series Field-Programmable Gate Arrays. Microelectronics Data Sheet. Mar. 1996.

149

Potrebbero piacerti anche