Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
DEPT OF ECE
ECET
DEPT OF ECE
IP) [2]. OCP is an interface (or socket) aiming to standardize and thus simplify the system integration problems. It facilitates system integration by defining a set of concrete interface (I/O signals and the handshaking protocol) which is independent of the bus architecture. Based on this interface IP core designers can concentrate on designing the internal functionality of IP cores, bus designers can emphasize on the internal bus architecture, and system integrators can focus on the system issues such as the requirement of the bandwidth and the whole system architecture. In this way, system integration becomes much more efficient. Most of the bus functionalities defined in AXI and OCP are quite similar. The most conspicuous difference between them is that AXI divides the address channel into independent write address channel and read address channel such that read and write transactions can be processed simultaneously. However, the additional area of the separated address channels is the penalty .Some previous work has investigated on-chip buses from various aspects. The work presented in [3] and [4] develops high-level AMBA bus models with fast simulation speed and high timing accuracy. The authors in [5] propose an automatic approach to generate high-level bus models from a formal channel model of OCP. In both of the above work, the authors concentrate on fast and accurate simulation models at high level but did not provide real hardware implementation details. In [6], the authors implement the AXI interface on shared bus architecture. Even though it costs less in area, the benefit of AXI in the communication efficiency may be limited by the shared-bus architecture. In this paper we propose a high-performance on-chip bus design with OCP as the bus interface. We choose OCP because it is open to the public and OCP-IP has provided some free tools to verify this protocol. Nevertheless, most bus design techniques developed in this paper can also be applied to the AXI bus. Our proposed bus architecture features crossbar/partial-crossbar based interconnect and realizes most transactions defined in OCP, including 1) single transactions, 2) burst transactions, 3) lock transactions, 4) pipelined transactions, and 5) out-of-order transactions. In addition, the proposed bus is flexible such that one can adjust the bus architecture according to the system requirement .One key issue of advanced buses is how to manipulate the order of
ECET
DEPT OF ECE
Transactions such that requests from masters and responses from slaves can be carried out in best efficiency without violating any ordering constraint. In this work we have developed a key bus component called the scheduler to handle the ordering issues of outof-order transactions. We will show that the proposed crossbar/partial-crossbar bus architecture together with the scheduler can significantly enhance the communication efficiency of a complex SOC.
ECET
DEPT OF ECE
used. Most of the IP cores OCP (Open Core Protocol) which is basically a core based protocol which has its own advantages and flexibilities.
ECET
DEPT OF ECE
eliminates the typical advance knowledge requirements regarding potential end systems, which might utilize a core, as well as the other IP cores that might be present in the application(s). Cores simply need a useful interface that de-couples them from system requirements. The interface then assumes the attributes of a SoCKETan attachment interface that is powerful, frugal and well understood across the industry. Via this methodology, system integrators realize the benefits of partitioning components through layered hardware; designers no longer have to contend with a myriad of diverse core protocols and inter-core delivery strategies. Using a standard IP core interface eliminates having to adapt each core during each SoC integration, allowing system integrators the otherwise unrealized luxury of focusing on SoC design issues. And, since the cores are truly decoupled from the on-chip interconnect, hence, each other, it becomes trivial to exchange one core for another to meet evolving system and market requirements. In summary, for true core reuse, cores must remain completely untouched as designers integrate them into any SoC. This only occurs when, say, a change in bus width, bus frequency or bus electrical loading does not require core modification. In other words, a complete socket insulates cores from the vagaries of, and change to, the SoC interconnect mechanism. The existence of such a socket enables supporting tool and collateral development for protocol, checkers, models, test benches and test generators. This allows independent core development that delivers plug-and-play modularity without core-interconnect rework. This also allows core development in parallel with a system design that saves precious design time.
1.6 Overview
The Open Core Protocol (OCP) defines a high-performance, bus-independent interface between IP cores that reduces design time, design risk, and manufacturing costs for SOC designs. An IP core can be a simple peripheral core, a high-performance microprocessor, or an on-chip communication subsystem such as a wrapped on-chip bus. The Open Core Protocol,
ECET
DEPT OF ECE
Achieves the goal of IP design reuse. The OCP transforms IP cores making them independent of the architecture and design of the systems in which they are used Optimizes die area by configuring into the OCP only those features needed by the communicating cores Simplifies system verification and testing by providing a firm boundary around each IP core that can be observed, controlled, and validated The approach adopted by the Virtual Socket Interface Alliances (VSIA) Design Working Group on On-Chip Buses (DWGOCB) is to specify a bus wrapper to provide a bus-independent Transaction Protocol-level interface to IP cores. The OCP is equivalent to VSIAs Virtual Component Interface (VCI). While the VCI addresses only data flow aspects of core communications, the OCP is a superset of VCI additionally supporting configurable sideband control signaling and test harness signals. The OCP is the only standard that defines protocols to unify all of the inter-core communication. The Open Core Protocol (OCP) delivers the only non-proprietary, openly licensed, core-centric protocol that comprehensively describes the system-level integration requirements of intellectual property (IP) cores. While other bus and component interfaces address only the data flow aspects of core communications, the OCP unifies all inter-core communications, including sideband control and test harness signals. OCP's synchronous unidirectional signaling produces simplified core implementation, integration, and timing analysis. OCP eliminates the task of repeatedly defining, verifying, documenting and supporting proprietary interface protocols. The OCP readily adapts to support new core capabilities while limiting test suite modifications for core upgrades. Clearly delineated design boundaries enable cores to be designed independently of other system cores yielding definitive, reusable IP cores with reusable verification and test suites. Any on-chip interconnects can be interfaced to the OCP rendering it appropriate for many forms of on-chip communications: Dedicated peer-to-peer communications, as in many pipelined signal processing applications such as MPEG2 decoding. Simple slave-only applications such as slow peripheral interfaces.
ECET
DEPT OF ECE
High-performance, latency-sensitive, multi-threaded applications, such as multibank DRAM architectures. The OCP supports very high performance data transfer models ranging from simple request-grants through pipelined and multi-threaded objects. Higher complexity SOC communication models are supported using thread identifiers to manage out-oforder completion of multiple concurrent transfer sequences. The Open Core Protocol interface addresses communications between the functional units (or IP cores) that comprise a system on a chip. The OCP provides independence from bus protocols without having to sacrifice high-performance access to on-chip interconnects. By designing to the interface boundary defined by the OCP, you can develop reusable IP cores without regard for the ultimate target system. Given the wide range of IP core functionality, performance and interface requirements, a fixed definition interface protocol cannot address the full spectrum of requirements. The need to support verification and test requirements adds an even higher level of complexity to the interface. To address this spectrum of interface definitions, the OCP defines a highly configurable interface. The OCPs structured methodology includes all of the signals required to describe an IP cores communications including data flow, control, and verification and test signals. Here the importance of project comes into picture i.e. OCP (Open Core Protocol) plays a vital role by doing its transaction between two different IP cores, which will make the application fail when it doesnt work properly.
1.7 Application:
Since it is an IP block, it can be used in any kind of SOC Application. The application can be listed as follows. SRAM Processor
ECET
DEPT OF ECE
ECET
DEPT OF ECE
reduce the handshaking latency, we proposed a hybrid data locked transfer mode. Unlike the lock transfer in [10] which requires arbitration lock over transactions, our data locked mode is based on a transfer-level arbitration scheme and allows bus ownership to change between transactions. This gives more flexibility to arbitration policy selection. With the additional features of AXI, new factors that affect the bus performance are also introduced. The first factor is the arbitration combination. The multi-channel architecture allows different and independent arbitration policies to be adopted by each channel. However, existing AXI-related works often assumed a unified arbitration policy where each channel adopts the same arbitration policy [1012]. Another key factor is the interface buffer size. A larger interface buffer usually implies that more out-of-order transactions can be handled. The third factor is the task access setting, which defines how the transfer modes should be used by the devices within a system. Proper task access settings can yield better performance. However, the proper setting may be different under different circumstances, such as different buffer sizes. Being aware of the performance factors mentioned above, we conducted a detailed simulation-based analysis on the performance impact of the factors. The analysis is carried out by simulating a multi-core platform with a shared-link AXI backbone running a video phone application. The performance is evaluated in terms of bandwidth utilization, average transaction latency and system task completion time. In addition to the analysis on the performance impact of the aforementioned factors, the performance of a corresponding five-layer AHB-lite bus, which has a cost comparable to a 5-channel shared-link AXI, is also included for comparison. The rest of the paper is organized as follows. Section 2 presents the related works on AXI bus. Section 3 presents the proposed transfer modes.
ECET
DEPT OF ECE
Two normal transactions with a data burst length of four. It takes 16 bus cycles to complete the eight data transfer in the two transactions. This means 50% of the bus available bandwidth is wasted.
ECET
10
DEPT OF ECE
ECET
11
DEPT OF ECE
There are two approaches to signal the bus interconnect to use the data locked mode for a transaction. One uses ARLOCK/AWLOCK signal in the address channels to signal the bus of an incoming transaction using data locked transfer. However, doing so requires modifying the protocol definition of these signals and the bus interface. To avoid modifying the protocol, the other approach assigns the devices that can use the data locked mode in advance. The overhead of this approach is that the bus interconnect must provide mechanisms to configure the device transfer mode mapping. Note that these two approaches can be used together without conflict. To support the proposed data locked mode, the bus interconnect needs an additional buffer, called data locked mode buffer, to keep record of the transactions using the data locked mode. Each entry in the buffer stores one transaction ID. If all the entries in the data locked mode buffer are in use, no more transactions can be transferred using the data locked mode.
ECET
12
DEPT OF ECE
ECET
13
DEPT OF ECE
ECET
14
DEPT OF ECE
this enables the major and very sensitive issue such as interfacing of these IP cores. These interfaces play a vital role in SOC and should be taken care because of the communication between the IP cores property. The communication between the different IP cores should have a lossless data flow and should be flexible to the designer too. Hence to resolve this issue, the standard protocol buses are used in or order to interface the two IP cores. Here the loss of data depends on the standards of protocols used. Most of the IP cores from ARM uses the AMBA (Advanced Microcontroller Bus Architecture) which has AHB (Advanced High-Performance Bus). This bus has its own advantages and flexibilities. A full AHB interface is used for the following. Bus masters On-chip memory blocks External memory interfaces High-bandwidth peripherals with FIFO interfaces DMA slave peripherals
ECET
15
DEPT OF ECE
Figure 2.3.5 Typical AMBA Systems The key advantages of a typical AMBA System are listed as follows. High performance Pipelined operation Multiple bus masters Burst transfers
ECET
16
DEPT OF ECE
Split transactions AMBA APB provides the basic peripheral macro cell communications infrastructure as a secondary bus from the higher bandwidth pipelined main system bus such peripherals typically. Have interfaces which are memory-mapped registers Have no high-bandwidth interfaces Are accessed under programmed control
The external memory interface is application-specific and may only have a narrow data path, but may also support a test access mode which allows the internal AMBA AHB, ASB and APB modules to be tested in isolation with system-independent test sets. Here the importance of project comes into picture i.e. AMBA-AHB plays a vital role by doing its transaction between two different IP cores, which will make the application fail when it doesnt work properly.
2.4. Terminology
The following terms are used throughout this specification 2.4.1 Bus Cycle A bus cycle is a basic unit of one bus clock period and for the purpose of AMBA AHB or APB protocol descriptions is defined from rising-edge to rising-edge transitions. An ASB bus cycle is defined from falling-edge to falling-edge transitions. Bus signal timing is referenced to the bus cycle clock. 2.4.2 Bus Transfer An AMBA AHB bus transfer is a read or write operation of a data object, which may take one or more bus cycles. The bus transfer is terminated by a completion response from the addressed slave. The transfer sizes supported by AMBA AHB include byte (8bit), half word (16-bit) and word (32-bit). 2.4.3 Burst Operation A burst operation is defined as one or more data transactions, initiated by a bus master, which have a consistent width of transaction to an incremental region of address
ECET
17
DEPT OF ECE
space. The increment step per transaction is determined by the width of transfer (byte, half word and word).
2.5 APPLICATIONS
AMBA-AHB can be used in the different application and also it is technology independent. ARM Controllers are designed according to the specifications of AMBA. In the present technology, high performance and speed are required which are convincingly met by AMBA-AHB Compared to the other architectures AMBA-AHB is far more advanced and efficient. To minimize the silicon infrastructure to support on-chip and off-chip communications Any embedded project which involve in ARM processors or microcontroller must always make use of this AMBA-AHB as the common bus throughout the project.
2.6 Features
AMBA Advanced High-performance Bus (AHB) supports the following features. High performance Burst transfers Split transactions Single edge clock operation SEQ, NONSEQ, BUSY, and IDLE Transfer Types Programmable number of idle cycles Large Data bus-widths - 32, 64, 128 and 256 bits wide Address Decoding with Configurable Memory Map
2.7 Merits
Since AHB is a most commonly used bus protocol, it must have many advantages from designers point of view and are mentioned below.
ECET
18
DEPT OF ECE
AHB offers a fairly low cost (in area), low power (based on I/O) bus with a moderate amount of complexity and it can achieve higher frequencies when compared to others because this protocol separates the address and data phases. AHB can use the higher frequency along with separate data buses that can be defined to 128-bit and above to achieve the bandwidth required for highperformance bus applications. AHB can access other protocols through the proper bridging converter. Hence it supports the bridge configuration for data transfer. AHB allows slaves with significant latency to respond to read with an HRESP of SPLIT. The slave will then request the bus on behalf of the master when the read data is available. This enables better bus utilization. AHB offers burst capability by defining incrementing bursts of specified length and it supports both incrementing and wrapping. Although AHB requires that an address phase be provided for each beat of data, the slave can still use the burst information to make the proper request on the other side. This helps to mask the latency of the slave. AHB is defined with a choice of several bus widths, from 8-bit to 1024-bit. The most common implementation has been 32-bit, but higher bandwidth requirements may be satisfied by using 64 or 128-bit buses. AHB used the HRESP signals driven by the slaves to indicate when an error has occurred. AHB also offers a large selection of verification IP from several different suppliers. The solutions offered support several different languages and run in a choice of environments. Access to the target device is controlled through a MUX, thereby admitting busaccess to one bus-master at a time. AHB Masters, Slaves and Arbiters support Early Burst Termination. Bursts can be early terminated either as a result of the Arbiter removing the HGRANT to a master part way through a burst or after a slave returns a non-OKAY response to
ECET
19
DEPT OF ECE
any beat of a burst. However that a master cannot decide to terminate a defined length burst unless prompted to do so by the Arbiter or Slave responses. Any slave which does not use SPLIT responses can be connected directly to an AHB master. If the slave does use SPLIT responses then a simplified version of the arbiter is also required. Thus the strengths of the AHB protocol is listed above which clearly resembles the reason for the wide use of this protocol.
2.8 Demerits
Even though AHB protocol is commonly used bus in the design, it has some affordable demerits which are listed below. AHB cannot achieve full data bus utilization and bandwidth if some slaves have a relatively high latency. AHB defines transfer sizes of 1, 2, 4, 8, and 16 bytes. Because byte enables are not defined, there are cases where multiple transfers must be made inside a single quadword. AHB defines timing parameters for many of the relationships between signals on the bus. However, these are not associated with requirements relative to a clock cycle. Therefore, SoC developers must integrate AHB cores and run chip level static timing analysis to judge how compatible AHB masters and slaves are with one another. Power-based SoCs cover a wide range of applications, and there is a corresponding wide range of address map requirements. Having the address decodes for all AHB slaves reside within the interconnect means having to support the most complex split address ranges, even for the simplest of slaves. Thus the weakness of AHB protocol is mentioned above which can be tolerated with respect to its useful advantages.
ECET
20
DEPT OF ECE
3.2 Merits:
The OCP has many advantages which will make the designers more comfortable which are listed below. OCP is a Point to point protocol which can be directly interfaced between the two IP cores. Most important advantage is that the OCP can be configured with respect to the application due to its configurable property. This configurable property will lead to reduction of the die area and the design time too. Hence the optimization of die area is attained. OCP is a bus independent protocol i.e. it can be interfaced to any bus protocol like AHB.
ECET
21
DEPT OF ECE
This supports pipelining operation and multi-threaded application such as Multi Bank DRAM architecture. Support the Burst operation which will generate the sequence of addresses with respect to the burst length. This OCP provides more flexible to the designer who uses it and also gives high performance by improved core maintenance. The reusability of the IP cores can be done easily using OCP because the issue arises while reusing the IP cores for other application is that the interfaces already used in the system have to be modified with respect to the application. Supports Sideband Signals which will carry out the information such as interrupt, flags, error and status which are said to be non-dataflow signals. Also supports the Testing Signals such as scan interface, clock control interface and Debug and test interface. This ensures that the OCP can also be used to interface the Device under Test (DUT) and test signals can be passed. This OCP also enables the Threads and Tags which does the independent concurrent transfer sequence. OCP doubles the peak bandwidth at a given frequency by using separate buses for read and write data. These buses are used in conjunction with pipelining command to data phases to increase performance. Simplified circuitry needed to bridge an OCP based core to another communication interface standard. Thus the advantages of the OCP are listed above which clearly explains the basic reason of choosing this protocol when compared to others.
ECET
22
DEPT OF ECE
3.3 Demerits:
Every protocol has its own demerits which should not have more proportion in affecting the application or flexibility of the designer. Some of the demerits of OCP are mentioned below.
The designing and verifying the OCP which supports all possible burst operation is complex and also it needs more time and effort. Slaves that support the largest burst transfer size will consume more die area than slaves that are able to accept. The main disadvantage of OCP is that the core which is to be interface with OCP should be OCP compliant, if not, the OCP compliant bridge must be created which will make the core OCP compliant.
ECET
23
DEPT OF ECE
Figure 3.1 Basic block diagram of OCP instance Figure 2.1 shows a simple system containing a wrapped bus and three IP core entities such as one that is a system target, one that is a system initiator, and an entity that is both. The characteristics of the IP core determine whether the core needs master, slave, or both sides of the OCP and the wrapper interface modules must act as the complementary side of the OCP for each connected entity. A transfer across this system occurs as follows. A system initiator (as the OCP master) presents command, control, and possibly data to its connected slave (a bus wrapper interface module). The interface module plays the request across the on-chip bus system. The OCP does not specify the embedded bus functionality. Instead, the interface designer converts the OCP request into an embedded bus transfer. The receiving bus wrapper interface module (as the OCP master) converts the embedded bus operation into a legal OCP command. The system target (OCP slave) receives the command and takes the requested action.
ECET
24
DEPT OF ECE
Each instance of the OCP is configured (by choosing signals or bit widths of a particular signal) based on the requirements of the connected entities and is independent of the others. For instance, system initiators may require more address bits in their OCP instances than do the system targets; the extra address bits might be used by the embedded bus to select which bus target is addressed by the system initiator. The OCP is flexible. There are several useful models for how existing IP cores communicate with one another. Some employ pipelining to improve bandwidth and latency characteristics. Others use multiple-cycle access models, where signals are held static for several clock cycles to simplify timing analysis and reduce implementation area. Support for this wide range of behavior is possible through the use of synchronous handshaking signals that allow both the master and slave to control when signals are allowed to change.
ECET
25
DEPT OF ECE
IP cores, especially since in the directly-connected case there is no decode/selection logic. OCP-compliant slaves receive device selection information integrated into the basic command field. Arbitration schemes vary widely. Since there is virtually no arbitration in the directly-connected case, arbitration for any shared resource is the sole responsibility of the logic on the bus side of the OCP. This permits OCP-compliant masters to pass a command field across the OCP that the bus interface logic converts into an arbitration request sequence. Address/Data Wide widths, characteristic of shared on-chip address and data buses, make tuning the OCP address and data widths essential for area-efficient implementation. Only those address bits that are significant to the IP core should cross the OCP to the slave. The OCP address space is flat and composed of 8-bit bytes (octets). To increase transfer efficiencies, many IP cores have data field widths significantly greater than an octet. The OCP supports a configurable data width to allow multiple bytes to be transferred simultaneously. The OCP refers to the chosen data field width as the word size of the OCP. The term word is used in the traditional computer system context; that is, a word is the natural transfer unit of the block. OCP supports word sizes of power-of-two and non-power-of-two as would be needed for a 12-bit DSP core. The OCP address is a byte address that is word aligned. Transfers of less than a full word of data are supported by providing byte enable information that specifies which octets are to be transferred. Byte enables are linked to specific data bits (byte lanes). Byte lanes are not associated with particular byte addresses.
ECET
26
DEPT OF ECE
Pipelining The OCP allows pipelining of transfers. To support this feature, the return of read data and the provision of write data may be delayed after the presentation of the associated request. Response The OCP separates requests from responses. A slave can accept a command request from a master on one cycle and respond in a later cycle. The division of request from response permits pipelining. The OCP provides the option of having responses for Write commands, or completing them immediately without an explicit response. Burst To provide high transfer efficiency, burst support is essential for many IP cores. The extended OCP supports annotation of transfers with burst information. Bursts can either include addressing information for each successive command (which simplifies the requirements for address sequencing/burst count processing in the slave), or include addressing information only once for the entire burst.
ECET
27
DEPT OF ECE
ECET
28
DEPT OF ECE
4.2.2 Lock transactions Lock is a protection mechanism for masters that have low bus priorities. Without this mechanism the read/write transactions of masters with lower priority would be interrupted whenever a higher-priority master issues a request. Lock transactions prevent an arbiter from performing arbitration and assure that the low priority masters can complete its granted transaction without being interrupted. 4.2.3 Pipelined transactions (outstanding transactions)
Figure 2(a) and 2(b) show the difference between non-pipelined and pipelined (also called outstanding in AXI) read transactions. In FIGURE 2(a), for a non-pipelined transaction a read data must be returned after its corresponding address is issued plus a period of latency. For example, D21 is sent right after A21 is issued plus t. For a pipelined transaction as shown in FIGURE 2(b), this hard link is not required. Thus A21 can be issued right after A11 is issued without waiting for the return of data requested by A11 (i.e., D11-D14).
ECET
29
DEPT OF ECE
4.3 Hardware Design of the On-Chip Bus The architecture of the proposed on-chip bus is illustrated in FIGURE 4, where we show an example with two masters and two slaves. A crossbar architecture is employed such that more than one master can communicate with more than one slave simultaneously. If not all masters require the accessing paths to all slaves, partial crossbar architecture is also allowed. The main blocks of the proposed bus architecture are described next.
ECET
30
DEPT OF ECE
Arbiter In traditional shared bus architecture, resource contention happens whenever more than one master requests the bus at the same time. For a crossbar or partial crossbar architecture, resource contention occurs when more than one master is to access the same slave simultaneously. In the proposed design each slave IP is associated with an arbiter that determines which master can access the slave.
ECET
31
DEPT OF ECE
Decoder Since more than one slave exists in the system, the decoder decodes the address and decides which slave return response to the target master. In addition, the proposed decoder also checks whether the transaction address is illegal or nonexistent and responses with an error message if necessary. 4.5 FSM-M & FSM-S Depending on whether a transaction is a read or a write operation, the request and response processes are different. For a write transaction, the data to be written is sent out together with the address of the target slave, and the transaction is complete when the target slave accepts the data and acknowledges the reception of the data. For a read operation, the address of the target slave is first sent out and the target slave will issue an accept signal when it receives the message. The slave then generates the required data and sends it to the bus where the data will be properly directed to the master requesting the data. The read transaction finally completes when the master accepts the response and issues an acknowledge signal. In the proposed bus architecture, we employ two types of finite state machines, namely FSM-M and FSM-S to control the flow of each transaction. FSM-M acts as a master and generates the OCP signals of a master, while FSM-S acts as a slave and generates those of a slave. These finite state machines are designed in a way that burst, pipelined, and out-or-order read/write transactions can all be properly controlled.
ECET
32
DEPT OF ECE
The basic signals between the two cores are identified and are shown in the Figure 2.2 which is said to be a dataflow signal diagram. Here core1 acts as the master that gives the command to the slave and the core2 acts as the slave which accepts the command given by the master in order to perform an operation. CORE 1 CORE 2
CLK
Control
MCmd MAddr
R E Q U E S T
Input Addr
Input Data
Burst Length
M A S T E R
SCmdAccept
S L A V E
R E S P O N S E
D A T A F L O W S I G N A L S
Output Data
MData MDataLast
Data Handshake
Figure 4.5 OCP dataflow signals Figure 2.2 shows the OCP dataflow signals which include the Request, Response and Data Handshake. A set of signals comes under the request phase are the one which
ECET
33
DEPT OF ECE
will be used for requesting a particular operation to the slave. The request phase will be ended by the SCmdAccept signal. Similarly the signals comes under the response phase are the one which will used for sending the proper response to the corresponding request. The response phase will be ended by the SResp signal. The data handshake signals are one which deals with the data transfer either from master or slave. The Basic signals are the one which will be used in the simple read and write operation of the OCP master and slave. This simple operation can also support the pipelining operation. These basic signals are extended to the burst operation in which more than one request with multiple data transfer. It can also be defined in such a way that the burst extensions allow the grouping of multiple transfers that have a defined address relationship. The burst extensions are enabled only when MBurstLength is included in the interface. The burst length is the one which represents that how many write or read operation should be carried out in a burst. Hence this burst length will be given by the system to the master which will in turn give it to the slave through the MBurstLength signal. Thus the burst length acts as one of the input to the master only in burst mode is enabled. Whereas in simple write and read operation, the burst length input is not needed. From the Figure 2.2, the inputs and outputs of the OCP are clearly identified which are discussed as follows.
ECET
34
DEPT OF ECE
Master System Control o Control signal acts as input which will say whether the WRITE or READ operation to be performed by the master and is given by the processor through control pin. Input address o System will give the address through addr pin to the master to which the write or read operation can be carried out. Input data o This will act as input pin in which data will be given by the system through data_in pin to the master and that must be stored in the corresponding address during write operation. Burst Length o This input is used only when the burst profile is used and is of integer type which indicates the number of operations that is to be carried out in a burst. Output data o In Read operation, the master will give the address and the slave will receive the address. Now the slave will fetch the corresponding data from the sent address and that data will be given out through this data_out pin.
ECET
35
DEPT OF ECE
Table 4.8.1 Basic OCP Signal Specification S.No. 1 2 3 4 5 6 7 NAME Clk MCmd MAddr MData SCmdAccept SData SResp WIDTH 1 3 13 (Configurable) 8 (Configurable) 1 1 2 DRIVER Varies Master Master Master Slave Slave Slave FUNCTION OCP Clock Transfer Command Transfer address Write data Slave accepts transfer Read data Transfer response
The request issued by system is given to slave by MCmd signal. Similarly, in Write operation, the input address and data provided by the system will be given to slave through the signal MAddr and MData and when those informations are accepted, slave will give SCmdAccept signal which ensures that the system can issue next request. During Read operation, system issues the request and address to slave which will set SResp and fetch the corresponding data that is given to output through SData. Clk Input clock signal for the OCP clock. The rising edge of the OCP clock is defined as a rising edge of Clk that samples the asserted EnableClk. Falling edges of Clk and any rising edge of Clk that does not sample EnableClk asserted do not constitute rising edges of the OCP clock. EnableClk
ECET
36
DEPT OF ECE
EnableClk indicates which rising edges of Clk are the rising edges of the OCP clock, that is. which rising edges of Clk should sample and advance interface state. Use the enableclk parameter to configure this signal. EnableClk is driven by a third entity and serves as an input to both the master and the slave. When enableclk is set to 0 (the default), the signal is not present and the OCP behaves as if EnableClk is constantly asserted. In that case all rising edges of Clk are rising edges of the OCP clock. MAddr The Transfer address, MAddr specifies the slave-dependent address of the resource targeted by the current transfer. To configure this field into the OCP, use the addr parameter. To configure the width of this field, use the addr_wdth parameter. MCmd Transfer command. This signal indicates the type of OCP transfer the master is requesting. Each non-idle command is either a read or write type request, depending on the direction of data flow. MData Write data. This field carries the write data from the master to the slave. The field is configured into the OCP using the mdata parameter and its width is configured using the data_wdth parameter. The width is not restricted to multiples of 8. SCmdAccept Slave accepts transfer. A value of 1 on the SCmdAccept signal indicates that the slave accepts the masters transfer request. To configure this field into the OCP, use the cmdaccept parameter. SData This field carries the requested read data from the slave to the master. The field is configured into the OCP using the sdata parameter and its width is configured using the data_wdth parameter. The width is not restricted to multiples of 8. SResp
ECET
37
DEPT OF ECE
Response field is given from the slave to a transfer request from the master. The field is configured into the OCP using the resp parameter. 4.8.2 Burst extension The required signals are identified for the burst operation and are tabulated in the Table 2.2. Burst Length indicates the number of transfers in a burst. For precise bursts, the value indicates the total number of transfers in the burst, and is constant throughout the burst. For imprecise bursts, the value indicates the best guess of the number of transfers remaining (including the current request), and may change with every request. Here the burst length that can be configured which represents that many read or write operation can be performed in sequence. The Burst Precise field indicates whether the precise length of a burst is known at the start of the burst or not. The Burst Sequence field indicates the sequence of addresses for requests in a burst. The burst sequence can be incrementing which increments the address sequentially. Table 4.8.2 OCP burst signal specification S.No. NAME WIDTH DRIVER FUNCTION
MBurstLength
13(Configurable)
Master
Burst Length
MBurstPrecise
Master
MBurstSeq
Master
MDataLast
Master
SRespLast
Slave
ECET
38
DEPT OF ECE
Each type will be indicated by its corresponding representation such as increment operation can be indicated by setting the Burst Sequence signal to 000. Data last represents the last write data in a burst. This field indicates whether the current write data transfer is the last in a burst. Last Response represents last response in a burst. This field indicates whether the current response is the last in this burst. MBurstLength Basically this field indicates the number of transfers for a row of the burst and stays constant throughout the burst. For imprecise bursts, the value indicates the best guess of the number of transfers remaining (including the current request), and may change with every request. To configure this field into the OCP, use the burstlength parameter. MBurstPrecise This field indicates whether the precise length of a burst is known at the start of the burst or not. When set to 1, MBurstLength indicates the precise length of the burst during the first request of the burst. To configure this field into the OCP, use the burstprecise parameter. If set to 0, MBurstLength for each request is a hint of the remaining burst length. MBurstSeq This field indicates the sequence of addresses for requests in a burst. To configure this field into the OCP, use the burstseq parameter. MDataLast Last write data in a burst. This field indicates whether the current write data transfer is the last in a burst. To configure this field into the OCP, use the datalast parameter. When this field is set to 0, more write data transfers are coming for the burst; when set to 1, the current write data transfer is the last in the burst. SRespLast Last response in a burst. This field indicates whether the current response is the last in this burst. To configure this field into the OCP, use the resplast parameter.
ECET
39
DEPT OF ECE
When the field is set to 0, more responses are coming for this burst; when set to 1, the current response is the last in the burst.
Thus the OCP basic block diagram, dataflow signal diagram and its specifications are tabulated and hence give the clear view in designing the Open Core Protocol bus.
4.9 Summary
The literature survey is carried out with merits and demerits of OCP and the signal flow diagram is identified. The specification for the signals shown in the signal flow diagram is identified and its working is explained with the help of its block diagram. The discussion on the overview of the OCP operation was made which includes all the signals involved in the OCP.
ECET
40
DEPT OF ECE
Table 5.1 Input control values Control 000 001 010 011 100 Notations Used IDL WR RD INCR_WR INCR_RD Command Idle Write Read Burst_Write Burst_Read
ECET
41
DEPT OF ECE
Table 5.2 OCP master command (MCmd) values MCmd 000 001 010 Notations Used IDL WR RD Command Idle Write Read
SResp
Notations Used
Response
00
NUL
No Response
01
DVA
5.2 IMPLEMENTATION 5.2.1 Simple Write and Read Operation The simple write and read operation in OCP has the mandatory signals whose specification is mentioned in the Table 2.1. FSM for OCP master The Finite State Machine (FSM) is developed for the simple write and read operation of OCP Master. The simple write and read operation indicates that the control goes to IDLE state after every operation. The FSM for the OCP Master Simple Write
ECET
42
DEPT OF ECE
and Read is developed and is shown in the Figure 3.1. Totally there are four states are available in this FSM such as IDLE, WRITE, READ and WAIT. Basically, the operation in the OCP will be held in two phases. Request Phase Response Phase Initially the control will be in IDLE state (Control = 000) at which all the outputs such as MCmd, MAddr and MData are set to dont care. The system will issue the request to the master such write request which leads to the WRITE state (Control = 001). In this state, the address and the data will be given to the slave that is to be written and hence the process will get over only when the SCmdAccept is asserted to high. If SCmdAccept is not set, this represents that the write operation still in process and the control will be in the WRITE state itself. Once the write operation is over the control will go to the IDLE state and then it will check for the next request.
IDLE
Control = WrReq SCmdAccept=1 SCmdAccept=0 SResp = DVA SCmdAccept=0 Control = RdReq
WRITE
MAddr, MCmd & MData
SCmdAccept=1 & SResp != DVA
READ
MAddr & MCmd
WAIT
Data_out = SData Figure 5.4 FSM for OCP master - simple write and read When the read request is made, the control will go to the READ state (Control = 010) and the address is send to the slave which in turn gives the SCmdAccept signal that ends the request phase. Once the SCmdAccept is set and SResp is not Data Valid (DVA), the control will go the WAIT state and will be waiting for the SResp signal.
ECET
43
DEPT OF ECE
When the read operation is over which represents that the SResp is set to DVA and the data for the corresponding address is taken. Hence the SResp signal ends the response phase and the control will go the IDLE state, then checks for the next request. FSM for OCP slave The FSM for the OCP Slave which has the simple write and read operation is developed and is shown in the Figure 3.2.
Figure 5.5 FSM for OCP slave - simple write and read The slave will be set to the respective state based on the MCmd issued by the master and the output of this slave is that the SCmdAccept and SResp. Initially control will be in the IDLE state and when the master issues the command as write request, and then the control will go the WRITE state in which the data will be written to th
IDLE
MCmd = WrReq Store_Mem = MData MCmd = RdReq
WRIT E
SCmdAccept=1 & Sresp = NULL
SData = Store_Mem
REA D
SCmdAccept=1, Sresp = DVA & SData
e corresponding memory address location which is sent by the masters. Once the write operation is finished, the SCmdAccept signal is set to high and is given to the master. When MCmd is given as read request, then the control will move to the READ state in which the data will read from the particular memory address location that is given
ECET 44
DEPT OF ECE
by the master. Hence the SCmdAccept is set to high and the SResp is set to the DVA which represents that the read operation over and control goes to the IDLE state.
Simulation result for simple write and read The above developed FSM for the OCP Master and Slave which supports the simple write and read operation is designed using VHDL and is simulated. The designed OCP master and slave are integrated as a single design and is simulated waveform represents the complete transaction of simple write and read operation from master to slave and vice-versa which is shown in Figure 3.3.
ECET
45
After IDLE, control checks for next request and goes to READ state In IDLE, MAddr and MData are set to Dont Cares MCmd, MAddr and MData are asserted
DEPT OF ECE
After READ, again goes to IDLE State
Figure 5.6 Waveform for OCP master and slave - simple write and read
The integrated OCP master and slave is simulated which clearly explains the operation of the FSM developed for the simple write and read. The input data is written in the given 0th and 3rd address memory location during write operation and is read out by giving the corresponding address during the read operation.
ECET
46
DEPT OF ECE
5.1. 2 Burst Operation The burst signals that are used for the burst extension in OCP are tabulated in the table 2.2 with the specification. The main advantage of this burst extension is that the address will be generated with respect to the burst length mentioned. This enables the automatic generation of the address and acts as a major advantage of OCP. FSM for OCP master The FSM for the OCP Master which supports the burst extension is developed with respect to its functionality and is shown in Figure 3.4.
IDLE
Control = WrReq SCmdAccept=1 & (Count = BurstLength) Control = RdReq
SCmdAccept=0
SCmdAccept=0
WRITE
SCmdAccept=1 & (Count != BurstLength)
READ
MAddr, MCmd & MBurstLength
SCmdAccept=1 & SResp != DVA SResp = DVA & (Count != BurstLength)
WAIT
Data_out = SData
Figure 5.7FSM for OCP master burst operation Note In the FSM with burst extension shown in Figure 3.4, the transition occurs for the condition (Count = BurstLength) will be the same when the condition (Count != BurstLength). The only difference will be coming in the Address Generation. When the condition (Count = BurstLength) is set, then the address will be generated from the starting location and when the condition (Count != BurstLength) is set, then the address will be generated from the previous location.
ECET
47
DEPT OF ECE
The basic operation for this burst extension remains the same as previously developed FSMs. Initially control will be in IDLE state and goes to the WRITE state when the write request is given. In this burst extension, the mandatory signal to be present is burst length which says the number of transfers in a burst. The counter is implemented in this operation which will start the count and hence the address generation will be started. When SCmdAccept is set to high, control check for the count value reaches the burst length. If not, the address generation will be continued and if count reaches burst length, the count is reset to zero and hence the address generation will start from initial location mentioned. Similarly, the control will be in READ state for the read request in which the count is process and when SCmdAccept is set to high, control goes to the WAIT state. In WAIT state, the count process will not be done i.e. the count process will be paused and hence the address generation will also be stopped. Once the Sresp is set to DVA, then the count process is continued which leads the address generation to be continued. The corresponding data for the generated address will be read from the memory and is sent to the master through the SData signal. Thus, after every burst operation i.e. either write or read, the control will goes to the idle state and then the next request will be checked by the control and will be performed according to it. FSM for OCP slave The FSM for the OCP slave with the burst extension is developed and is shown in the Figure 5.5. Note The transition occurs for the condition (Count = BurstLength) will be the same when the condition (Count != BurstLength). The only difference is the Address Generation. When the condition (Count = BurstLength) is set, then the address will be
generated from the starting location and when the condition (Count != BurstLength) is set, then the address will be generated from the previous location or value.
ECET
48
DEPT OF ECE
The Initial state will be IDLE and when the MCmd is set to write request, the control will go the WRITE state. Here the burst length and the count are declared because the slave may not know whether burst extension is enabled. The generated address and the input data will be given to the slave and it will store the data to the corresponding address and assert the SCmdAccept signal to high. Then the control will check for the count and the next MCmd request and will process according to it.
IDLE
MCmd = WrReq Store_Mem = MData & (Count = MBurstLength) MCmd = RdReq
WRITE
SCmdAccept=1 Count != MBurstLength
READ
SCmdAccept=1, Sresp = DVA & SData Count != MBurstLength
Figure 5.8 FSM for OCP slave burst operation When the MCmd has read request, then the control will go to READ state and the corresponding data for the generated address will be read during which SCmdAccept is set to high. Once the read process over, SResp will be set to DVA and will check for both count and next request.
6. SYNTHESIS RESULTS
ECET
49
DEPT OF ECE
Internal Diagram
ECET
50
DEPT OF ECE
ECET
51
DEPT OF ECE
ECET
52
DEPT OF ECE
DEPT OF ECE
6.3 Final Report Final Results RTL Top Level Output File Name Top Level Output File Name Output Format Optimization Goal Keep Hierarchy : NGC
ECET
54
DEPT OF ECE
Design Statistics # IOs Cell Usage : # BELS # # # # # # # # # # # # # # # # # # # # # # # # # BUF GND INV LUT1 LUT2 LUT2_D LUT2_L LUT3 LUT3_D LUT3_L LUT4 LUT4_D LUT4_L MUXCY MUXF5 MUXF6 MUXF7 MUXF8 VCC XORCY FD FDE FDR FDRS FDS : 7818 :7 :1 :8 : 52 : 100 :5 :6 : 2314 : 48 :7 : 2783 : 270 : 21 : 71 : 1165 : 512 : 256 : 128 :1 : 63 : 2368 : 2123 : 190 : 21 :1 : 33 : 42
# FlipFlops/Latches
ECET
55
DEPT OF ECE
:1 :1 : 41 : 33 :8
6.4 Device utilization summary: Selected Device : 3s500efg320-5 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of IOs: Number of bonded IOBs: Number of GCLKs: 42 42 out of 1 out of 232 24 18% 4% 2947 out of 4656 63% 25% 60%
6.5 Timing Summary: --------------Speed Grade: Minimum period: 13.413ns (Maximum Frequency: 74.555MHz) Minimum input arrival time before clock: 8.467ns Maximum output required time after clock: 5.184ns Maximum combinational path delay: No path found
ECET
56
DEPT OF ECE
6.6 Summary Based on the literature review, the working of OCP masters and slaves is made clear and on identified specifications the design is made. Initially the FSMs are developed for both master and slave of OCP separately which includes simple write and read operation and burst operation. The modelling of the developed FSMs of OCP are made using VHDL. Finally the OCP is designed in such a way that the transaction between master and slave is carried out with proper delay and timings. The screen shots of the simulated waveform results are displayed and are explained with respect to the design behaviour.
ECET
57
DEPT OF ECE
CHAPTER 7 RESULTS
ECET
58
DEPT OF ECE
Burst Length is given as 8 which represents that a burst has 8 Data transfers
The Write process enables count which is incremented when SCmdAccept is set to high
The generated Address and the corresponding input data is assigned to the MAddr and MData
Slave writes the input data to corresponding generated Address Memory Location
Figure 7.1 Waveform for OCP master and slave burst write operation The simulated waveform for the burst read operation is shown in the Figure 3.6 which represents that for the generation of the address in sequence, the corresponding data that stored in the memory are read out. Here also the count is implemented which will be incremented with respect to the burst length. The master and slave will go to the IDLE state when the burst operation got over which can be indicated by the count i.e.
ECET
59
DEPT OF ECE
when the count reaches the burst length given, it got reset and hence the address will be generated from the initial value.
Figure 7.2 Waveform for OCP master and slave burst read operation Thus the simulation results represents the operation of the developed FSMs for the master and slave that supports the simple write and read operation, pipelining operation and finally the burst operation.
ECET
60
DEPT OF ECE
Figure 7.3 Waveform for OCP master and slave burst read 8 operations
ECET
61
DEPT OF ECE
Figure 7.4 Waveform for OCP master and slave burst read 16 operations
ECET
62
DEPT OF ECE
Figure 7.5 Waveform for OCP master and slave out of order 4 operation
ECET
63
DEPT OF ECE
Figure 7.6 Waveform for OCP master and slave out of order 8 operation
ECET
64
DEPT OF ECE
Figure 7.7 Waveform for OCP master and slave out of order 16 operation
ECET
65
DEPT OF ECE
Figure 7.8 Waveform for OCP master and slave pipelined operation
ECET
66
DEPT OF ECE
8.1 Conclusion
Cores with OCP interfaces and OCP interconnect systems enable true modular, plug-and-play integration; allowing the system integrators to choose cores optimally and the best application interconnect system. This allows the designer of the cores and the system to work in parallel and shorten design times. In addition, not having system logic in the cores allows the cores to be reused with no additional time for the core to be re-created. Depending upon the real time application these intellectual properties can be used. The basic aim of our project is to model the master and slave of OCP and we have successfully modeled both MASTER and SLAVE along with internal memory design using VHDL. The simulation result shows that the communication between different IP cores using OCP is proper. All of the commands and data are successfully transferred from one IP core to the other IP core using OCP. There is no loss of data or control information. The OCP supports the simple write and read operation and the burst extension. Based on the result obtained, the burst extension is seen to automate the address generation. The initial address alone is provided to the protocol. The Various Scenarios for each component in the OCP design are verified effectively during the simulation wit respect to its behavior.
ECET
67
DEPT OF ECE
ECET
68