Sei sulla pagina 1di 61

Modelling and Prototyping of a Network on Chip

Rickard Holsmark Magnus Hgberg

Master of Science Thesis 2002


ELECTRONICS

Modellkonstruktion och Prototypframtagning av ett Ntverk p Kisel

Rickard Holsmark Magnus Hgberg


Detta examensarbete r utfrt vid Ingenjrshgskolan i Jnkping inom mnesomrdet Elektroteknik. Arbetet r ett led i en 60 pongs magisterutbildning. Frfattarna svarar sjlva fr framfrda sikter, slutsatser och resultat. Handledare: Shashi Kumar Omfattning: 20 p (D-niv) Datum: Arkiveringsnummer:

Postadress: Box 1026 551 11 Jnkping

Besksadress: Kyrkogatan 15

Telefon: 036-15 77 00 (vx)

Telefax: 036-12 00 65

Abstract
This thesis describes a design flow of a Network on Chip (NoC), which could be a solution for communication in future System on Chip (SoC). The time span in which this is thought to be commercial is 5 to 10 years. Because of the lack of information on performance of various NoC configurations, one important purpose of the design phase is to make a system- level model that can be used for performance simulations. To build the system- level model the programming language SDL has been used. A discrete event simulator for the SDL model has been used for the simulations. The NoC is designed as a packet switched network, with micro-routers placed in a two-dimensional m*n mesh, in this case 4*4 that equals to 16 micro-routers. Every router has a connection for a resource, which could be for instance a processor, memory or an FPGA. Another objective is to make a prototype of a NoC in an FPGA. For that purpose VHDL has been used to describe the circuit at a synthesizable le vel of abstraction. It is concluded that it is useful and relatively easy to use SDL for making performance simulations of a NoC and use these to draw conclusions of design questions. For example, the results of the simulations showed that increasing the buffer of a switch output from 2 to 3 packets only marginally have an effect of the performance. When the behaviour and structure is described in SDL it also helps as a template to the design in VHDL. A small working NoC prototype has been built on FPGA and tested using the serial port of a PC.

Acknowledgements
We like to thank Professor Shashi Kumar for the invaluable guiding in this new NoC world. Alf Johansson, programme coordinator has been very helpful and for that we are much grateful. Magnus want to thank his apartment friends for putting up with the unwashed plates. Rickard sends a special thanks to his wife and daughter and promises to spend more time at home.

Sammanfattning
Detta dokument beskriver ett designarbete som behandlar Network on Chip (NoC), vilket r en mjlig lsning fr kommunikationen i framtida System On Chip (SoC). Denna lsning finns inte p marknaden men r tnkt att kunna anvndas kommersiellt om 5-10 r. Designen r byggd som ett paketvxlat ntverk, med 16 mikroroutrar placerade i en tvdimensionell 4*4 matris. Varje router har en koppling till en resurs som till exempel kan anvndas fr en processor, minne eller FPGA. P grund av att det lilla utbud av prestandamtningar av olika NoC konfigurationer r simuleringar en viktig del av designfasen. Fr att gra dessa simuleringar har programmeringssprket SDL anvnts. Efter det har en VHDL beskrivning av kretsen gjorts fr att kunna gra en implementering i en FPGA. Det konstateras att SDL r relativt enkelt att anvnda fr prestandamtningar p ett NoC. Det r sedan mjligt att anvnda dessa fr designavvgningar. Till exempel visar resultaten p att det inte lnar sig att utka utbufferternas storlek frn 2 till 3 paket, eftersom detta endast har en marginell effekt p prestandan. Nr beteendet och strukturen r beskriven i SDL hjlper det ocks till som ett std fr konstruktionen i VHDL. En liten NoC prototyp har implementerats i en FPGA och testats via serieporten p en PC.

Key words
Core Based Design FPGA Network on Chip (NoC) On Chip Communication Packet Switched Network SDL System on Chip (SoC) VHDL

List of Contents

List of Figures..................................................................................................................................... 6 1 Introduction............................................................................................................................ 7

1.1 System on Chip ........................................................................................................................ 7 1.2 Network on Chip ...................................................................................................................... 7 1.2.1 NoC Evaluation Tools...................................................................................................... 8 1.3 SDL as a modelling platform................................................................................................... 8 1.4 Objectives of the project .......................................................................................................... 9 1.5 Outline...................................................................................................................................... 9 2 Theoretical Background...................................................................................................... 10 2.1 Communication Networks ..................................................................................................... 10 2.1.1 Communication Techniques........................................................................................... 10 2.1.2 OSI Model ...................................................................................................................... 10 2.1.3 Routers ........................................................................................................................... 11 2.1.4 Buffers ............................................................................................................................ 12 2.1.5 Topologies ...................................................................................................................... 12 2.2 Network on Chip Concepts .................................................................................................... 13 2.2.1 Survey of Network on Chip Ideas................................................................................... 13 2.3 System Level Design ............................................................................................................. 14 2.3.1 SDL................................................................................................................................. 14 3 NoC: Design Decisions ......................................................................................................... 17 3.1 Design Methodology.............................................................................................................. 17 3.2 Network configuration........................................................................................................... 17 3.3 Route and Switch function..................................................................................................... 18 3.4 Connections between Nodes .................................................................................................. 18 3.4.1 Drop of Packets.............................................................................................................. 19 3.4.2 Physical issues ............................................................................................................... 19 3.4.3 Data-link layer connection............................................................................................. 19 3.4.4 Network layer connection .............................................................................................. 19 3.4.5 Transport layer .............................................................................................................. 20 3.5 Packet structures .................................................................................................................... 20 3.6 Buffers.................................................................................................................................... 21 3.7 Routing Algorithm ................................................................................................................. 22 3.8 RNI function .......................................................................................................................... 22 3.9 Resource................................................................................................................................. 22 3.10 Connection to Environment ................................................................................................... 23

NoC: Modelling in SDL ....................................................................................................... 24

4.1 Requirements of Model.......................................................................................................... 24 4.2 System Structure .................................................................................................................... 24 4.2.1 Design Blocks................................................................................................................. 24 4.2.2 Parameterised Mesh size ............................................................................................... 24 4.2.3 Micro-Router.................................................................................................................. 26 4.2.4 RNI ................................................................................................................................. 27 4.2.5 Resource......................................................................................................................... 28 4.2.6 Description of Common Types....................................................................................... 28 4.3 Design Tool............................................................................................................................ 29 4.3.1 Simulation Tool .............................................................................................................. 29 4.4 Simulation Set-up................................................................................................................... 29 4.5 Simulation Results ................................................................................................................. 31 4.5.1 Simulations with equal delay of Switch and Buffer ....................................................... 31 4.5.2 Simulations with unequal delay of Switch and Buffer ................................................... 39 4.6 Chapter Discussion ................................................................................................................ 42 5 NoC: Hardware Design ....................................................................................................... 44 5.1 Model Requirements.............................................................................................................. 44 5.2 Design Structure..................................................................................................................... 44 5.2.1 Micro-Router.................................................................................................................. 45 5.2.2 RNI ................................................................................................................................. 47 5.2.3 Resource......................................................................................................................... 47 5.3 Design and Simulation Tool .................................................................................................. 47 5.4 Simulation Results ................................................................................................................. 48 5.4.1 Simulated values ............................................................................................................ 48 5.4.2 Simulated and Implemented values................................................................................ 50 5.5 Chapter Discussion ................................................................................................................ 50 6 NoC: Prototyping on FPGA................................................................................................ 51 6.1 Prototype Board ..................................................................................................................... 51 6.2 Functional Description........................................................................................................... 51 6.2.1 Communication.............................................................................................................. 51 6.2.2 Resources ....................................................................................................................... 51 6.2.3 I/O-ports......................................................................................................................... 51 6.3 Technology Mapping tool...................................................................................................... 52 6.4 Implementation Result ........................................................................................................... 53 6.5 Chapter Discussion ................................................................................................................ 53 7 7.1 7.2 7.3 8 9 Results ................................................................................................................................... 54 SDL Modelling and Simulation of NoC ................................................................................ 54 Designing NoC using VHDL................................................................................................. 54 Implementation of NoC prototype in FPGA.......................................................................... 55 Conclusions ........................................................................................................................... 56 Vocabulary ............................................................................................................................ 59

List of Figures
FIGURE 1-1. RESOURCES IN A NOC FIGURE 2-1. LAYERS IN THE OSI-MODEL FIGURE 2-2. NETWORK TOPOLOGIES. FIGURE 2-3. A SIMPLE SDL SYSTEM. FIGURE 3-1. DESIGN REFINEMENT. FIGURE 3-2. BLOCKS IN MICRO-ROUTER FIGURE 3-3. LAYERS IN THE NOC FIGURE 3-4. PACKET STRUCTURES IN DIFFERENT LAYERS. FIGURE 3-5. RNI INTERFACE FIGURE 3-6. RESOURCE FIGURE 3-7. CONNECTIONS TO ENVIRONMENT FIGURE 4-1. A NODE IN THE NOC FIGURE 4-2. INTERNAL BLOCKS OF A NODE FIGURE 4-3. SDL BLOCKS IN THE NETWORK LA YER. FIGURE 4-4. NETWORK OVERVIEW FIGURE 4-5. TABLE OF RESOURCE CONFIGURATION FIGURE 4-6. TRANSFER STATISTICS FOR 1 CONTINUOUS AND 15 BURSTY RESOURCES FIGURE 4-7. TRANSFER MEAN TIM E FOR 1 CONTINUOUS 15 BURSTY RESOURCES FIGURE 4-8. SPREADING FACTOR WITH 1 CONTINUOUS AND 15 BURSTY RESOURCES FIGURE 4-9. SIMULATION SET-UP FOR 16 VS. 14 BURSTY RESOURCES FIGURE 4-10. NUMBER OF TRANSFERRED PACKETS, 16 VS. 14 BURSTY RESOURCES FIGURE 4-11. NUMBER OF CANCELLED PACKETS, 16 VS. 14 BURSTY RESOURCES FIGURE 4-12. NUMBER OF DROPPED PACKETS, 16 VS. 14 BURSTY RESOURCES FIGURE 4-13. TRANSFER MEANTIME, 16 VS. 14 BURSTY RESOURCES FIGURE 4-14. SIMULATION RESULTS WITH DIFFERENT BURST LENGTH FIGURE 4-15. TRANSFER STATISTICS FOR 1 CONTINUOUS AND 15 BURSTY RESOURCES FIGURE 4-16. TRANSFER MEAN TIME FOR 1 CONTINUOUS AND 15 BURSTY RESOURCES FIGURE 4-17. NUMBER OF TRANSFERRED PACKETS, 16 VS. 14 BURSTY RESOURCES FIGURE 4-18. NUMBER OF CANCELLED PACKETS, 14 VS. 16 BURSTY RESOURCES FIGURE 4-19. NUMBER OF DROPPED PACKETS, 14 VS. 16 BURSTY RESOURCES FIGURE 4-20. TRANSFER MEANTIME, 14 VS. 16 BURSTY RESOURCES FIGURE 4-21. SIMULATION RESULTS WITH DIFFERENT BURST LENGTH FIGURE 5-1. VHDL BLOCK MODEL OF NOC AT NODE LEVEL FIGURE 5-2. VHDL MODEL OF NOC AT NETWORK LAYER FIGURE 5-3. VHDL BLOCK MODEL OF RNI AND A RESOURCE FIGURE 5-4. THE COMMUNICATION PROCESS BETWEEN A SWITCH AND BUFFERS FIGURE 6-1. OVERVIEW OF NETWORK ON CHIP PROTOTYPE IN FPGA FIGURE 6-2. BITS IN IMPLEMENTATION FIGURE 6-3. IMPLEMENTATION RESULT FIGURE 7-1. DESCRIPTION OF COMMUNICATION IN NOC-PROTOTYPE. 8 11 12 15 17 18 20 21 22 23 23 25 25 26 30 31 33 33 34 35 35 36 36 37 38 39 40 40 41 41 41 42 45 46 47 49 52 52 53 55

Introduction

1.1 System on Chip


With the increase in the number of transistors that can be fabricated onto a single chip, the System on Chip (SoC) concept has become possible. B efore this the chip design was aimed at producing chips that contained a single stand-alone design. What SoC means is that multiple stand-alone VLSI (Very Large Scale Integration) designs are stitched together on a chip to provide one functional system. When used in SoC the stand-alone designs are often referred to as Cores or Intellectual Property (IP) blocks. There are several vendors that only design cores but let other companies buy these and manufacture the chips. A SoC design may thus contain cores from different vendors.

1.2 Network on Chip


Network on Chip (NoC) is one solution for designing communication among components in the SoC circuits with several billion transistors, that will reach the market in approximately 5-10 years from now. One reason for a new communication strategy is, that it is to costly to use one global synchronous clock in a circuit, since it will take several clock pulses just for a signal to travel across the chip. As a result of this the designs will approach a model that is locally synchronous in each component, but globally asynchronous on the chip. This is called the GALS (Globally Asynchronous Locally Synchronous) paradigm. A solution that exists in todays SoC is to have dedicated buses between the communicating resources. This will not give any flexibility since the needs of the communication, in this case, have to be thought of every time a design is made. Another possibility is the use of common buses, which have the problem that it does not scale very well, as the number of resources grow. NoC is intended to solve the shortcomings of these, by implementing a communication network of switches/microrouters and resources. Research on NoC is now expanding very rapidly, and there are several companies and universities that are involved. The KTH/VTT Project Group, which this project is mostly influenced by, has suggested a mesh where each micro-router is connected to one resource and four other microrouters. The resources can be IP-cores or in-house developed designs. Examples could be Signal Processors, RAM, FPGA or any kind of custom hardware block. Figure 1-1 shows how a NoC, in comparison with shared buses, could be occupied with various components as resources. RNI stands for Resource Network Interface and is a component that adapts the communication requirements of the resource to the network protocol.

Figure 1-1. Resources in a NoC

1.2.1 NoC Evaluation Tools Today the development of NoC is focused on developing a suitable ne twork configuration. The ideas have to be tested and therefore tools for evaluation of a NoC design have to be considered. If a design of a NoC should be able to claim having some degree of efficiency, this would have to be supported by performance simulations. There are several network simulators available, for example NS-2 [15] has been used for this purpose. NS-2 was however, not designed to be used for NoC and the configuration possibilities seem unable to meet the requirements of a NoC model. Another idea is to use an ordinary high- level programming language, like C++ to build a simulator. Here there is, of course, a possibility to make it as accurate as the developer wants, but in turn it will take a lot of time to develop. The idea of this project is to use a system- level description language to build a model that will meet the requirements of a NoC simulator, in order to make the results valid for a specific design.

1.3 SDL as a modelling platform


The Specification and Description Language (SDL) is suited for describing reactive discrete systems. A NoC can be characterised as such a system, for example a micro-router reacts on a discrete event like a packet arrives. After that it makes a decision and perform some action according to this. Because of the high level of description and possibility to specify timing properties, it is relatively easy to build a model of a complex system. It is then possible to use a discrete event simulator to simulate and examine the behaviour of the system, before it is implemented.

SDL supports division of the system in hierarchy using blocks, which can be used to describe functional or physical units. The behaviour of the system can be specified using concurrent processes. The structural properties of the language, makes it possible to use a model in order to simplify the lower levels of the system design.

1.4 Objectives of the project


Since the NoC concept is relatively new, research in this area concentrates on making investigations on some small detail of the subject. When studying the published material it was discovered that nobody has tried to model a NoC using SDL. There are also no models in VHDL presented that make use of buffers in the micro-routers in the NoC. In order to investigate this, the first objective of the project is to model and simulate a packet switched NoC with buffered micro-routers using SDL. Communication between micro-routers shall use a layered design similar to the OSI-model. After this a VHDL description of the NoC shall be developed and simulated. This will give information about timing properties in the NoC. The model should be synthesizable and be possible to implement in an FPGA. The third and last objective is to make a prototype of a NoC-system in an FPGA and verify the design. The resources could be very simple because of the limited space in the circuit, but it shall prove that the model works.

1.5

Outline

In this chapter the reader is introduced to the NoC concept and its motivation. Here the purpose of the project is also defined. Chapter 2, entitled Theoretical Background, describes theories about network in general and some different ideas on how a NoC could be designed. The purpose of this is for the readers understanding of the area and to show that there exist many different aspects to consider when designing a network. The theories presented here are also the basis that the design is built upon and the reader should after reading this chapter be able to understand the design decisions and limitations of the model. In chapter 3, entitled NoC: Design Decisions, the overall decisions regarding all the stages of the design are presented and motivated. Chapter 4, entitled NoC: Modelling in SDL, describes the system level design and functionality of NoC in SDL. Chapter 5, entitled NoC: Hardware Design, discusses VHDL Design of various components of a NoC system. In chapter 6, entitled NoC: Prototyping on FPGA, we discuss issues of implementing a small prototype of NoC on a programmable platform like FPGA. The last three chapters also show part results and conclusions about NoC during various design phases. Chapter 7, entitled Results, presents the overall results and it is a summary of the obtained results in the previous chapters. In chapter 8, entitled Conclusions, some important thoughts about the project and the results are discussed and proposals for future work are also given.

Theoretical Background

This chapter describes the theoretical background and programming language on which the project is based on. It also gives a brief description about what has been presented in the NoC area by other researchers.

2.1 Communication Networks


2.1.1 Communication Techniques The two main techniques for directing the communication in a network are called packet switching and circuit switching. Packet switching works as follows, when the router receives a packet it looks at the destination address and then tries to forward it in that direction. If impossible, for some reason, to send it to the best way, the packet may be redirected, dropped or buffered. The Internet is a good example of a network that uses this type of switching. Circuit switching is another type of switching. The main idea behind this is to set up a physical or logical connection between the nodes that wants to communicate. An example of this is the Time Division Multiplexing (TDM) systems were nodes communicate with each other in specified timeslots. 2.1.2 OSI Model International Standards Organisation (ISO) has presented a reference model called Open Systems Interconnection (OSI) [5], [7]. The concept of this model is to separate the different functions of a network into 7 layers, see Figure 2-1, where each layer performs a certain service in the network. For each layer the function of layer below is not visible. The layer in one node communicates as if it was directly connected to the same layer in the node it is communicating with. This is called a logical connection and the means of communication are called peer protocols. Since it is not possible for a layer, except the physical, to reach its peer physically it must make use of the service that a lower layer performs. Consider a situation if a packet from the transport layer is to be sent to another node in a network. The transport layer service can not itself find the way to that node so the packet is sent to the network layer were it is attached with an address that corresponds with the receiving node. However the network layer has no protocol for sending the packet to another node. It must send the packet via the data link layer were the packet is framed and error detection code is added. The data link layer then takes the service of the physical layer to transmit the frame to the next node that the network layer decided. At the next network node the packet will be unpacked to the network layer and if it is not the destination it will be re-routed and sent out on the network again. This will continue until the preferred destination is reached or the packet is dropped.

10

Application

Application

Presentation

Presentation

Session

Session

Transport

Transport

Network

Network

Data Link

Data Link

Physical

Physical

Figure 2-1. Layers in the OSI-model

2.1.3 Routers In a switched network there is a need to find a route through several switches. Therefore the switches that are cross-points in the network also implement a routing function. They are called routers because of this functionality. The routing algorithm is a very important part of the router since its task is to route every packet towards the right direction. Some routing algorithms are able to tell which route is the fastest, not only, which way that is the shortest. The two main kinds of routing algorithms are static and adaptive routing. Static routing is when there are one, or possibly a few paths between sender and receiver that are fixed. In static routing algorithm, the routing changes very slowly, if at all, over time. When the routing is changed it is often a result of human intervention. Adaptive, also called dynamic routing, on the other hand is when the routing algorithm alters the route of packets in a dynamic way. A dynamic routing algorithm changes the routes according to, for example, network traffic or due to changes of the topology. A global routing algorithm has complete information about connectivity and link costs in the network. The algorithm can thereby compute the least-cost path between source and receiver. The calculation itself can be run at one site or at multiple sites. Decentralized routing algorithms calculate the least-cost path by communicating with its neighbours. In the beginning the node only knows the costs of its own directly attached links, then through an iterative process of communication between nodes, the least cost path to a destination is calculated. 11

2.1.4 Buffers If buffers are added to a switch, in order to store packets when at times the network is overloaded, the possibility that packets will be dropped decreases. Some switches use only a single output buffer and multiple input buffers, which can cause the problem called head of line blocking[6] often seen in such switches. This fault appears when the first message in the FIFO queue on the input buffer cant be sent, because its desired output is not available. The next packet cannot pass through the line, since it is waiting for the packet first in line to be sent. Multiple output buffers with single input buffers do not suffer from this kind of problem. The main drawback with these is however that packets may be rejected if the rate of transmission to the router is higher than the router can handle. Another method is to use shared memory in the switch. The problem with this type is that it may result in slower system since there have to be some amount of synchronization and organization of memory access. 2.1.5 Topologies Network topology refers to the shape of the network. How the different nodes in a network are connected to each other and how they communicate are determined by the network's topology.

Full Mesh

Star

Ring Figure 2-2. Network topologies.

Bus

Mesh topology comes in two types. They are full mesh and partial mesh. Full mesh means that a node is connected to every other node in the network, this is a very costly method and mostly used to connect busses. Partial mesh means that a node doesnt have to be directly connected to all other nodes. This type of mesh is not as costly as full mesh, but the disadvantage is less redundancy. 2D-array is a type of mesh in which nodes form a two dimensional grid where each node is connected to the four adjacent routers. The routers at the edges have only two or three connections since the y dont have more adjacent routers. The number of nodes will then become CxR where C is the number of columns and R is the number of rows. Torus is a topology, which is similar to the 2D-array in which nodes form a regular cyclic 2dimensional grid. Here all routers have four connections since a torus basically is a mesh with wrap-around on the edges. Star topology uses a central hub to which all recourses are connected. All communication between resources is then passed through the central hub. Ring topology when the resources are connected to each other in a ring. Every resource is then connected to its two neighbours communication with other resources then has to pass through the neighbours. 12

Bus topology means that several resources use the same communication channel. In an ordinary local area network this can results in collisions, caused by two resources sending a packet at the same time. If you want to avoid collisions it is a possible to let the resources send their packet in a time slot, which is unique for each resource.

2.2 Network on Chip Concepts


Many researchers are investigating how to design a network on chip. The issues cover aspects like design methodology, topology, physical constraints and switching techniques. The following subsection gives some examples on ideas that other researchers have published. 2.2.1 Survey of Network on Chip Ideas Phillips [2] The increasing complexity of designs will promote the use of Intellectual Property (IP) blocks, which will lead to that a system becomes a composition of heterogeneous blocks. The challenges for system design will shift from design of computation, the IP blocks, to communication and storage. As the relative cost for wire network will increase, an increased efficiency in their usage is suggested by introducing routers. To get a flexible and efficient solution the network will have to support at least two traffic classes called Best Effort (BE) and Guaranteed Throughput (GT). The router suggested is packet switched with input queuing for BE traffic and time division multiplexing for GT traffic. Stanford University [3] The limiting factor of achieving the desired SoC is meant to be the interconnection of the components. To solve this it is suggested that the layered design methodology should be used. The authors propose to borrow design methods from the network design field and view the SoC as a micro network of components. Each level in the micro network stack is then optimised for the target application domain. Network reconfigurability will serve for the flexibility where plug and play components can interact with each other through reconfigurable protocols. The laws of nature will require a paradigm shift in the design of the physical layer. Because of the length of the global wires in the networks that will result in that they function as lossy transmission lines. It may be to costly to try to design ideal wires like today, but instead there will be necessary to accept that wires on the chip are not fully reliable and deal with that fact at a higher la yer of abstraction. The conclusion is that the layered micro network approach is likely the only approach to master the complexity of future SoC. KTH/VTT Project Group [1] The NoC platform includes both architecture and design methodology. The architecture is a twodimensional mesh of switches with slots for resources. The architecture is the communication infrastructure including the physical, data-link and network layer of the OSI protocol stack. Deriving the concrete architecture is the first of the two phases that is the foundation of the design methodology. The second phase is the mapping of applications that together with the architecture form a complete product. It is assumed that, based on calculations of the switch size that the physical bus width between switches can be about 300 wires. Therefore, it is possible to route 128 wires in each direction between neighbouring switches. 13

The general network is called CLICH (Chip-Level Integration of Communicating Heterogeneous Elements). For more special purposes were performance is of more importance the network may have to support the concept of regions. These regions will not necessary have the same structure and communication mechanisms as the rest of the network.

2.3 System Level Design


If the construc tion of a system is going to be done in a way that resources are effectively used, the implementation issues should not be considered in the beginning. In this stage, called specification part, focus should be concentrated on fulfilling the requirements and specify the functions without telling if they are implemented in hardware or software. After that the functions are designed and implemented in the most efficient (least costly) way that can fulfil the demands. A design at system level can also be useful if one wants to describe how the system performs its functions in this case with an implementation in mind. The most desired would be that one could make a system level specification and directly out of that get an implementation. At this time, tools are not available to directly go from specification to implementation. Some vendors have tools that can convert SystemC to VHDL and in that way get an implementation in hardware. Tools for SDL can at this time compile to C/C++ code and make an implementation in software. 2.3.1 SDL SDL is an abbreviation of Specification and Description Language and was available in1976. It was used by the telecommunication sector to help in the design of the increasingly complex systems that were created. The language has been cons tantly developed during the years and the last version is updated in 2000. It is an object oriented language and has the possibility to separate the structural, data and behavioural aspects of a system. There are both a graphical and a textual representation of SDL and there is a possibility to translate between these representations. The language is well suited for describing reactive discrete systems. A reactive system can be characterised as a system whose behaviour is dominated by its reactions on actions from its environment. The systems is also discrete if these actions take place in discrete events and not continuously. With SDL there is a possibility to build a model of a complete system and examine the behaviour before its implemented. The structural aspects of a system are described in blocks, which can be hierarchy organised. Behaviour is described in processes and they are not possible to describe in hierarchy but instead all processes in a system are concurrent. Behaviour hierarchy can be described by using procedures in the processes. Communication between processes is made by signals that pass in channels between the communicating processes. Extended Finite State Machines in which transitions are caused by discrete events describe process behaviour. Figure 2-3 shows an example of how a simple system can be modelled. It is describing the behaviour of a slot machine which responds to an input of a coin and either sends back the coin if it is not valid, or puts out a randomly chosen ware if the coin is a ten crown.

14

Figure 2-3. A simple SDL system. In the upper left is the system specification, which in this case contains a block called Ware_Machine and channels that connect it to the environment. On the channel CoinInput there can pass a signal called Coin, which is of the type CoinType that is declared as a new type. The interior of block Ware_Machine is viewed in the upper left and in this case it contains one process called Ware_Machine_Process.

15

In this process the behaviour is described as an EFSM in the lower part of the picture. Let us follow what happens if a coin is inserted in the CoinInput. The EFSM makes a transition and leaves the state Wait_for_Coin and checks CoinSort in the decision box. If it is other than a TenCrown it puts Coin on CoinOutput and returns to Wait_for_Coin. If CoinSort is TenCrown a Task with the ANY operator randomly choose one of the wares, a key ring, a plastic snake or a small doll, declared in WareType. It is assigned to WareSort and put out with the signal Ware on Ware_Output. After this the EFSM returns to Wait_for_Coin.

16

NoC: Design Decisions

Before and during the project time it is necessary to make several design decisions since detail specifications for the project are not defined. The reason for this is that there is not enough information available to make a detailed specification before start.

3.1 Design Methodology


This project consists of three stages ordered from system level down to implementation as viewed in Figure 3-1. The networks behaviour is modelled at the system level in SDL. Certain descriptions of the systems behaviour will come out from the VHDL design. Since the VHDL model will be clock synchronous, the number of clock cycles a function takes, will be used in the SDL model to get a more realistic timing behaviour. The meaning of Figure 3-1 is that the design will be refined and information from a lower stage will be used to improve the model at a higher level.

Figure 3-1. Design Refinement.

3.2 Network configuration


The packet switched method was chosen because of its scalability and flexibility. The drawback of this is that there will be no hard real-time properties in the network. The m*n 2-d mesh structure is used since this makes the routing easy, even if the number of network nodes grow.

17

3.3 Route and Switch function


A micro-router is built according to the OSI-model where different layers perform specified services in it. This gives a ground structure to design upon and provides ability to change the service on one layer without affecting the other. The micro-router should be able to communicate between micro-routers both on network layer and data- link layer. The reason for this is that it should be possible to make a fast simulation in SDL only using the network layer, or to make a more accurate simulation with both layers. The micro-router itself is divided into several functional blocks showed in Figure 3-2, I/O-buffers for communication with other micro-routers I/O-buffer for communication with the RNI Route-Control unit for switching the packets into the right direction.

Micro-Router
In/Out Buffer

In/Out Buffer

Route-Control

In/Out Buffer

RNI In/Out Buffer

In/Out Buffer

Figure 3-2. Blocks in Micro-Router

3.4 Connections between Nodes


This area can be divided into two parts because of the objectives to both simulate and evaluate, and also to support the VHDL design. These objectives are partly conflicting. If a very detailed design in SDL were made, it would be more time consuming to simulate. On the other hand too few details, may result in that design aspects will be overlooked, and to deal with these in VHDL can be very complex. One solution to solve such a problem was that there should be a design, that were possible to simulate both down to the data- link layer and on the network layer. Since data- link layer does not add information on how the routing and buffers work, it would be possible to leave this layer out when making evaluations of network performance.

18

3.4.1 Drop of Packets The connections of the packet transferring components in the design are made in a way that a sender of a packet cannot send, unless it gets a ready to receive signal from the receiver. A receiver does not give this signal if it is full. No packets will therefore be dropped between these components. Drops are allowed in the micro-router when it is impossible to route for a certain amount of time, in order to prevent deadlock or other unwanted behaviour. This strategy is chosen because when the network is heavily loaded, the resources cannot send packets that most likely will be dropped. This brings the information closer to the source and the decision on how to react is made by the resource. 3.4.2 Physical issues In [1] the proposed bus-width is set to 300 wires including address and control wires. This is based on the assumption that the physical size of each side on the router can have this amount of I/O: s. If a bus-width is decided there must be a decision in what direction the wires should be used. There could be a bus were all wires are used by both routers to send and receive data. Another way is to use half of the bus as output on one and input on the other and vice versa. This means that less data will be transmitted in one direction but an easier way to communicate and a possibility to both send and receive at same time. For simulation and implementation in this project it should not be necessary to use such a wide bus, as it would only slow simulation down and take up a lot of space in the FPGA. It is however useful to have a bus-width that is easily controlled and there should be a possibility to set the width at any size. The physical layer defines the actual physical connections that are the basis for a network. Since the implementation is intended for an existing FPGA with a fixed structure the advantages of modelling at the physical layer in SDL is not clear. It is therefore not included in the design. 3.4.3 Data-link layer connection The data- link layer deals with how the transfer of data between two micro-routers can be reliable. Data is grouped into frames that may consist of for example a header, payload and checksum. In this design the payload is a packet from the network layer. The transfer of one data-link frame is sent parallel to the neighbouring node. Every frame is checked for errors and retransmission is ordered if it is incorrect. The transfer of bits on the bus has to be synchronised in some way. The method chosen to realise this is by a simple handshaking protocol. For example data will be present on the bus on a write signal from the sender and acknowledged by the receiver with another signal. 3.4.4 Network layer connection This layer regards the transfer of data between any arbitrary nodes in the network. This is done by the routing function in the micro-routers. A message at this layer may consist of several packets that carry an address to its destination. The purpose of the network layer is to make sure that the packets reach their desired address in a way that is decided by the routing algorithm. The network layer is considered not reliable, because there is no confirmation that a message has reached its destination. As mentioned in the beginning of this chapter there should be a possibility to simulate the design without the data- link layer. In this case the connection is directly between the buffers in the opposite nodes. Figure 3-3 gives a graphical explanation. 19

Transport

Transport

Network
Network Connection

Network

Data Link

Data Link

Physical

Physical

Figure 3-3. Layers in the NoC The vertical connection of network and data- link layer is in this case broken. The transfer of data on the network layer is done in the same manner as if it were in the same node, i.e. as a transfer between network and data- link layer. In Figure 3-3 it is viewed as the horizontal network connection. 3.4.5 Transport layer The purpose of the transport layer is to provide a reliable transfer of messages between the resources that use the network. For example if a packet gets dropped in the network, this would be detected and the transport layer would take necessary actions. The micro-routers do not implement the transport layer, since they only operate up to the network layer. In the project design resources are very simplified and transmitting transport layer packets. This leads to that the RNI will not perform transport layer services but will provide the service of the network layer to these packets.

3.5 Packet structures


Every layer has a different packet structure. It is based on what information is needed to perform each layers service. The highest level of packet structure that is defined is the Transport layer. Much of the structure that is used comes from the KTH/VTT project. In Figure 3-4 the packets relations to each other are explained.

20

Figure 3-4. Packet Structures In Different Layers. In the transport layer it is necessary to be able to identify the Destination Process Id (DPID) and Source Process ID (SPID). Every message that a process sends has a Message Sequence Number. If a message is too large to fit into one packet it will be divided into several packets and thus every packet will have a Packet Sequence Number (PSN). The payload in the transport layer is in a real situation a packet from a higher- level service but in this model this will be sent as some test dummy bits. The network layer adds the destination RID with the actual row address and column address of the node in the network were the DPID resides. The payload of this layer is a packet from the transport layer. To make sure that a packet with errors does not stay around in the network there is a possibility of a Hop Counter (HC) that is counted up for each router that it passes. The data-link layer frames consist of a payload in form of a packet from the network layer and an error check field. Example sizes in bits that each layer consists of are shown in Figure 3-4. In the SDL model there is no purpose to limit the size of fields and therefore the type integer is used. The size of each field is limited only in VHDL because of the need to set a fixed size in order to be able to implement it.

3.6 Buffers
Buffers make it possible to store packets for a while without dropping them if it is not possible to transmit them further at the moment. A major design issue is how large buffers are cost effective to be used in a design. The larger the buffer, the higher will the cost be in terms of chip area. To satisfy these conflicting demands, it is decided to use a maximum size of 4 buffers for our evaluating purposes.

21

3.7 Routing Algorithm


A simple dynamic routing algorithm makes the route decisions. It first tries to route the packet in the right direction according to columns, if that is not possible it tries to send the packet in the right direction according to rows. If it is not possible to route the packet in the right direction then it is just sent to any free output-buffer. When there is no output buffer available for a time period then the packet will be dropped. The reason for choosing this type of algorithm is that we dont need to use large tables that will take a lot of resources. It is also not necessary since the structure of the network is known. One of the drawbacks with this kind of routing algorithm is that we cant guarantee any real-time properties in the NOC, but it is small in area and quite fast.

3.8 RNI function


The purpose of the Resource Network Interface (RNI) is to be a link between a micro-router in the network and the resource connected to it. The idea to have a RNI is that there is no need for a resource to implement the network layer. That means that some complexity of the resource will be removed. In the model it is assumed that the transport layer is implemented in the resource part of the interface between the RNI and resource. This is because there is a possibility that every resource will n ot have identical use of this layers service. This means that the RNI will receive transport layer packets, perform network layer service, which is, adding the network address of the desired destination for the transport packet.

RNI In/Out Buffer Network Service Control

Figure 3-5. RNI Interface

3.9 Resource
Because of the need to get good simulation results, the resource in the model are to resemble real resources behaviour. There can be different kinds of resources in a real system, such as DSP: s, general-purpose processors or memory. In VHDL the resource will be made very simple due to lack of time.

22

Resource Resource Behavior Imitation

Figure 3-6. Resource

3.10 Connection to Environment


There are at least two ways in which the network can connect to the environment outside the chip. One is to have dedicated components to connect between edge routers and environment, see picture A in Figure 3-7. The other way is that resource space is used and connected via RNI, see picture B. The first way is a bit more complicated since there will have to be special routers in these connections. Packets should not be routed to the environment if it is not the destination. The second suggestion perhaps will take up unnecessary space but gives a simpler configuration. The proposal in this project is that some resource spaces are used to connect to environment. Making some pins available for resources to connect to can achieve this. In this way there is flexibility to choose both number of and type of connectio n, for example an Ethernet controller.

A) Connecting to I/O-Interface via edge routers

B) Connecting to I/O-Interface via dedicated resources

Figure 3-7. Connections to Environment

23

NoC: Modelling in SDL

There are many ways to look at NoC, but since it is basically a communication network it could be a good idea to divide it into structures using the layered network model. The hierarchical and regular structure of a NoC architecture are quite suitable for simulation using SDL.

4.1 Requirements of Model


The motivation for using SDL in the design process is to make a model of the system to support the design at a lower abstraction level, in this case RTL design using VHDL. This means that the SDL system is designed with a block structure that reflects the component structure in VHDL. Another issue is to make a system with good simulation possibilities. The SDL system should, in this sense, be easy to configure in order to obtain an effective design regarding real resources behaviour.

4.2 System Structure


At the top level in the design the type definitions, signals and configuration constants are set. As an example, the simulation can be performed on different layers. One alternative is to simulate only the behaviour on the network layer, which is accomplished with setting the synonym DLS to 0. If the behaviour of the data link layer also is to be simulated DLS should be set to 1. Another configuration possibility is to choose the size and shape of the network. Set the network row and column size and a mesh of a complete network will be created. The resources can be individually or collectively set to specific behaviour in the resource type. 4.2.1 Design Blocks The major decision for the block division is that a block sho uld encapsulate a specific function of the network. To some extent they should also reflect the physical units that form the network. The design takes frequent use of type definitions. Data-link layer is designed as a separate block that can be optionally connected to the buffers in the network layer connections. 4.2.2 Parameterised Mesh size Since the model is to be used for simulation purposes the network size is an important parameter. In Figure 4-1 the node block is shown. Before a simulation the number of nodes that are the product of the values assigned to row size*column size are created. The connections between the nodes in the network are then automatically set up.

24

Figure 4-1. A node in the NoC The internal structure of the Node block is viewed in Figure 4-2. The IO_Switch is a connection that is only used in the initialisation of the simulation in order to get all the created routers connected appropriately. When this is done this connection is no longer used.

Figure 4-2. Internal blocks of a node

25

4.2.3 Micro-Router The block that is called the Router is a micro-router that is designed with functional blocks dividing the network and data-link layer. Figure 4-3 shows the network layer block consisting of one switch/routing block, 4 in/out buffers, and 1 RNI in/out buffer. This block has two optional modes; it can either communicate with the neighbouring routers directly at the network layer or through the data-link layer.

Figure 4-3. SDL blocks in the network layer.

Switch and Route unit The block SWITCH_CONTROL contains the processes Switch and Route_Control. Though they have a sequential behaviour in this design, the division into two processes have been done in order to make it possible running them independently of each other. For example this is used when the Route_Control is updating the out buffer states. The Switch receives packets from the in-buffers and sends information about the packets desired destination to the Route_Control.

26

The Route_Control has information about the state of the out-buffers and makes a decision about the route for the packet. The first choice is to send it north or south according to the destination. If that is not possible it investigates west or east. If none of the preferred buffers are available it will send to any other free buffer. If there is no free buffer the Switch will be informed about this. When there are free buffers the Switch receives a message, telling in which buffer to put the packet. If at that time there were no free buffer, it starts a counter and sends another request for route of another packet. The counter is reset if a packet is routed to an out-buffer. In the case that the routing request has failed and the counter has reached a timeout value (16 in our case) the packet will be dropped. Buffers The buffers use the BUFFER_TYPE block and handle all communication in one direction (North, South, West, and East). To make it possible to set different configuration for the buffer connecting to RNI this is of a special type called RNI_BUFFER_TYPE. These blocks are also able to communicate with the neighbouring router or RNI, both directly on the network layer and via the data-link layer. Data-link layer The layer is divided into a separate block that connects to the buffers and performs service of the data-link layer. The type used is D_IO_TYPE. 4.2.4 RNI The RNI contains the units that perform the actions of the Resource Network Interface. It encapsulates the block type N_LAYER with processes that perform the network layer services in the RNI. This block operates in two optional modes; it can either communicate with the neighbour router directly at the network layer or through the data- link layer. It contains one RNI service block and one in/out-buffer towards the micro-router. There is also a possibility to use a data- link layer block. RNI service The block RNI_SERVICE is responsible for the transformation of data between the resource and the network. In this design the resource send transport layer (T_Layer) packets In the process RNI_OUT, T_Layer packets are attached with network layer information and sent out on the network. There is a possibility to set a certain delay for the operations in this process. The process RNI_IN the N_layer packets are unpacked to transport layer and sent to the resource. There is also a possibility to set a certain delay for the operations at this stage. Buffer The N_BUFFER_TYPE is used to handle the communication with the network router on the network layer or via the data-link layer. It is similar to the BUFFER_TYPE in the router except that it is of an own type for configuration options. Data-link layer In the block D_LAYER there is one sub-block of the type D_IO_TYPE. 27

4.2.5 Resource The block RESOURCE is a container of the type of resource that is connected. It is modelled using two processes, one each for sending packets and receiving packets. Packet Sender The process Sender simulates the behaviour of the resource. In this model the resource simulates the transport layer service, which result in that data is sent and received as transport layer packets. Between the RNI and the resource there is only a transport layer connection. Adding lower layers here will not give any benefit to evaluate the model. Since the resource and RNI sits in the same node there is no reason for implementing the network layer between these. Different behaviours can be set with procedures for individual resources. We provide a possibility to choose between bursty and continuous base behaviour. With bursty behaviour it means that a certain number of packets is put out after each other with maximum rate, called burst, and after that there is a delay called burst gap before the process repeats itself. The number of packets between the delays is randomly selected according to the Poisson distribution. The burst gap is calculated from a random delay that is uniformly distributed between a minimum and a maximum value. The length of the burst gap is weighted with the number of packets in the burst, which results in that if a big number of packets are sent there will be a longer burst gap. The selection of addresses are random within the limits of the network, but addresses situated next to the sender has double probability to be chosen since this is the most likely scenario for communication. With the continuous behaviour it is possible to set the delay between two packets transmitted which gives the output frequency of packets. Packet Receiver In the Receiver process the time of arrival of the packet is noted and the packet information is written to a file. It is now possible to compare the logged information from the receiver with other values. We can get a lot of information from these logs, for instance number of packets sent/received, transfer time and many other interesting figures. 4.2.6 Description of Common Types The following is a description of some common types defined to model NoC. Block type BUFFER_TYPE Process DATA_IN_TYPE receives data from data- link layer (D_N_TYPE) or the neighbouring routers buffer (DATA_OUT_TYPE) and buffers it. When SWITCH_TYPE is ready to pass it on it is removed from the buffer. DATA_OUT_TYPE receives data from SWITCH_TYPE and buffers it. Data is then passed on either to the neighbouring router (DATA_IN_TYPE) or to the Data-Link Layer (D_N_TYPE) and is thereafter removed from the buffer. Block type D_IO_TYPE This block contains processes, which performs the data-link layer services. The processes in this block will not be active during network layer simulation.

28

The process N_D_TYPE is consuming network layer data packets from network layer, framing them with data link layer information and error check. After that it passes them to the data-link when the receiver is ready to receive. If there is an error introduced in the transmission there will be a retransmission of the failing message. Process D_N_TYPE receives data- link layer data packets from the data link, unframes the data-link layer information and checks for errors. After that it passes them to the network. If there is an error in the transmission there will be a signal sent to the sender.

4.3 Design Tool


The programming tool that was available and used in the project was Telelogic Ta u 4.3. This package contains an SDL suite that fully supports SDL-92 and partially SDL-96. The design has been done with the graphical notation of SDL. After the design in SDL it is possible to make an implementation in C or C++ that can be compiled to an executable program, which can be tested with a simulator. 4.3.1 Simulation Tool In the SDL suite there is a possibility to use a simulator that can simulate the function of the system. It is possible to stimulate inputs to the system and to examine data and state transitions. In an SDL simulation there is a possibility to use real time or simulated time. Real time means that the time is updated according to the wall clock, that is, if an event is set to occur in 1 hour the user have to wait that amount of time. The simulated time is updated according to the discrete event simulation technique. This means that the current value of the simulation time, accessed with the NOW operator, is identical with the time at which the currently executing event is scheduled. The simulation time is updated when an event is finished and then set to the time the next event is scheduled. Say for instance that the next event scheduled in a simulation is the expiration of a timer set to 1 hour. This will lead to an immediate execution of the event and a transition in the processes. The simulation time will increase 1 hour but without having to wait this amount of time. The scheduling assumes that a transition takes no time and that a signal is placed in the input port of the receiver immediately as the sender makes an output of the signal. Due to these rules the only events that update the simulated time are the output of timers.

4.4 Simulation Set-up


One interesting issue to test is how differences in the buffer size will affect the performance of the network. A comparison that cost the same regarding area could for instance be to test which of the configurations, in-buffer=2 and out-buffer=2 perform relatively to in-buffer=1 and out-buffer=3. Another factor that could have an impact on network performance are the delays in the different components that packets will pass through. When comparing this it is revealed how much it is worth to further improve these delays in VHDL. The performance measures used are loss probability in terms of packet drops/100 packets and mean packet delay regarding all packets sent and received in the network. One thing that must be considered is what kind of traffic load should be offered to the network. Of course one must limit the simulations, because the possibilities are so many that they can continue forever. The parameters below are regarded being mostly interesting to change, and after the simulations investigate how network performance is affected with different buffer sizes. 29

Burstiness- Same average in packet rate but change in burst gap and packets/burst. Network clock speed- Change the delays of the network components. Network load- Change the number of sending resources. Network Size It was decided that a 4*4 node network, like Figure 4-4, should be used for the simulations. This size is chosen because that a smaller will not test the routing thoroughly and a larger will take to long time to simulate and analyse. For a SoC it also seems like a size that could be realistic in a near future. After studying some simulations the following set- ups were used for experiments: 1. A mixed set- up according to Figure 4-5 with 1 continuous and 15 bursty resources. The positions of the resources will be same for all simulations in order to make a fair comparison. 2. Set up all sending resources as bursty according to the Poisson distribution. 3. Set up fewer sending resources as bursty. The last two will give a comparison on how the load change, in number of resources, affects performance.

1,1 RNI RESOURCE RNI RESOURCE

1,2 RNI RESOURCE

1,3 RNI RESOURCE

1,4

2,1 RNI RESOURCE RNI RESOURCE

2,2 RNI RESOURCE

2,3 RNI RESOURCE

2,4

3,1 RNI RESOURCE RNI RESOURCE

3,2 RNI RESOURCE

3,3 RNI RESOURCE

3,4

4,1 RNI RESOURCE RNI RESOURCE

4,2 RNI RESOURCE

4,3 RNI RESOURCE

4,4

Figure 4-4. Network Overview 30

In Figure 4-5 it is shown what kind of communication the resources in Figure 4-4 have.
Mean Nr of Mean Rate Mean Burst Gap packets/Burst (MHz) (us) 100 59,2 8 64 64 64 64 64 64 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8 100 59,2 8

Position Resource Destination Behaviour Max Rate/Packet (MHz) 1,1 0 3,1 Burst 384 1,2 1 1,1 Continuous 64 1,2 1 2,2 Continuous 64 Continuous 64 1,2 1 4,4 Continuous 64 1,2 1 3,3 Continuous 64 1,2 1 2,3 Continuous 64 1,2 1 1,4 384 1,3 2 2,4 Burst Burst 384 1,4 3 1,3 Burst 384 2,1 4 3,1 Burst 384 2,2 5 4,1 Burst 384 2,3 6 2,1 Burst 384 2,4 7 2,1 Burst 384 3,1 8 4,3 Burst 384 3,2 9 4,3 Burst 384 3,3 10 3,2 Burst 384 3,4 11 3,2 Burst 384 4,1 12 3,4 Burst 384 4,2 13 4,3 Burst 384 4,3 14 4,1 Burst 384 4,4 15 4,2

Figure 4-5. Table of Resource Configuration

Simulation Period The amount of time that the simulations should last was set to 10 000 time-units (clock cycles). Tests up to 200 000 time- units showed that 10 000 time-units were enough to produce reliable data. Since the time to deal with the data increased rapidly, 10 000 time-units were chosen.

4.5 Simulation Results


4.5.1 Simulations with equal delay of Switch and Buffer The following simulations are made with faster settings than those measured in the simulations in VHDL. There are equal delay of the buffer and the switch. In this case the minimum period of the micro-router is 3 clock cycles. For example, if the network clock period is 0,5 ns, this result in a maximum bandwidth of 1/(3*0,5*10-9 ) =667 Mpackets/s. We present them here in order to show how the network could behave if the component delays are optimised to this speed.

31

4.5.1.1 Simulations with 1 Continuous and 15 Bursty Resources This simulation is based on the resource behaviour that you can see in Figure 4-5. When evaluating what happens to the sent packets we have divided the packets into three different categories, namely Transferred, Cancelled and Dropped. Transferred packets are those that have been sent on time, that is when the resource wants to put it to the network, and have reached their destination. Cancelled packets are those that havent been sent on time because the buffers are full. These packets will be dropped by the transmitting resource. Dropped packets are those that are sent on time but later got dropped by a micro-router because the required buffers are full.

When looking at the results it can be seen that a network that works with 0,5 time units per clock cycle gets no dropped packets and only a few cancelled packets. The network is so fast that the buffer configuration has no, or very little influence on the performance. When the speed of the network is lowered to 0,6 time units per clock cycle, we start to se the effects of the buffers. The configurations with only one output buffer begin to drop packets and also to cancel a lot of the packets. When more than one output buffer is used the network works quite satisfactorily, and has no drops and only a few cancelled packets. The highest throughput is reached by [in-buffer:out-buffer] 02:02, because this result in the largest amount of transferred packets. The difference in performance between 01:02 and 02:02 is not very large so it will probably not be worth the extra cost of resources. The reason that it works better with more than one output buffer is that more packets can be routed into an optimal route, and from that reason causes a lower number of routings along its path. From what we have seen so far, 02:02 is the best configuration. The reason for this is that the advantages from both input- and output-buffers are combined. The input buffers gives the router a possibility to have a high throughput because there are always messages in the buffers that are ready to be routed, the output buffers makes it possible to route the packets into the correct direction. If we look at the results from using clk = 0,7 and clk = 0,8 we can once again see that the results from simulations with more than one output buffer are superior to those with only one output buffer. At clk = 0,8 the buffer configurations with 01:03 and 02:02 gets almost the same results, this can be connected to the higher spreading, see Figure 4-8, 02:02 causes when the load is high.
Clk=0,5 100% 50% 0% Dropped Cancelled 01:01 01:02 01:03 02:01 02:02 03:01 0 66 0 19 0 13 0 17 0 2 0 3
100% 50% 0% Dropped Cancelled Clk=0,6

01:01 01:02 01:03 02:01 02:02 03:01 953 2671 0 140 0 118 2903 4167 0 35 2411 3078

Transferred 18173 18220 18226 18222 18237 18236

Transferred 14615 18099 18121 11169 18204 12750

32

Clk=0,7 100% 50% 0% Dropped Cancelled Transferred 100% 50% 0% Dropped Cancelled Transferred

Clk=0,8

01:01 01:02 01:03 02:01 02:02 03:01 3170 7381 0 517 0 442 3763 6788 0 194 4008 6246

01:01 01:02 01:03 02:01 02:02 03:01 4593 0 0 1084 4195 8929 0 1096 5027 9127

11050 1300

7688 17722 17797 7688 18045 7985

2595 16938 17154 5114 17142 4084

Figure 4-6. Transfer statistics for 1 continuous and 15 bursty resources The mean time to transfer packets with different configurations on both network speed and buffer configuration once again proves the configurations with several output buffers to be superior. The configuration with 02:02 is slightly faster than 01:02 and 01:03 on networks working on 0,5-0,6 time units. When the period time is increased further configurations with one input buffer doesnt increase their mean transfer time as much as 02:02 does. The reason for this can be that the probability that the wanted output buffer is full is less since it takes longer time between two routings from the same input buffer. Once a packet is routed a new packet has to be transferred in to the input buffer and during this time its possible to make space in the desired output buffer. This result in less spreading, see Figure 4-8, of the packets and lower number of switchings for each packet, which in the end gives a lower transfer time for the packets.
Mean transfertime 250 Time units 200 150 100 50 0 Clk=0,5 Clk=0,6 Clk=0,7 Clk=0,8 01:01 18 41 81 230 01:02 14 19 26 01:03 14 19 26 02:01 18 58 97 02:02 14 19 27 03:01 17 59 113 236

35 35 158 47 Bufferconfiguration (In:Out)

Figure 4-7. Transfer mean time for 1 continuous 15 bursty resources Figure 4-8 is targeted to show how the packets travel within the NOC. This measurement we call the spreading factor. It is defined as the percentage of packets in their optimal path compared to packets out of their optimal path. For example a 60 % spreading shows that out of one hundred packet switchings, switches that reside out of the packets optimal path switch 60 of them. Switches that lay in the packets optimal path make 40 switchings. If a packet doesnt travel the shortest possible path the number of switchings will increase rapidly, and may cause drops if there is heavy traffic.

33

As you can see from the figure below the buffer configuration affects the percentage of switchings caused by misdirected packets, which proves the importance of evaluating what buffer configuration to use. Especially the importance of more than one output buffer can be seen clearly. The buffer configuration 01:02 has a slightly higher spreading than 02:02 and 01:03 at a high frequency of the network clk=(0,5;0,7), this can be explained by the reason that this configuration only has three buffers compared to four in the other two. The probability that all buffers are full is higher when there are fewer buffers, and the router then has to find an optional route. When the network-clock is set to 0,8 the configuration 02:02 suddenly begins to spread the packets more than 01:02, which seems rather odd because more buffers should give better result in most situations. The reason for this is that the router is now able to route the second packet in the input queue before there is any space in the desired output buffer, and this causes the router to send the packet into a non-optimal route.
Spreading

60% 40% 20% 0% Clk=0,8 Clk=0,7 Clk=0,5 01:01 48% 34% 11% 01:02 8% 5% 2% 01:03 5% 3% 1% 02:01 38% 33% 9% 02:02 10% 3% 1% 03:01 41% 31% 9%

Figure 4-8. Spreading factor with 1 continuous and 15 bursty resources

4.5.1.2 Simulation with 16 bursty resources versus 14 bursty and 2 inactive resources Simulation Set-up: These simulations are made upon the set-up table in Figure 4-9 but with some differences between the two models. In the simulations model All Bursty the set-up of the resources are exactly as in the table. The simulation model All Burst (-2) is also based on this table but the sending part of the resources 1,2 and 3,3 are turned off.

34

Mean Burst Gap(ns) Spreading

Mean Burst Gap (per packet) us

1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4

3,1 1,1 2,4 1,3 3,1 4,1 2,1 2,1 4,3 4,3 3,2 3,2 3,4 4,3 4,1 4,2

Burst Burst Burst Burst Burst Burst Burst Burst Burst Burst Burst Burst Burst Burst Burst Burst

384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384

100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

59,2 59,2 59,2 59,2 59,2 59,2 59,2 59,2 59,2 59,2 59,2 59,2 59,2 59,2 59,2 59,2

75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75%

2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09 2,60E-09

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

7,396 7,396 7,396 7,396 7,396 7,396 7,396 7,396 7,396 7,396 7,396 7,396 7,396 7,396 7,396 7,396

13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8

Figure 4-9. Simulation set-up for 16 vs. 14 bursty resources When comparing the number of packets transferred we can once again see that the network that runs with a period of 0,5 time units is so fast that the buffers have a very small influence on the behaviour of the network. There are no drops and the numbers of cancelled packets are also very small using this configuration. When increasing the period time to 0,7 time we can see that the influence of the buffers starts to increase. The simulations made with more then one output buffer is significantly faster than the ones with only one output buffer. The 02:02 configuration is slightly better then 01:02 and 01:03 at this speed in the All Burst model, but when increasing the period to 0,9 the configuration 02:02 falls back and 01:02 and 01:03 are the best. In the All Burst (2) model the simulation with 02:02 stays the best in all simulations.
0,5 0,7 0,9 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out) Number of transferred packets Number of transferred packets 16 Bursty Resources (All Burst) 0,5 0,7 0,9 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out) Number of transferred packets 14 Bursty Resources (All Burst-2)

Figure 4-10. Number of transferred packets, 16 vs. 14 bursty resources 35

Number of transferred packets

Mean Nr of packets/ Burst

Max Rate/ Packet (MHz) Mean Rate (MHz)

Behaviour

Receiver

Max ns

Sender

Min ns

Period in ns

The number of cancelled messages clearly shows what configurations that are capable of maintaining the dedicated throughput. All buffer configurations are able to route without drops with the period of 0,5 time units. When increasing the period to 0,7 the configurations with multiple output buffers starts to show their advantages. All three configurations manage to distribute the packets with only a few cancelled packets. The configuration with two input, and two output-buffers is the one that causes the smallest number of cancelled packets. The period time is then increased further to 0,9 and the configuration 02:02 in the All Burst simulation is not the best option any more. It seems like the configuration 02:02 is the best when the load is low, and when the load increases the configurations 01:02 and 01:03 gets the best, where the later is slightly best. The reason for this can once again be connected to the higher spreading that 02:02 causes when the load is high.
0,5 0,7 0,9 Number of cancelled packets 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out) Number of cancelled packets 16 Bursty Resources (All Burst) 0,5 0,7 0,9 Number of cancelled packets 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out) Number of cancelled packets 14 Bursty Resources (All Burst-2)

Figure 4-11. Number of cancelled packets, 16 vs. 14 bursty resources Once again we can see the importance of the output buffers and there are no drops when the configurations 01:02 and 01:03 are used. The configuration 02:02 starts to drop heavily when the load increases and the configurations 01:02 and 01:03 are the ones that dont drop in any of these two simulations.
0,5 0,7 0,9 6000 Number of droppet packets 5000 4000 3000 2000 1000 0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out) Number of dropped packets 16 Bursty Resources (All Burst) 0,5 0,7 0,9 Number of dropped packets 6000 5000 4000 3000 2000 1000 0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out) Number of dropped packets 14 Bursty Resources (All Burst-2)

Figure 4-12. Number of dropped packets, 16 vs. 14 bursty resources 36

When looking at the mean transfer time it is observed that the results from all the set- ups with one output buffer increase their transfer time much faster than the other set-ups. The set- ups with more than one output buffer have almost the same performance as long as the load is not too high. When the load gets higher the 02:02 set- up falls back and the set-ups 01:02 and 01:03 are by no doubts the fastest. The reason that 02:02 suddenly changes from the fastest of the three set- ups with multiple output buffers to the slowest, as the load increases seems a little strange. One theory about that this happens is that when there is a low load on the network the two input buffers helps the router to keep a high throughput. When the load increases there is not enough time to make space in the output buffer, and the packets are sent through a non-optimal path. This is the reason why the transfer time for 02:02 suddenly becomes longer than the time for 01:02 and 01:03.
Mean Transfer Time for Packets 16 Bursty Resources (All Burst) 700 600 Transfertime 500 400 300 200 100 0 0,5 0,7 0,9 Network Speed, Period Time 01:01
Transfertime 700 600 500 400 300 200 100 0 0,5 0,7 0,9 1,1 1,3 Network Speed, Period Time 01:01 01:02 01:03 02:01 02:02 03:01 Mean Transfer Time for Packets 14 Bursty Resources (All Burst-2)

01:02 01:03 02:01 02:02 03:01

Figure 4-13. Transfer Meantime, 16 vs. 14 bursty resources

4.5.1.3 Simulations with different burst length To investigate the importance of buffers when sending messages in bursts, we decided to examine what happens when the mean burst- length is increased. The mean number of packets per burst was now changed from 8 to 16. The expected result was to see that a longer mean burst- length would benefit from a larger amount of buffers. From the figure below it is possible to see that the results we got did not prove that at all. The number of transferred packets are very similar in the two cases, but when the network speed is set to 0,7 the simulations with a mean burst length of 16 are much better in two cases. Its when only one output buffer is used that the numbers of cancelled packets are significantly lower. When a longer burst- length is used this also gives longer time between the bursts because the total amount of packets sent shall not increase. The probability that some other resource is sending packets that degrades the performance in a transmission is from this reason lower when sending longer bursts.

37

0,5 0,7 0,9 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0

Number of transferred packets 16 Bursty Resources, Burst Length 8

0,5 0,7 0,9 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0

Number of transferred packets 16 Bursty Resorces, Burst Length 16

Number of transferred packets

01:01

01:02 01:03 02:01 02:02 Bufferconfiguration (In:Out)

03:01

Number of transferred packets

01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out)

0,5 0,7 0,9 14000 12000 10000 8000 6000 4000 2000 0

Number of cancelled packets 16 Bursty Resources, Burst Length 8

0,5 0,7 0,9 14000 12000 10000 8000 6000 4000 2000 0

Number of cancelled packets 16 Bursty Resorces, Burst Length 16

Number of cancelled packets

Number of cancelled packets

01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out)

01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out)

0,5 0,7 0,9 Number of dropped packages 6000 5000 4000 3000 2000 1000 0

Number of dropped packets 16 Bursty Resources, Burst Length 8

0,5 0,7 0,9


6000

Number of dropped packets 16 Bursty Resorces, Burst Length 16

Number of dropped packets

5000 4000 3000 2000 1000 0 01:01 01:02 01:03 02:01 02:02 03:01

01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out)

Bufferconfiguration (In:Out)

Figure 4-14. Simulation results with different burst length

38

4.5.2 Simulations with unequal delay of Switch and Buffer The following simulations show the results of the simulations when the switch and buffer delays are set to the delays that were obtained in VHDL. The main difference in the delay configurations is that the switch is a little slower than the buffers. The minimum period of the micro-router is 5 clock cycles. For example, if the network clock period is 0,5 ns, this result in a maximum bandwidth of 1/(5*0,5*10-9 ) =400 Mpackets/s. The set-ups are identical with the previous simulations. 4.5.2.1. Simulations with 1 Continuous 15 and Bursty Resources The results show that a network clock with 0,3 time units gets no dropped packets and very few cancelled packets. We can see that the configurations with multiple in-buffers seem to transfer most packets. At a network speed of 0,4 time units per clock cycle, we can see small effects of the outbuffers. The configurations with several input-buffers begin to drop packets an also to cancel a lot of the packets. It can be seen that the out-buffers does not have the big effect that were seen in the previous simulation. The switch cannot take advantage of the larger out-buffers since it is slower and thus having it difficult filling them up. It is perhaps surprising that the in-buffers are so bad, but the routing algorithm can explain it. At high traffic they make it possible to route packets in a nonoptimal direction, thus creating a lot of extra traffic.

Clk=0,3 100% 50% 50% 0% Dropped Cancelled Transferred 01:01 01:02 01:03 02:01 02:02 03:01 0 82 0 60 0 52 0 23 0 16 0 6 Dropped Cancelled 100%

Clk = 0,4

0%

01:01 01:02 01:03 02:01 02:02 03:01 0 1302 0 905 0 710 3065 2020 3237 4152 3243 3593

18157 18179 18187 18216 18223 18233

Transferred 16937 17334 17529 11022 12976 11409

Clk=0,5 100% 50% 0% Dropped Cancelled Transferred

01:01 2775 6840 8624

01:02 2565 6487 9187

01:03 2426 6049 9764

02:01 3972 7216 7051

02:02 3759 6862 7618

03:01 4354 7014 6871

Figure 4-15. Transfer statistics for 1 continuous and 15 bursty resources The mean time to transfer packets with different configurations on both network speed and buffer configuration shows that adding more buffers will only increase the time. The configuration with 01:01 is slightly faster than 01:02 and 01:03. Here the configurations with multiple input-buffers perform worse at every clock setting. 39

Transfer time (1 Continuous and 15 Bursty Resources)


160,0 Transfertime 140,0 120,0 100,0 80,0 60,0 40,0 20,0 0,0 0,3 0,4 0,5 Network Speed, Period Time 01:01 01:02 01:03 02:01 02:02 03:01

Figure 4-16. Transfer mean time for 1 continuous and 15 bursty resources

4.5.2.2 Simulation with 16 bursty resources versus 14 bursty and 2 inactive resources When comparing the number of packets transferred we can once again see that the network that runs with a period of 0,3- 0,4 time units is so fast that the buffers have a very small influence on the behaviour of the network. There are no drops and the numbers of cancelled packets are also very small using this configuration. When increasing the period time to 0,5 we can see that the influence of the buffers starts to increase. The simulations made with more then one output-buffer is slightly faster than the ones with only one output-buffer. In the 14 bursty model simulations with multiple out-buffers, the effects of the buffers appear more distinctively.
0,4 Number of transferred packets 0,5 (14 Bursty Resources, 8 pkts per Burst) 0,6 20000 15000 10000 5000 0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out) 0,3 0,4 0,5 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 Number of transferred packets (16 Bursty Resources, 8 pkts per burst)

Number of transferred packets

Number of tranferred packets

01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out)

Figure 4-17. Number of transferred packets, 16 vs. 14 bursty resources All buffer configurations are able to route without drops with the period of 0,3 time units. When increasing the period to 0,5 the configurations with multiple output-buffers starts to show their advantages, especially in the case with 14 resources. The period time is then increased further to 0,6 and the configurations 01:02 and 01:03 causes the fewest number of cancelled packets, where the later is slightly better.

40

0,4

0,5 0,6 7000 Number of dropped packets 6000 5000 4000 3000 2000 1000

Number of cancelled packets (14 Bursty Resources, 8 pkts per Burst)

0,3 0,4 0,5 Number of dropped packets Number of cancelled packets (16 Bursty Resources, 8 pkts per burst) 7000 6000 5000 4000 3000 2000 1000 0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out)

0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out)

Figure 4-18. Number of cancelled packets, 14 vs. 16 bursty resources We can see the importance of the output-buffers and there are no drops when the configurations are 01:02 and 01:03 up to a clock of 0,5. The configuration 03:01 starts to drop heavily when the load increases.
0,4 0,5 0,6 Number or dropped packets 5000 4000 3000 2000 1000 0 01:01 01:02 01:03 02:01 02:02 Bufferconfiguration (In:Out) 03:01

Number of dropped packets (14 Bursty Resources, 8 pkts per Burst)

0,3 0,4 0,5 Number of dropped packets 5000 4000 3000 2000 1000 0

Number of dropped packets (16 Bursty Resources, 8 pkts per burst)

01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out)

Figure 4-19. Number of dropped packets, 14 vs. 16 bursty resources In the mean transfer time diagram it is observed that the results from all the set-ups with several inbuffers increase their transfer time faster than the other set-ups. The set-ups with more than one output buffer have almost the same performance as long as the load is not too high. The single in-, out-buffer seems to have the fastest transfer time in all clock settings.
Transfer time (14 Bursty Resources, 8 pkts per Burst) 400 350 300 250 200 150 100 50 0 0,4 0,5 0,6 Network Speed, Period Time 01:01 01:02 01:03 02:01 02:02 03:01 400 350 300 250 200 150 100 50 0 0,3 0,4 0,5
Network Speed, Period Time

Transfer time All Burst (16 Bursty Resources, 8 pkts per burst) 01:01 01:02 01:03 02:01 02:02 03:01

Transfertime

Figure 4-20. Transfer Meantime, 14 vs. 16 bursty resources 41

Transfertime

4.5.2.3 Simulations with different burst length The number of transferred packets seems to be influenced more when the bursts are 8 packets. The explanation to this is probably the same as in the simulation with equal times.
0,3 0,4 0,5

Number of transferred packets (16 Bursty Resources, 8 pkts per burst) 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out)

0,3 0,4 0,5 18000 16000 14000 12000 10000 8000 6000 4000 2000 0

Number of transferred packets (16 Bursty Resources, 16 pkts per Burst)

Number of tranferred packets

Number of transferred packets

01:01

01:02 01:03 02:01 02:02 Bufferconfig (In:Out)

03:01

0,3 0,4 0,5 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0

Number of cancelled packets (16 Bursty Resources, 8 pkts per burst)

01:01

01:02 01:03 02:01 02:02 Bufferconfiguration (In:Out)

03:01

0,3 Number of cancelled packets 0,4 (16 Bursty Resources, 16 pkts per Burst) 0,5 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 01:01 01:02 01:03 02:01 02:02 03:01 Bufferconfiguration (In:Out)

Number of cancelled packets

5000 4000 3000 2000 1000 0 01:01 01:02 01:03 02:01 02:02 Bufferconfiguration (In:Out) 03:01

Number of dropped packets

0,3 0,4 0,5 6000 Number of dropped packets

Number of dropped packets (16 Bursty Resources, 8 pkts per burst)

Number of cancelled packets

0,3 0,4 0,5 6000 5000 4000 3000 2000 1000 0

Number of dropped packets (16 Bursty Resources, 16 pkts per Burst)

01:01

01:02 01:03 02:01 02:02 Bufferconfiguration (In:Out)

03:01

Figure 4-21. Simulation results with different burst length

4.6 Chapter Discussion


SDL seems to be well suited to make models and simulate this complex system. The time to simulate 10 000 clock cycles took about 2 minutes. The most time was spent on handling the data in Excel. It took about 5 minutes to calculate the results of one simulation.

42

One must not forget the fact that using SDL, the total design time for a project may increase. For example it may have been faster to directly make a VHDL design on a higher abstraction level instead of using SDL. The benefits of SDL are however that the simulations can be performed using one tool. Using VHDL it may be necessary to introduce other programs, such as Matlab or user written functions in C for generating simulation stimulus. From the different simulations that have been made, it is possible to see that one result is constant throughout all simulations. This is the advantage that comes from using at least two output buffers. The throughput of the system increases, especially in the case of equal delays of buffers and switches. When the output buffer is increased further to three buffer spaces there is only a small extra benefit in performance, but in most cases it will not be worth the extra cost that is connected to extending the buffer size. It also seems that adding more input-buffers only makes the network performance worse. One surprising result, showed in both delay configurations was the one where longer burst-length did not benefit more from larger buffers compared to the shorter burst- length. The configuration that we would like to recommend for a general-purpose system is one with one input buffer and two output buffers. If a NoC with a special purpose is designed there should be simulations made with different configurations to see what configuration that is the most suited.

43

NoC: Hardware Design

It was decided to make the design in VHDL. It was not intended to make several designs in VHDL and thus the design is described in Register Transfer Level (RTL) code. This makes it possible to make an automatic transfer to net- list using a synthesis tool. The programming language tool supports graphical description of synchronous Finite State Machines (FSM) and this have been used to specify the behaviour and code generation.

5.1 Model Requirements


The model serves the purpose of describing a digital circuit at a synthesizable level of abstraction. Main objectives of the VHDL model are: To determine delay/execution time in terms of clock cycles for various blocks To determine hardware cost of various NoC blocks. Implementation results are shown in chapter 6.

5.2 Design Structure


Since the model in SDL seems to serve its purpose, the block division that was used in that design will be used for encapsulating functional units in VHDL. There are however some differences. The units that are written as processes in SDL are described as components containing one state machine in VHDL. Also the Switch Control and the RNI unit, which are internally divided into several processes, i.e. two separate state machines in SDL, are only represented using one state machine each in VHDL. The reason for this is that the design should not take up unnecessary space and time. Since these processes in SDL have a sequential functional behaviour, it is enough with one state machine in VHDL. If more state machines were used there would have to be synchronization signals between these. The VHDL network model is not parameterised as the SDL model. The network mesh is created by instantiating components and then manually connecting them to each other, as in Figure 5-1. The package Noc_package have been created in order to gather type definitions and constants in one place.

44

Clk Reset

Clk Reset Indata_East Indata_East_Wr Indata_East_Enable Indata_East Indata_East_Wr

Clk Reset

Clk Reset Outdata_West Outdata_West_WR Outdata_West_Enable Outdata_West_RTR

Indata_East_Enable Indata_East_RTR

NoC Indata_East_RTR NOC_Node_Uart_Rec I0


Outdata_East Outdata_East_Wr Outdata_East_Enable Outdata_East_RTR RouterID RouterID Outdata_South_RTR Indata_South_RTR Outdata_South_Enable Indata_South_Enable Outdata_South_Wr Indata_South_Wr Outdata_South Outdata_South_Wr Indata_South_Wr Indata_South Outdata_South_Enable Indata_South_Enable Outdata_South_RTR

Outdata_East Outdata_East_Wr Outdata_East_Enable Outdata_East_RTR RouterID_1 Send_pid_1

NoC NOC_Node_Transceiver0_1 I1

Indata_West Indata_West_WR Indata_West_Enable Indata_West_RTR RouterID Send_pid Outdata_South_RTR Indata_South_RTR Dest_PID Outdata_South_Enable Indata_South_Enable Outdata_South_Wr Indata_South_Wr Outdata_South Indata_South

rx

rx

Dest_PID_1

Outdata_South_1 Indata_South_1 Outdata_South_Wr_1 Indata_South_Wr_1 Outdata_South_Enable_1 Indata_South_Enable_1 Outdata_South_RTR_1 Indata_South_RTR_1 Indata_North Outdata_North Indata_North_Wr Outdata_North_WR Indata_North_Enable Outdata_North_Enable Indata_North_RTR Outdata_North_RTR

Clk Reset

Indata_South_RTR Outdata_South Indata_South Indata_North Outdata_North Indata_North_Wr Outdata_North_WR Clk Indata_North_Enable Outdata_North_Enable Reset Indata_North_RTR Outdata_North_RTR Indata_East Indata_East_Wr Indata_East_Enable Indata_East_1 Indata_East_Wr_1

Clk Reset

Clk Reset

Outdata_West Outdata_West_WR Outdata_West_Enable Outdata_West_RTR

Indata_East_Enable_1 Indata_East_RTR_1

NoC Indata_East_RTR NOC_Node_Uart_Trans I2


Outdata_East Outdata_East_Wr Outdata_East_Enable Outdata_East_RTR RouterID_2 Tx RouterID

Outdata_East_1 Outdata_East_Wr_1 Outdata_East_Enable_2 Outdata_East_RTR_1 RouterID_3 Send_pid_3

NoC NOC_Node_Transceiver1_1 I3

Indata_West Indata_West_WR Indata_West_Enable Indata_West_RTR RouterID Send_pid

Tx Dest_PID_3 Dest_PID

Figure 5-1. VHDL block model of NoC at node level

5.2.1 Micro-Router The Micro_Router in Figure 5-2 is of the same component structure as the blocks in SDL. Switch and Control unit The objective of the Switch is to check the input buffers if there are any messages waiting to be transmitted and if there are, switch them to an appropriate output buffer in a safe way. Handshaking is used for packet transactions. When ready to handle a packet the Switch sets the RTR signal to 1. After reset all these are thus set to 1. If an input buffer has a packet it will set the WR signal to 1. The Switch will then try to route this packet according to the same algorithm as in the SDL design. If there is a free output buffer this has its RTR signal 1. The Switch will then assert its WR signal 1. At the same time the Switch will set the RTR of the input buffer to 0 to indicate that the packet could be taken. After this it will wait for the WR from the in buffer to be 0 and the RTR from the out buffer to be 0 before it sets the WR to 0 to the out buffer and after that is ready to handle a new packet.

45

A counter is rolled forward so it will not start to look for a packet in the same buffer where it just picked up a packet from. This will give a degree of fairness to the behaviour. This was not needed to do in the SDL design since the signals there, are in the form of events that are queued in the order they arrive to a process. In VHDL we cannot determine which of the signal that came first if they arrive within the same clock period.

RouterID Reset Clk North_Out North_In North_Out_RTR North_In_Wr Reset RouterID Reset clk North_Out North_In North_Out_WR North_In_Wr North_Out_RTR North_In_RTR East_Out East_Out_WR East_Out East_Out_WR East_Out_RTR East_In East_In_Wr East_In_RTR Clk Reset Clk

NoC NOC_Router_Buffer I2
Datalink_Out_RTR Datalink_Out_Enable

West_In West_In_Wr

West_In West_In_Wr West_In_RTR West_Out West_Out_WR

Switch_In Switch_In_WR Datalink_Out_WR Switch_In_RTR Datalink_Out Switch_Out Datalink_In_RTR Switch_Out_WR Datalink_In_Enable Switch_Out_RTR Datalink_In_WR Datalink_In

NoC Switch I5

East_Out_RTR East_In East_In_Wr East_In_RTR

West_Out_RTR RNI_In RNI_In_Wr RNI_In_RTR

West_Out_RTR RNI_In RNI_In_Wr RNI_In_RTR

RNI_Out_RTR South_In_RTR South_Out_RTR RNI_Out_WR South_In_Wr South_Out_WR RNI_Out South_In South_Out South_Out South_In South_In_Wr Switch_Out Switch_Out_WR Switch_Out_RTR Switch_In South_Out_WR South_Out_RTR

South_In_RTR RNI_Out RNI_Out_WR RNI_Out_RTR Switch_Out Switch_In Switch_Out_WR Switch_In_WR Switch_Out_RTR Switch_In_RTR

NOC_Router_Buffer

Switch_In_WR Switch_In_RTR

Figure 5-2. VHDL model of NoC at network layer Buffers The buffers in the North, South, West and East direction are of the same type. A special type is used for the buffer towards RNI for separate configuration possibility. If the buffer is ready to receive a packet the In_RTR is set to 1. When a packet is ready to enter the buffer the In_WR signal sets to 1. When WR is 1 the In_RTR is put low to indicate that the buffer is handling the packet. The buffer then checks if the Out_RTR is set and if so it will set the Out_WR to 1 for immediate transfer of the packet. In the case that Out_RTR is low the packet is saved in the buffer. If the buffer is not full and the In_WR signal has gone low the buffer sets the In_RTR high again.

46

5.2.2 RNI For simplicity the service and buffer capability of the RNI is built together in one FSM. The difference from a pure buffer is that, for instance if it is an out-buffer, the FSM looks up a table and adds the network layer information as the packet from the resource is in the buffer.

D_Indata_RNI_Wr

D_Indata_RNI Clk Reset rx Clk Reset rx Data_Out Data_Out_WR Data_Out_RTR Data_Out Data_Out_WR Data_Out_RTR Data_In Data_In_WR Data_In_RTR Clk Reset Rout_Out_RTR Rout_Out_Enable Rout_Out_WR Reset Rout_Out Clk Res_In Res_In_Wr Res_In_RTR Res_Out Res_Out_Wr Res_Out_RTR Rout_In_Enable Rout_In_RTR

NoC NOC_RNI I1
D_Outdata_RNI Rout_In Rout_In_Wr

NoC UART_Rec I0

D_Outdata_RNI_W

D_Outdata_RNI_En

D_Outdata_RNI_RT

Figure 5-3. VHDL Block model of RNI and a Resource

5.2.3 Resource In the design there are three types of resources. These are designed for the requirements of the implementation. There is an UART receiver, which can receive messages at 9600 kbit/s and transmit them towards the RNI as a simple transport layer message. Another is the send ID resource, which receives a number 0-9, forwards it and then sends its own ID that number of times to a specific node in the network. The last type is the UART transmitter, which receives packets from the RNI and sends them at 9600 kbit/s onto a serial channel.

5.3 Design and Simulation Tool


The tool for accomplishing the design was FPGAdvantage 4.0 from Mentor Graphics. This is a set of design tools in an integrated environment. The design is made in Renoir. For target synthesis it uses the Leonardo Spectrum program. A separate synthesis by Modelsim is made in order to simulate the code.

47

5.4 Simulation Results


The design was simulated in several test-benches to find out the timing requirements. The VHDL model serves two purposes. It was desired to find realistic timing values to put back in the SDL model. There was also the goal that the model should be able to implement in an FPGA. As an FPGA is not optimally designed for the purpose, we needed to accept that this model would be a bit slower in clock cycles. 5.4.1 Simulated values This chapter shows the results that were simulated as r ealistic if, for instance an ASIC were designed. Buffers: Forwarding a one packet: 1 clk. Turn around time to receive new packet: 3 clk Switch: Load packet for address evaluation and send: 2 clk Be ready to load new packet: 2 clk In Figure 5-4 it can be viewed that the reason that these clock cycles occurs are decided by the handshaking procedure that is used. This figure shows how the timing values from the VHDL simulations, were transferred to the SDL timing in the upper part of the figure. Since it takes 2 clocks for the switch to send a packet and it has to wait 3 clock cycles for the RTR from the buffer it sums up to a period of 5 clock cycles. The buffers are one clock cycle faster because it sends on one clock cycle and the turn around time is counted from when it receives. From these values a max bandwidth (BW) of the router can be derived with an example using a specific clock frequency. Ex. F=1 GHz Max BW Buffers = 1/(4*1*10-9 )= 250 MPackets/s Max BW Switch = 1/(5*1*10-9 )= 200 MPackets/s Max BW overall = 200 Mpackets/s Since one packet can pass the buffers faster than the max bandwidth the minimum delay for one packet can be calculated as follows. In-buffer delay = 1 ns Switch delay = 2 ns Out-buffer delay = 1 ns Total min delay = 4 ns

48

Pkt1 0 1 Outbuf Inbuf Switch Outbuf Inbuf

Pkt2 3 4 5

RTR Signal Packet Signal

SDL Timing

PIn PSwitch POut

8 VHDL Simulation

< Outbuf_WR < Inbuf_RTR to Outbuf < Inbuf_WR to Switch < Switch_RTR to Inbuf < Switch_WR to next Outbuf < Next Outbuf_RTR to Switch

Figure 5-4. The communication process between a switch and buffers

49

5.4.2 Simulated and Implemented values The fast design used in the above chapter would not work in the FPGA. It was decided to make more of the used signals clocked thus making the design slower, but safer in the behaviour. Below the values that work in the FPGA are shown. Buffers: Forwarding a one packet: 2 clk Turn around time to receive new packet: 3 clk Switch: Load packet for address evaluation and send: 4 clk Be ready to load new packet: 4 clk It is shown that the clock cycles to perform the packet transportation is increasing resulting in that both max BW and minimum delay are getting worse. Ex. F=1 GHz Max BW Buffers = 1/(5*1*10-9 )= 200 MPackets/s Max BW Switch = 1/(10*1*10-9 )= 100 MPackets/s Max BW overall = 100 MPackets/s The minimum delay for one packet is calculated as follows. In-buffer delay = 2 ns Switch delay = 4 ns Out-buffer delay = 2 ns Total min delay = 8 ns

5.5

Chapter Discussion

Although it was not possible to make a working implementation in FPGA with the faster values it can be concluded that these are the most realistic in an ASIC implementation. The reason for this is that the possibilities for the VLSI designer vastly exceed the ones existing in this project. Of course one can manually make adjustments in the technology mapping but this is too time consuming and is therefore left out.

50

NoC: Prototyping on FPGA

6.1 Prototype Board


The design is implemented with the help of a prototyping board with a Xilinx Spartan II XC2S100PQ208 on board. This is a 100k gates, 600 CLB FPGA, 1200 FF that is clocked at 40 MHz. The board has a driver for RS232, which can be connected, to the ports on the FPGA.

6.2 Functional Description


A two times two Network on Chip (NoC) was to be implemented into an FPGA. The reason for this is that there is not room for a bigger network in the target FPGA. The design structure can be viewed in Figure 6-1. 6.2.1 Communication Communication between two micro-routers passes through one output-buffer. Because that the FPGA is too small there is not room to implement the data- link layer. Data also travels through an input-buffer before it reaches the receiving micro-router. Between the resource and micro-routers there is a Resource Network Interface (RNI). The RNI converts data between the format used in the resource and the format in the network layer. The RNI is connected in the same way as communication between micro-routers. 6.2.2 Resources Resource zero receives ASCII characters from the RS-232 interface and pass them on to resource one followed by the ASCII code for zero as code for resource number zero. Resource one receives ASCII characters from resource zero and passes them on to resource three. After each message that passes through the resource recognizes number that the ASCII code represents, sends it and after that sends its ID number in ASCII code that number of times. Resource two receives ASCII characters from resource one and passes them on to resource three. After each message that passes through resource three, the ASCII code for three will be sent in the same manner as in resource one. Resource two receives ASCII characters from resource three and sends them via RS-232 to the PC. 6.2.3 I/O-ports There are two connections to the PC through a RS-232 interface. Resource zero receives ASCII characters via RS-232. Resource two also has a RS-232 interface and sends ASCII characters to the PC.

51

PC
RNI RS-232 Resource 0 Rx RS-232 Interface

Micro Router (0:0)

Micro Router (0:1) RNI Resource 1 Store, modify, forward

Micro Router (1:0) RNI RS-232 Resource 2 Tx RS-232 Interface RNI Resource 3 Store, modify, forward

Micro Router (1:1)

Figure 6-1. Overview of Network on Chip Prototype in FPGA

Label Data MSN PSN DPID SPID HC RID Check

Size 8 bits 1 bits 1 bits 2 bits 2 bits 1 bits 2 bits 1 bit

Message Sequence Number Packet Sequence Number Destination Process Identification Source Process Identification Hop Counter Resource Identification Check bit

Figure 6-2. Bits in implementation

6.3 Technology Mapping tool


The technology mapping was made using Xilinx Design manager 4.1i. There is also a possibility to do this stage in the Leonardo Spectrum. However, there had to be special configurations for the used prototype board regarding start-up clock and we could not find this setting in Leonardo. This was the reason that it could not be used. 52

6.4 Implementation Result


In Figure 6-3 there is a summary of how much space the different components take up. The tool is set for area optimisation. The total packet size is, in the case of the buffers, switch/route and microrouter 17 bits because they only operate on the network layer.
Slices Buffer (1 pkt) Buffer (2 pkts) Buffer (3 pkts) Switch/Route Flip-Flops Gates Logic levels

20 32
41 105

24 41 58
45

312 574
824 1659

4 5
5 10

385 Micro-Router Implementation 947

370 751

6245 9 15509 5

Figure 6-3. Implementation Result The micro-router is synthesized with 2 output-buffers and one input-buffer in all of the 5 directions. If we manually sum up these components it will be in slices: 32*5+20*5+105=365 In flip-flops: 24*5+41*5+45=370 In gates: 312*5+574*5+1659=6089 The implementation is done with only one buffer in each direction because of the limited space. Here there are only 3 in/out-buffers since it is only a 2*2 network and not connected buffers at the boundary are not added in the design.

6.5 Chapter Discussion


The size of the components when synthesized separately seems to be roughly the same as when synthesized together. An explanation why the total micro-router is larger can be that the gate-depth is lowered to meet timing constraints. It is unsatisfactory that the implementation had to be done with the slower technology. However this seems to work well and shows that the principles of the design works.

53

Results

The objective for this thesis has been divided into three parts. The first and most important objective was to evaluate the SDL- language for modelling and simulation of a NoC. The results of this part are described in part 7.1. The second objective was to make a model of NoC using VHDL, this model should be based on the description that was made in SDL. The results are described in chapters 7.2 and 7.3. The third and last objective was to implement a small NoC into an FPGA, this is described in chapter 7.4.

7.1 SDL Modelling and Simulation of NoC


One part of the work was to build the simulator in SDL. It was important to make this model as flexible and reliable as possible. The way we implemented this into the simulator was to build every functional unit as a separate design block. The advantages that the block model gave to the simulator, was that it became very easy to change the functionality of one single part of the node, for example the micro-router. It is now very easy to build a new micro-router based on a different routing-algorithm and then make some new simulations to see how the behaviour has changed. Properties that were changed on a more regular basis were implemented as synonyms in the uppermost block level, examples of such properties can be buffer-size and how long time that was spent in different situations. From the simulation results it is easy to see that a NoC built with more than one input-buffer is not a very good solution. With this configuration the routers start to route packets into the wrong direction as soon as the load of the network increases. This is caused by the behaviour of the microrouter, when a packet cant be routed towards the destination it will be passed on to any other available buffer. When the router has to send packets into the wrong direction, this results in extra traffic that causes even more packets to be misdirected.

7.2 Designing NoC using VHDL


A VHDL model has been designed based on SDL model. In SDL the model was divided into several functional units, called blocks. When we developed the VHDL model we took every such block and translated it into components. It was not possible to directly translate SDL processes and procedures into VHDL, but the behaviour is very similar to that in SDL. In this project we have only designed one base model in VHDL, and it is also possible to optimise this model further. The simulations made from the VHDL gave us information on how many clock cycles different processes took.

54

7.3 Implementation of NoC prototype in FPGA


The reason that we choose to implement a NoC prototype into an FPGA was that we wanted to see that the code was implementable and to visually prove that our model works. The prototype had to be very simple because we only had a simple prototyping board with a 100k gates Spartan II. We decided to build a simple UART that could communicate with a PC via the RS-232 interface, see Figure 6-1. It was now possible to use the PC both for input of data to the system and to view the results on the screen. The size of the network could only be 2x2 and two of the resources were already used for the UART. The behaviour of the two remaining resources are the same, when resource 1 receives a digit from resource 0, for example 3 it forwards the digit followed by 3 packets containing the resource number (1 in this case) to resource 3. Resource 3 works exactly the same as above. When a digit is received it is forwarded to resource 2, and resource 3 sends its resource number as many times as the forwarded digit tells. Resource 2 sends all the packets it receives from resource 3 to PC. PC Resource 0 Resource 1 Resource 3 Resource 2 Resource 0 Resource 1 Resource 3 Resource 2 PC 2 2 2,1,1 2,3,3,1,3,1,3 2,3,3,1,3,1,3

Figure 7-1. Description of communication in NoC-prototype.

55

Conclusions

In this project we have developed a generic model of a NoC architecture using SDL. We have also designed a small size NoC in VHDL and prototyped it in a 100k gate FPGA. Our project demonstrate the feasibility of the NoC concept. To develop something like a NoC is very interesting since it is a relatively unexplored area. A complete design flow has tried to be done, which has forced us to make things simple. As we look at every step in the design we see that there are possibilities to improve the models at all levels of abstraction. Though the most effort has been spent on the SDL model and the simulations there still can be lots of things to do. For example, since we see that it can be used for making fast simulations it may be worthwhile testing other routing algorithms. When we were planning for NoC prototype we first thought that it would be possible to implement a larger network, but all the buffers took a large amount of space. The network size was from this reason set to two by two, its a little bit too small to draw any real conclusions from but we thought that it would be fun to get a working network-prototype into an FPGA anyhow. There were also very little space left for the resources, and we had to make them very simple. It would be very interesting to build a lager network with some more advanced resources like processors and RAM to see how a NoC could be used. One thing that seems important is to include in the network is designing mechanisms to get hard real-time properties for messages between resources. Some researchers have proposed that a circuit switched network could be the solution for this [16]. A circuit switched network makes it possible to transfer data very fast as soon as the data path has been locked. Another solution for increased real-time properties is to give packets in the NoC different priorities. Packets that are given high priority can pass the packets with lower priority in for example the buffer. Its also possible for the router to give packets with high priority advantages. If there is a packet with lower priority in the output buffer it can be dropped to make space. In this way it is possible to implement some sort of hard real-time properties into a packet-switched network. Similar behaviour can be seen in all simulations made on routers with only one output buffer, and this is the reason why we recommend at least two output buffers. By increasing the number of buffers to three output buffers or two inputs and two outputs we can only get a very small extra benefit. In some simulations it is even possible to see that the number of transferred messages is lower, and the transmission times are longer. Because of this and because of the limited resources in a chip we propose the configuration with one input buffer and two output buffers for the NoC.

56

References
[1] Shashi Kumar, Axel Jantsch, Juha-Pekka Soininen, Martti Forsell, Mikael Millberg, Johnny berg, Kari Tiensyrj, and Ahmed Hemani, A network on chip architecture and design methodology, In Proceedings of IEEE Computer Society Annual Symposium on VLSI, April 2002. Edwin Rijpkema, Kees Goossens, and Paul Wielage, A Router Architecture for Networks on Silicon, Proceedings of Rogress, 2nd Workshop on Embedded Systems, 2001 Luca Benini, Giovanni De Micheli, Networks on Chips: A New SoC Paradigm, IEEE Computer Society, 2002 Christer Bohm, Circuit Switching for High Performance Integrated Services Networks, Royal Institute of Technology, Department of Teleinformatics, June 1996 James F. Kurose, Computer Networking, Addison Wesley Longman 2001, ISBN 0-201-47711-4 Dhiman Deb Chowdhury, High Speed Lan Technology Handbook , Springer 2000, ISBN 9-783540-665977 Gary N. Higginbottom, Performance Evaluation of Communication Networks, Artech House 1998, ISBN 0-89006-870-4 Roland Airiau et al., Circuit Synthesis with VHDL, Kluwer Academic Publishers 1999, ISBN 0-7923-9429-1 Stefan Sjholm m.fl., VHDL fr konstruktion, Studentlitteratur 1999, ISBN 91-44-01250-0 Douglas L. Perry, VHDL, McGraw-Hill Inc. 1994, ISBN 0-07-049434-7 A. Olsen et al., System Engineering Using SDL -92, North-Holland 1997, ISBN 0-444-89872-7 Jan Ellsberger et al., SDL Formal Object-oriented Language for Communicating Systems, Prentice Hall 1997, ISBN 0-13-621384-7 57

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13}

Paul Wielage and Kees Goosens, Networks on Silicon: Blessing or Nightmare?, Philips Research Laboratories, Eindhoven, The Nederlands Yi-Ran Sun, Simulation and Performance Evaluation for Networks on Chip, M Sc Thesis, KTH, Sweden Dake Liu et al., SoCBUS: The solution of high communication bandwidth on chip and short TTM, Proc of the Real- Time and Embedded Computing Conference, Gothenburg, Sweden, Sep 2002

[15]

[16]

58

9
Burst

Vocabulary
With bursty behaviour it means that a certain number of packets is put out after each other with maximum rate, called burst, and after that there is a delay called burst gap before the process repeats itself. The number of packets between the delays is, in this project randomly selected according to the Poisson distribution. Continuous behaviour is, in this project, when it is possible to set the delay between two packets transmitted, which gives the output frequency. Field Programmable Gate Array. A programmable logic device that uses static ram to store the configuration. It has a high logic capacity but the device have to be reprogrammed after a power shutdown. Network on Chip. NoC is one proposed solution for communication in future SoC design. The main idea is that resources on the chip are supposed to communicate with each other through a network. Resource Network Interface. RNI is an interface that adapts the interface of the resource to the network. In a switched network there is a need to find through several switches. Therefore the switches cross-points in the network also implement a function. They are called routers because functionality. a route that are routing of this

Continuous FPGA

NoC

RNI Router

SoC VHDL

System on Chip. Multiple stand-alone VLSI-designs are stitched together on a chip to provide one functional system. Very high speed integrated circuit Hardware Description Language. Initially developed for the US Department of Defence in order to describe digital circuits. Now also widely used for design and synthesis of these circuits. Very Large Scale Integration. Term for the process used when manufacturing chips containing several hundred thousand up to a million transistors.

VLSI

59

Potrebbero piacerti anche