Sei sulla pagina 1di 5

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.

ORG

86

NoC-based Network Switching Node


Reza Kourdy Department of Computer Engineering Islamic Azad University, Khorramabad Branch, Iran Mohammad Reza Nouri rad Department of Computer Engineering Islamic Azad University, Khorramabad Branch, Iran

AbstractA promising idea towards a more scalable solution is to replace the crossbar by a Network-on-Chip (NoC). In this paper we want to investigate the general feasibility of this approach and explore different NoC structures for the crossbar substitution. However, such a crossbar fabric does not scale linearly with the number of ports and for implementations with many ports a large chip area is consumed. Network switching nodes generally internally slice incoming packets into multiple of so-called flits of equal size. Such flits are then transported in fixed timeslots via a crossbar fabric to their destined output port, where they are reassembled to packets. Index TermsNetwork on Chip (NOC), Resource Network Interface (NI), NOC SWITCHING TECHNIQUE, Circuit switching, Packet Switching, Store and Forward switching (SAF), Virtual Cut through Switching (VCT), Wormhole Switching (WH).

1 INTRODUCTION
ystem on a chip (SoC) is the design methodology currently used by VLSI designers, based on extensive IP core reuse. Cores do not make up SoCs alone, they must include an interconnection architecture and interfaces to peripheral devices [1]. Usually, the interconnection architecture is based on dedicated wires or shared busses. Dedicated wires are effective only for systems with a small number of cores, since the number of wires in the system increases dramatically as the number of cores grows. Therefore, dedicated wires have poor reusability and flexibility. A shared bus is a set of wires common to multiple cores. This approach is more scalable and reusable, when compared to dedicated wires. However, busses allow only one communication transaction at a time. Thus, all cores share the same communication bandwidth in the system and scalability is limited to a few dozens IP cores [2]. Using separate busses interconnected by bridges or hierarchical bus architectures may reduce some of these constraints, since different busses may account for different bandwidth needs, protocols and also increase communication parallelism. Nonetheless, scalability remains a problem for hierarchical bus architectures. A network on chip (NoC) appears as a probably better solution to implement future on-chip interconnection architectures [2]-[7]. In the most commonly found organization, a NoC is a set of interconnected switches, with IP cores connected to these switches. NoCs present better performance, bandwidth, and scalability than shared busses [3]. Switches are responsible for: (i) receiving incoming packets; (ii) storing packets; (iii) routing these packets to a given output port; (iv) sending packets to others switches. To accomplish these functions, four main components compose a switch: a router, to define a path between in-

put and output switch ports (function i); buffers to store intermediate data (function ii); an arbiter to grant access to a given port when multiple input requests arrive in parallel (function iii); and a flow control module to regulate the data transfer to the next switch (function iv). Packet switching is by far the most employed switching mechanism in NoCs, although circuit switching NoCs have already been proposed [7]. Packet switching requires the use of a switching mode, which defines how packets move through the switches [8]. The wormhole switching mode avoids the need for large buffer spaces, since a packet is transmitted between switches in smaller units, called flits. Only the header flit has routing information. Thus, the rest of the flits that compose a packet must follow the same path reserved for the header.

2 RELATED WORKS
The large number of processor cores in chip multiprocessors, 2D mesh has been gaining wide acceptance for inter core on chip communication. Program performance is more sensitive to the router latency than to the link bandwidth. Adaptive System-on-a-Chip (aSoC) is used as a backbone for power-aware video processing cores. Adaptive System-on-a-Chip, by nature of its statically scheduled mesh interconnect, performs up to 5 times faster than bus-based architectures. Additionally interconnect usage for typical digital signal processing applications is fewer than 20%. This leaves significant interconnect bandwidth to accommodate the control communications required by the power-aware features of modern cores. A SoCs ability to provide dynamic voltage and frequency scaling is critical to future portable digital signal processing applications. This will allow SoC implementations to exploit the inevitable mis-

2012 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

87

matches in core utilization, due to data content variations or user requirements, to reduce power consumption. Communication between nodes takes place via pipelined, point-to-point connections .The XY algorithms are used to route packet from source node to a destination node. The entire flow leverages the flexibility of a fully reusable and scalable network components library called Xpipes. The latency of its router is 7 cycles. The disadvantage of this NoC is that the reception of the packets is not guaranteed when flits take different paths. The ANoC communication architecture is composed of nodes links between nodes and computations resources. The global topology of the architecture is not determined. HERMES is a 2DMesh NoC topology satisfies the requirement of implementing a low area and low latency communication for system on chip modules. The flit size is parametrable and the XY algorithm is used for routing the packet from source node to a destination node.

4.3 Resource Network Interface (NI) The NI is a protocol converter that maps the processing node I/O protocol into the protocol used within the NoC. A reason for the need of this conversion is that NoCs use streamlined protocols for the sake of efficiency. This means that NI converts the message into packets for the transmission of the NoC. Several functional fields in one packet are distinguished with the address, the data and control information fields. Each of fields in one packet is filled by the NI with the IP control. 4.4 Processing Element (PE), Core, Resource These are the functional block that does the processing or runs the applications. PE can be CPU or a memory block. However, it is important to understand that PE is not the part of NoC. A NoC consists of routers, links and network interfaces.

3 THE NOC DESIGN SPACE


Network-on-Chip is a communication network that is used on chip. It replaces dedicated design specific wires with scalable, general purpose multi hop network. NoC provides a robust, high performance, scalable and power efficient communication infrastructure for connecting MPSoC components. A NoC usually consists of a packetswitched on chip micro-network, foreseen as the natural evolution of traditional bus-based solutions. The principal goals of packet switching are to optimize utilization of available link capacity and to increase the robustness of communication. This is the reason why most networks proposed in the literature are packet based.

Fig.1. An example SoC that has a 2-D mesh NoC with 9 resources.

Network interface is denoted with NI

4 NOC COMPONENTS
Before we delve deeper into the details NoC communication, it is worthwhile to know the components of it.

4.1 Switch (SW) In order to send the messages from one module to another, we need switches. Switches consist of a set up of input buffers, an interconnection matrix, a set of output buffers and some control circuitry. Buffers allow the storage of the data which cannot be immediately routed. 4.2 Link (L) Link is the physical connection between the two nodes and/or switches. These are usually bi-directional. The physical wires are predominantly used for network, because pitch of the wire in silicon technologies is very small and multi-layers of wiring are possible. Thus, in NoCs, data and control are typically physically separated resulting into a cleaner and performance focus design. Nevertheless, this implementation has its own side effect. It causes signal delay and signal dispersion on chip wires, which is a serious enough problem to be taken care off. Indeed, delay grows quadratically with the wire length, because of the combined resistive and capacitive effects. Repeaters are used to restore the voltage level on wires and provide local current sourcing and sinking capability.

Fig. 1 shows an example SoC with a NoC and nine heterogeneous IP blocks that are CPUs, memories, input/output devices, and HW accelerators. The size of one resource is, for example, 50-200 kilo gates or larger. This means that tens of resources are possible in a single chip with modern 65 nm processing technology. The good old shared bus is still very common in practical SoC implementations so it is well motivated to have a deeper look at NoC proposals and figure out what really matters.

5 NOC SWITCHING TECHNIQUE


The first step to design NoC is to decide which NoC topology should be implemented. The next step is to determine the switching technique i.e. how data flows through the routers. This means defining the granularity of data transfer and the switching mechanism. The switching technique determines how flits and packets are transported and stored by the routers, but it does not tell which route to be taken. The transfer of the data (of fixed length) takes place on a link. The data transferred in a single cycle on a link is called the phit (physical unit). To ensure that there is no buffer overflow, the two routers synchronize each data transfer. The unit of synchronization is called flit (flow

2012 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

88

Fig.2. Packet Structure

control unit) and should be (at least) as large as phit. A packet consists of multiple flits, many of these packets will may make up a message. It is these messages that are sent across by the modules connected to the NoC. The design choice depends on sizes of phit and flit, such as link speed versus router arbitration speed. There are certain important points to be taken into consideration to determine the switching technique for a NoC, such as: 1) Granularity of the data to be sent and the frequency with which it is sent 2) Cost and complexity of the router 3) The dynamism and number of concurrent flows to be supported 4) Resulting performance (bandwidth, latency) of the NoC. Different type of NoC often uses different switching techniques. There are basically two switching techniques of transporting flits:

in a circuit-switched connection. Current SoC have a large amount of wiring resources that give enough flexibility for streams with different bandwidth demands. Circuit switching eases the implementation of asynchronous communication techniques, because data and control can be separated. A control free pipelined asynchronous data stream does not require much design effort. The circuit switching has a minimal amount of control in the data pad (e.g. no arbitration). This increases the energy- efficiency per transported bit and the maximum throughput. Further, we see some benefits when guaranteed throughput traffic has to be scheduled: Scheduling communication streams over non-time multiplexed channels is easier, because by definition a stream will not have collisions with other communication streams. The thereal [10] and SoCBUS [11] routers have large interaction between data streams (both have to guarantee contention free paths). Determining the static time slots table requires considerable effort. Because data-streams are physically separated, collisions in the crossbar do not occur. Therefore, we do not need buffering and arbitration in the individual router. An established physical channel can always be used.

5.1 Circuit switching In this type of switching, all the flits of the message are sent on a circuit with fixed physical path between sender and receiver. Sharing resources and giving guarantees are conflicting, and efficiently combining guaranteed traffic with best-effort traffic is hard [9]. By using dedicated techniques for both types of traffic we aim for reducing the total area and power consumption. In this paper we concentrate on the architecture for communication of guaranteed throughput traffic. For guaranteed throughput we use reconfigurable circuit switching that create dedicated connections between two processing tiles. The reasons for reconsidering circuit switching instead of using packet switching are: The flexibility of packet switching is not needed, because data streams are fixed for a relatively long time. Therefore, a connection between two tiles is required for a long period (e.g. seconds or longer). This connection can be configured by the CCN. Large amount of the traffic between tiles will need a guaranteed throughput, which can be easier guaranteed

5.2 Packet Switching In packet switching, the route between the sender and receiver is not fixed. Therefore, the packets consisting of messages make their own way independently from the sender to receiver along different routes and with different delays. There are three basic packet switching schemes:
a) Store and Forward switching (SAF) SAF is the simplest form of packet switching. A packet will be sent from sender to the receiver only when the receiving router has buffer space for the entire packet. Hence, this scheme ensures that packet transmission does not stall. Since, the routers forward a packet only when it

2012 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

89

can be received in its entirety, the latency per router and the buffer size are should be at least equal to the size of the packet. However, in NoC the buffer size should be minimum so few NoCs have used this basic technique. b) Virtual Cut through Switching (VCT) In this switching, the first flit of a packet is sent as soon as space for the entire packet is available in the next router. This helps in reducing the per router latency. The other flits then follow without delay. However, when no space is available for the entire packet, whole packet will have to be buffered. In computer networking, cut-through switching is a method for packet switching systems, wherein the switch starts forwarding a frame (or packet) before the whole frame has been received, normally as soon as the destination address is processed. This technique reduces latency through the switch, but decreases reliability. Switches do not necessarily have cut-through and store-and-forward "modes" of operation. As stated earlier , cut-through switches usually receive a predetermined number of bytes, depending on the type of packet coming in, before making a forwarding decision. The switch does not move from one mode to the other as dictated by configuration, speed differential, congestion, or any other condition. Virtual cut-through switching is the most sophisticated and expensive technique where, 1. Messages are split into packets and router has buffers for the whole packet as in SF switching. 2. Instead of waiting for the whole packet buffered, the incoming header flit is cut through into the next router as soon as the routing decision was made and the output channel is free.

Fig.3. Wormhole switching

3. Every further flit is buffered whenever it reaches the router, but it is also immediately cut-through to the next router if the output channel is free. 4. In case the header cannot proceed, it waits in the current router and all the following flits subsequently draw in, possibly releasing the channels occupied so far. 5. In case of no resource conflicts along the route, the packet is effectively pipelined through successive routers as a loose chain of flits. All the buffers along the routing path are blocked for other communication requirements. c) Wormhole Switching (WH) This switching technique is an improvement over the VCT switching. In WH switching the buffer requirements are reduced to one flit. In order to achieve this one flit transmission, each flit of a packet is send only when there is space for that flit in the receiving router. In other case when no space is available, the packet is left strung out over two or routers. This situation will lead to blocking of

the link and the resulting congestion is more than SAF and VCT switching. To alleviate this problem, we multiplex virtual links (or virtual channels) on one physical link. This approach will introduce usage dependencies between links and there make WH switching more susceptible to deadlock than SAF and VCT switching. To avoid deadlock, virtual channel and/ or routing schemes can be used. Almost all the NoCs use WH switching, without virtual channels use restricted topologies (usually partial mesh, with some form of dimension ordered routing) to avoid deadlock. Nodes in a direct network communicate by passing messages from one node to another. A message enters the network from a source node and is routed towards its destination through a series of intermediate nodes. Four types of switching techniques are usually used for this purpose: circuit switching, packet switching, virtual cut-through switching, and wormhole switching. In circuit switching, a dedicated path is established between the source and the destination before the data transfer initiates. The message is never blocked during transfer. In order to improve performance, packet switching is used. In packet switching, a message is divided into packets that are independently routed towards their destination. The destination address is encoded in the header of each packet. The entire packet is stored at every intermediate node and then forwarded to the next node in its path. In order to reduce the time to store the packets at each node, virtual cutthrough switching is introduced. In this technique, a message is stored at an intermediate node only if the next channel required is occupied by another packet. Wormhole switching is a variant of the virtual cutthrough technique that avoids the need for large buffers for saving messages. In wormhole switching, a packet is transmitted between the nodes in units of flits, the smallest units of a message on which flow control can be performed. The head flit of a message contains all the necessary routing information and all the others flit contain the data elements. The flits of the message are transmitted through the network in a pipelined fashion. It is shown in figure 3. The main advantage of wormhole switching derives from the pipelined message flow, since transmission latency is insensitive to the distance between the source and the destination. Moreover, since the message moves flit by flit across the network, each node needs to store only one flit. The reduction of buffer requirements at each node has a major effect on the cost and the size of systems. The main disadvantage of wormhole switching comes from the fact that only the head flit has the routing information. If the head flit cannot advance in the network due to resource allocation, all the trailing flits will be blocked along the path and these blocked messages can block other messages. This reduces network performance drastically and this chained blocking can also lead to deadlock.

2012 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

90

6 CONCLUSION
Network switching nodes generally internally slice incoming packets into multiple of so-called flits of equal size. Such flits are then transported in fixed timeslots via a crossbar fabric to their destined output port, where they are reassembled to packets. However, such a crossbar fabric does not scale linearly with the number of ports and for implementations with many ports a large chip area is consumed. A promising idea towards a more scalable solution is to replace the crossbar by a Network-onChip (NoC).

REFERENCES
[1] Martin, G.; Chang, H. System on Chip Design. In: 9th International Symposium on Integrated Circuits, Devices & Systems (ISIC01), Tutorial 2, 2001. [2] Kumar, S.; et al. A Network on Chip Architecture and Design Methodology. In: IEEE Computer Society Annual Symposium on VLSI. (ISVLSI02), Apr. 2002, pp. 105-112. [3] Benini, L.; De Micheli, G. Networks on chips: a new SoC paradigm. IEEE Computer, v.35(1), Jan. 2002, pp. 70-78. [4] Bolotin, E.; Cidon, I; Ginosar, R.; Kolodny, A. QNoC: QoS architecture and design process for Network on Chip. The Journal of Systems Architecture, Special Issue on Networks on Chip, 2004 (accepted for publication). [5] Moraes, F.; Mello, A.; Mller, L.; Ost, L.; Calazans, N. A Low Area Overhead Packet-switched Network on Chip: Architecture and Prototyping. In: IFIP Very Large Scale Integration (VLSI-SOC), 2003, pp 318-323. [6] Moraes, F. G.; Calazans, N.; Mello, A.; Mller, L.; Ost, L. HERMES: an Infrastructure for Low Area Overhead Packet-switching Networks on Chip. Integration VLSI Journal, 2003 (accepted for publication). [7] Andriahantenaina, A.; et al. SPIN: a Scalable, Packet Switched, On-Chip Micro-network. In. Design Automation and Test in Europe Conference (DATE'2003), 2003, pp. 70-73. [8] Liang, J.; Swaminathan, S.; Tessier, R. aSOC: A Scalable, Single-Chip communications Architecture. In: IEEE International Conference on Parallel Architectures and Compilation Techniques, Oct. 2000, pp. 3746. [9] J. Rexford and K. G. Shin, Support for multiple classes of traffic in multicomputer routers, in Proceedings of the First International Workshop on Parallel Computer Routing and Communication. Springer-Verlag, 1994, pp. 116130. [10] J. Dielissen, et al., Concepts and implementation of the Philips network-on-chip, in IP-Based SOC Design, Nov.2003. [11] D. Wiklund and D. Liu, Socbus: Switched network on chip for hard real time systems, in Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), Nice, France, April 2003.

2012 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/

Potrebbero piacerti anche