Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Quiz 1
NIOS II processor basics FPGA basics Caches
Performance Size, number of bits Block placement Block identification Block replacement Write strategy
Quiz 1 (Cont.)
Key terms: Flynns taxonomy Shared memory architectures
Cache coherence NUMA, UMA, COMA Symmetric Multiprocessors
Distributed memory systems Classification based on communication Classification based on type of parallelism Chapter 1 from the textbook
Quiz 1 (Cont.)
Amdahl law Speedup, Efficiency Parallelism profile, average parallelism, MIPS Scalability Understanding of performance of the program for parallel addition
Overview
Network properties Switches Single and multistage Interconnection networks Crossbar
Network properties
Node degree d - the number of edges incident on a node.
In degree Out degree
Diameter D of a network is the maximum shortest path between any two nodes. The network is symmetric if it looks the same from any node. The network is scalable if it expandable with scalable performance when the machine resources are increased.
Bisection width
Bisection width is the minimum number of wires that must be cut to divide the network into two equal halves. Small bisection width -> low bandwidth A large bisection width -> a lot of extra wires
A cut of a network C(N1,N2) is a set of channels that partition the set of all nodes into two disjoint sets N1 and N2. Each element of C(N1,N2) is a channel with a source in N1 and destination in N2 or vice versa. A bisection of a network is a cut that partitions the entire network nearly in half, such that |N2||N1||N2+1|. Here |N2| means the number of nodes that belong to the partition N2. The channel bisection of a network is the minimum channel count over all bisections of the network:
Bc min | C ( N1, N 2) |
bi sec tions
2 2 Switches
Switches
Module size 22 44 88 NN Legitimate states 4 256 16,777,216 NN Permutation connection 2 24 40,320 N!
Permutation function: each input can only be connected a single output. Legitimate state: Each input can be connected to multiple outputs, but each output can only be connected to a single input
10
Single-stage networks
Single stage Shuffle-Exchange IN (left) Perfect shuffle mapping function (right) Perfect shuffle operation: cyclic shift 1 place left, eg 101 --> 011 Exchange operation: invert least significant bit, e.g. 101 --> 100
11
Rearrangeable nonblocking
In this case a network should be able to establish all possible connections between inputs and outputs by rearranging its existing connections.
Blocking interconnection
A network is said to be blocking if it can perform many, but not all, possible connections between terminals. Example: the Omega network
12
Omega networks
A multi-stage IN using 2 2 switch boxes and a perfect shuffle interconnect pattern between the stages In the Omega MIN there is one unique path from each input to each output. No redundant paths no fault tolerance and the possibility of blocking
Example: Connect input 101 to output 001 Use the bits of the destination address, 001, for dynamically selecting a path Routing: - 0 means use upper output - 1 means use lower output
13
Omega networks
log2N stages of 2 2 switches N/2 switches per stage S=(N/2) log2(N) switches Number of permutations in a omega network 2S
14
Baseline networks
The network can be generated recursively The first stage N N, the second (N/2) (N/2) Networks are topologically equivalent if one network can be easily reproduced from the other networks by simply rearranging nodes at each stage.
15
Crossbar Network
Each junction is a switching component connecting the row to the column. Can only have one connection in each column
16
Crossbar Network
The major advantage of the cross-bar switch is its potential for speed. In one clock, a connection can be made between source and destination. The diameter of the cross-bar is one. Blocking if the destination is in use Because of its complexity, the cost of the cross-bar switch can become the dominant factor for a large multiprocessor system. Crossbars can be used to implement the ab switches used in MINs. In this case each crossbar is small so costs are kept down.
17
Problem
A) Use two-input AND and OR gates to construct NxN crossbar switch network between N processors and N memory modules. Use cij signal as the enable signal for the switch in ith row and jth column. Let the width of each crosspoint be w bits. B) Estimate the total number of AND and OR gates needed as a function of N and w.
18
Problem (cont.)
M1 M2
...
Mn
Crosspoint
C11 P1
C12
C1n
P2
C21
C22
C2n
...
Cn1 Pn Cn2 Cnn
19
Problem (cont.)
M1 M2
...
Mn
Crosspoint
M1
C12 C1n
C11 P1
P2
C21
C22
C2n
Crosspoint
...
Cn1 Pn
C11
Cn2 Cnn
P1
20
Problem (cont.)
P1 Address P2 Address
Decoder
Decoder
1 C11 C12
C21 C22
21
Performance Comparison
Network Latency Switching Wiring Blocking complexity complexity
O(1) O(w) yes
Bus
MIN
yes
Crossbar
O(1)
no
22
23
References
1. Advanced Computer Architecture and Parallel Processing, by Hesham El-Rewini and Mostafa Abd-ElBarr, John Wiley and Sons, 2005. 2. Advanced Computer Architecture Parallelism, Scalability, Programmability, by K. Hwang, McGraw-Hill 1993. 3. A. Lines, Nexus: an asynchronous crossbar interconnect for synchronous system-on-chip designs, Proc. of High Performance Interconnects, pp 2-7, 2003.
24