Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE669 L12: Interconnection Network Performance March 9, 2004 ECE 669 Parallel Computer Architecture Lecture 12 Interconnection Network Performance.

Similar presentations


Presentation on theme: "ECE669 L12: Interconnection Network Performance March 9, 2004 ECE 669 Parallel Computer Architecture Lecture 12 Interconnection Network Performance."— Presentation transcript:

1 ECE669 L12: Interconnection Network Performance March 9, 2004 ECE 669 Parallel Computer Architecture Lecture 12 Interconnection Network Performance

2 ECE669 L12: Interconnection Network Performance March 9, 2004 Performance Analysis of Interconnection Networks °Bandwidth °Latency Proportional to diameter °Latency with contention °Processor utilization °+ Physical constraints

3 ECE669 L12: Interconnection Network Performance March 9, 2004 Bandwidth & Latency Latency Average latency k d : Bandwidth per node - more complex - if all near neighbor messages -If average travel dist then? -Let Direct k nk  worst case n (k  1) 2  torus  1 dir channels ~n k 4  torus  2 dir channels ~n k 3  no end conns  2 dir channels 1 B Avg DIST  nk 3 Msg size  B

4 ECE669 L12: Interconnection Network Performance March 9, 2004 Analogy –If each student takes 8 years to graduate –And if a Prof. can support 10 students max at any time How many new students can Prof. take on in a year? Each year take on °Similarly: Network has channels # of flits it can sustain = # of msgs it can concurrently sustain= Each msg flit uses channels to dest So Nn N x nk 3  Nn B or BW per node  x  3 kB Nn B nk 3

5 ECE669 L12: Interconnection Network Performance March 9, 2004 Another way of getting BW is: Max # msgs in net at any time These take cycles to get delivered, during which time no new mgs can get in I.e. we can inject or # injected per node per cycle °Note: We have not considered contention thus far. °In practice, latency shoots up much before we achieve the theoretical due to contention. = kn 3  Nn B B msgs every kn 3 cycles = Nn B · 3 kn · 1 N = 3 Bk

6 ECE669 L12: Interconnection Network Performance March 9, 2004 What’s a good performance metric to analyze networks Possibilities: -Bandwidth -Latency Discuss - Bandwidth - good when latency not an issue. I.e., if we can overlap all communication with computation –Estimate of how much info we can transport -Latency - good when not much parallelism exists (critical sections, e.g.), or when communication cannot be overlapped with computation. -Effective processor utilization best metric - but always not easy to derive. L BW U  1 1  req rate * L (U)

7 ECE669 L12: Interconnection Network Performance March 9, 2004 Performance Analysis First, analysis based on latency alone -Taking contention into account, but ignoring technological constraints Such analysis is o.k., as long as we do not compare workloads with different traffic rates With no contention, With contention, Average contention delay cycles at each switch Latency T = (1+c) hops + M + B Latency T = hops + M + B Mem Delay M Network Hops B

8 ECE669 L12: Interconnection Network Performance March 9, 2004 Direct k-ary n-cube (Agarwal paper) c   B  1   k d  1  kdkd 2 1  1 n       hops  nk d r = Fraction of bandwidth consumed, = mBk d Distance traveled in a dim. in a unidirectional torus kdkd  k  1 2 m  traffic rate, [probability of request] k-ary, n-cube latency  T  1        nk d  M  B  B  1   k d  1  kdkd 2 1  1 n      

9 ECE669 L12: Interconnection Network Performance March 9, 2004 A better metric °Processor utilization U °Or, how network delay impacts overall performance °U = Fraction of time spent doing useful work °Ideally, Each instruction takes 1 cycle If probability of message (remote mem req) on each useful cycle is m and corresponding latency is T Each instruction takes 1+mT cycles or M P T cycles Net U= 1 1 + mTmT

10 ECE669 L12: Interconnection Network Performance March 9, 2004 Deriving U is not easy °Notice m = Probability of a message on a useful processor cycle m eff = m U = probability of msg on any cycle T = f (m eff ) = network delay as a function of m or # of useful processor cycles depends on T and m eff °Cyclic dependence! T BW m U T T1T1 m1m1 m2m2 m0m0 U 1 m 2 U 0 m 1 =U 0 m 0 T0T0 T0T0 T1T1 U= 1 1 + mTmT

11 ECE669 L12: Interconnection Network Performance March 9, 2004 Processor utilization °k-ary n-cube m = Probability of msg on useful cycle °Channel utilization °Latency °Processor utilization °Two equations [1] and [2] °Two unknowns: T, U °Solving, °Where °and [1] [2] U= 1 1 + mTmT   mBk d U U= 1 1 + mTmT T  1        nk d  M  B  B  1   k d  1  kdkd 2 1  1 n       T  T0T0 2  Bk d 4  1 2m  1 2 T 0  2  1 m       2  2B2B2 k d  1  n  1  T 0  nk d  M  B  unloaded network latency

12 ECE669 L12: Interconnection Network Performance March 9, 2004 How do technological constraints impact network design, performance? °Constraints Wire lengths limit maximum speed of clock # pins limit size of nodes Bisection width limits # of wires per channel °Software Can we compile to the architecture? Can users specify programs? Is it scalable? Technology Constraints

13 ECE669 L12: Interconnection Network Performance March 9, 2004 The performance equation becomes °Latency °Recall, °Now, °See paper for details. °Summary: Constraints favor small n. switch delay wire delay msg length channel width: function of bisection width cycle time T = [ (1 + c) hops + B] T  T switch  T wire n   1  c  hops  B W(n)       f(n) E.g  k n 2  1  N 1 2  1 n E.g  k  N 1 n


Download ppt "ECE669 L12: Interconnection Network Performance March 9, 2004 ECE 669 Parallel Computer Architecture Lecture 12 Interconnection Network Performance."

Similar presentations


Ads by Google