Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSIT560 By M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues and Others.

Similar presentations


Presentation on theme: "CSIT560 By M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues and Others."— Presentation transcript:

1 CSIT560 By M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues and Others

2 CSIT560 By M. Hamdi 2 Key Characteristics in Designing Internet Switches and Routers 1. 1. Scalability in terms of line rates 2. 2. Scalability in terms of number of interfaces (port numbers)

3 CSIT560 By M. Hamdi 3 Switch fabric chips comparison http://www.lightreading.com/document.asp?doc_i d=47959 http://www.lightreading.com/document.asp?doc_i d=47959

4 CSIT560 By M. Hamdi 4 Head-of-Line Blocking Blocked!

5 CSIT560 By M. Hamdi 5

6 6

7 7 Crossbar Switches: Virtual Output Queues Virtual Output Queues: –At each input port, there are N queues – each associated with an output port –Only one packet can go from an input port at a time –Only one packet can be received by an output port at a time It retains the scalability of FIFO input-queued switches It eliminates the HoL problem with FIFO input Queues

8 CSIT560 By M. Hamdi 8 Virtual Output Queues

9 CSIT560 By M. Hamdi 9 Scheduler VOQs VOQs: How Packets Move

10 CSIT560 By M. Hamdi 10 Crossbar Scheduler in VOQ Architecture Scheduler Memory b/w=2R Can be quite complex!

11 CSIT560 By M. Hamdi 11 Question: do more lanes help? Answer: it depends on the scheduling Head of Line BlockingVOQs with Bad Scheduling Good Scheduling? Ayalon: depends on traffic matrix…

12 CSIT560 By M. Hamdi 12 Crossbar Scheduler in VOQ Architecture Which packets I can send during each configuration of the crossbar

13 CSIT560 By M. Hamdi 13 Port Processor optics LCS Protocol optics Port Processor optics LCS Protocol optics Crossbar Switch core architecture Port #1 Scheduler RequestGrant/CreditCell Data Port #256

14 CSIT560 By M. Hamdi 14 Basic Switch Model A 1 (n) S(n) N N L NN (n) A 1N (n) A 11 (n) L 11 (n) 11 A N (n) A NN (n) A N1 (n) D 1 (n) D N (n)

15 CSIT560 By M. Hamdi 15 Some definitions 3. Queue occupancies: Occupancy L 11 (n) L NN (n)

16 CSIT560 By M. Hamdi 16 Some possible performance goals When traffic is admissible

17 CSIT560 By M. Hamdi 17 VOQ Switch Scheduling A1 B C D E F 2 3 4 5 6 The VOQ switch scheduling can be represented by a bipartite graph –The left-hand side nodes of the bipartite graph are the input ports –The right-hand side nodes of the bipartite graph are the output ports –The edges between the nodes are requests for packet transmission between input ports and output ports.

18 CSIT560 By M. Hamdi 18 Maximum size bipartite match Intuition: maximizes instantaneous throughput L 11 (n)>0 L N1 (n)>0 “Request” Graph Bipartite Match Maximum Size Match

19 CSIT560 By M. Hamdi 19 Network flows and bipartite matching Finding a maximum size bipartite matching is equivalent to solving a network flow problem with capacities and flows of size “1”. A1 Source s Sink t B C D E F 2 3 4 5 6

20 CSIT560 By M. Hamdi 20 Network Flows Source s Sink t ac bd 10 1 1 1 Let G=[V,E] be a directed graph with capacity cap(v,w) on edge [v,w]. A flow is an (integer) function, f, that is chosen for each edge so that f(v,w) <= cap(v,w). We wish to maximize the flow allocation.

21 CSIT560 By M. Hamdi 21 A maximum network flow example By inspection Source s Sink t ac bd 10 1 1 1 Step 1: Source s Sink t ac bd 10, 10 10 10, 10 1 1 1 10 10, 10 Flow is of size 10

22 CSIT560 By M. Hamdi 22 A maximum network flow example Source s Sink t ac bd 10, 10 10, 1 10, 10 1 1 1, 1 10, 1 10, 10 Step 2: Flow is of size 10+1 = 11 Source s Sink t ac bd 10, 10 10, 2 10, 9 1,1 10, 2 10, 10 Maximum flow: Flow is of size 10+2 = 12 Not obvious

23 CSIT560 By M. Hamdi 23 Ford-Fulkerson method of augmenting paths 1.Set f(v,w) = -f(w,v) on all edges. 2.Define a Residual Graph, R, in which res(v,w) = cap(v,w) – f(v,w) 3.Find paths from s to t for which there is positive residue. 4.Increase the flow along the paths to augment them by the minimum residue along the path. 5.Keep augmenting paths until there are no more to augment.

24 CSIT560 By M. Hamdi 24 Example of Residual Graph st ac bd 10, 10 10 10, 10 1 1 1 10 10, 10 Flow is of size 10 t ac bd 10 1 1 1 s res(v,w) = cap(v,w) – f(v,w) Residual Graph, R Augmenting path

25 CSIT560 By M. Hamdi 25 Example of Residual Graph st ac bd 10, 10 10 10, 10 1 1 1 10 10, 10 Flow is of size 10 t ac bd 10 1 1 1 s res(v,w) = cap(v,w) – f(v,w) Residual Graph, R Augmenting path

26 CSIT560 By M. Hamdi 26 Example of Residual Graph st ac bd 10, 10 10, 1 10, 10 1 1 1, 1 10, 1 10, 10 Step 2: Flow is of size 10+1 = 11 st ac bd 10 1 1 1 1 1 Residual Graph 9 9 Augmenting path

27 CSIT560 By M. Hamdi 27 Example of Residual Graph st ac bd 10, 10 10, 2 10, 9 1, 1 10, 2 10, 10 Step 3: Flow is of size 10+2 = 12 st ac bd 10 2 1 1 1 2 Residual Graph 8 8

28 CSIT560 By M. Hamdi 28 An other Example: Ford-Fulkerson method s 16 13 10 4 9 7 12 20 4 11 ab cd t f=0 G s 16 13 10 4 9 7 12 20 4 11 ab cd t GfGf find augmenting path p s 16 4/13 10 4 9 7 12 20 4/4 4/11 ab cd t s 16 4 10 4 9 7 12 20 4 7 ab cd t 4 9 f=4

29 CSIT560 By M. Hamdi 29 f=4 G GfGf find augmenting path p s 16 4/13 10 4 9 7 12 20 4/4 4/11 ab cd t s 16 4 10 4 9 7 12 20 4 7 ab cd t 4 9 f=4+12 s 12/16 4/13 10 4 9 7 12/12 12/20 4/4 4/11 ab cd t s 12 4 10 4 9 7 12 8 4 7 ab cd t 4 9 4 An other Example: Ford-Fulkerson method

30 CSIT560 By M. Hamdi 30 f=16 G GfGf find augmenting path p s 12/16 4/13 10 4 9 7 12/12 12/20 4/4 4/11 ab cd t s 12 4 10 4 9 7 12 8 4 7 ab cd t 4 9 4 f=16+7 s 12/16 11/13 10 4 9 7/7 12/12 19/20 4/4 11/11 ab cd t s 12 11 10 4 9 7 12 1 4 11 ab cd t 2 4 19 An other Example: Ford-Fulkerson method

31 CSIT560 By M. Hamdi 31 f=23 G GfGf find augmenting path p s 12/16 11/13 10 4 9 7/7 12/12 19/20 4/4 11/11 ab cd t s 12 11 10 4 9 7 12 1 4 11 ab cd t 2 4 19 No more augmenting path Maximum Flow is 23 An other Example: Ford-Fulkerson method

32 CSIT560 By M. Hamdi 32 An example for Flow: Obvious solution S T 10 9 99 9 Input graph G S T 10 9 99 9 Residual Graph G r S T Flow graph G f S T 0 10 0 0 9 99 9 S T S T 9 99 9 S T Total flow = 10, Sub-optimal solution!

33 CSIT560 By M. Hamdi 33 Flow algorithm – Optimal version S T 10 9 99 9 Input graph G S T 10 9 99 9 Residual Graph G r S T Flow graph G f S T 10 9 99 9 S T S T 0 0 0 9 99 9 S T S T 9 99 9 S T S T 9 99 9 S T Total flow = 10 + 9 = 19 units! S T 1 1 S T 10 1 1 9 9 9 9 9 9 9 9 9 9 9 9 9

34 CSIT560 By M. Hamdi 34 Complexity of network flow problems In general, it is possible to find a solution by considering at most V.E paths, by picking shortest augmenting path first. There are many variations, such as picking most augmenting path first. The complexity of the algorithm is less when the graph is bipartite There are techniques other than the Ford- Fulkerson method.

35 CSIT560 By M. Hamdi 35 Ford - Fulkerson Algorithm – 1 123456 sink abcdef source Network flows and bipartite matching Finding a maximum size bipartite matching is equivalent to solving a network flow problem with capacities and flows of size “1”.

36 CSIT560 By M. Hamdi 36 Ford - Fulkerson Algorithm – 2 123456 sink abcdef source Increasing the flow by 1.

37 CSIT560 By M. Hamdi 37 Ford - Fulkerson Algorithm – 3 123456 sink abcdef source Increasing the flow by 1.

38 CSIT560 By M. Hamdi 38 Ford - Fulkerson Algorithm – 4 123456 sink abcdef source Increasing the flow by 1.

39 CSIT560 By M. Hamdi 39 Ford - Fulkerson Algorithm – 5 123456 sink abcdef source Increasing the flow by 1.

40 CSIT560 By M. Hamdi 40 Ford - Fulkerson Algorithm – 6 123456 sink abcdef source Increasing the flow by 1.

41 CSIT560 By M. Hamdi 41 Ford - Fulkerson Algorithm – 7 123456 sink abcdef source Augmenting flow along the augmenting path.

42 CSIT560 By M. Hamdi 42 Ford - Fulkerson Algorithm – 8 123456 sink abcdef source Maximum flow found! Thus maximum matching found.

43 CSIT560 By M. Hamdi 43 Complexity of Maximum Matchings Maximum Size/Cardinality Matchings: –Algorithm by Dinic O(N 5/2 ) Maximum Weight Matchings –Algorithm by Kuhn O(N 3 logN) ftp://dimacs.rutgers.edu/pub/netflow/matching/ (contains code for maximum size/weighting algorithms) In general: –Hard to implement in hardware –Slooooow.

44 CSIT560 By M. Hamdi 44 Maximum size bipartite match Intuition: maximizes instantaneous throughput for uniform traffic. L 11 (n)>0 L N1 (n)>0 “Request” Graph Bipartite Match Maximum Size Match

45 CSIT560 By M. Hamdi 45 Why doesn’t maximizing instantaneous throughput give 100% throughput for non-uniform traffic? Three possible matches, S (n):

46 CSIT560 By M. Hamdi 46 Maximum weight matching A 1 (n) N N L NN (n) A 1N (n) A 11 (n) L 11 (n) 11 A N (n) A NN (n) A N1 (n) D 1 (n) D N (n) L 11 (n) L N1 (n) “Request” Graph Bipartite Match S*(n) Maximum Weight Match Weight could be length of queue or age of packetWeight could be length of queue or age of packet Achieves 100% throughput under all traffic patterns Achieves 100% throughput under all traffic patterns

47 CSIT560 By M. Hamdi 47 Packet Scheduling/Arbitration in Virtual Output Queues: Maximal Matching Algorithms

48 CSIT560 By M. Hamdi 48 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Maximum size matching Maximum weight matching 1 2 3 4 1 2 3 4 8 6 4 2 1 3 1 1 2 3 4 1 2 3 4 8 6 4 Maximum Matching in VOQ Architecture

49 CSIT560 By M. Hamdi 49 Complexity of Maximum Matchings Maximum Size/Cardinality Matchings: –Algorithm by Dinic O(N 5/2 ) Maximum Weight Matchings –Algorithm by Kuhn O(N 3 logN) In general: –Hard to implement in hardware –Slooooow.

50 CSIT560 By M. Hamdi 50 Maximal Matching A maximal matching is a matching in which each edge is added one at a time, and is not later removed from the matching. i.e., No augmenting paths allowed (they remove edges added earlier) – like by inspection. No input and output are left unnecessarily idle.

51 CSIT560 By M. Hamdi 51 Example of Maximal Size Matching A1 B C D E F 2 3 4 5 6 A1 B C D E F 2 3 4 5 6 Maximal Matching Maximum Matching

52 CSIT560 By M. Hamdi 52 Comments on Maximal Matchings In general, maximal matching is much simpler to implement, and has a much faster running time. A maximal size matching is at least half the size of a maximum size matching. A maximal weight matching is defined in the obvious way. A maximal weight matching is at least half the size of a maximum weight matching.

53 CSIT560 By M. Hamdi 53 PIM Maximal Size Matching Algorithm: Performance and Properties It is among the very first practical schedulers proposed for VOQ architectures (used by DEC). It is based on having arbiters at the inputs and outputs It iterates the following steps until no more requests can be accepted (or for a given number of iterations): 1.Request: Each unmatched input sends a request to every output for which it has a queued cell 2. Grant (outputs): If an unmatched output receives any request, it grants one by randomly selecting a request uniformly over all requests. 3.Accept (inputs): If an unmatched input receives a grant, it accepts one by selecting an output randomly among those granted to this input.

54 CSIT560 By M. Hamdi 54 State of Input Queues (N 2 bits) 1 2 N 1 2 N Decision Register Grant Arbiters Request Arbiters Implementation of the parallel maximal matching algorithms

55 CSIT560 By M. Hamdi 55 Implementation of the parallel maximal matching algorithms (another similar way)

56 CSIT560 By M. Hamdi 56 1 2 3 4 1 2 3 4 Step 1: Request 1 2 3 4 1 2 3 4 Step 2: Grant 1 2 3 4 1 2 3 4 Step 3: Accept PIM: 1 st Iteration Random selection PIM Maximum Size Matching Algorithm: Performance and Properties

57 CSIT560 By M. Hamdi 57 1 2 3 4 1 2 3 4 Step 3: Accept PIM: 2 nd Iteration 1 2 3 4 1 2 3 4 Step 1: Request Step 2: Grant 1 2 3 4 1 2 3 4 PIM Maximum Size Matching Algorithm: Performance and Properties

58 CSIT560 By M. Hamdi 58 Traffic Types to evaluate Algorithms Uniform traffic Unbalanced traffic Hotpot traffic

59 CSIT560 By M. Hamdi 59 Parallel Iterative Matching PIM with a single iteration

60 CSIT560 By M. Hamdi 60 Parallel Iterative Matching PIM with 4 iterations

61 CSIT560 By M. Hamdi 61 Parallel Iterative Matching Analytical Results Number of iterations to converge:

62 CSIT560 By M. Hamdi 62 PIM Maximum Size Matching Algorithm: Performance and Properties It is a fair algorithm – servicing inputs Can have 100% throughtput under uniform traffic It converges in logN iterations to a maximal size matching It has a very poor performance (63% throughput) with 1 iteration – because of its inability to desynchronize the output pointers It is not easy to build random arbiters in hardware The best iterative maximal size matching algorithm takes O(N 2 logN) serial or O(log N) parallel time steps. If the number of iterations is constant, then it can be implemented in constant time (that is why it is practical) – however the hardware design is not trivial.

63 CSIT560 By M. Hamdi 63 RRM Maximum Size Matching Algorithm: Performance and Properties Round Robin Matching (RRM) is easier to implement that PIM (in terms of designing the I/O arbiters). The pointers of the arbiters move in straightforward way It iterates the following steps until no more requests can be accepted (or for a given number of iterations): Request. Each input sends a request to every output for which it has a queued cell. Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted. The pointer g i to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input. If no request is received, the pointer stays unchanged.

64 CSIT560 By M. Hamdi 64 RRM Maximum Size Matching Algorithm: Performance and Properties Accept. If an input receives a grant, it accepts the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The pointer a i to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the accepted output. If no grant is received, the pointer stays unchanged.

65 CSIT560 By M. Hamdi 65 RRM Maximal Matching Algorithm (1) 0 1 2 3 0 1 2 3 Step 1: Request

66 CSIT560 By M. Hamdi 66 RRM Maximal Matching Algorithm (2) 0 1 2 3 0 1 2 3 Step 2: Grant 3 0 2 1 3 0 2 1

67 CSIT560 By M. Hamdi 67 RRM Maximal Matching Algorithm (2) 0 1 2 3 0 1 2 3 Step 2: Grant 3 0 2 1 3 0 2 1

68 CSIT560 By M. Hamdi 68 RRM Maximal Matching Algorithm (2) 0 1 2 3 0 1 2 3 Step 2: Grant 3 0 2 1 3 0 2 1

69 CSIT560 By M. Hamdi 69 RRM Maximal Matching Algorithm (2) 0 1 2 3 0 1 2 3 Step 2: Grant 3 0 2 1 3 0 2 1

70 CSIT560 By M. Hamdi 70 RRM Maximal Matching Algorithm (3) 0 3 1 2 0 1 2 3 0 1 2 3 Step 3: Accept 3 0 2 1 3 0 2 1

71 CSIT560 By M. Hamdi 71 RRM Maximal Matching Algorithm (3) 0 3 1 2 0 1 2 3 0 1 2 3 Step 3: Accept 3 0 2 1 3 0 2 1

72 CSIT560 By M. Hamdi 72 RRM Maximal Matching Algorithm (3) 0 3 1 2 0 1 2 3 0 1 2 3 Step 3: Accept 3 0 2 1 3 0 2 1

73 CSIT560 By M. Hamdi 73 Poor performance of RRM Maximal Matching Algorithm 0 1 0 1 0 1 0 1 50% Throughput 0101.. 0101..

74 CSIT560 By M. Hamdi 74 iSLIP Maximum Size Matching Algorithm: Performance and Properties It is a scheduler used in most VOQ switches (e.g., Cisco). It is exactly like RRM algorithm with the following change: Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted. The pointer g i to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input if and only if the grant is accepted in (Accept phase).

75 CSIT560 By M. Hamdi 75 1 2 3 4 1 2 3 4 Step 2: Grant 1 2 3 4 1 2 3 4 Step 3: Accept iSlip: 1 st Iteration 4 1 3 2 4 1 3 2 1 2 3 4 1 2 3 4 Step 1: Request 1 4 2 3 4 1 3 2 Original pointer Selected one Updated pointer iSLIP Maximum Size Matching Algorithm

76 CSIT560 By M. Hamdi 76 1 2 3 4 1 2 3 4 Step 2: Grant 1 2 3 4 1 2 3 4 Step 3: Accept iSlip: 2 nd Iteration 4 1 3 2 1 2 3 4 1 2 3 4 Step 1: Request 1 4 2 3 4 1 3 2 No change Original pointer Selected one Updated pointer iSLIP Maximum Size Matching Algorithm

77 CSIT560 By M. Hamdi 77 Simple Iterative Algorithms: iSlip 0 1 2 3 0 1 2 3 Step 1: Request

78 CSIT560 By M. Hamdi 78 Simple Iterative Algorithms: iSlip 0 1 2 3 0 1 2 3 Step 2: Grant 3 0 2 1 3 0 2 1

79 CSIT560 By M. Hamdi 79 0 1 2 3 0 1 2 3 Step 2: Grant 3 0 2 1 3 0 2 1 Simple Iterative Algorithms: iSlip

80 CSIT560 By M. Hamdi 80 0 3 1 2 0 1 2 3 0 1 2 3 Step 3: Accept 3 0 2 1 3 0 2 1 Simple Iterative Algorithms: iSlip

81 CSIT560 By M. Hamdi 81 0 3 1 2 0 1 2 3 0 1 2 3 Step 3: Accept 3 0 2 1 3 0 2 1 Simple Iterative Algorithms: iSlip

82 CSIT560 By M. Hamdi 82 Simple Iterative Algorithms: iSlip 0 3 1 2 0 1 2 3 0 1 2 3 Step 3: Accept 3 0 2 1 3 0 2 1

83 CSIT560 By M. Hamdi 83 Simple Iterative Algorithms: iSlip 0 3 1 2 0 1 2 3 0 1 2 3 Step 3: Accept 3 0 2 1 3 0 2 1

84 CSIT560 By M. Hamdi 84 Simple Iterative Algorithms: iSlip 0 3 1 2 0 1 2 3 0 1 2 3 Step 3: Accept 3 0 2 1 3 0 2 1

85 CSIT560 By M. Hamdi 85 iSLIP Implementation Grant Accept 1 2 N 1 2 N State N N N Decision log 2 N Programmable Priority Encoder

86 CSIT560 By M. Hamdi 86 Hardware Design Layout of the 256 bits Priority Encoder

87 CSIT560 By M. Hamdi 87 Hardware Design Layout of 256 bits grant arbiter

88 CSIT560 By M. Hamdi 88 FIRM Maximum Size Matching Algorithm: Performance and Properties It is exactly like iSLIP with a very small – yet significant modification. Grant (outputs): If an unmatched output receives a request, it grants the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request is granted. The pointer to the highest priority element of the round-robin schedule is incremented beyond the granted input. If input does not accept the pointer is set at the granted one.

89 CSIT560 By M. Hamdi 89 0 1 2 3 0 1 2 3 Step 3: Accept 3 0 2 1 3 0 2 1 Simple Iterative Algorithms: FIRM

90 CSIT560 By M. Hamdi 90 Pointer Synchronization Why this is good: this small change prevents the output arbiters from moving in lock-step (being synchronized – pointing to the same input) leading to a dramatic improvement in performance. If several outputs grant the same input, no matter how this input chooses, only one match can be made, and the other outputs will be idle. To get as many matches as possible, it's better that each output grants a different input. Since each output will select the highest priority input if a request is received from this input, it's better to keep the output pointers desynchronized (pointing to different locations).

91 CSIT560 By M. Hamdi 91 iSLIP Maximal Matching Algorithm 0 1 0 1 0 1 0 1 100% Throughput 0101.. 0010..

92 CSIT560 By M. Hamdi 92 Pointer Synchronization: Differences between RRM, iSlip & FIRM

93 CSIT560 By M. Hamdi 93 Differences between RRM, iSlip & FIRM RRMiSlipFIRM Input No grantunchanged Grantedone location beyond the accepted one Output No requestunchanged Grant accepted one location beyond the granted one Grant not accepted one location beyond the previously granted one unchangedthe granted one

94 CSIT560 By M. Hamdi 94 General remarks Since all of these algorithms try to approximate maximum size matching, they can be unstable under non-uniform traffic They can achieve 100% throughput under uniform traffic Under a large number of iterations, their performance is similar They have similar implementation complexity

95 CSIT560 By M. Hamdi 95 Input Queueing Longest Queue First or Oldest Cell First 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 10 1 1 1 1 M ax i m u m w e i g h t Weight Waiting Time 100% Queue Length { } =

96 CSIT560 By M. Hamdi 96 Input Queueing Why is serving long/old queues better than serving maximum number of queues? When traffic is uniformly distributed, servicing the maximum number of queues leads to 100% throughput. When traffic is non-uniform, some queues become longer than others. A good algorithm keeps the queue lengths matched, and services a large number of queues. VOQ # Avg Occupancy Uniform traffic VOQ # Avg Occupancy Non-uniform traffic

97 CSIT560 By M. Hamdi 97 Maximum/Maximal Weight Matching 100% throughput for admissible traffic (uniform or non- uniform) Maximum Weight Matching –OCF (Oldest Cell First): w=cell waiting time –LQF (Longest Queue First):w=input queue occupancy –LPF (Longest Port First):w=QL of the source port + Sum of QL form the source port to the destination port Maximal Weight Matching (practical algorithms) –iOCF –iLQF –iLPF (comparators in the critical path of iLQF are removed )

98 CSIT560 By M. Hamdi 98 Maximal Weight Matching Algorithms: iLQF Request. Each unmatched input sends a request word of width bits to each output for which it has a queued cell, indicating the number of cells that it has queued to that output. Grant. If an unmatched output receives any requests, it chooses the largest valued request. Ties are broken randomly. Accept. If an unmatched input receives one or more grants, it accepts the one to which it made the largest valued request. Ties are broken randomly.

99 CSIT560 By M. Hamdi 99 Maximal Weight Matching Algotithms: iLQF The i-LQF algorithm has the following properties: Property 1. Independent of the number of iterations, the longest input queue is always served. Property 2. As with i-SLIP, the algorithm converges in at most logN iterations. Property 3. For an inadmissible offered load, an input queue may be starved.

100 CSIT560 By M. Hamdi 100 Maximal Weight Matching Algotithms: iOCF The i-OCF algorithm works in similar fashion to iLQF, and has the following properties: Property 1. Independent of the number of iterations, the cell that has been waiting the longest time in the input queues (it must at the head of the queue) Property 2. As with i-LQF, the algorithm converges in at most logN iterations. Property 3. No input queue can be starved indefinitely. Property 4. It is difficult to keep time stamps on the cells.

101 CSIT560 By M. Hamdi 101 iLQF - Implementation

102 CSIT560 By M. Hamdi 102 iLPF - Implementation Complicated hardware

103 CSIT560 By M. Hamdi 103 Other research efforts Packet-based arbitration Exhaustive-based arbitration Numerous other efforts

104 CSIT560 By M. Hamdi 104 Packet Scheduling/Arbitration in Virtual Output Queues: Randomized Algorithms and Others

105 CSIT560 By M. Hamdi 105 Input-Queued Packet Switch Crossbar Scheduler inputs outputs 1 N 1N.......... i,j N,N 1,1 X i,j  i  j (  i i,j < 1 ;  j i,j < 1)

106 CSIT560 By M. Hamdi 106 Bipartite Graph and Matrix 011 111 001 inputs outputs 1 2 3 321

107 CSIT560 By M. Hamdi 107 Stability of Scheduling Definition: Let X i,j (t) be the number of packets queued at input i for output j at time-slot t. Then an algorithm is stable iff:

108 CSIT560 By M. Hamdi 108 Motivation Networking problems suffer from the “curse of dimensionality” –algorithmic solutions do not scale well Typical causes –size: large number of users or large number of I/O –time: very high speeds of operation A good deterministic algorithm exists (Max Flow), but … –it requires too large a data structure –it needs state information, and “state” is too big –it “starts from scratch” in each iteration

109 CSIT560 By M. Hamdi 109 Randomization Randomized algorithms have frequently been used in many situations where the state space (e.g., different number of connections between input and output N!) is very large Randomized algorithms –are a powerful way of approximating –it is often possible to randomize deterministic algorithms –this simplifies the implementation while retaining a (surprisingly) high level of performance The main idea is –to simplify the decision-making process –by basing decisions upon a small, randomly chosen sample of the state –rather than upon the complete state

110 CSIT560 By M. Hamdi 110 Randomizing Iterative Schemes (e.g., iSLIP) Often, we want to perform some operation iteratively Example: find the heaviest matching in a switch in every time slot Since, in each time slot –at most one packet can arrive at each input –and, at most one packet can depart from each output  the size of the queues, or the “state” of the switch, doesn’t change by much between successive time slots  so, a matching that was heavy at time t will quite likely continue to be heavy at time t+1 This suggests that –knowing a heavy matching at time t should help in determining a heavy matching at time t+1  there is no need to start from scratch in each time slot

111 CSIT560 By M. Hamdi 111 Summarizing Randomized Algorithms Randomized algorithms can help simplify the implementation –by reducing the amount of work in each iteration If the state of the system doesn’t change by much between iterations, then –we can reduce the work even further by carrying information between iterations The big pay-off is  that, even though it is an approximation, the performance of a randomized scheme can be surprisingly good

112 CSIT560 By M. Hamdi 112 Randomized Scheduling Algorithms: Example Consider a 3 x 3 input-queued switch –input traffic: is Bernoulli IID and λij = α/3 for all i, j, and α < 1 –This is admissible –note: there are a total of 6 (= 3!) possible service matrices

113 CSIT560 By M. Hamdi 113 Random Scheduling Algorithms In time slot n, let S(n) be equal to one of the 6 possible matchings independently and uniformly at random Stability of Random –Consider L11(n), the number of packets in VOQ11 arrivals to VOQ11 occur according to A11(n), which is Bernoulli IID input rate = λ11 = α/3 this queue gets served whenever the service matrix connects input 1 to output 1 There are 2 service matrices that connect input 1 to output 1 since Random chooses service matrices u.a.r., input 1 is connected to output 1 1. for a fraction of time = 2/6 = 1/3 --- the service rate between input1 and output1 E(L11(n)) < iff λ11 < 1/3  α < 1 This random algorithm is stable.

114 CSIT560 By M. Hamdi 114 Random Scheduling Algorithms Instability of Random Now suppose λii = α for all i and λij =0 for –clearly, this is admissible traffic for all α < 1 –but, under Random, the service rate at VOQ11 is 1/3 at best –hence VOQ11 and the switch will be unstable as soon as Stability (or 100% throughput) means it is stable under all admissible traffic!

115 CSIT560 By M. Hamdi 115 Obvious Randomized Schemes Choose a matching at random and use it as the schedule  doesn’t give 100% throughput (already shown) Choose 2 matchings at random and use the heavier one as the schedule Choose N matchings at random and use the heaviest one as the schedule   None of these can give 100% throughput !!

116 CSIT560 By M. Hamdi 116

117 CSIT560 By M. Hamdi 117 Iterative Randomized Scheme (Tassiulas) Say M is the matching used at time t Let R be a new matching chosen uniformly at random (u.a.r.) among the N! different matchings At time t+1, use the heavier of M and R Complexity is very low O(1) iterations This gives 100% throughput !  note the boost in throughput is due to memory (saving previous matchings) But, delays are very large

118 CSIT560 By M. Hamdi 118

119 CSIT560 By M. Hamdi 119 Finer Observations Let M be schedule used at time t Choose a “good’’ random matching R M’ = Merge(M,R) M’ includes best edges from M and R Use M’ as schedule at time t+1 Above procedure yields algorithm called LAURA There are many other small variations to this algorithm.

120 CSIT560 By M. Hamdi 120 3 2 3 2 2 1 2 3 4 1 Merging 3 2 3 3 1 XR 3-1+2-2=2 2-1+2-4=-1 W(X)=12W(R)=10 M W(M)=13 Merging Procedure

121 CSIT560 By M. Hamdi 121


Download ppt "CSIT560 By M. Hamdi 1 Packet Scheduling/Arbitration in Virtual Output Queues and Others."

Similar presentations


Ads by Google