Adonet Spring School1 Scheduling algorithms for input-queued IP routers Emilio Leonardi in collaboration with: P. Giaccone, M. Ajmone Marsan, A Bianco,

Slides:



Advertisements
Similar presentations
EE384y: Packet Switch Architectures
Advertisements

Belief-Propagation Assisted Scheduling in Input-Queued Switches S. Atalla 1, D. Cuda 2, P. Giaccone 1, M. Pretti 2 1 Politecnico di Torino 2 Italian National.
DYNAMIC POWER ALLOCATION AND ROUTING FOR TIME-VARYING WIRELESS NETWORKS Michael J. Neely, Eytan Modiano and Charles E.Rohrs Presented by Ruogu Li Department.
Router Architecture : Building high-performance routers Ian Pratt
Chen-Nien Tsai 2008/10/16 1 Wireless and Broadband Network Laboratory (WBNLAB) Dept. of CSIE, NTUT.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Towards Simple, High-performance Input-Queued Switch Schedulers Devavrat Shah Stanford University Berkeley, Dec 5 Joint work with Paolo Giaccone and Balaji.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
1 Input Queued Switches: Cell Switching vs. Packet Switching Abtin Keshavarzian Joint work with Yashar Ganjali, Devavrat Shah Stanford University.
CS 268: Router Design Ion Stoica March 1, 2004.
1 Comnet 2006 Communication Networks Recitation 5 Input Queuing Scheduling & Combined Switches.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
CSIT560 by M. Hamdi 1 Course Exam: Review April 18/19 (in-Class)
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.
1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
EE 122: Router Design Kevin Lai September 25, 2002.
CS 268: Lecture 12 (Router Design) Ion Stoica March 18, 2002.
Maximum Size Matchings & Input Queued Switches Sundar Iyer, Nick McKeown High Performance Networking Group, Stanford University,
1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and.
COMP680E by M. Hamdi 1 Course Exam: Review April 17 (in-Class)
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
1 Netcomm 2005 Communication Networks Recitation 5.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scheduling.
Distributed Scheduling Algorithms for Switching Systems Shunyuan Ye, Yanming Shen, Shivendra Panwar
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Computer Networks Switching Professor Hui Zhang
Load Balanced Birkhoff-von Neumann Switches
Belgrade University Aleksandra Smiljanić: High-Capacity Switching High-Capacity Packet Switches.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
CS 552 Computer Networks IP forwarding Fall 2005 Rich Martin (Slides from D. Culler and N. McKeown)
1 Copyright © Monash University ATM Switch Design Philip Branch Centre for Telecommunications and Information Engineering (CTIE) Monash University
High Speed Stable Packet Switches Shivendra S. Panwar Joint work with: Yihan Li, Yanming Shen and H. Jonathan Chao New York State Center for Advanced Technology.
Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms Thursday, October 08, 2015 Feng Wang Siu Hong Yuen.
Summary of switching theory Balaji Prabhakar Stanford University.
Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
Queueing in switched networks Damon Wischik, UCL thanks to Devavrat Shah, MIT TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
Improving Matching algorithms for IQ switches Abhishek Das John J Kim.
Topics in Internet Research: Project Scope Mehreen Alam
1 Buffering Strategies in ATM Switches Carey Williamson Department of Computer Science University of Calgary.
Packet Switch Architectures The following are (sometimes modified and rearranged slides) from an ACM Sigcomm 99 Tutorial by Nick McKeown and Balaji Prabhakar,
Input buffered switches (1)
Providing QoS in IP Networks
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
scheduling for local-area networks”
Weren’t routers supposed
CS 268: Router Design Ion Stoica February 27, 2003.
Packet Forwarding.
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Presentation transcript:

Adonet Spring School1 Scheduling algorithms for input-queued IP routers Emilio Leonardi in collaboration with: P. Giaccone, M. Ajmone Marsan, A Bianco, M.Mellia, F.Neri Dipartimento di Elettronica Telecommunication Network Group Politecnico di Torino (Italy) Budapest, March 2006

Adonet Spring School2 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School3 Note The slides marked RWP are reproduced with permission of Prof.Nick McKeown from the Electrical Engineering and Computer Science Dept. of Stanford University (CA,USA)

Adonet Spring School4 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School5 “The Internet is a mesh of routers” core router access router enterprise router

Adonet Spring School6 Access router:  high number of ports at low speed (kbps/Mbps)  several access protocols (modem, ADSL, cable) Enterprise router:  medium number of ports at high speed (Mbps)  several services (IP classification, filtering) Core router:  moderate number of ports at very high speed (Mbps/Gbps)  very high throughput “The Internet is a mesh of routers”

Adonet Spring School7 Basic functions  Routing  computation of the output port of an incoming packet  uses the routing tables computed by the routing protocols  can be a complex procedure: very large routing tables dynamic variation of routes in the Internet

Adonet Spring School8 Basic functions  Switching  transfer of packets from input ports to output ports  solution of the contentions for output ports queueing – where to store scheduling –what to transfer

Adonet Spring School9 Faster and faster  Need for high performance routers  to accommodate the bandwidth demands for new users and new services  to support QoS  to reduce costs

Adonet Spring School10 Packet processing and link speed 0, Fiber Capacity (Gbit/s) TDMDWDM Packet processing PowerLink Speed Moore’s law 2x / 18 months 2x / 7 months Source: SPEC95Int & David Miller, Stanford. RWP  Increase of electronic packet processing power cannot accommodate the increase in link speed ?

Adonet Spring School11 Moore’s Law 2x / 18 months 1.1x / 18 months RWP Memory access time

Adonet Spring School12 It’s hard to keep up with Moore’s law:  the bottleneck is memory speed Moore’s law is too slow:  routers need to improve faster than Moore’s law RWP Moore’s law

Adonet Spring School13 Router capacity exceeds Moore’s law Growth in capacity of commercial routers:  1992 ~ 2 Gb/s  1995 ~ 10 Gb/s  1998 ~ 40 Gb/s  2001 ~ 160 Gb/s  2003 ~ 640 Gb/s Average growth rate: 2.2x / 18 months RWP

Adonet Spring School14 Single packet processing  The time to process one packet is becoming shorter and shorter  worst case: 40-Byte packets (ACKs) travelling over the Internet 3.2  s at 100 Mbps 320 ns at 1 Gps 32 ns at 10 Gps 3.2 ns at 100 Gbps 320 ps at 1Tbps

Adonet Spring School15 Hardware architecture physical structurelogical structure

Adonet Spring School16 Hardware architecture Main elements  line cards  support input/output transmissions  store packets  adapt packets to the internal format of the switching fabric  support data link protocols  classify packets  schedule packets  support security  switching fabric  transfers packets from input ports to output ports

Adonet Spring School17 Main elements  control processor/network processor  runs routing protocols  computes routing tables  manages the overall system  forwarding engines  compute the packet destination (lookup)  inspect packet headers  rewrite packet headers Hardware architecture

Adonet Spring School18 switching fabric line card control processor forwarding engine forwarding engine 1N Interconnections among main elements - I

Adonet Spring School19 Interconnections among main elements - II switching fabric line card & forwarding engine control processor 1 line card & forwarding engine N

Adonet Spring School20 Cell switch (fabric) ORM 1 N 1 ISM N packetscells packets Cell-based routers  ISM: Input-Segmentation Module  ORM: Output-Reassembly Module  packet: variable-size data unit  cell: fixed-size data unit

Adonet Spring School21 Switching fabric  Our assumptions:  bufferless to reduce internal hardware complexity  non-blocking it is always possible to transfer in parallel from input to output ports any non-conflicting set of cells

Adonet Spring School22 Switching fabric  Examples:  crossbar  rearrangeable Clos network  Benes network  Batcher-Banyan network (self-routing)  Switching constraints  at most one cell for each input and for each output can be transferred outputs inputs

Adonet Spring School23 Switching fabric  We do not discuss switching fabrics with internal buffers  e.g.: crossbars with buffer at each crosspoint

Adonet Spring School24 Generic switching architecture Output 1 switching fabric Input 1 Input N Output N S in S out input queues output queues

Adonet Spring School25 Speedup  The speedup determinates the switch performance:  S in = reading speed from input queues  S out = writing speed to output queues  maximum speedup factor: S = max(S in,S out )

Adonet Spring School26 Performance comparison  The performance of different switching systems can be studied  with analytical models introducing simplifying assumptions, but obtaining general results  with simulation models obtaining more detailed results

Adonet Spring School27 Traffic description  A ij (n) = 1 if a packet arrives at time n at input i, with destination reachable through output j  ij = E[A ij (n)]  An arrival process is admissible if:   i ij  1   j ij  1 that is, no input and no output are overloaded on average note that OQ switches exhibit finite delays only for admissible traffic  traffic matrix:  = [ ij ]

Adonet Spring School28 Traffic scenarios  Uniform traffic  Bernoulli i.i.d. arrivals  usual testbed in the literature “easy to schedule”  Diagonal traffic  Bernoulli i.i.d arrivals  critical to schedule, since only two matchings are good

Adonet Spring School29 Traffic scenarios  LogDiagonal traffic  Bernoulli i.i.d. arrivals  more critical than uniform, less than diagonal traffic

Adonet Spring School30 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School31 Output Queued (OQ) switches  S in = 1 S out = N  used for low bandwidth routers  no coordination among ports  work-conserving best average delays  complete control of delays support of QoS scheduling

Adonet Spring School32 Output Queued (OQ) switch speedup N Output N Output 1 switching fabric Input 1 Input N

Adonet Spring School33 OQ performance OQ Note: OQ is optimal from the point of view of average delay and throughput Uniform traffic

Adonet Spring School34 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School35 Simple Input Queued (IQ) switches  S in = 1 S out = 1  1 FIFO queue for each input port  throughput limitations  due to head of the line (HOL) blocking  scheduling  to solve contentions for the same output Output N Input 1 Output 1 switching fabric Input 1

Adonet Spring School36 Head of the Line (HOL) Blocking RWP

Adonet Spring School37 Simple IQ switch performance OQ Simple IQ Uniform traffic

Adonet Spring School38 Improving simple IQ switches  Window/bypass schedulers  the first w cells of each queue contend for outputs  HOL blocking is reduced, not eliminated  w = 1 means FIFO at each input  higher complexity the scheduler deals with wN cells non-FIFO queues

Adonet Spring School39 Improving IQ switches  Virtual output queueing (VOQ)  one queue for each input/output pair N queues at each input N 2 queues in the whole switch  eliminates HOL blocking  used in high-bandwidth routers scheduling implemented in hardware at very high speed

Adonet Spring School40 IQ switches with VOQ Output N Input 1 1 N Output 1 Input N 1 N scheduler switching fabric Note: from now on, we always assume VOQ at the switch inputs input constraints output constraints

Adonet Spring School41 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School42 Scheduling in IQ switches  Scheduling can be modeled as a matching problem in a bipartite graph  the edge from node i to node j refers to packets at input i and directed to output j  the weight of the edge can be binary (not empty/empty queue) queue length HOL cell waiting time, or cell age some other metric indicating the priority of the HOL cell to be served

Adonet Spring School43 Scheduling in IQ switches Request GraphMatching (or Permutation) inputsoutputs scheduler

Adonet Spring School44 Scheduling in IQ switches Request Matrix scheduler Permutation

Adonet Spring School45 Implementing schedulers  Scheduling is a complex task  a scheduling algorithm can be implemented in hardware if: it shows good performance for a wide range of traffic patterns it can be efficiently parallelized it can be efficiently pipelined it requires few iterations (or clock cycles) it requires limited control information

Adonet Spring School46 Scheduling uniform traffic  A number of algorithms give 100% throughput when traffic is uniform  For example: TDM and a few variants iSLIP (see later) RWP Example of TDM for a 4x4 switch

Adonet Spring School47 Birkhoff - von Neumann theorem Any doubly stochastic matrix  can be expressed as convex combination of permutation matrices  n :  =  n a n  n with a n ≥0  n a n =1

Adonet Spring School48 Scheduling non-uniform traffic  thanks to the Birkhoff - von Neumann theorem  If the traffic is known and admissible, 100% throughput can be achieved by a TDM using:  for a fraction of time a 1 matching M 1 (     for a fraction of time a 2 matching M 2 (     for a fraction of time a k matching M k (   

Adonet Spring School49 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School50 Maximum Size Matching  Maximum Size Matching (MSM)  among all the possible matchings, selects the one with the highest number of edges MSM is generally not unique  the best MSM algorithm requires O(N 2.5 ) iterations, and cannot be implemented efficiently, since it is based on a flow augmentation path algorithm

Adonet Spring School51 Instability of MSM  Assume:  P(arrival at Q 12 ) =  P(arrival at Q 11 ) = P(arrival at Q 22 ) = 1- -   Q 12 = B » 0 Q 11 = Q 22 = 0  in case of parity serve Q 11 and/or Q 22 instead of Q 12  Observe:  Q 12 is served only when A 11 = 0 and A 22 = 0, i.e. with probability: P(serve Q 12 ) = P(no arrivals at both Q 11 and Q 22 ) = [1-(1- -  )] 2 = ( +  ) 2  P(serve Q 12 ) < P(arrival at Q 12 ) if  is small enough  Example: = 0.5;  = 0.1; P(serve Q 12 ) = 0.36 In1 In2 Out1 Out  Note: this proof is due to I.Keslassy, Stanford Univ.

Adonet Spring School52 Maximum Size Matching  MSM maximizes the instantaneous throughput  MSM may not yield 100% throughput  short term decisions can be inefficient in the long term  non-binary edge weights allow MWM to maximize the long-term throughput

Adonet Spring School53 Maximum Weight Matching  Maximum Weight Matching (MWM)  among all the possible N! matchings, selects the one with the highest weight (sum of the edge metrics) MWM is generally not unique  MWM is too complex to be implemented in hardware at high speed the best MWM algorithm requires O(N 3 ) iterations, and cannot be implemented efficiently, since it is based on a flow augmentation path algorithm cannot be parallelized and pipelined efficiently  MWM has never been implemented in a commercial chipset

Adonet Spring School54 Maximum Weight Matching  In case of unknown traffic, MWM is the optimal solution of the scheduling problem when the weight is either the queue length or the cell age  achieves 100% throughput under any traffic also under non-Bernoulli arrival processes, satisfying the law of large numbers  achieves low average delays, very close to those of OQ switches  possible starvation for lightly loaded packet flows

Adonet Spring School55 Maximum Weight Matching  MWM is the optimal solution of the scheduling problem when the traffic is unknown, when the weight is either the queue length or the cell age  achieves 100% throughput under any traffic also under non-Bernoulli arrival processes, satisfying the law of large numbers  achieves low average delays, very close to those of OQ switches  possible starvation for lightly loaded packet flows

Adonet Spring School56 MWM with pipeline and latency  Let T and P be fixed  D t denotes the matching used at time t  The following variations of MWM also achieve 100% throughput:  D t = MWM(t-P) MWM with pipeline degree P  D t = MWM(ceil(t/T)T) MWM with latency T  combinations of both  thus, it seems easy to achieve 100% throughput!

Adonet Spring School57 MWM with pipeline and latency  Bit:  What about throughput? 100% throughput –but needs the computation of a MWM …  What about delays? delays can be really bad!  

Adonet Spring School58 General consideration  When scheduling in IQ switches, it is very difficult to achieve simultaneously  high throughput  low delay  limited implementation complexity

Adonet Spring School59 Uniform traffic  MWM and MSM behave almost identically Mean delay Normalized Load Uniform Traffic MWM MSM

Adonet Spring School60 LogDiagonal traffic  MSM is somewhat inferior to MWM Mean delay Normalized Load LogDiagonal Traffic MWM MSM

Adonet Spring School61 Diagonal traffic  MSM yields much longer delays than MWM at medium/high loads Mean delay Normalized Load Diagonal Traffic MWM MSM

Adonet Spring School62 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School63 Approximations of MSM and MWM  Motivation  strong interest in scheduling algorithms with very low complexity high performance  Usually  implementable schedulers (low complexity)  low throughput, long delays  theoretical schedulers (high complexity)  high throughput, short delays

Adonet Spring School64 Some implementable algorithms  Approximate MSM  WFA, iSLIP, 2DRR, RC, FIRM and many others  Approximate MWM with w ij = X ij (queue length)  iLQF, RPA, learning algorithms  Approximate MWM with w ij = cell age  iOCF  Approximate MWM with w ij =  i X ij +  j X ij  iLPF, MUCS

Adonet Spring School65 APPROXIMATIONS OF MAXIMUM SIZE MATCHING

Adonet Spring School66 Wave Front Arbiter RequestsMatch RWP

Adonet Spring School67 Wave Front Arbiter RequestsMatch RWP 2N-1 steps

Adonet Spring School68 Wrapped Wave Front Arbiter Requests Match N steps instead of 2N-1 RWP

Adonet Spring School69 iSLIP  iSLIP means “iterative SLIP”  iterates among the following 3 phases  Request  Grant  Accept

Adonet Spring School70 iSLIP  3 phases:  Request (from inputs to outputs) each unmatched input sends a request to every output for which it has a cell  Grant (from outputs to inputs) if an unmatched output receives requests, it sends a grant to one of the inputs –contentions solved by a round-robin mechanism  Accept (from inputs to outputs) if an unmatched input receives grants, it selects a single output and it becomes matched to it –contentions solved by a round-robin mechanism

Adonet Spring School71 iSLIP  The round robin mechanism in iSLIP is designed so that, under uniform traffic, iSLIP emulates a dynamic TDM scheduler synchronized on the arrival pattern

Adonet Spring School72 iSLIP  iSLIP is maximal often, with log N iterations always, with N iterations  iSLIP was implemented on one chip in the Cisco router 

Adonet Spring School73 iSLIP iSLIP demo from:

Adonet Spring School74 APPROXIMATIONS OF MAXIMUM WEIGHT MATCHING

Adonet Spring School75 iLQF  iLQF means “iterative Longest Queue First”  iterates among the following 3 phases  Request  Grant  Accept

Adonet Spring School76 iLQF  3 phases:  Request (from inputs to outputs) each unmatched input sends all its queue lengths as requests to corresponding outputs  Grant (from outputs to inputs) if an unmatched output receives requests, it sends a grant to the input corresponding to the longest queue –contentions solved by random choice  Accept (from inputs to outputs) if an unmatched input receives grants, it selects the output with the longest queue –contentions solved by random choice

Adonet Spring School77 iLQF  iLQF is maximal often, with log N iterations always, with N iterations  iLQF is robust to non-uniform traffic

Adonet Spring School78 iLQF iLQF demo from:

Adonet Spring School79 RPA  RPA means “Reservation with Preemption and Acknowledgment”  Two phases  Reservation (possibly preemptive)  Acknowledgement  Sequential accesses to a reservation vector  Urg j (if set) is the urgency of the transfer from input In j to output j Urg 1,In 1 Urg 2,In 2 Urg 3,In 3 Urg N,In N Out 1Out 2Out 3Out N Vector Res

Adonet Spring School80 RPA  Vector Res is sequentially accessed by all inputs Res Input 1Input 2 Input 4Input 3

Adonet Spring School81 RPA Initially, at each round: Urg j = 0 for all j Reservation phase  when input i accesses Res  it computes W j = X ij – Urg j for all j  finds j * such that W j* = max{ W j }  if W j* > 0,  reserve output j * and set Urg j* =X ij*, possibly overwriting the previous reservation  otherwise,  leave the current reservation

Adonet Spring School82 RPA  Acknowledgement phase  if input i still finds its reservation at output j,  books output j  otherwise,  chooses an unreserved output j and books output j

Adonet Spring School83 Uniform traffic  comparison between MWM, iSLIP, iLQF, and RPA Mean delay Normalized Load Uniform Traffic MWM iSLIP iLQF RPA

Adonet Spring School84 LogDiagonal traffic  iSLIP saturates close to 84% throughput Mean delay Normalized Load LogDiagonal Traffic MWM iSLIP iLQF RPA

Adonet Spring School85 Diagonal traffic  RPA achieves 98% throughput, iLQF 87%, iSLIP 83% Mean delay Normalized Load Diagonal Traffic MWM iSLIP iLQF RPA

Adonet Spring School86 LEARNING ALGORITMS

Adonet Spring School87 Learning algorithms  Goal: find a good compromise among throughput, delay and complexity

Adonet Spring School88 Learning algorithms  Key observation  the matchings generated by MWM show limited changes from one time to another remembering the matching from the past simplifies the computation of the new matching  the search implemented by MWM can be enhanced with a randomized approach by observing arrivals by searching in parallel  based on an extension of randomized scheduling algorithms

Adonet Spring School89 Simple Randomized Schemes  Choose a matching at random and use it as the schedule  doesn’t yield 100% throughput  Choose 2 matchings at random and use the heavier one as the schedule  …  Choose N matchings at random and use the heaviest one as the schedule  None of these can give 100% throughput !

Adonet Spring School90 Simple randomized algorithms 32x32

Adonet Spring School91 Bounds on Maximum Throughput

Adonet Spring School92 Tassiulas’ scheme  Consider the following policy  R t = matching picked at random (uniformly) among all the possible N! matchings  D t = arg max { W(D t-1 ), W(R t ) }  Complexity is very low  O(1) iterations  easy to pipeline  Yields 100% throughput !  note the boost in throughput is due to memory of the past matching D t-1  However, delays are very large

Adonet Spring School93 Tassiulas' scheme 32x32

Adonet Spring School94 Learning approach  Properties of COMP1  W(D t )  W(D t-1 )  W(D t )  W(M t )  Examples:  COMP1 is the MAX among D t-1 and M t  COMP1 is the MERGE among D t-1 and M t D t-1 DtDt C OMP 1 MtMt

Adonet Spring School95 Merging = = X W(X)= R W(R)=10 M W(M)=13 Emulating MWM is O(N) MERGE procedure

Adonet Spring School96 The learning approach D t-1 DtDt C OMP 1 MtMt  Properties of M t  informally, M t should be a “good” sample in the space of all possible matchings  Examples:  M t is a matching picked uniformly at random  M t is a matching picked non-uniformly at random, with a high probability of being heavy  M t is derived from the arrival vector A t  M t is a good “neighbor” of D t-1

Adonet Spring School97 Theoretical properties D t-1 DtDt C OMP 1 MtMt  Stability  100% throughput under any admissible Bernoulli traffic pattern  Delay  the better is the weight of M t, the smaller are the queue lengths, and hence the smaller are the delays

Adonet Spring School98 D t-1 MtMt DtDt MAX N1N1 NKNK AtAt K-th neighbor of D t-1 Example of practical implementation  Exploiting parallel search:  This scheme is called APSARA

Adonet Spring School99 What is a “neighbor” of a matching? Each neighbor – differs from D t-1 in ONLY TWO edges – can be generated very easily in hardware 3 neighbors Example: 3 x 3 switch D t-1 N1N1 N2N2 N3N3

Adonet Spring School100 Max-APSARA  APSARA, as described before, is not maximal  Max-APSARA is a modified version of APSARA where a maximal size matching algorithm runs on the remaining unmatched inputs/outputs  e.g., if k inputs/outputs are unmatched, run iSLIP with k iterations select k random edges among the unmatched inputs/outputs

Adonet Spring School101 APSARA performance

Adonet Spring School102 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School103 Routers and switches  IP routers deal with variable-size packets  Hardware switching fabrics often deal with fixed-size cells Question:  how to integrate an hardware switching fabric within an IP router?

Adonet Spring School104 Router based on an IQ cell switch: cell-mode switching fabric IQ cell switch 1 ISM N ORM 1 N

Adonet Spring School105 Cell-mode scheduling  Scheduling algorithms work at cell level  pros: 100% throughput achievable  cons: interleaving of packets at the outputs of the switching fabric

Adonet Spring School106 Router based on an IQ cell switch: packet-mode switching fabric IQ cell switch 1 ISM N ORM 1 N NO packet interleaving if packet-mode

Adonet Spring School107 Router based on an IQ cell switch: packet-mode switching fabric IQ cell switch 1 ISM N ORM 1 N NO packet interleaving if packet-mode ORMs can be removed

Adonet Spring School108 Packet-mode scheduling  Rule: packets transferred as trains of cells  when an input starts transferring the first cell of a packet comprising k cells, it continues to transfer in the following k-1 time slots  Pros:  no interleaving of packets at the outputs  easy extension of traditional schedulers  Cons:  starvation due to long packets inherent in packet systems without preemption negligible for high speed rates

Adonet Spring School109 Packet-mode scheduling  Questions  can packet mode provide high throughput?  what about delays? YES! It depends… 

Adonet Spring School110 Packet-mode properties  Main theoretical results  MWM in packet-mode yields 100% throughput  Packet mode can provide shorter delays than cell mode, depending on the packet length distribution

Adonet Spring School111 Simulation scenario  Router with ISMs and ORMs  Uniform packet traffic  uniform packet load  uniform (1,192) packet size distribution  Spotted packet traffic  non uniform packet load  bimodal (3,100) packet size distribution P=P=

Adonet Spring School112 Uniform packet traffic  Packet mode and cell mode reach the same throughput Cell-mode Packet-mode Mean packet delay Normalized Load Uniform packet traffic for packet mode Mean packet delay Normalized Load Uniform packet traffic for cell mode MWM MSM iSLIP iLQF

Adonet Spring School113 Spotted packet traffic  Packet mode reaches higher throughput than cell mode Mean packet delay Normalized Load Spotted packet traffic for packet mode Mean packet delay Normalized Load Spotted packet traffic for cell mode MWM MSM iSLIP iLQF Cell-modePacket-mode

Adonet Spring School114 At high load PM becomes better Effect of packet size distribution  iSLIP delay CM /delay PM for different packet size distributions better PM better CM Packet mode gain for iSLIP Normalized load Uniform Exponential Trimodal Bimodal

Adonet Spring School115 Packet mode features  Packet mode scheduling  is a feasible modification of schedulers  improves throughput but it can generate some unfairness between long and short packets –inherent to all variable-packet networks without preemption  may give better packet delays than cell mode depends on the packet size distribution

Adonet Spring School116 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School117 Network of IQ routers  Question:  given a network of IQ switches and an admissible input traffic, is the network always stable? NO!  this is quite counterintuitive…but true

Adonet Spring School118 Networks of IQ routers  Consider the acyclic network of IQ routers in the following slide  derived from well established results from adversarial queueing theory  a very specific scenario, but comprises only few switches… this situation may not be common, but cannot be excluded in real networks

Adonet Spring School119 Pathological network of IQ switches Network with 8 switches and 4 flows

Adonet Spring School120 Instability of MWM  If MWM is adopted at each IQ router, and the traffic is admissible, the system can be unstable under Bernoulli i.i.d. arrivals

Adonet Spring School121 Instability of MWM  MWM is too greedy, in the sense that it can create traffic bursts that are amplified by each scheduler  A server can be idling when large bursts (directed to it) are blocked because of the contentions upstream  the problem arises when a packet flow is subject to priority changes along its path through the network it is “dangerous” to increase priority along the path

Adonet Spring School122 Stability in networks of routers  Global policies  “Oldest in the network” and many others problem: requires global information about the network, and perfectly synchronized clocks at the ingress of the network  Local policies  until now, nothing really satisfying known … (work in progress)

Adonet Spring School123 Stability in networks of routers  Semi-local policies  MWM with local information about the router neighbors can achieves 100% throughput under i.i.d. Bernoulli arrivals  Virtual network queue the weights used by MWM are: – w ij = max{0,X ij -H(X ij )} where H(X ij ) is the size of the queue upstream which is sending packets to X ij

Adonet Spring School124 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School125 CIOQ routers Output 1 switching fabric Output N S S o1o1 oNoN Input 1 S Input N S VOQ

Adonet Spring School126 CIOQ routers  Question:  if a low speedup S is allowed (and queues are available at both inputs and outputs), is it possible to design simple scheduling algorithms, capable of achieving high throughput and low delay? YES!

Adonet Spring School127 CIOQ routers with S=2  If S = 2  it is easy to obtain 100% throughput all maximal matchings work –based on stable marriage algorithms  it is less easy to obtain work conservation output never idling whenever a packet is present destined to it same average delays as OQ very good delay performance e.g.: LOOFA  it is difficult to perfectly emulate OQ…

Adonet Spring School128 LOOFA  The occupancy C j  is the number of cells currently residing at the j-th output queue  at each time slot, it is decremented by one because of departures  Basic idea of LOOFA  give priority to output channels with low occupancy, thereby attempting to maintain work-conservation for all outputs

Adonet Spring School129 LOOFA  If S = 2, during each of the two phases  each unmatched input selects a non-empty VOQ directed to the unmatched output with the lowest occupancy, and sends a request to that output  each unmatched output selects one of the requests, and sends a request to that input  repeat until the matching is maximal  the selection at the outputs can be round robin, random,...

Adonet Spring School130 CIOQ routers with S=2  If S = 2  it is difficult (but possible) to perfectly emulate an OQ router in terms of packet departures it is impossible to distinguish, by observing arrivals and departures, if the switching architecture is CIOQ or OQ delays are perfectly controlled –easy to implement scheduling algorithms born for OQ (eg: WFQ)

Adonet Spring School131 CIOQ routers  CIOQ are very promising architectures  many degrees of freedom in design how to balance input/output buffers how the buffers interact –e.g., by backpressure mechanisms  Several currently designed architectures are supposed to be CIOQ  The speedup S is becoming closer and closer to 1 in practical implementations of new switching architectures (CIOQ  IQ)

Adonet Spring School132 Outline  IP routers  OQ routers  IQ routers  Scheduling  Optimal algorithms  Heuristic algorithms  Packet-mode algorithms  Networks of routers  CIOQ routers  Multicast traffic  Conclusions

Adonet Spring School133 Multicast traffic Misleading (but common) idea:  observe 1.OQ can achieve 100% throughput under any admissible unicast and multicast traffic 2.OQ can be perfectly emulated by CIOQ with S = 2  then, with S = 2 it is possible to achieve 100% throughput for multicast traffic WRONG!  because observation 2 holds only for unicast traffic

Adonet Spring School134 Multicast traffic  Question:  what is the minimum speedup required to achieve 100% throughput? unknown! 

Adonet Spring School135 Multicast traffic  Possible implementations  copy network before the switching fabric a multicast cell with f destinations is treated as f cells possible bandwidth inefficiency  dedicated queue multicast packets are treated in some specific way 1 UC MC N N  N UC+MC N  N

Adonet Spring School136 Multicast traffic: optimal queueing  MC-VOQ queueing  best throughput performance avoids HOL blocking  2 N -1 queues for each input, one for each fanout set re-enqueuing process  out-of-sequence problem no re-enqueuing  some throughput degradation MC+UC 1 2 N -1 N  N

Adonet Spring School137 Multicast traffic: optimal scheduling  The optimal scheduling for multicast traffic can be defined similarly to unicast traffic  it is a sort of max flow algorithm on all N(2 N -1) queues  Many heuristics can be envisaged to approximate it

Adonet Spring School138 Summary  3 main ingredients for IQ scheduling algorithms:  Weight computation  Matching computation  Contention resolution

Adonet Spring School139 Summary  Weight computation  obtains the priority of each input queue  the metric can be related to queue length, waiting time of the cell at the HOL, …  Contention resolution  whenever the selection is among situations with equal weights  can be round robin, or random

Adonet Spring School140 Summary  Matching computation  computes the matching, trying to maximize its total weight  can be based on  an iterative search, like in iSLIP, iOCF, iLQF  a matrix greedy approach, like in MUCS, WFA  a reservation vector, like in RPA  a learning approach, like in APSARA

Adonet Spring School141 Summary  Good IQ scheduling algorithms exist:  100% throughput  short delay  limited complexity  Performance differences are significant only close to saturation

Adonet Spring School142 Summary  Open questions concerning IQ schedulers:  QoS guarantees  stability of networks of switches  multicast traffic

Adonet Spring School143 References Router functions and architectures  Keshav S., Sharma R., ``Issues and trends in router design'', IEEE Communications Magazine, vol.36, n.5, May 1998, p  Bux W., Denzel W.E., Engbersen T., Herkersdorf A., Luijten R.P.,``Technologies and building blocks for fast packet forwarding'', IEEE Communications Magazine, Jan.2001, pp  Newman P., Minshall G., Lyon T., Huston L.,``IP switching and gigabit routers'', IEEE Communications Magazine, Jan.1997, pp  Wolf T., Turner J.S., ``Design issues for high-performance active routers'', IEEE Journal on Selected Areas in Communications, vol.19, n.3, Mar.2001, pp Scheduling in IQ switches  Karol M., Hluchyj M., Morgan S., ``Input versus output queueing on a space division switch'', IEEE Transactions on Communications, vol.35, n.12, Dec.1987  McKeown N., Anantharam V., Walrand J.,``Achieving 100\% throughput in an input-queued switch'',IEEE INFOCOM'96, vol.1, San Francisco, CA, Mar.1996, pp  McKeown N.,``iSLIP: a scheduling algorithm for input-queued switches'', IEEE Transactions on Networking, vol.7, n.2, Apr.1999, pp  McKeown N., Mekkittikul A.,``A practical scheduling algorithm to achieve 100\% throughput in input-queued switches'', IEEE INFOCOM'98, vol.2, 1998, pp.792-9, New York, NY  Tamir Y., Chi H.-C., ``Symmetric crossbar arbiters for VLSI communication switches'', IEEE Transaction on Parallel and Distributed Systems, vol.4, no.1, Jan.1993, pp.13 –27  Chen H., Lambert J., Pitsilledes A.,``RC-BB switch. A high performance switching network for B-ISDN'', GLOBECOM 95

Adonet Spring School144 References Scheduling in IQ switches  Anderson T., Owicki S., Saxe J., Thacker C.,``High speed switch scheduling for local area networks'', ACM Transactions on Computer Systems, vol.11, n.4, Nov.1993  LaMaire R.O., Serpanos D.N., ``Two dimensional round-robin schedulers for packet switches with multiple input queues'', IEEE/ACM Transaction on Networking, vol.2, n.5, Oct.1994, p  Chen H., Lambert J., Pitsilledes A., ``RC-BB switch. A high performance switching network for B-ISDN'', IEEE GLOBECOM 95, 1995  Duan H., Lockwood J.W., Kang S.M., Will J.D., ``A high performance OC12/OC48 queue design prototype for input buffered ATM switches'', IEEE INFOCOM'97, vol.1, 1997, pp.20-8, Los Alamitos, CA  Partridge C., et al., ``A 50-Gb/s IP router'', IEEE Transactions on Networking, vol.6, n.3, June 1998, pp  Ajmone Marsan M., Bianco A., Leonardi E., Milia L., ``RPA: a flexible scheduling algorithm for input buffered switches'', IEEE Transactions on Communications, vol.47, n.12, Dec.1999, pp  Ajmone Marsan M., Bianco A., Filippi E., Giaccone P.,Leonardi E., Neri F.,``On the behavior of input queueing switch architectures'', European Transactions on Telecommunications, vol.10, n.2, Mar.1999, pp  Christensen K.J.,``Design and evaluation of a parallel-polled virtual output queued switch'', IEEE ICC 2001, vol.1, pp , 2001  Serpanos D.N., Antoniadis P.I., ``FIRM: a class of distributed scheduling algorithms for high-speed ATM switches with multiple input queues'', IEEE INFOCOM 2000, vol.2, pp , 2000  Ying Jiang, Hamdi, M., “A 2-stage matching scheduler for a VOQ packet switch architecture”, IEEE ICC 2002, vol.4, pp , 2002  Tassiulas L., ``Linear complexity algorithms for maximum throughput in radio networks and input queued switches'', IEEE INFOCOM'98, vol.2, New York, NY, 1998, pp  Giaccone P., Prabhakar B., Shah D., ``Towards simple, high-performance schedulers for high-aggregate bandwidth switches '', IEEE INFOCOM'02, New York, Jun.2002

Adonet Spring School145 References Packet scheduling in IQ switches  Ajmone Marsan M., Bianco A., Giaccone P., Leonardi E., Neri F., ``Packet scheduling in input-queued cell- based switches'', IEEE INFOCOM'01, Anchorage, Alaska, Apr.2001(extended version to appear in IEEE Trans. on Networking, about Oct.2002)  Moon S.H., Sung D.K., ``High-performance variable-length packet scheduling algorithm for IP traffic'', IEEE GLOBECOM'01, Dec.2001 Scheduling multicast traffic in IQ switches  Hayes J.F., Breault R., Mehmet-Ali M.K., ``Performance analysis of a multicast switch'', IEEE Transactions on Communications, vol.39, n.4, Apr.1991, pp  Kim C.K., Lee T.T., ``Call scheduling algorithm in multicast switching systems'', IEEE Transactions on Communications, vol.40, n.3, Mar.1992, pp  McKeown N., Prabhakar B., ``Scheduling multicast cells in an input-queued switch'', INFOCOM'96, vol.1, San Francisco, CA, Mar.1996, pp  Prabhakar B., McKeown N., Ahuja R., ``Multicast scheduling for input-queued switches'', IEEE Journal on Selected Areas in Communications, vol.15, n.5, Jun.1997, pp  Chen W., Chang Y., Hwang W., ``A high performance cell scheduling algorithm in broadband multicast switching systems'', IEEE GLOBECOM'97, vol.1, New York, NY, 1997, pp  Guo M., Chang R., ``Multicast ATM switches: survey and performance evaluation'', Computer Communication Review, vol.28, n.2, Apr.1998, pp  Andrews M., Khanna S., Kumaran K., ``Integrated scheduling of unicast and multicast traffic in an input- queued switch'', IEEE INFOCOM'99, vol.3, New York, NY, 1999, pp  Liu Z., Righter R., ``Scheduling multicast input-queued switches'', Journal of Scheduling, John Wiley & Sons, May 1999

Adonet Spring School146 References Scheduling multicast traffic in IQ switches  Nong G., Hamdi M., ``On the provision of integrated QoS guarantees of unicast and multicast traffic in input- queued switches'', IEEE GLOBECOM'99, vol.3, 1999  Ajmone Marsan M., Bianco A., Giaccone P., Leonardi E., Neri F., ``On the throughput of input-queued cell- based switches with multicast traffic'', IEEE INFOCOM'01, Anchorage Alaska, Apr.2001  Ge Nong, Hamdi M., “Providing QoS guarantees for unicast/multicast traffic with fixed/variable-length packets in multiple input-queued switches”, IEEE Symposium on Computers and Communications, pp.166 –171, 2001  Smiljanic A., “Flexible bandwidth allocation in high-capacity packet switches”, IEEE/ACM Transactions on Networking, vol.10, n.2, pp , Apr.2002 QoS support in IQ switches  Tabatabaee V., Georgiadis L., Tassiulas L., ``QoS provisioning and tracking fluid policies in input queueing switches'', IEEE INFOCOM'00, New York, Mar.2000  Chang C.S., Lee D.S., Jou Y.S., ``Load balanced Birkhoff-von Neumann switches'', 2001 IEEE Workshop on High Performance Switching and Routing, 2001, pp  Hung A., Kesidis G., McKeown N.,``ATM input-buffered switches with guaranteed-rate property'', IEEE ISCC'98, July 1998, pp , Athens, Greece Advanced architectures derived from pure IQ  Iyer S., McKeown N., ``Making parallel packet switches practical'', IEEE INFOCOM'01, Alaska, Mar.2001  Chang C.S., Lee D.S., Jou Y.S., ``Load balanced Birkhoff-von Neumann switches'', 2001 IEEE Workshop on High Performance Switching and Routing, 2001, pp  Sivaram R., Stunkel C.B., Panda D.K., “HIPIQS: a high-performance switch architecture using input queuing”, IEEE Transactions on Parallel and Distributed Systems, vol.13, n.3, pp , Mar.2002