Presentation on theme: "Florin Dinu T. S. Eugene Ng Rice University Inferring a Network Congestion Map with Traffic Overhead 0 zero."— Presentation transcript:
Florin Dinu T. S. Eugene Ng Rice University Inferring a Network Congestion Map with Traffic Overhead 0 zero
2 Effects of Congestion Need to identify, quantify and localize congestion
The Vision: Passively Inferred Congestion Map 3 R0R0 R1R1 R3R3 R5R5 X7X7 X8X8 AS 2 R2R2 R4R4 R6R6... Without any dedicated measurement (probing) traffic At fine time granularities (seconds) Good accuracy AS 1 How it works? Why it works? Where is this applicable?
4 Benefits of Passive Inference Solution/ChallengesActive Reporting (SNMP) Passive Inference Has reasonable accuracy Does not need access rights to routers Does not exacerbate existing congestion Detects congestion in timely manner x x Passive inference – complementary to active reporting
Overview – Passively Inferring Congestion Maps 5 R0R0 R1R1 R3R3 R5R5 X7X7 X8X8 AS 1 AS 2 R2R2 R4R4 R6R6... R0R0 R1R1 Step 1 : Use congestion markings from existing traffic Get path-level congestion information Routers are AQM/ECN capable and can mark existing traffic
6 P 06 P 04 P 46 ? Expand on Step 1: path-level congestion from AQM/ECN markings R0R0 R1R1 R3R3 R5R5 R2R2 R4R4 R6R6 R0R0 R1R1 Step 2: Use topological information to complete congestion map P 06 – P 04 P 46 = func(P 06,P 04 ) = 1 – P 04 Overview – Passively Inferring Congestion Maps
7 AQM Background AQM = Active Queue Management Router marks/drops packets probabilistically as a function of congestion severity Many different definitions of congestion severity Marking Probability (MP) Congestion severity RED, PI REM We use marking probability (MP) as the congestion measure
8 ECN Background – Marking Data Packets S D AQM/ECN Data packets are marked probabilistically ECN = Explicit Congestion Notification
Use of the Data Markings 9 R0R0 R1R1 R3R3 R5R5 R2R2 R4R4 R6R6 R0R0 R1R1 P 40 P 30 P 60 Data markings describe congestion on routers’ ingress paths Data packet marking is probabilistic => Use ratio of marked data packets to obtain MP on the ingress path
10 ECN Background - Echoing Echoing the markings from data packets to ACKs: S D ACK DATA The ACK markings are an altered version of the data packet markings
11 ECN Background – Responding to Markings Responding to marked ACKs: S Stopping the echoing after receiving a CWR packet: S DD S ACK DATA CWR DATA CWR ACK The ACK markings are an altered version of the data packet markings
12 Groups - Effect of ECN Echoing Groups of unmarked ACKs of “size zero”: Groups of marked and unmarked ACKs: CWR DD ACK DATA ACK DATA Group of size zero
Use of the ACK Markings 13 R0R0 R1R1 R3R3 R5R5 R2R2 R4R4 R6R6 R0R0 R1R1 P 05 P 03 P 04 ACK markings describe congestion on forward paths of the flows ACK markings describe congestion on routers’ egress paths Ratio of marked ACKs is an inaccurate measure ACK markings are very important and more challenging to use
14 Obtaining MP from ACK Markings p = MP on the forward path AVG_SZ_UNMARKED = func(p) D ACK DATA = ∑ n ∙ (1-p) n ∙ p=(1-p)/p n=0 ∞ To get MP need to compute average size of groups of unmarked ACKs CWR
15 Average Size of Groups of Unmarked ACKs Sampling Interval (SI) Training period start of Estimation Interval (EI) Flow1 Flow2 Flow5 end of EI Select flows until a limit is reached During training period only select flows, do not compute samples For each following SI Sample = avg size of groups of unmarked ACKs that finish in that SI Discard groups that start or end in different EI At end of EI use AVG(SAMPLES)=(1-p)/p to obtain p Flow4 Flow3 Not selected
16 Optimization – the Use of Groups of Size Zero Probability of a group to be of size zero is: (1-p) 0 ∙ p = p If p is high, most groups will be of size zero Better statistical significance if use groups of size zero Routers need to be on both the data and ACK path of a flow CWR D ACK DATA Group of size zero Use of groups of size zero increases accuracy
17 Evaluation – Parameter Settings ns-2 simulations, 500s simulation time AQM algorithms (RED, PI, REM) – RED by default SI=0.5 (congestion sample computed every 0.5s) Monitor at most 1000 flows per EI/path Groups of size zero used in all experiments
18 Evaluation – Traffic & Topology 5ms link delay, 500Mbps link bandwidth Metric: 50 th, 90 th percentile of |inferred MP – real MP | for each link R0 to Ri : 250*i 2 TCP flows Ri to Ri+2: 100 TCP flows R0R0 R1R1 R2R2 R8R8 R9R9 R 10 Ri to Ri+2: 100 TCP flows UDP Hop 10
19 Evaluation – vs Baseline Solution D CWR ACK DATA Our group-based solution (GROUP): Baseline solution, no alteration (REFERENCE): D CWR ACK DATA GROUP vs REFERENCE
Sensitivity to the Length of the EI 20 Accuracy decreases with hop count but is within 0.1 for most cases Value of EI (s) - logscale
Sensitivity to Drastic Changes 21 UDP sources vary their sending rate by 50Mbps between 250Mbps and 750Mbps Every 10s we start 3000 TCP flows between random nodes, for a random time (0-10s) How well does our solution track these sudden and large variations?
Sensitivity to Drastic Changes Accuracy decreases with hop count but is within for most cases 22 EI = 10s EI = 3s 90 th perc. 50 th perc.
23 Sensitivity to AQM Marking Function A linear marking function allows better inference for our solution Why does REM perform much worse? Abrupt variations in marking probability Limited visibility Marking/Drop Probability Congestion severity RED, PI REM
24 Limited Visibility R0R0 R1R1 R2R2 R1 marks 100% of packets R2 marks 30% of packets P 20 P 10 If P 20 =P 10 =100%, P 12 is unknown (any value possible) At high MP (less than 100%) problem still exist because very few packets are left unmarked Limited visibility appears at high MP. More probable for REM. P 12 =??
25 Sensitivity to Dropped ACKs - Numerical Drop ACKs can modify the average size of groups of unmarked ACKs Size Size Average size: 3.75 Average size: 4.33 ACKs can be dropped by non-AQM/ECN routers Pure ACKs can be dropped even by AQM/ECN routers
26 Sensitivity to Dropped ACKs - Numerical At reasonable drop probabilities the additional error is low
27 Other Advantages of Our Solution Incremental deployment On specific paths Around non AQM-ECN routers Useful in heterogeneous environments Different AQM types
28 Related Work Re-ECN [SIGCOMM 2005], ConEx IETF WG Extends ECN with one step Sources re-echo congestion information from ACK markings A router on forward path has upstream, downstream and whole path-congestion Useful for traffic policing or traffic management Lower precision. Limited by header space bits. Needs modifications to ECN and headers Does not address challenge posed by ACK markings Does not go beyond path-level congestion inference
29 Conclusion Novel method for inferring congestion with zero network overhead Does not require changes to hosts, headers or protocols Incrementally deployable and useful in heterogeneous environments Good accuracy even in very congested environments
Thank you Credits for the pictures
31 Why not Use Ratio for ACK Markings? The ratio of marked ACKs is very inaccurate. Need a better solution.
32 Effects of Using Delayed ACK - Numerical Additional error introduced by the use of delayed ACK
33 Sensitivity to Bandwidth (EI = 3s) Accuracy increases with bandwidth
34 Sensitivity to Flow Size (EI = 3s) Good accuracy even with many small flows
35 Severity of False Positives (EI = 3s) Small false positives inherent in probabilistic approach
Counters per-path Length & Number of all groups of unmarked Acks Counters per-flow Current group of unmarked ACKs Prefix matching for source and destination Transport protocol header matching for flow identification Sequence numbers for CWR 37 Implementation
Six real network topologies (Internet2, TEIN2, iLight, GEANT, SUNET, NLR) Assume all-to-all traffic pattern Average congestion map coverage NLR, Internet2, GEANT ~60% TEIN2 ~ 91% iLight ~ 94% SUNET ~ 95% 38 Coverage of Congestion Maps