Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring Peng Sun Minlan Yu, Michael J. Freedman, Jennifer Rexford Princeton University.

Slides:



Advertisements
Similar presentations
Measuring IP Performance Geoff Huston Telstra. What are you trying to measure? User experience –Responsiveness –Sustained Throughput –Application performance.
Advertisements

Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level 1 List of Nominations Whats Good.
Tuning and Evaluating TCP End-to-End Performance in LFN Networks P. Cimbál* Measurement was supported by Sven Ubik**
A Measurement Study of Available Bandwidth Estimation Tools MIT - CSAIL with Jacob Strauss & Frans Kaashoek Dina Katabi.
Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
Programmable Measurement Architecture for Data Centers Minlan Yu University of Southern California 1.
TCP Vegas: New Techniques for Congestion Detection and Control.
RB-Seeker: Auto-detection of Redirection Botnet Presenter: Yi-Ren Yeh Authors: Xin Hu, Matthew Knysz, Kang G. Shin NDSS 2009 The slides is modified from.
Profiling Network Performance in Multi-tier Datacenter Applications Minlan Yu Princeton University 1 Joint work with Albert Greenberg,
Options or payload? Costin Raiciu UCL IETF 78, Maastricht.
Path Optimization in Computer Networks Roman Ciloci.
Measurements of Congestion Responsiveness of Windows Streaming Media (WSM) Presented By:- Ashish Gupta.
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
Profiling Network Performance in Multi-tier Datacenter Applications
IPlane: An Information Plane for Distributed Services Offence by: Anup Goyal Sagar Vemuri.
Internet Traffic Patterns Learning outcomes –Be aware of how information is transmitted on the Internet –Understand the concept of Internet traffic –Identify.
1 Is the Round-trip Time Correlated with the Number of Packets in Flight? Saad Biaz, Auburn University Nitin H. Vaidya, University of Illinois at Urbana-Champaign.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Informed Detour Selection Helps Reliability Boulat A. Bash.
On the Constancy of Internet Path Properties Yin Zhang, Nick Duffield AT&T Labs Vern Paxson, Scott Shenker ACIRI Internet Measurement Workshop 2001 Presented.
1 Network Tomography Venkat Padmanabhan Lili Qiu MSR Tab Meeting 22 Oct 2001.
Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,
A Measurement Framework for Pin-Pointing Routing Changes Renata Teixeira (UC San Diego) with Jennifer Rexford (AT&T)
Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.
1 Design study for multimedia transport protocol in heterogeneous networks Haitao Wu; Qian Zhang; Wenwu Zhu; Communications, ICC '03. IEEE International.
Internet and Intranet Protocols and Applications Section V: Network Application Performance Lecture 11: Why the World Wide Wait? 4/11/2000 Arthur P. Goldberg.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.
Tradeoffs in CDN Designs for Throughput Oriented Traffic Minlan Yu University of Southern California 1 Joint work with Wenjie Jiang, Haoyuan Li, and Ion.
Junxian Huang 1 Feng Qian 2 Yihua Guo 1 Yuanyuan Zhou 1 Qiang Xu 1 Z. Morley Mao 1 Subhabrata Sen 2 Oliver Spatscheck 2 1 University of Michigan 2 AT&T.
Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Internet Traffic Management Prafull Suryawanshi Roll No - 04IT6008.
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
 Zhichun Li  The Robust and Secure Systems group at NEC Research Labs  Northwestern University  Tsinghua University 2.
1 Reading Report 9 Yin Chen 29 Mar 2004 Reference: Multivariate Resource Performance Forecasting in the Network Weather Service, Martin Swany and Rich.
Internet Traffic Management. Basic Concept of Traffic Need of Traffic Management Measuring Traffic Traffic Control and Management Quality and Pricing.
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 2. Network Monitoring Metrics.
Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech.
An Efficient Approach for Content Delivery in Overlay Networks Mohammad Malli Chadi Barakat, Walid Dabbous Planete Project To appear in proceedings of.
©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang.
Aditya Akella The Performance Benefits of Multihoming Aditya Akella CMU With Bruce Maggs, Srini Seshan, Anees Shaikh and Ramesh Sitaraman.
1 Lecture 14 High-speed TCP connections Wraparound Keeping the pipeline full Estimating RTT Fairness of TCP congestion control Internet resource allocation.
Measurement and Modeling of Packet Loss in the Internet Maya Yajnik.
TFRC for Voice: the VoIP Variant Sally Floyd, Eddie Kohler. August 2005 draft-ietf-dccp-tfrc-voip-02.txt Slides:
TFRC for Voice: the VoIP Variant Sally Floyd, Eddie Kohler. November 2005 draft-ietf-dccp-tfrc-voip-02.txt Slides:
TCP behavior of a Busy Internet Server: Analysis and Improvements Y2K Oct.10 Joo Young Hwang Computer Engineering Research Laboratory KAIST. EECS.
Detection of Routing Loops and Analysis of Its Causes Sue Moon Dept. of Computer Science KAIST Joint work with Urs Hengartner, Ashwin Sridharan, Richard.
Pavel Cimbál, Sven Ubik CESNET TNC2005, Poznan, 9 June 2005 Tools for TCP performance debugging.
Masaki Hirabaru NICT Koganei 3rd e-VLBI Workshop October 6, 2004 Makuhari, Japan Performance Measurement on Large Bandwidth-Delay Product.
Understanding the network level behavior of spammers Published by :Anirudh Ramachandran, Nick Feamster Published in :ACMSIGCOMM 2006 Presented by: Bharat.
N. Hu (CMU)L. Li (Bell labs) Z. M. Mao. (U. Michigan) P. Steenkiste (CMU) J. Wang (AT&T) Infocom 2005 Presented By Mohammad Malli PhD student seminar Planete.
1 Capacity Dimensioning Based on Traffic Measurement in the Internet Kazumine Osaka University Shingo Ata (Osaka City Univ.)
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
The TCP-ESTATS-MIB Matt Mathis John Heffner Raghu Reddy Pittsburgh Supercomputing Center Rajiv Raghunarayan Cisco Systems J. Saperia JDS Consulting, Inc.
1 Evaluating NGI performance Matt Mathis
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
Module 9 Planning and Implementing Monitoring and Maintenance.
ECE 4110 – Internetwork Programming
Development of a QoE Model Himadeepa Karlapudi 03/07/03.
Performance Limitations of ADSL Users: A Case Study Matti Siekkinen, University of Oslo Denis Collange, France Télécom R&D Guillaume Urvoy-Keller, Ernst.
1 Three ways to (ab)use Multipath Congestion Control Costin Raiciu University Politehnica of Bucharest.
Distributed Systems 11. Transport Layer
Internet Quarantine: Requirements for Containing Self-Propagating Code
TCP Performance Monitoring
Distributed Network Traffic Feature Extraction for a Real-time IDS
TCP Vegas: New Techniques for Congestion Detection and Avoidance
Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting
Pong: Diagnosing Spatio-Temporal Internet Congestion Properties
Anant Mudambi, U. Virginia
Achieving reliable high performance in LFNs (long-fat networks)
Presentation transcript:

Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring Peng Sun Minlan Yu, Michael J. Freedman, Jennifer Rexford Princeton University August 19, 2011

Performance Bottlenecks 2 Server APP Server OS CDN Servers InternetClients APP Write too slowly Server OS Insufficient send buffer or Small initial congestion window Internet Network congestion Client Insufficient receive buffer

Reaction to Each Bottleneck 3 Server APP Server OS CDN Servers InternetClients APP is bottleneck: Debug application Server OS is bottleneck: Tune buffer size, or upgrade server Internet is bottleneck: Circumvent the congested part of network Client is bottleneck: Notify client to change

Server APP Packet Sniffer Server OS Previous Techniques Not Enough 4 Application logs: No details of network activities Packet sniffing: Expensive to capture Active probing: Extra load on network Transport-layer stats: Directly reveal perf. bottlenecks

How TCP Stats Reveal Bottlenecks CDN ServersInternetClients CDN Server Applications Server Network Stack Network Path Clients 5 Insufficient data in send buffer Send buffer full or Initial congestion window too small Packet loss Receive window too small

Measurement Framework Collect TCP statistics Web100 kernel patch Extract useful TCP stats for analyzing perf. Analysis tool Bottleneck classifier for individual connections Cross-connection correlation at AS level Map conn. to AS based on RouteView Correlate bottlenecks to drive CDN decisions 6

How Bottleneck Classifier Works 7 BytesInSndBuf = Rwin Rwin limits sending Client is bottleneck Cwin drops greatly and Packet loss Network path is bottleneck Small initial Cwin Slow start limits perf. Network Stack is bottleneck

CoralCDN Experiment CoralCDN serves 1 million clients per day Experiment Environment Deployment: A Clemson PlanetLab node Polling interval: 50 ms Traces to Show: Feb 19 th – 25 th 2011 Total # of Conn.: 209K After removing Cache-Miss Conn.: 137K (Total 2008 ASes) Log Space overhead < 200MB per Coral server per day 8

What are Major Bottleneck for Individual Clients ? We calculate the fraction of time that the connection is under each bottleneck in lifetime 9 Bottlenecks % of Conn. With Bottleneck for >40% of Lifetime Server Application10.75% Server Network Stack18.72% Network Path3.94% Clients1.27% Reasons: Slow CPU or scarce disk resources of the PlanetLab node Reasons: Congestion window rises too slowly for short conn. (>80% of the connections last <1 second) Reasons: Spotty network (discussed in next slide) Reasons: Receive buffer too small (Most of them are <30KB) Our suggestion: Use more powerful PlanetLab machines Our suggestion: Use larger initial congestion window Our suggestion: Filter them out of decision making

AS-Level Correlation CDNs make decision at the AS level e.g., change server selection for /24 Explore at the AS level: Filter out non-network bottlenecks Whether network problems exist Whether the problem is consistent 10

Filtering Out Non-Network Bottlenecks CDNs change server selection if clients have low throughput Non-network factors can limit throughput 236 out of 505 low-throughput ASes limited by non-network bottlenecks Filtering is helpful : Don’t worry about things CDNs cannot control Produce more accurate estimates of perf. 11

Network Problem at AS Level CDN make decision at AS level Whether conn. in the same AS have common network problem For 7.1% of the ASes, half of conn. have >10% packet loss rate Network problems are significant at the AS level 12

Consistent Packet Loss of AS CDNs care about predictive value of measurement Analyze the variance of average packet loss rates Each epoch (1 min) has nonzero average loss rate Loss rate is consistent across epochs (standard deviation < mean) 13 Analysis Length # of ASes with Consistent Packet Loss One Week377 / 2008 One Day (Feb 21 st )122 / 739 One Hour (Feb 21 st 18:00~19:00) 19 / 121

Conclusion & Future Work Use TCP-level stats to detect performance bottlenecks Identify major bottlenecks for a production CDN Discuss how to improve CDN’s operation with our tool Future Works Automatic and real-time analysis combined into CDN operation Detect the problematic AS on the path Combine TCP-level stats with application logs to debug online services 14

Thanks! Questions ? 15