Profiling Network Performance in Multi-tier Datacenter Applications

Slides:

Advertisements

Similar presentations

Introduction 1 Lecture 13 Transport Layer (Transmission Control Protocol) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer.

Advertisements

Programmable Measurement Architecture for Data Centers Minlan Yu University of Southern California 1.

Computer Networks Transport Layer. Topics F Introduction (6.1)  F Connection Issues ( ) F TCP (6.4)

Transport Layer3-1 TCP. Transport Layer3-2 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 r full duplex data: m bi-directional data flow in same connection.

Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,

Profiling Network Performance in Multi-tier Datacenter Applications Minlan Yu Princeton University 1 Joint work with Albert Greenberg,

Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.

6/3/ Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross-Layer Information Awareness CS495 – Spring 2005 Northwestern University.

Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat David Andersen, Greg Ganger, Garth Gibson, Brian Mueller* Carnegie Mellon University, *Panasas.

Identifying Performance Bottlenecks in CDNs through TCP-Level Monitoring Peng Sun Minlan Yu, Michael J. Freedman, Jennifer Rexford Princeton University.

Chapter 3 outline 3.1 transport-layer services

Congestion control in data centers

1 Is the Round-trip Time Correlated with the Number of Packets in Flight? Saad Biaz, Auburn University Nitin H. Vaidya, University of Illinois at Urbana-Champaign.

Defense: Christopher Francis, Rumou duan Data Center TCP (DCTCP) 1.

Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,

CSEE W4140 Networking Laboratory Lecture 7: TCP congestion control Jong Yul Kim

The Transport Layer Chapter 6. The Transport Service Services Provided to the Upper Layers Transport Service Primitives Berkeley Sockets An Example of.

Internet and Intranet Protocols and Applications Section V: Network Application Performance Lecture 11: Why the World Wide Wait? 4/11/2000 Arthur P. Goldberg.

Computer Networks Transport Layer. Topics F Introduction  F Connection Issues F TCP.

1 Reliability & Flow Control Some slides are from lectures by Nick Mckeown, Ion Stoica, Frans Kaashoek, Hari Balakrishnan, and Sam Madden Prof. Dina Katabi.

L13: Sharing in network systems Dina Katabi Spring Some slides are from lectures by Nick Mckeown, Ion Stoica, Frans.

TCP Congestion Control

Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.

UDP© Dr. Ayman Abdel-Hamid, CS4254 Spring CS4254 Computer Network Architecture and Programming Dr. Ayman A. Abdel-Hamid Computer Science Department.

Junxian Huang 1 Feng Qian 2 Yihua Guo 1 Yuanyuan Zhou 1 Qiang Xu 1 Z. Morley Mao 1 Subhabrata Sen 2 Oliver Spatscheck 2 1 University of Michigan 2 AT&T.

Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101

Scalable Management of Enterprise and Data Center Networks Minlan Yu Princeton University 1.

Performance Diagnosis and Improvement in Data Center Networks

IA-TCP A Rate Based Incast- Avoidance Algorithm for TCP in Data Center Networks Communications (ICC), 2012 IEEE International Conference on 曾奕勳.

Advanced Network Architecture Research Group 2001/11/149 th International Conference on Network Protocols Scalable Socket Buffer Tuning for High-Performance.

Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.

TCP Throughput Collapse in Cluster-based Storage Systems

Transport Layer1 Reliable Transfer Ram Dantu (compiled from various text books)

Improving TCP Performance over Mobile Networks Zahra Imanimehr Rahele Salari.

Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.

1 Lecture 14 High-speed TCP connections Wraparound Keeping the pipeline full Estimating RTT Fairness of TCP congestion control Internet resource allocation.

MaxNet NetLab Presentation Hailey Lam Outline MaxNet as an alternative to TCP Linux implementation of MaxNet Demonstration of fairness, quick.

CSE679: Computer Network Review r Review of the uncounted quiz r Computer network review.

Advanced Network Architecture Research Group 2001/11/74 th Asia-Pacific Symposium on Information and Telecommunication Technologies Design and Implementation.

1 TCP III - Error Control TCP Error Control. 2 ARQ Error Control Two types of errors: –Lost packets –Damaged packets Most Error Control techniques are.

HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.

The Future of Transport Hari Balakrishnan LCS and EECS Massachusetts Institute of Technology

Copyright © Lopamudra Roychoudhuri

Wide Area Network Performance Analysis Methodology Wenji Wu, Phil DeMar, Mark Bowden Fermilab ESCC/Internet2 Joint Techs Workshop 2007

TCP Behavior Inference Tool Jitendra Padhye, Sally Floyd Presented by Songjie Wei.

1 Evaluating NGI performance Matt Mathis

Computer Networking Lecture 18 – More TCP & Congestion Control.

TCP: Transmission Control Protocol Part II : Protocol Mechanisms Computer Network System Sirak Kaewjamnong Semester 1st, 2004.

1 CS 4396 Computer Networks Lab TCP – Part II. 2 Flow Control Congestion Control Retransmission Timeout TCP:

1 SIGCOMM ’ 03 Low-Rate TCP-Targeted Denial of Service Attacks A. Kuzmanovic and E. W. Knightly Rice University Reviewed by Haoyu Song 9/25/2003.

CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.

Performance Interactions Between P-HTTP and TCP Implementation John Heidemann USC/Information Sciences Institute May 19, 1997 Presentation Baekcheol Jang.

Chapter 11.4 END-TO-END ISSUES. Optical Internet Optical technology Protocol translates availability of gigabit bandwidth in user-perceived QoS.

Development of a QoE Model Himadeepa Karlapudi 03/07/03.

TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.

Midterm Review Chapter 1: Introduction Chapter 2: Application Layer

Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 5 th edition. Jim Kurose, Keith Ross Addison-Wesley, April 2009.

An Analysis of AIMD Algorithm with Decreasing Increases Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data Mining.

1 Flow & Congestion Control Some slides are from lectures by Nick Mckeown, Ion Stoica, Frans Kaashoek, Hari Balakrishnan, and Sam Madden Prof. Dina Katabi.

© 2006 Andreas Haeberlen, MPI-SWS 1 Monarch: A Tool to Emulate Transport Protocol Flows over the Internet at Large Andreas Haeberlen MPI-SWS / Rice University.

ICTCP: Incast Congestion Control for TCP in Data Center Networks By: Hilfi Alkaff.

TCP Performance Monitoring

Chapter 5 TCP Sequence Numbers & TCP Transmission Control

Transport Protocols over Circuits/VCs

Chapter 5 TCP Transmission Control

TCP Congestion Control at the Network Edge

Jiyong Park Seoul National University, Korea

TCP Congestion Control at the Network Edge

Anant Mudambi, U. Virginia

Presentation transcript:

Profiling Network Performance in Multi-tier Datacenter Applications Scalable Net-App Profiler Minlan Yu Princeton University It’s a challenging problem to diagnose network performance problems for complex applications running in large-scale datacenters. In this talk, I will show you how a simple tool called SNAP can help in the diagnose process. - at the start of the talk, it isn't clear what problem you are focusing on. give a brief summary on the title slide. you need some sort of "topic sentence". just to say that you are describing work you did this past summer during an internship in the Bing group at Microsoft, so that folks know you will be revealing the mysteries of the Real World to them! :) Joint work with Albert Greenberg, Dave Maltz, Jennifer Rexford, Lihua Yuan, Srikanth Kandula, Changhoon Kim

Applications inside Data Centers …. …. …. …. A single application(propagated and aggregated), the same server is shared by many more apps, some are delay sensitive while others have high throughput. All these applications inside data centers are already complicated when things are right, but what if something goes wrong? Aggregator Workers Front end Server

Challenges of Datacenter Diagnosis Large complex applications Hundreds of application components Tens of thousands of servers New performance problems Update code to add features or ﬁx bugs Change components while app is still in operation Old performance problems (Human factors) Developers may not understand network well Nagle’s algorithm, delayed ACK, etc. We have many low-level protocols such as….that is a mystery to developers without a networking background, but these protocols may have significant performance impact. It could be a disaster for a database expert to work with a TCP stack - slide 34: audience probably won't know what "Nagle's algorithm" and "delayed ACK" are. you can finesse this out loud, and say that networking protocols have many low-level mechanisms (and view these as examples you don't expect the audience to know). in practice, I think Microsoft doesn't really "hot swap" in the new code, but rather brings up new images for a service and gradually phase out the old ones your point is more that change in constant. Just stress the point about constant influx of new developers out loud. silly window syndrome

Diagnosis in Today’s Data Center Packet trace: Filter out trace for long delay req. App logs: #Reqs/sec Response time 1% req. >200ms delay Host App Too expensive Application-specific Packet sniffer OS Google and microsoft 100K$ a few monitoring machines monitor two racks of servers SNAP: Diagnose net-app interactions Switch logs: #bytes/pkts per minute Too coarse-grained Generic, fine-grained, and lightweight

SNAP: A Scalable Net-App Profiler that runs everywhere, all the time

SNAP Architecture At each host for every connection Collect data Input Overview to give sense of what SNAP is Tuning polling rate to reduce overhead Input Topology, routing information Mapping from connections to processes/apps Sharing the same switch/link, app code

Collect Data in TCP Stack TCP understands net-app interactions Flow control: How much data apps want to read/write Congestion control: Network delay and congestion Collect TCP-level statistics Defined by RFC 4898 Already exists in today’s Linux and Windows OSes

TCP-level Statistics Cumulative counters Instantaneous snapshots Packet loss: #FastRetrans, #Timeout RTT estimation: #SampleRTT, #SumRTT Receiver: RwinLimitTime Calculate the difference between two polls Instantaneous snapshots #Bytes in the send buffer Congestion window size, receiver window size Representative snapshots based on Poisson sampling Two types: elevate the difference to the beginning of the patter. Some are easier to deal with - cumulative, some are hard - requires being lucky enough to sample at the instant something interesting happens Counters will catch every event that happens even when the polls are too large Sampling periodically may miss some value, so we choose Poisson which can guarantee that we can get a statistically accurate overview of these values. Poisson sampling can make sure the distribution of sampling data is meaningful … Example variables, there are many others… PASTA, Independent of underlying statistical distribution Data are used for classify and to show details for people to … Why Poisson sampling? – to get meaningful values…

Performance Classifier SNAP Architecture At each host for every connection Collect data Performance Classifier Overview to give sense of what SNAP is Tuning polling rate to reduce overhead Input Topology, routing information Mapping from connections to processes/apps Sharing the same switch/link, app code

Life of Data Transfer Application generates the data Sender App Application generates the data Copy data to send buffer TCP sends data to the network Receiver receives the data and ACK Send Buffer Network Here we show a simple example on the basic data transfer stages that can help illustrate the problems that come from different stages. this simple example useful to explain where performance impairments happen. make clear that this is the simple life cycle shown so you can explain the very useful taxonomy that we came up with. Receiver

Taxonomy of Network Performance Sender App No network problem Send buffer not large enough Fast retransmission Timeout Not reading fast enough (CPU, disk, etc.) Not ACKing fast enough (Delayed ACK) Send Buffer Network Nagle and delayed ack… : small data which trigger Nagle’s algo. Receiver

Identifying Performance Problems Sender App Not any other problems #bytes in send buffer #Fast retransmission #Timeout RwinLimitTime Delayed ACK diff(SumRTT) > diff(SampleRTT)*MaxQueuingDelay Send Buffer Sampling Network Direct measure Nagle and delayed ack… : small data which trigger Nagle’s algo. What is the assumptions about why send buffer is full? Maybe recv window/congestion is the cause… Receiver Inference

SNAP Architecture Management System Topology, routing Conn  proc/app At each host for every connection Cross-connection correlation Collect data Performance Classifier Shared resource: host, link, or switch Overview to give sense of what SNAP is Tuning polling rate to reduce overhead Input Topology, routing information Mapping from connections to processes/apps Sharing the same switch/link, app code Offending app, host, link, or switch

Pinpoint Problems via Correlation Correlation over shared switch/link/host Packet loss for all the connections going through one switch/host Pinpoint the problematic switch

Pinpoint Problems via Correlation Correlation over application Same application has problem on all machines Report aggregated application behavior

SNAP Architecture Offline, cross-conn diagnosis Online, lightweight processing & diagnosis Management System Topology, routing Conn  proc/app At each host for every connection Cross-connection correlation Collect data Performance Classifier Shared resource: host, link, or switch Overview to give sense of what SNAP is Tuning polling rate to reduce overhead Input Topology, routing information Mapping from connections to processes/apps Sharing the same switch/link, app code Offending app, host, link, or switch

Reducing SNAP Overhead Data volume: 120 Bytes per connection per poll CPU overhead: 5% for polling 1K connections with 500 ms interval Increases with #connections and polling freq. Solution: Adaptive tuning of polling frequency Reduce polling frequency to stay within a target CPU Devote more polling to more problematic connections E.g., 35% for polling 5K connections with 50 ms interval 5% for polling 1K connections with 500 ms interval

SNAP in the Real World

Key Diagnosis Steps Identify performance problems Correlate across connections Identify applications with severe problems Expose simple, useful information to developers Filter important statistics and classification results Identify root cause and propose solutions Work with operators and developers Tune TCP stack or change application code SNAP does not automatically solve all the problems, but often the challenging part is to identify the performance problems and understand which component is responsible for it. SNAP makes the important first step of identifying problems and tease apart what stage in the transfer causes the problem what role SNAP plays (and doesn't play) in ultimately pinpointing the root cause of the performance problems.

SNAP Deployment Deployed in a production data center Diagnosis results 8K machines, 700 applications Ran SNAP for a week, collected terabytes of data Diagnosis results Identified 15 major performance problems 21% applications have network performance problems Terabytes , less than 1 GB per machine per day Read every 500 ms 700 always running, persistent connection > - summarize that you found 15 serious performance bugs and worked with developers to fix their code.

Characterizing Perf. Limitations #Apps that are limited for > 50% of the time Send Buffer Send buffer not large enough 1 App Fast retransmission Timeout Network 6 Apps 6 apps self inflicted packet loss" "incast Life of transfer: be sure to talk about ack One connection is always limited by one component, it’s good for the network, if it’s limited by apps Nagle and delayed ack… : small data which trigger Nagle’s algo. Not reading fast enough (CPU, disk, etc.) Not ACKing fast enough (Delayed ACK) Receiver 8 Apps 144 Apps

Three Example Problems Delayed ACK affects delay sensitive apps Congestion window allows sudden burst Significant timeouts for low-rate flows

Problem 1: Delayed ACK Delayed ACK affected many delay sensitive apps even #pkts per record  1,000 records/sec odd #pkts per record  5 records/sec Delayed ACK was used to reduce bandwidth usage and server interrupts A B Data ACK every other packet ACK Proposed solutions: Delayed ACK should be disabled in data centers …. point out 1000s txn/s versus 5, based on parity of number of packets in request. Delayed ack disable cost (at least mention) configuration-file distribution service 1M connections… - clarify up front that Delayed ACK is a mechanism in TCP, not something added by the developers or operators in the data center. Data 200 ms ACK

Send Buffer and Delayed ACK SNAP diagnosis: Delayed ACK and zero-copy send Application buffer Application With Socket Send Buffer 1. Send complete Socket send buffer Receiver Network Stack 2. ACK Change app if use zero— Proxy collecting logs from servers call it zero copy send "windows, of course, supports speed optimizations like zero copy send, but ..." Application buffer Application Zero-copy send Receiver 2. Send complete Network Stack 1. ACK

Problem 2: Congestion Window Allows Sudden Bursts Increase congestion window to reduce delay To send 64 KB data with 1 RTT Developers intentionally keep congestion window large Disable slow start restart in TCP Drops after an idle time At any time, cwd is large enough …. But cwd gets reduced … Aggregator distributing requests Window t

Slow Start Restart SNAP diagnosis Proposed solutions Significant packet loss Congestion window is too large after an idle period Proposed solutions Change apps to send less data during congestion New transport protocols that consider both congestion and delay

Problem 3: Timeouts for Low-rate Flows SNAP diagnosis More fast retrans. for high-rate flows (1-10MB/s) More timeouts with low-rate flows (10-100KB/s) Proposed solutions Reduce timeout time in TCP stack New ways to handle packet loss for small flows Problem Low-rate flows are not the cause of congestion But suffer more from congestion

Conclusion A simple, efficient way to profile data centers Passively measure real-time network stack information Systematically identify problematic stages Correlate problems across connections Deploying SNAP in production data center Diagnose net-app interactions A quick way to identify them when problems happen Future work Extend SNAP to diagnose wide-area transfers these problems, while known at some level, *do* really occur in practice, and people need to find and resolve these problems efficiently. SNAP demonstrates that Help operators improve platform and tune network Help developers to pinpoint app problems

Thanks! Neither part is truly "rocket science" but, together, they are a clean and elegant solution. Like Dave wrote in the introduction, the main point of the paper is a "proof of concept" that this kind of approach is so effective.