Nick Feamster Georgia Tech Joint work with Mukarram bin Tariq, Murtaza Motiwala, Yiyi Huang, Mostafa Ammar, Anukool Lakhina, Jim Xu Detecting and Diagnosing.

Slides:



Advertisements
Similar presentations
WPA-WHO Global Survey of Psychiatrists' Attitudes Towards Mental Disorders Classification Results for the Spanish Society of Psychiatry.
Advertisements

Mathematical Preliminaries
1 Knowledge and reasoning – second part Knowledge representation Logic and representation Propositional (Boolean) logic Normal forms Inference in propositional.
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
Milan Vojnović Microsoft Research Cambridge Collaborators: E. Perron and D. Vasudevan 1 Consensus – with Limited Processing and Signalling.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
STATISTICS HYPOTHESES TEST (II) One-sample tests on the mean and variance Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Detection of Hydrological Changes – Nonparametric Approaches
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Network Monitoring System In CSTNET Long Chun China Science & Technology Network.
NANO: Network Access Neutrality Observatory Mukarram Bin Tariq, Murtaza Motiwala, Nick Feamster, Mostafa Ammar Georgia Tech.
1 Diagnosing Network Disruptions with Network-wide Analysis Yiyi Huang, Nick Feamster, Anukool Lakhina*, Jim Xu College of Computing, Georgia Tech * Guavus,
Diagnosing Network Disruptions with Network-wide Analysis Yiyi Huang, Nick Feamster, Anukool Lakhina, Jim Xu College of Computing, Georgia Tech Boston.
Challenges in Making Tomography Practical
Characterizing VLAN-Induced Sharing in a Campus Network
1 1 Detecting Network Neutrality Violations with Causal Inference Mukarram Bin Tariq, Murtaza Motiwala Nick Feamster, Mostafa Ammar {mtariq, murtaza, feamster,
Detecting Network Neutrality Violations with Causal Inference Mukarram Bin Tariq, Murtaza Motiwala Nick Feamster, Mostafa Ammar Georgia Tech
1 Building a Fast, Virtualized Data Plane with Programmable Hardware Bilal Anwer Nick Feamster.
Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)
Access Networks: Applications and Policy Nick Feamster CS 6250 Fall 2011 (HomeOS slides from Ratul Mahajan)
UNITED NATIONS Shipment Details Report – January 2006.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
CALENDAR.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Evaluating Provider Reliability in Risk-aware Grid Brokering Iain Gourlay.
Chapter 7 Sampling and Sampling Distributions
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
The basics for simulations
2 |SharePoint Saturday New York City
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
VOORBLAD.
Chapter 4 Inference About Process Quality
The world leader in serving science TQ ANALYST SOFTWARE Putting your applications on target.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Science as a Process Chapter 1 Section 2.
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Module 17: Two-Sample t-tests, with equal variances for the two populations This module describes one of the most utilized statistical tests, the.
Before Between After.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
Subtraction: Adding UP
Statistical Inferences Based on Two Samples
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Chapter 12 Analyzing Semistructured Decision Support Systems Systems Analysis and Design Kendall and Kendall Fifth Edition.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Chapter 15: Quantitatve Methods in Health Care Management Yasar A. Ozcan 1 Chapter 15. Simulation.
Clock will move after 1 minute
PSSA Preparation.
Experimental Design and Analysis of Variance
Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation.
Select a time to count down from the clock above
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Basics of Statistical Estimation
Commonly Used Distributions
Probabilistic Reasoning over Time
Benchmark Series Microsoft Excel 2013 Level 2
Presentation transcript:

Nick Feamster Georgia Tech Joint work with Mukarram bin Tariq, Murtaza Motiwala, Yiyi Huang, Mostafa Ammar, Anukool Lakhina, Jim Xu Detecting and Diagnosing Network Performance Degradations with Statistical Methods

2 Network Performance Problems Frequent and varied –Analysis of performance problems on the NANOG mailing list suggests a reasonably major incident every 2-3 days Causes range from malice, to misconfiguration, to software error, to physical breach

3 Conventional: Domain Knowledge Common approach: Apply domain knowledge to diagnose the cause of a problem or performance degradation Example –Router configuration checker Normalized Representation Correctness Specification Constraints Distributed router configurations (Single AS) Faults Problem: Must define (and know) problems in advance!

4 Complementary Approach: Statistics Need not have overly complete models of protocol, behavior, network, problems –All effects on application behavior –All causes of network disruption –All failure scenarios… Instead: Statistical model for desired behavior –Based on behavior of protocol –Agnostic to underlying causes –Automatically discover Dependencies Cases that might violate this behavior

5 This Talk: Two Problems Detecting network-wide routing anomalies –Monitoring network-wide routing disruptions –Watch for deviations from Detecting application-level performance degradations –Monitor application performance from array of clients –Place clients into strata and adjust for confounding factors

6 Routing Disruptions: Overview Network routing disruptions are frequent –On Abilene from January 1,2006 to June 30, s, 282 disruptions How to help network operators deal with disruptions quickly? –Massive amounts of data –Lots of noise –Need for fast detection

7 Existing Approaches Many existing tools and data sources –Tivoli Netcool, SNMP, Syslog, IGP, BGP, etc. Possible issues –Noise level –Time to detection Network-wide correlation/analysis –Not just reporting on manually specified traps This talk: Explore complementary data sources –First step: Mining BGP routing data

8 Challenges: Analyzing Routing Data Large volume of data Lack of semantics in a single stream of routing updates Needed: Mining, not simple reporting Idea: Can we improve detection by mining network- wide dependencies across routing streams?

9 Key Idea: Network-Wide Analysis Structure and configuration of the network gives rise to dependencies across routers Analysis should be cognizant of these dependencies. Dont treat streams of data independently. Big network events may cause correlated blips.

10 Overview

11 Approach: network-wide, multivariate analysis –Model network-wide dependencies directly from the data –Extract common trends –Look for deviations from those trends High detection rate (for acceptable false positives) –100% of node/link disruptions, 60% of peer disruptions Fast detection –Current time to reporting (in minutes) Detection

12 Identification: Approach Classify disruptions into four types –Internal node, internal link, peer, external node Track three features 1.Global iBGP next-hops 2.Local iBGP next-hops 3.Local eBGP next-hops Approach Goal

13 Identification: Results

14 Key Results 90% of local disruptions are visible in BGP –Many disruptions are low volume –Disruption size can vary by several orders of magnitude About 75% involve more than 2 routers –Analyze data across streams –BGP routing data is but one possible input data set Detection –100% of node and link disruptions –60% of peer disruptions Identification –100% of node disruptions, –74% of link disruptions –93% of peer disruptions

15 Two Problems Detecting network-wide routing anomalies –Monitoring network-wide routing disruptions –Watch for deviations from Detecting application-level performance degradations –Monitor application performance from array of clients –Place clients into strata and adjust for confounding factors

16 Net Neutrality

17 Example: BitTorrent Blocking

18 Throttling/prioritizing based on destination or service –Target domains, applications, or content Discriminatory peering –Resist peering with certain content providers … Many Forms of Discrimination

19 Problem Statement Identify whether a degradation in a service performance is caused by discrimination by an ISP –Quantify the causal effect Existing techniques detect specific ISP methods –TCP RST (Glasnost) –ToS-bit based de-prioritization (NVLens) Goal: Establish a causal relationship in the general case, without assuming anything about the ISPs methods

20 Causality: Analogy from Health Epidemiology: Study causal relationships between risk factors and health outcome NANO (Network Access Neutrality Observatory): Infer causal relationship between ISP and service performance

21 Does Aspirin Make You Healthy? Sample of patients Positive correlation in health and treatment Can we say that Aspirin causes better health? Confounding Variables: correlate with both cause and outcome variables and confuse the causal inference Aspirin No Aspirin Healthy 40%15% Not Healthy 10%35% Sleep Duration Diet Other Drugs Gender Aspirin Health ?

22 Comcast No Comcast BitTorrent Download Time 5 sec2 sec Client Setup ToD Content Locatio n Sample of client performances Some correlation in ISP and service performance Can we say that Comcast is discriminating? Many confounding variables can confuse inference. Comcast BT Downloa d Time ? Does an ISP Degrade Service?

23 Baseline Performance Performance with the ISP Causal Effect = E(Real Download time using Comcast) E(Real Download time not using Comcast) G 1, G 0 : Ground-truth values for performance (aka. Counter-factual values) Problem: No ground truth values for the same clients. in situ data sets cannot directly estimate causal effect. Causation vs. Association

24 Observed Baseline Performance Observed Performance with the ISP Association = E (Download time using Comcast) E (Download time not using Comcast) Observing association in an in situ data set In general,. How to estimate causal effect ( ) ? Causation vs. Association

25 Estimating Causal Effect Two common approaches –Random Treatment –Adjusting for Confounding Variables

26 Aspirin Treated Not Aspirin Treated !H H H HH H H HH H = = 0.55 Strawman: Random Treatment Given a population: –Treat subjects with Aspirin randomly, irrespective of their health –Observe new outcome and measure association –For large samples, association converges to causal effect if confounding variables do not change Diet, other drugs, etc. should not change

27 Random Treatment of ISPs: Hard! Ask clients to change ISP to an arbitrary one Difficult to achieve on the Internet –Changing ISP is cumbersome for the users –Changing ISP may change other confounding variables, e.g., the ISP network changes.

28 Adjusting for Confounding Variables !H H H HHH H H HH H H H H H H HH H H Treated Baseline Strata Effect An in situ data set 1. List confounders e.g., gender ={, } 2. Collect a data set 3. Stratify along confounder variable values 4. Measure association 5. If there still is association, then it must be causation

29 Adjusting for Confounding: ISPs What is the baseline? What are the confounding variables? Is the list of confounders sufficient? How to collect the data? Can we infer more than the effect? –e.g., the discrimination criteria

30 What is the Baseline? Baseline: performance when ISP is not used –We need to use some ISP for comparison –What if the one we use is not neutral? Solutions –Use average performance over all other ISPs –Use a lab model –Use service providers model

31 Determine Confounding Variables Client Side –Client setup (Network Setup) –Application (Browser, BT Client, VoIP client) –Resources (Memory, CPU, Utilization) ISP Related –Not all ISPs are equal; e.g., location Temporal –Diurnal cycles, transient failures

32 Data Collection Quantify confounders and effect Identify the treatment variable Unbiased Passive measurements at the client end

33 + Domain Content Size MB Inferring the Underlying Cause Label data in two classes: –discriminated (-), non-discriminated (+) Train a decision tree for classification –rules provide hints about the criteria

34 Clients use two applications: App 1 and App 2 to access services Service 1 is slower using App 2 App is confounding ISP B throttles access to Service 1 Association Service 1Service 2 Baseline ISP B Association0.92 (10%)0.04 (1%) Causation Service 1Service 2 App 1 App 2 App 1 App 2 Baseline ISP B Causation 2.05 (20%) 5.18(187 %) 0.06(2%)0.12(4%) Preliminary Evaluation: Simulation

35 Conclusion Detecting Routing Disruptions –Ability to detect and identify specific link/node disruptions without using domain-specific rules NANO: Black-box approach to infer and quantify discrimination; generically applicable. –Many open issues Privacy: Can we do local inference? Deployment: PlanetLab, CPR, Real Users How much data?: Depends on variance