Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detailed diagnosis in enterprise networks Srikanth Kandula, Ratul Mahajan, Patrick Verkaik (UCSD), Sharad Agarwal, Jitu Padhye, Victor Bahl.

Similar presentations


Presentation on theme: "Detailed diagnosis in enterprise networks Srikanth Kandula, Ratul Mahajan, Patrick Verkaik (UCSD), Sharad Agarwal, Jitu Padhye, Victor Bahl."— Presentation transcript:

1 Detailed diagnosis in enterprise networks Srikanth Kandula, Ratul Mahajan, Patrick Verkaik (UCSD), Sharad Agarwal, Jitu Padhye, Victor Bahl

2 Network diagnosis Explaining faulty behavior ratul | sigcomm | '09

3 Current landscape of network diagnosis systems ratul | sigcomm | '09 Big enterprises Large ISPs Big enterprises Large ISPs Network size Small enterprises ? ?

4 Why study small enterprise networks separately? ratul | sigcomm | '09 Big enterprises Large ISPs Big enterprises Large ISPs Small enterprises Less sophisticated admins Less rich connectivity Many shared components IIS, SQL, Exchange, …

5 Our work 1.Shows that small enterprises need “detailed diagnosis” Not enabled by current systems that focus on scale 2.Develops NetMedic for detailed diagnosis Diagnoses application faults without application knowledge ratul | sigcomm | '09

6 Understanding problems in small enterprises ratul | sigcomm | '09 100+ cases Symptoms, root causes

7 Symptom App-specific 60 % Failed initialization 13 % Poor performance 10 % Hang or crash 10 % Unreachability 7 % Identified cause Non-app config (e.g., firewall) 30 % Software/driver bug 21 % App config 19 % Overload 4 % Hardware fault 2 % Unknown 25 % And the survey says ….. 7 Detailed diagnosis Handle app-specific as well as generic faults Identify culprits at a fine granularity

8 Example problem 1: Server misconfig ratul | sigcomm | '09 Web server Browser Server config

9 Example problem 2: Buggy client ratul | sigcomm | '09 SQL server SQL client C2 SQL client C1 Requests

10 Current formulations sacrifice detail (to scale) Dependency graph based formulations (e.g., Sherlock [SIGCOMM2007]) Model the network as a dependency graph at a coarse level Simple dependency model ratul | sigcomm | '09

11 Example problem 1: Server misconfig ratul | sigcomm | '09 Web server Browser Server config The network model is too coarse in current formulations

12 Example problem 2: Buggy client ratul | sigcomm | '09 SQL server SQL client C2 SQL client C1 Requests The dependency model is too simple in current formulations

13 A formulation for detailed diagnosis Dependency graph of fine-grained components Component state is a multi-dimensional vector ratul | sigcomm | '09 SQL svr Exch. svr IIS svr IIS config Process OS Config SQL client C1 SQL client C2 % CPU time IO bytes/sec Connections/sec 404 errors/sec

14 The goal of diagnosis ratul | sigcomm | '09 Svr C1 C2 Identify likely culprits for components of interest Without using semantics of state variables  No application knowledge Process OS Config

15 Using joint historical behavior to estimate impact ratul | sigcomm | '09 DS d0ad0a d0bd0b d0cd0c s0as0a s0bs0b s0cs0c s0ds0d dnadna dnbdnb dncdnc............... d1ad1a d1bd1b d1cd1c snasna snbsnb sncsnc sndsnd.................... s1as1a s1bs1b s1cs1c s1ds1d Identify time periods when state of S was “similar” How “similar” on average states of D are at those times Svr C1 C2 Request rate (low) Response time (high) Request rate (high) Response time (high) Request rate (high) H H L

16 Robust implementation of impact estimation Ignore state variables that represent redundant info Place higher weight on state variables likely related to faults being diagnosed Ignore state variables irrelevant to interaction with neighbor Account for aggregate relationships among state variables of neighboring components Account for disparate ranges of state variables ratul | sigcomm | '09

17 Diagnose a.edge impact b.path impact Implementation of NetMedic ratul | sigcomm | '09 Target components Diagnosis time Reference time Monitor components Component states Ranked list of likely culprits

18 Evaluation setup ratul | sigcomm | '09 IIS, SQL, Exchange, …...... 10 actively used desktops Diverse set of faults observed in the logs #components~1000 #dimensions per component (avg) 35

19 NetMedic assigns low ranks to actual culprits ratul | sigcomm | '09

20 NetMedic handles concurrent faults well ratul | sigcomm | '09 2 simultaneous faults

21 Other results in the paper Netmedic needs a modest amount (~60 mins) of history It compares favorably with a method that understands variable semantics ratul | sigcomm | '09

22 Conclusions NetMedic enables detailed diagnosis in enterprise networks w/o application knowledge Think small: Small enterprise networks deserve more attention ratul | sigcomm | '09


Download ppt "Detailed diagnosis in enterprise networks Srikanth Kandula, Ratul Mahajan, Patrick Verkaik (UCSD), Sharad Agarwal, Jitu Padhye, Victor Bahl."

Similar presentations


Ads by Google