Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fingerprinting the Datacenter Marcel Flores Shih-Chi Chen.

Similar presentations


Presentation on theme: "Fingerprinting the Datacenter Marcel Flores Shih-Chi Chen."— Presentation transcript:

1 Fingerprinting the Datacenter Marcel Flores Shih-Chi Chen

2 Motivation Large datacenters often encounter large and complex crises Come in the form of dipping below SLAs Often complex and difficult to diagnose Can be costly to operators

3 Approach Want to quantify the state of the datacenter in a compact manner Can be compared to past crises Allows for easy identification and diagnoses of crises

4 Fingerprints Tracks quantiles for each metric Determines hot/normal/cold status for each metric Includes only relevant metrics Uses a similarity metric for comparison

5 Fingerprint - details Track quantiles of each metric Resistant to outliers Measure 25%, 50%, 95% quantiles Determines if each measurement is Hot (>98th percentile), Cold (<2nd percentile), or Normal

6 Relevant Metrics Select metrics via feature selection and classification Technique from statistical machine learning Eliminates noise from the fingerprints

7 Identification Define a similarity metric Allows comparison between current state fingerprint and known crisis fingerprints Identification Threshold determines when two fingerprints are considered the same

8 Evaluation Used data gathered from a real live data center consisting of hundreds of servers 240 days About 100 metrics per server

9 Evaluation Criteria Discrimination: when are two crises different? Identification Stability: when does it provide a consistent suggestion? Identification Accuracy: when does it provide the correct label?

10 Offline Uses all known data Attempts to recall the crises that it saw Provides a baseline What is the best possible (if it knew everything)? Dominates existing methods, near perfect.

11 Quasi-Online More realistic, but still computes the thresholds offline Doesn’t know the future Known and Unknown accuracy of 85%

12 Online Everything online, computed on the fly Including Identification Threshold Achieved both accuracies to 80% (with 10 seeding crises) 78% known, 74% unknown (with 2) Does well with smaller seeding set!

13 A note on Thresholds Hot/Cold thresholds were selected arbitrarily Ran evaluations with varied values from other statistical methods Showed reduced discriminative power (95% down from 99%) Why mess with what works?


Download ppt "Fingerprinting the Datacenter Marcel Flores Shih-Chi Chen."

Similar presentations


Ads by Google