Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Mystery Machine: End-to-end performance analysis of large-scale Internet services Michael Chow David Meisner, Jason Flinn, Daniel Peek, Thomas F. Wenisch.

Similar presentations


Presentation on theme: "The Mystery Machine: End-to-end performance analysis of large-scale Internet services Michael Chow David Meisner, Jason Flinn, Daniel Peek, Thomas F. Wenisch."— Presentation transcript:

1 The Mystery Machine: End-to-end performance analysis of large-scale Internet services Michael Chow David Meisner, Jason Flinn, Daniel Peek, Thomas F. Wenisch

2 Internet services are complex Michael Chow2 Tasks Time Scale and heterogeneity make Internet services complex

3 Internet services are complex Michael Chow3 Tasks Time Scale and heterogeneity make Internet services complex

4 Analysis Pipeline Michael Chow4

5 Step 1: Identify segments Michael Chow5

6 Step 2: Infer causal model Michael Chow6

7 Step 3: Analyze individual requests Michael Chow7

8 Step 4: Aggregate results Michael Chow8

9 Challenges Previous methods derive a causal model – Instrument scheduler and communication – Build model through human knowledge Michael Chow9 Need method that works at scale with heterogeneous components

10 Opportunities Component-level logging is ubiquitous Handle a large number of requests Michael Chow10 Tremendous detail about a request’s execution Coverage of a large range of behaviors

11 The Mystery Machine 1) Infer causal model from large corpus of traces – Identify segments – Hypothesize all possible causal relationships – Reject hypotheses with observed counterexamples 2) Analysis – Critical path, slack, anomaly detection, what-if Michael Chow11

12 Step 1: Identify segments Michael Chow12

13 Define a minimal schema Michael Chow13 Segment 1Segment 2 Task Event

14 Define a minimal schema Michael Chow14 Segment 1Segment 2 Task Event Request identifier Machine identifier Timestamp Task Event Aggregate existing logs using minimal schema

15 Step 2: Infer causal model Michael Chow15

16 Types of causal relationships Michael Chow16 RelationshipCounterexample Happens-Before Mutual Exclusion Pipeline B B A A A A B B B B A A A A B B OR B B A A B B A A C C B’ A’ C’ B B A A C C B’ C’ A’ t1t1 t2t2 t1t1 t2t2

17 Producing causal model Michael Chow17 Causal Model S1N1C1 S2N2C2

18 Producing causal model Michael Chow18 Causal Model S1N1C1 S2N2C2 C1N1S1 C2N2S2 Trace 1 Time

19 Producing causal model Michael Chow19 Causal Model S1N1C1 S2N2C2 C1N1S1 C2N2S2 Trace 1 Time

20 Producing causal model Michael Chow20 Causal Model S1N1C1 S2N2C2 C1N1S1 C2N2S2 Time Trace 2

21 Producing causal model Michael Chow21 Causal Model S1N1C1 S2N2C2 C1N1S1 C2N2S2 Time Trace 2

22 Producing causal model Michael Chow22 Causal Model S1N1C1 S2N2C2 C1N1S1 C2N2S2 Time Trace 2

23 Step 3: Analyze individual requests Michael Chow23

24 Critical path using causal model Michael Chow24 S1N1C1 S2N2C2 C1N1S1 C2N2S2 Trace 1 Time

25 Critical path using causal model Michael Chow25 S1N1C1 S2N2C2 C1N1S1 C2N2S2 Trace 1

26 Critical path using causal model Michael Chow26 S1N1C1 S2N2C2 C1N1 S1 C2N2S2 Trace 1

27 Step 4: Aggregate results Michael Chow27

28 Inaccuracies of Naïve Aggregation Michael Chow28

29 Inaccuracies of Naïve Aggregation Michael Chow29

30 Inaccuracies of Naïve Aggregation Michael Chow30 Need a causal model to correctly understand latency

31 High variance in critical path Breakdown in critical path shifts drastically – Server, network, or client can dominate latency Michael Chow31 ServerNetwork Client Percent of critical path cdf

32 High variance in critical path Breakdown in critical path shifts drastically – Server, network, or client can dominate latency Michael Chow32 ServerNetwork Client Percent of critical path cdf 20% of requests, server contributes 10% of latency

33 High variance in critical path Breakdown in critical path shifts drastically – Server, network, or client can dominate latency Michael Chow33 ServerNetwork Client Percent of critical path cdf 20% of requests, server contributes 50% or more of latency

34 Diverse clients and networks Michael Chow34 Server Network Client Server Network Client Server Network Client

35 Diverse clients and networks Michael Chow35 Server Network Client Server Network Client Server Network Client

36 Diverse clients and networks Michael Chow36 Server Network Client Server Network Client Server Network Client

37 Differentiated service Michael Chow37 Deliver data when needed and reduce average response time No slack in server generation time Produce data faster Decrease end-to-end latency Slack in server generation time Produce data slower End-to-end latency stays same

38 Additional analysis techniques Slack analysis What-if analysis – Use natural variation in large data set Michael Chow38 C1N1 S1 C2N2S2 Time

39 What-if questions Does server generation time affect end-to-end latency? Can we predict which connections exhibit server slack? Michael Chow39

40 Server slack analysis Michael Chow40 Slack < 25msSlack > 2.5s

41 Server slack analysis Michael Chow41 Slack < 25msSlack > 2.5s End-to-end latency increases as server generation time increases Server generation time has little effect on end- to-end latency

42 Predicting server slack Predict slack at the receipt of a request Past slack is representative of future slack Michael Chow42 First slack (ms) Second slack (ms)

43 Predicting server slack Predict slack at the receipt of a request Past slack is representative of future slack Michael Chow43 Classifies 83% of requests correctly Type II Error 9% Type I Error 8% First slack (ms) Second slack (ms)

44 Conclusion Michael Chow44

45 Questions Michael Chow45


Download ppt "The Mystery Machine: End-to-end performance analysis of large-scale Internet services Michael Chow David Meisner, Jason Flinn, Daniel Peek, Thomas F. Wenisch."

Similar presentations


Ads by Google