Presentation is loading. Please wait.

Presentation is loading. Please wait.

12 November 2003 Rebecca Isaacs Paul Barham Richard Mortier Dushyanth Narayanan Microsoft Research Cambridge James Bulpin University of Cambridge Magpie:

Similar presentations


Presentation on theme: "12 November 2003 Rebecca Isaacs Paul Barham Richard Mortier Dushyanth Narayanan Microsoft Research Cambridge James Bulpin University of Cambridge Magpie:"— Presentation transcript:

1 12 November 2003 Rebecca Isaacs Paul Barham Richard Mortier Dushyanth Narayanan Microsoft Research Cambridge James Bulpin University of Cambridge Magpie: Distributed request tracking for realistic performance modelling

2 12 November 2003 Performance in distributed systems Faults in distributed systems are notoriously hard to diagnose Performance problems are even more subtle to debug Often transient or affect only a subset of requests / users Frequently involve complex interactions between multiple machines Aggregate statistics (e.g. utilization) may look perfectly normal

3 12 November 2003 Magpie Approach Track individual requests end to end Observe control flow (causality) Monitor resource consumption: CPU, bandwidth, disk Debug performance in the small Build a probabilistic workload model from the aggregate requests Cluster similar requests according to their observed behaviour Debug performance in the large

4 12 November 2003 How do we use this information? Performance debugging Why did this request take much longer than that request? Fault detection Configuration and management Performance prediction Realistic workload models for capacity planning Obtain automatically on a live system

5 12 November 2003 Magpie components Instrumentation System activity recorded to logs Generic request parser Extract individual requests from logs according to an event schema Model construction Behavioural clusters Probabilistic state machine

6 12 November 2003 Outline Introduction What is a request? Instrumentation Request extraction Modelling Current status

7 12 November 2003 What is a request? System activity which takes place in response to an action initiated by the application being traced HTTP request Database query File open request We describe a request as The sequence of application components involved in its processing The resource consumed at each stage CPU, bandwidth, disk transfer size, (latency)

8 12 November 2003 A typical e-commerce site (1) Web Front Ends SQL Servers Storage Internet

9 12 November 2003 A typical e-commerce site (2) Filter Kernel http.sys CLR IIS Kernel Web Server Application Logic WinSock2 API SQL Server Stored procedures Static Content ASP.NET ADO.NET WinSock2 API Data

10 12 November 2003 HTTP request: detailed view WEB.eec WEB.398 Disk Net RX Net TX 10.051s10.155s Net TX Net RX Disk SQL.9c4 10.051s10.155s ! - + - - + - - + - + - - - - 10.100s HTTP request packet from IIS worker thread picks up request http.sys Sync WinSock send to SQL Server ASP.NET thread blocks after RPC to database ASP.NET worker thread takes over TDS request and reply packets sent and received SQL thread unblocks HTTP response packets sent back to client IIS worker thread wakes up to write log BlockedIISASP.NETSQLKEY:DiskOther

11 12 November 2003 Why is request tracking hard? Many components, multiple machines Must track control flow across machines No globally unique request ID Components are developed independently Multiple thread pools Many threads participate in processing a request Asynchronous communication Must match send/recvs between threads/machines Hand-rolled synchronization primitives SQL server has user-mode scheduler

12 12 November 2003 Outline Introduction What is a request? Instrumentation Request extraction Modelling Current status

13 12 November 2003 Event Tracing for Windows Low-overhead event mechanism Events timestamped with cycle counter Global ordering on events on a single machine Can enable/disable sets of events at runtime Using ETW in Magpie Each instrumentation point posts an event Events are logged to disk Logs are post-processed to extract requests Can also consume events in real time

14 12 November 2003 Instrumentation points Existing ETW event providers IIS, kernel App-specific hooks IIS, ASP.NET, SQL Server Detours Wrap dlls to trap Win32 and WinSock2 calls WinPcap Capture packets on the wire

15 12 November 2003 CPU usage from kernel events The ETW kernel logger records every context switch How do we know which cycles are used for which request? We can attribute cycles to a request by An application-specific event which occurs within a delimited sector of CPU time, or The current context of execution, eg thread id

16 12 November 2003 Example: protocol processing in a DPC cswitch DPC start DPC end pkt recv Request 1 cycle count Request 2 cycle count Events:cswitch time

17 12 November 2003 Application and middleware events Cover points where flow of control moves between components Cover points where resources are multiplexed and demultiplexed E.g. user-level scheduling primitives Propagation of a global request id is not required! Magpie used to do this but not any more

18 12 November 2003 Instrumenting a web service Filter Kernel http.sys CLR IIS Kernel Web Server HTTPModule Application Logic SQL Server Wrappers Stored procedures ISAPI Filter Static Content ASP.NET ADO.NET CLR profiler WinSock2 API Intercept Data Event Tracing for Windows Packet capture Event Tracing for Windows Packet capture Extended SPs WinSock2 API Intercept

19 12 November 2003 Outline Introduction What is a request? Instrumentation Request extraction Modelling Current status

20 12 November 2003 Generic request extraction No inbuilt assumptions about the system or the application No common unique identifier Schema specifies semantics of events Easy to add new event types Parser stitches events into requests based on event semantics

21 12 November 2003 Terminology Namespace Event parameter which references an entity in the system, eg thread id Timeline Instantiation of a namespace with a unique value, eg thread id = 0xa Events bind or unbind requests to timelines Bindings capture the semantics of each event for a particular request type

22 12 November 2003 Cpuid=0 Tid=0xa Tid=0xb Connid=0xd Enter Recv cswitch DPC start DPC end Recv returns TCP pkt Example: connecting events Request 1 Request 2

23 12 November 2003 End-to-end request extraction An instance of the request parser runs on each machine in the distributed system Online or offline mode Offline post-processing connects request fragments from each node according to a globally unique namespace, e.g. packet IP identifier

24 12 November 2003 Outline Introduction What is a request? Instrumentation Request extraction Modelling Current status

25 12 November 2003 Clustering for workload generation Target the Indy performance modelling tool Calculates throughput, bottlenecks Needs transaction mix, resource consumption Previously: microbenchmark approach Run 10000 of each transaction type (URL) Divide aggregate resource usage by 10000 Aim: provide realistic workload models From real, mixed workloads Derive transaction types automatically

26 12 November 2003 Single request: cartoon view Partial ordering of events Annotated with resource usage IIS CPU ASP.NET CPU SQL Server CPU Disk Network

27 12 November 2003 Behavioural clustering of requests Represent requests as event strings Flatten out any concurrency Use Levenshtein string edit distance Modified to factor in resource usage vectors Cluster requests based on this distance Linear-time algorithm Each cluster is a request type Select representative from near centroid

28 12 November 2003 Build a workload model by clustering similar requests Requests in the same cluster often have different URLs, and one URL may appear in many clusters A D B C E A 7% B 10% C 15% E 63% D 5%

29 12 November 2003 Taking it further: work-in- progress Online and incremental modelling: Detect component failure Detect sudden shifts in workload More sophisticated models Learn the probabilistic state machine for each request c.f. flowcharts annotated with performance information Bayesian watchdogs Compute the likelihood of a requests behaviour as it moves through the system Deal with unlikely requests appropriately

30 12 November 2003 Outline Introduction What is a request? Instrumentation Request extraction Modelling Current status

31 12 November 2003 Current status Recent focus has been developing a generic request extraction scheme Prototype for 2-machine e-commerce site TPC-W style workload Prototype for single machine SQL Server 2000 Challenge is user mode scheduler TPC-C workload Other applications on the way Large-scale Real systems with real performance problems

32 12 November 2003 Conclusion Magpie is a tool for performance analysis in a distributed system Bottom up, per-request approach Complementary to existing techniques: Performance counters Program profiling Feeds into performance debugging and prediction tools

33 12 November 2003 Work-in-progress: learning the probabilistic state machine Infer a stochastic context free grammar from a sample set of strings Each state transition emits a character and has an associated probability Use the Alergia algorithm (Carrasco & Oncina 94) Construct a prefix tree from the sample set Merge similar subtrees Apply to Magpie requests Just event strings…

34 12 November 2003 Ongoing work with Alergia Tuning the similarity criterion Factoring in resource usage information Can we identify event sequences with suspiciously low probability Run online for anomaly detection?


Download ppt "12 November 2003 Rebecca Isaacs Paul Barham Richard Mortier Dushyanth Narayanan Microsoft Research Cambridge James Bulpin University of Cambridge Magpie:"

Similar presentations


Ads by Google