Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Queries for Distributed Monitoring and Forensics Atul Singh Rice University Peter Druschel Max Planck Institute for Software Systems Timothy Roscoe.

Similar presentations


Presentation on theme: "Using Queries for Distributed Monitoring and Forensics Atul Singh Rice University Peter Druschel Max Planck Institute for Software Systems Timothy Roscoe."— Presentation transcript:

1 Using Queries for Distributed Monitoring and Forensics Atul Singh Rice University Peter Druschel Max Planck Institute for Software Systems Timothy Roscoe Intel Research Berkeley Petros Maniatis Intel Research Berkeley

2 Atul Singh/RiceEuroSys 20062 Building and monitoring a system Building a distributed system is a complex undertaking –Select properties –algorithms –implement, deploy Switch to monitoring the system –Testing, debugging, profiling, tuning Monitoring is hard, error-prone Distributed state Partial faults Complex interactions Asynchronous External factors

3 Atul Singh/RiceEuroSys 20063 Monitoring is hard! Current state of the art: –Manual insertion of “printf” –Bringing logs to one place –Parsing/processing of logs Scripts (perl/python) Queries (Astrolabe) –Offline by nature Expose internal state Ad-hoc, error-prone Probe exposed state Correlate events Bridge the semantic gap

4 Atul Singh/RiceEuroSys 20064 Declarative systems: building systems via queries Declarative specification via queries Execution by a distributed query processor P2[SOSP’05]: a prototype declarative system –Concise specifications –Enables rapid prototyping We present a monitoring framework for P2 –Flexible introspection –Retains semantics of application –Online execution tracing Probe the state Expose internals

5 Atul Singh/RiceEuroSys 20065 Overview Introduction P2 Background Monitoring framework Example applications/Performance Conclusions

6 Atul Singh/RiceEuroSys 20066 Example: route operation in P2 route(B,K) :- route(A,K), nextHop(A,D,B), D == K. nextHop route Join route.A == nextHop.A Select D == K Project route Rule strand Application state action :- precondition. event, R0 R1. Network In Network Out Dataflow graph K Router A nextHop K -> B K’ -> D.. Router B nextHop K K -> C K’ -> E..

7 Atul Singh/RiceEuroSys 20067 Overview Introduction Background Monitoring framework Examples applications/Performance Conclusions

8 Atul Singh/RiceEuroSys 20068 Introspection and Logging Introspection at three levels –Application state level –Rule level –Dataflow level Systematic instrumentation –System is built using smaller, re-usable components –Systematic insertion of logging statements Logging data is in the form of tuples –Retains semantics of application logic –No need for translation JoinSelection Project r1

9 Atul Singh/RiceEuroSys 20069 Tracing rule executions We want to step through the execution –Each step corresponds to a rule –Do it in “online” fashion For rule level tracing –Need to trace tuples 1.Match output tuple to input 2.Track tuples as they go over wire Node A Node B r1r0 x wz y

10 Atul Singh/RiceEuroSys 200610 (1) Tracing rule executions Matching input and output tuples of a rule –Tap elements at the beginning and end of a rule Execution tracer: tracks rule executions Execution records are stored as tuples in exec table exec x xr1yd Execution Tracer output input JoinSelection Project r1 inputruleIdoutputdest. y

11 Atul Singh/RiceEuroSys 200611 (2) Tracing tuples across wire Each tuple has a locally unique ID –Tuple ID is sent along with the tuple Upon receiving, a new tuple is created with different ID Hooks in the network in/out handling subsystem –A record is created tuple’s local ID tuple’s remote ID Node from which it came from xyA B’ tupleTable Network Out Network In A B x y

12 Atul Singh/RiceEuroSys 200612 Putting it all together Of course in reality, it’s more complicated … –Aborted rule executions –Pipelined rule executions Node A Node B r1r0 x w y z exec tupleTable exectupleTable xr0yBvxC zr1wCyzA

13 Atul Singh/RiceEuroSys 200613 Overview Introduction Background Monitoring framework Example applications/Performance Conclusions

14 Atul Singh/RiceEuroSys 200614 Example applications (I) Distributed watchpoints: Trigger an event if true –Possibly trace back/forward Oscillation of faulty/stale information (route flaps) –Gossiping for stabilization or updates Inconsistent routing in DHT’s [Pastry, Chord,…] –Each node is responsible for a unique region –Route using distinct paths and check [Bamboo, Secure Routing]

15 Atul Singh/RiceEuroSys 200615 Example applications (II) Online execution profiling: –How much time is spent in each rule? –Where are the bottlenecks? –Which rule is costlier? What operation? Consistent Snapshots [Chandy-Lamport]: –Snapshot for the routing state –Queries on “snapshots” itself –What is the degree distribution? –How many node-disjoint paths? No more than 16 rules for any of the above r1 r3 r2

16 Atul Singh/RiceEuroSys 200616 Performance 21 node Chord overlay in P2 –Monitored node on separate, unloaded machine Overhead of introspection –CPU (0.98 1.3%), Memory (8MB 13MB) Consistent distributed snapshot Other results in the paper % CPU Util. Rate (1/#sec) Tx pkts(X1000)

17 Atul Singh/RiceEuroSys 200617 Related Work Management using database techniques [Hy+…] Performance debugging [Magpie, Causeway…] Configuration debugging for BGP, OSes [Time-travel…] Distributed debuggers [WiDS, Pip, Replay Debugging…] Deep embedded monitoring [IBM Websphere, Adaptations…]

18 Atul Singh/RiceEuroSys 200618 Conclusions Declarative development of systems –Integrated approach to building and monitoring –Automatic execution tracing –Online, in-place monitoring Step towards “autonomic” distributed systems –Fault-finding tasks evolve with the system Interesting future directions –User interface –Trade-off between monitoring accuracy and overhead Questions? [Thank You]

19 Atul Singh/RiceEuroSys 200619 Request to EuroSys Please schedule my next talk on the first day Move the submission deadline away from NSDI (last year, NSDI submission (19 th Oct), EuroSys (20 th ))

20 Atul Singh/RiceEuroSys 200620 Questions? Thank You!


Download ppt "Using Queries for Distributed Monitoring and Forensics Atul Singh Rice University Peter Druschel Max Planck Institute for Software Systems Timothy Roscoe."

Similar presentations


Ads by Google