Using Queries for Distributed Monitoring and Forensics Atul Singh Rice University Peter Druschel Max Planck Institute for Software Systems Timothy Roscoe.

Slides:



Advertisements
Similar presentations
P2: Implementing Declarative Overlays
Advertisements

Network II.5 simulator ..
Intel Research Timothy Roscoe P2: Implementing Declarative Overlays Timothy Roscoe Boon Thau Loo, Tyson Condie, Petros Maniatis, Ion Stoica, David Gay,
Implementing Declarative Overlays Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion Stoica 1 1 University.
Declarative Networking Mothy Joint work with Boon Thau Loo, Tyson Condie, Joseph M. Hellerstein, Petros Maniatis, Ion Stoica Intel Research and U.C. Berkeley.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
The Design and Implementation of Declarative Networks Boon Thau Loo University of Pennsylvania, University of California-Berkeley * *This dissertation.
Implementing declarative overlays Boom Thau Loo Tyson Condie Joseph M. Hellerstein Petros Maniatis Timothy Roscoe Ion Stoica.
Implementing Declarative Overlays From two talks by: Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Module 17 Tracing Access to SQL Server 2008 R2. Module Overview Capturing Activity using SQL Server Profiler Improving Performance with the Database Engine.
Operating Systems Operating system is the “executive manager” of all hardware and software.
Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.
Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented.
CSE 598B: Self-* Systems Path Based Failure and Evolution Management Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim Lloyd, Dave Patterson, Armando Fox,
Transaction.
MotoHawk Training Model-Based Design of Embedded Systems.
Building diagnosable distributed systems Petros Maniatis Intel Research Berkeley ICSI – Security Crystal Ball.
Overview Distributed vs. decentralized Why distributed databases
Application architectures
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Chapter 9: Moving to Design
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
Distributed Databases
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Windows Server 2008 Chapter 11 Last Update
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Rainbow Facilitating Restorative Functionality Within Distributed Autonomic Systems Philip Miseldine, Prof. Taleb-Bendiab Liverpool John Moores University.
Chapter 9 Elements of Systems Design
Workflow Management Chris A. Mattmann OODT Component Working Group.
1 Autonomic Computing An Introduction Guenter Kickinger.
Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
1.eCognition Overview. 1 eCognition eCognition is a knowledge utilisation platform based on Active Knowledge Network technology eCognition covers the.
Reporter: PCLee. Assertions in silicon help post-silicon debug by providing observability of internal properties within a system which are.
Common Devices Used In Computer Networks
Software Component Technology and Component Tracing CSC532 Presentation Developed & Presented by Feifei Xu.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Components of Database Management System
Magnetic Field Measurement System as Part of a Software Family Jerzy M. Nogiec Joe DiMarco Fermilab.
©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang.
ASI-Eumetsat Meeting Matera, 4-5 Feb CNM Context Matera, February 4-5, 20092ASI-Eumetsat Meeting.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience and Challenges Sue B. Moon and Timothy Roscoe.
November 17, 2015Department of Computer Sciences, UT Austin1 SDIMS: A Scalable Distributed Information Management System Praveen Yalagandula Mike Dahlin.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Conformance Test Experiments for Distributed Real-Time Systems Rachel Cardell-Oliver Complex Systems Group Department of Computer Science & Software Engineering.
End-to-End Performance Analytics For Mobile Apps Lenin Ravindranath, Jitu Padhye, Ratul Mahajan Microsoft Research 1.
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Yaping Zhu with: Jennifer Rexford (Princeton University) Aman Shaikh and Subhabrata Sen (ATT Research) Route Oracle: Where Have.
MVC WITH CODEIGNITER Presented By Bhanu Priya.
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler.
Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
9 Systems Analysis and Design in a Changing World, Fifth Edition.
Atrium Router Project Proposal Subhas Mondal, Manoj Nair, Subhash Singh.
SQL Database Management
SDN challenges Deployment challenges
CMS High Level Trigger Configuration Management
#01 Client/Server Computing
JINI ICS 243F- Distributed Systems Middleware, Spring 2001
#01 Client/Server Computing
Presentation transcript:

Using Queries for Distributed Monitoring and Forensics Atul Singh Rice University Peter Druschel Max Planck Institute for Software Systems Timothy Roscoe Intel Research Berkeley Petros Maniatis Intel Research Berkeley

Atul Singh/RiceEuroSys Building and monitoring a system Building a distributed system is a complex undertaking –Select properties –algorithms –implement, deploy Switch to monitoring the system –Testing, debugging, profiling, tuning Monitoring is hard, error-prone Distributed state Partial faults Complex interactions Asynchronous External factors

Atul Singh/RiceEuroSys Monitoring is hard! Current state of the art: –Manual insertion of “printf” –Bringing logs to one place –Parsing/processing of logs Scripts (perl/python) Queries (Astrolabe) –Offline by nature Expose internal state Ad-hoc, error-prone Probe exposed state Correlate events Bridge the semantic gap

Atul Singh/RiceEuroSys Declarative systems: building systems via queries Declarative specification via queries Execution by a distributed query processor P2[SOSP’05]: a prototype declarative system –Concise specifications –Enables rapid prototyping We present a monitoring framework for P2 –Flexible introspection –Retains semantics of application –Online execution tracing Probe the state Expose internals

Atul Singh/RiceEuroSys Overview Introduction P2 Background Monitoring framework Example applications/Performance Conclusions

Atul Singh/RiceEuroSys Example: route operation in P2 route(B,K) :- route(A,K), nextHop(A,D,B), D == K. nextHop route Join route.A == nextHop.A Select D == K Project route Rule strand Application state action :- precondition. event, R0 R1. Network In Network Out Dataflow graph K Router A nextHop K -> B K’ -> D.. Router B nextHop K K -> C K’ -> E..

Atul Singh/RiceEuroSys Overview Introduction Background Monitoring framework Examples applications/Performance Conclusions

Atul Singh/RiceEuroSys Introspection and Logging Introspection at three levels –Application state level –Rule level –Dataflow level Systematic instrumentation –System is built using smaller, re-usable components –Systematic insertion of logging statements Logging data is in the form of tuples –Retains semantics of application logic –No need for translation JoinSelection Project r1

Atul Singh/RiceEuroSys Tracing rule executions We want to step through the execution –Each step corresponds to a rule –Do it in “online” fashion For rule level tracing –Need to trace tuples 1.Match output tuple to input 2.Track tuples as they go over wire Node A Node B r1r0 x wz y

Atul Singh/RiceEuroSys (1) Tracing rule executions Matching input and output tuples of a rule –Tap elements at the beginning and end of a rule Execution tracer: tracks rule executions Execution records are stored as tuples in exec table exec x xr1yd Execution Tracer output input JoinSelection Project r1 inputruleIdoutputdest. y

Atul Singh/RiceEuroSys (2) Tracing tuples across wire Each tuple has a locally unique ID –Tuple ID is sent along with the tuple Upon receiving, a new tuple is created with different ID Hooks in the network in/out handling subsystem –A record is created tuple’s local ID tuple’s remote ID Node from which it came from xyA B’ tupleTable Network Out Network In A B x y

Atul Singh/RiceEuroSys Putting it all together Of course in reality, it’s more complicated … –Aborted rule executions –Pipelined rule executions Node A Node B r1r0 x w y z exec tupleTable exectupleTable xr0yBvxC zr1wCyzA

Atul Singh/RiceEuroSys Overview Introduction Background Monitoring framework Example applications/Performance Conclusions

Atul Singh/RiceEuroSys Example applications (I) Distributed watchpoints: Trigger an event if true –Possibly trace back/forward Oscillation of faulty/stale information (route flaps) –Gossiping for stabilization or updates Inconsistent routing in DHT’s [Pastry, Chord,…] –Each node is responsible for a unique region –Route using distinct paths and check [Bamboo, Secure Routing]

Atul Singh/RiceEuroSys Example applications (II) Online execution profiling: –How much time is spent in each rule? –Where are the bottlenecks? –Which rule is costlier? What operation? Consistent Snapshots [Chandy-Lamport]: –Snapshot for the routing state –Queries on “snapshots” itself –What is the degree distribution? –How many node-disjoint paths? No more than 16 rules for any of the above r1 r3 r2

Atul Singh/RiceEuroSys Performance 21 node Chord overlay in P2 –Monitored node on separate, unloaded machine Overhead of introspection –CPU ( %), Memory (8MB 13MB) Consistent distributed snapshot Other results in the paper % CPU Util. Rate (1/#sec) Tx pkts(X1000)

Atul Singh/RiceEuroSys Related Work Management using database techniques [Hy+…] Performance debugging [Magpie, Causeway…] Configuration debugging for BGP, OSes [Time-travel…] Distributed debuggers [WiDS, Pip, Replay Debugging…] Deep embedded monitoring [IBM Websphere, Adaptations…]

Atul Singh/RiceEuroSys Conclusions Declarative development of systems –Integrated approach to building and monitoring –Automatic execution tracing –Online, in-place monitoring Step towards “autonomic” distributed systems –Fault-finding tasks evolve with the system Interesting future directions –User interface –Trade-off between monitoring accuracy and overhead Questions? [Thank You]

Atul Singh/RiceEuroSys Request to EuroSys Please schedule my next talk on the first day Move the submission deadline away from NSDI (last year, NSDI submission (19 th Oct), EuroSys (20 th ))

Atul Singh/RiceEuroSys Questions? Thank You!