Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.

Slides:



Advertisements
Similar presentations
Fraunhofer FOKUS 2007 VoIP Defender The Future of VoIP Protection Fraunhofer FOKUS Institute, Germany.
Advertisements

LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
© 2006 IBM Corporation Features of an Enterprise-ready Triple Store Ben Szekely June, 2006.
Distributed Data Processing
Performance Testing - Kanwalpreet Singh.
Distributed Processing, Client/Server and Clusters
1 The ns-2 Network Simulator H Plan: –Discuss discrete-event network simulation –Discuss ns-2 simulator in particular –Demonstration and examples: u Download,
Global States.
Database Architectures and the Web
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 8: Monitoring the Network Connecting Networks.
An Introduction to Java Programming and Object- Oriented Application Development Chapter 8 Exceptions and Assertions.
CTO Office Reliability & Security Distinctions and Interactions Hal Lockhart BEA Systems.
CHESS: A Systematic Testing Tool for Concurrent Software CSCI6900 George.
Reporter:PCLee With a significant increase in the design complexity of cores and associated communication among them, post-silicon validation.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
Continuously Recording Program Execution for Deterministic Replay Debugging.
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
Introduction to Web Database Processing
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
1 Efficient Memory Safety for TinyOS 2.1 Yang Chen Nathan Cooprider Will Archer Eric Eide David Gay † John Regehr University of Utah School of Computing.
Author: Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, Ion Stoica Presenter :Yinzhi Cao.
Introduction to Web Interface Technology (CSE2030)
CS 268: Project Suggestions Ion Stoica January 23, 2006.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
Client/Server Architecture
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
Client-Server Processing and Distributed Databases
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
Overview SAP Basis Functions. SAP Technical Overview Learning Objectives What the Basis system is How does SAP handle a transaction request Differentiating.
TM Herding Penguins with Performance Co-Pilot Ken McDonell Performance Tools Group SGI, Melbourne.
INTRODUCTION TO WEB DATABASE PROGRAMMING
The Design Discipline.
Testing Tools. Categories of testing tools Black box testing, or functional testing Testing performed via GUI. The tool helps in emulating end-user actions.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Using Queries for Distributed Monitoring and Forensics Atul Singh Rice University Peter Druschel Max Planck Institute for Software Systems Timothy Roscoe.
Bottlenecks: Automated Design Configuration Evaluation and Tune.
SOS EGEE ‘06 GGF Security Auditing Service: Draft Architecture Brian Tierney Dan Gunter Lawrence Berkeley National Laboratory Marty Humphrey University.
Capture and Replay Often used for regression test development –Tool used to capture interactions with the system under test. –Inputs must be captured;
Web Application Firewall (WAF) RSA ® Conference 2013.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
A Virtual Honeypot Framework Author: Niels Provos Published in: CITI Report 03-1 Presenter: Tao Li.
Problem Diagnosis Distributed Problem Diagnosis Sherlock X-trace.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
© Jörg Liebeherr (modified by M. Veeraraghavan) 1 ICMP: A helper protocol to IP The Internet Control Message Protocol (ICMP) is the protocol used for error.
Dr. John P. Abraham Professor University of Texas Pan American Internet Applications and Network Programming.
1 Welcome to CSC 301 Web Programming Charles Frank.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
3-Tier Client/Server Internet Example. TIER 1 - User interface and navigation Labeled Tier 1 in the following graphic, this layer comprises the entire.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
1 Chapter Overview Defining Operators Creating Jobs Configuring Alerts Creating a Database Maintenance Plan Creating Multiserver Jobs.
14 1 Chapter 14 Web Database Development Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
Creating SmartArt 1.Create a slide and select Insert > SmartArt. 2.Choose a SmartArt design and type your text. (Choose any format to start. You can change.
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
Web Services Architecture Presentation for ECE8813 Spring 2003 By: Mohamed Mansour.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
1 Channel Access Concepts – IHEP EPICS Training – K.F – Aug EPICS Channel Access Concepts Kazuro Furukawa, KEK (Bob Dalesio, LANL)
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
Introduction to ASP.NET development. Background ASP released in 1996 ASP supported for a minimum 10 years from Windows 8 release ASP.Net 1.0 released.
Copyright 1999 G.v. Bochmann ELG 7186C ch.1 1 Course Notes ELG 7186C Formal Methods for the Development of Real-Time System Applications Gregor v. Bochmann.
SQL Database Management
Architecture Review 10/11/2004
Affinity Depending on the application and client requirements of your Network Load Balancing cluster, you can be required to select an Affinity setting.
Chapter 8: Monitoring the Network
Presentation transcript:

Presenter: Chi-Hung Lu 1

Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments Protocols involve complex interactions among a collection of networked machines Need to handle failures ranging from network problems to crashing nodes Intricate sequences of events can trigger complex errors as a result of mishandled corner cases 2

Approaches Logging-based Debugging X-Trace Bi-directional Distributed BackTracker (BDB) Pip Deterministic Replay WiDS Friday Jockey Model Checking MaceMC 3

R. Fonseca et al, NSDI 07 4

Problem Description It is difficult to diagnose the source of the problem for an internet application Current network diagnostic tools only focus on one particular protocol Does not share information on the application between the user, service, and the network operators 5

Examples traceroute Could locate IP connectivity problem Could not reveal proxy or DNS failures HTTP monitoring suite Could locate application problem Could not diagnose routing problems 6

Examples 7 User DNS Server Proxy Web Server

Examples 8 User DNS Server Proxy Web Server

Examples 9 User DNS Server Proxy Web Server

Examples 10 User DNS Server Proxy Web Server

X-Trace An integrated tracing framework Record the network path that were taken Invoke X-Trace when initiating an application task Insert X-Trace metadata with a task identifier in the request Propagate the metadata down to lower layers through protocol interfaces 11

Task Tree X-Trace tags all network operations resulting from a particular task with the same task identifier Task tree is the set of network operations connected with an initial task Task tree could be reconstruct after collecting trace data with reports 12

An example of the task tree A simple HTTP request through a proxy 13

X-Trace Components Data X-Trace metadata Network path Task tree Report Reconstruct task tree 14

Propagation of X-Trace Metadata The propagation of X-Trace metadata through the task tree 15

Propagation of X-Trace Metadata The propagation of X-Trace metadata through the task tree 16

The X Trace metadata FieldUsage FlagsBits that specify which of the three optional components are present TaskIDAn unique integer ID TreeInfoParentID, OpID, EdgeType DestinationSpecify the address that X-Trace report should be sent to OptionsAccommodate future extensions mechanism 17

Operation of X-Trace Metadata 18

Operation of X-Trace Metadata 19

X-Trace Report Architecture 20

X-Trace Report Architecture 21

X-Trace Report Architecture 22

Usage Scenario (1) Web request and recursive DNS queries 23

Usage Scenario (2) A request fault annotated with user input 24

Usage Scenario (3) A client and a server communicate over I3 overlay network 25

Usage Scenario (3) Internet Indirect Infrastructure (I3) 26

Usage Scenario (3) Internet Indirect Infrastructure (I3) 27

Usage Scenario (3) Internet Indirect Infrastructure (I3) 28

Usage Scenario (3) Tree for normal operation 29

Usage Scenario (3) The receiver host fails 30

Usage Scenario (3) Middlebox process crash 31

Usage Scenario (3) The middlebox host fails 32

Discussion Report loss Non-tree request structures Partial deployment Managing report traffic Security Considerations 33

X. Liu et al, NSDI 07 34

Problem Description Log mining is both labor-intensive and fragile Latent bugs often are distributed across multiple nodes Logs reflect incomplete information of an execution Non-determinism of distributed application 35

Goals Efficiently verify application properties Provide fairly complete information about an execution Reproduce the buggy runs deterministically and faithfully 36

Approach Log the actual execution of a distributed system Apply predicate checking in a centralized simulator over a run driven by testing scripts or replayed by logs Output violation report along with message traces An execution is interpreted as a sequence of events, which are dispatched to corresponding handling routines 37

Components A versatile script language Allow a developer to refine system properties into straightforward assertions A checker Inspect for violations 38

Architecture Components of WiDS Checker 39

Architecture Reproduce real runs Log all non-deterministic events using Lamport’s logical clock Check user-defined predicates A versatile scription language to specify system states being observed and the predicates for invariants and correctness Screen out false alarms with auxiliary information For liveness properties Trace root causes using a visualization tool 40

Programming with WiDS WiDS APIs are mostly member function of the WiDSObject class WiDS runtime maintains an event queue to buffer pending events and dispatches them to corresponding handling routines 41

Enabling Replay Logging Log all WiDS nondeterminism Redirect OS calls and log the results Embed a Lamport Clock in each out-going message Checkpoint Support partial replay Save the WiDS process context Replay Start from the beginning or a checkpoint Replay events in serialized Lamport order 42

Checker Observe memory state Define states and evaluate predicates Refresh database for each event Maintain history Re-evaluate modified predicates Auxiliary information for violations Liveness properties only guarantee to be true eventually 43

44

45

46

Visualization Tools Message flow graph 47

Evaluation Benchmark and result summary 48

Performance Running time for evaluating predicates 49

Logging Overhead Percentage of logging time 50

Discussion System is debugged by those who developed it Bugs are hunted by those who are intimately familiar with the system 51