Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay.

Slides:



Advertisements
Similar presentations
The Quest for Correctness Joseph Sifakis VERIMAG Laboratory 2nd Sogeti Testing Academy April 29th 2009.
Advertisements

Modeling and Analyzing Periodic Distributed Computations Anurag Agarwal Vijay Garg Vinit Ogale The University.
Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
Synchronization.
CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.
Scalable Algorithms for Global Snapshots in Distributed Systems
1 Model checking. 2 And now... the system How do we model a reactive system with an automaton ? It is convenient to model systems with Transition systems.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Virtual Time “Virtual Time and Global States of Distributed Systems” Friedmann Mattern, 1989 The Model: An asynchronous distributed system = a set of processes.
Lecture 8: Asynchronous Network Algorithms
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Uncoordinated Checkpointing The Global State Recording Algorithm.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Efficient Reachability Analysis for Verification of Asynchronous Systems Nishant Sinha.
CS542 Topics in Distributed Systems Diganta Goswami.
An Automata-based Approach to Testing Properties in Event Traces H. Hallal, S. Boroday, A. Ulrich, A. Petrenko Sophia Antipolis, France, May 2003.
Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Model Checking. Used in studying behaviors of reactive systems Typically involves three steps: Create a finite state model (FSM) of the system design.
Ordering and Consistent Cuts
CS603 Process Synchronization February 11, Synchronization: Basics Problem: Shared Resources –Generally data –But could be others Approaches: –Model.
Ordering and Consistent Cuts Presented by Chi H. Ho.
1 Formal Engineering of Reliable Software LASER 2004 school Tutorial, Lecture1 Natasha Sharygina Carnegie Mellon University.
Chapter 10 Global Properties. Unstable Predicate Detection A predicate is stable if, once it becomes true it remains true Snapshot algorithm is not useful.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Formal Verification of SpecC Programs using Predicate Abstraction Himanshu Jain Daniel Kroening Edmund Clarke Carnegie Mellon University.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Cheng/Dillon-Software Engineering: Formal Methods Model Checking.
D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September Synchronizing the timestamps of concurrent events.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Control of FACTS Devices Using a Transportation Model Bruce McMillin Computer Science Mariesa Crow Electrical and Computer Engineering University.
Yang Liu, Jun Sun and Jin Song Dong School of Computing National University of Singapore.
Survey on Trace Analyzer (2) Hong, Shin /34Survey on Trace Analyzer (2) KAIST.
On Reducing the Global State Graph for Verification of Distributed Computations Vijay K. Garg, Arindam Chakraborty Parallel and Distributed Systems Laboratory.
Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Analysis of Concurrent Software Models Using Partial Order Views Qiang Sun, Yuting Chen,
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
1 Distributed Process Management Chapter Distributed Global States Operating system cannot know the current state of all process in the distributed.
1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal Vijay K. Garg
“Virtual Time and Global States of Distributed Systems”
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Program correctness The State-transition model A global states S  s 0 x s 1 x … x s m {s k = set of local states of process k} S0  S1  S2  Each state.
Hwajung Lee. The State-transition model The set of global states = s 0 x s 1 x … x s m {s k is the set of local states of process k} S0  S1  S2  Each.
Software Systems Verification and Validation Laboratory Assignment 4 Model checking Assignment date: Lab 4 Delivery date: Lab 4, 5.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
ICDCS 2006 Efficient Incremental Optimal Chain Partition of Distributed Program Traces Selma Ikiz Vijay K. Garg Parallel and Distributed Systems Laboratory.
Agenda  Quick Review  Finish Introduction  Java Threads.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
Symbolic Model Checking of Software Nishant Sinha with Edmund Clarke, Flavio Lerda, Michael Theobald Carnegie Mellon University.
Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems On Building Reliable Concurrent Systems Vijay.
Distributed Systems Lecture 6 Global states and snapshots 1.
runtime verification Brief Overview Grigore Rosu
Indranil Roy High Performance Computing (HPC) group
Efficient Decentralized Monitoring of Safety in Distributed Systems
Koushik Sen Abhay Vardhan Gul Agha Grigore Rosu
Detecting Temporal Logic Predicates on Distributed Computations
Model Checking for an Executable Subset of UML
Exploiting Predicate Structure for Efficient Reachability Detection
湖南大学-信息科学与工程学院-计算机与科学系
Time And Global Clocks CMPT 431.
Efficient Incremental Optimal Chain Partition of Distributed Program Traces Selma Ikiz Vijay K. Garg Parallel and Distributed Systems Laboratory.
Breakpoints and Halting in Distributed Systems
Producing short counterexamples using “crucial events”
Runtime Safety Analysis of Multithreaded Programs
Hints for Building Self-. Systems Vijay K
Presentation transcript:

Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay K. Garg Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX

2 Talk Outline  Motivation and Overview  Instrumentation – Clock : Tracking Dependency  Property Checking – Sensor : Detecting Global Properties – Slicer : Computation Slicing

3 Motivation: system-level verification  A system: a collection of HW and SW components – Processors, buses, bridges, memory controllers, etc. – Bus Functional Models (BFMs)  Verification becomes very important – Up to 80% of the design costs

4 Motivation: Reliable System  Concurrent systems are prone to errors. – Concurrency, nondeterminism, process and channel failures Techniques to ensure correctness  Modeling: Model Checking and Formal Verification  Bug Hunting: Simulation, Debugging and Verification  Fault-Tolerance

5 Paradise Environment ProgramMonitorSlicerPredicate Observe Control

6 Talk Outline  Motivation and Overview  Instrumentation – Clock : Tracking Dependency  Property Checking – Sensor : Detecting Global Properties – Slicer : Computation Slicing

7 Trace Model: Total Order vs Partial Order  Total order: interleaving of events in a trace  Partial order: Lamport’s happened-before model f2f2 e1e1 CS 2 CS 1 f1f1 e2e2 P1P1 P2P2 Partial Order Trace CS 2 CS 1 e1e1 e2e2 f1f1 f2f2 e2e2 e1e1 CS 2 f1f1 f2f2 Successful Trace Specification: CS 1 Λ CS 2 ¬CS 2 ¬CS 1 ¬CS 2 ¬CS 1 ¬ CS 2 Faulty Trace 

8 Tracking Dependency computation: a set of events ordered by “happened before” relation  Problem: Timestamp events to answer – e happened before f ? – e concurrent with f ?

9 Clocks in a Distributed System Result: s happened before t i the vector at s is less than the vector at t. Vector Clocks [Fidge 89, Mattern 89] P1P1 (1,0,0)(2,1,0)(3,1,0) P2P2 (0,1,0)(0,2,0) P3P3 (0,0,1)(0,0,2)(2,1,3)

10 Dynamic Chain Clocks  Problem with vector clocks: scalability, dynamic process structure  Idea: Computing the “chains” in an online fashion [Aggarwal and Garg PODC 05] for relevant events a f e b d c h g abcd e fgh A computation with 4 processes The relevant subcomputation P1P2P3P4P1P2P3P4

11 Experimental Results Simulation of a computation with 1% relevant events Measured – number of components vs number of threads – total time overhead vs number of threads

12 Talk Outline  Motivation and Overview  Instrumentation – Clock : Tracking Dependency  Property Checking – Sensor : Detecting Global Properties – Slicer : Computation Slicing

13 Global Property Detection Predicate: A global condition expressed using variables on processes – e.g., more than one process is in critical section, there is no token in the system Problem: find a global state that satisfies the given predicate P1P1 P2P2 G1G1 G2G2 Critical section

14 The Main Difficulty in Partial Order Algorithm for general predicate [Cooper and Marzullo 91] Too many global states : A computation may contain as many as O(k n ) global states k: maximum number of events on a process n: number of processes e1e1 e2e2 f1f1 f2f2 T ┴ P1P1 P2P2 {e 1, ┴ } {f 1, ┴ } {e 1, f 1, ┴ } {e 2, e 1, f 1, ┴ } {e 2, e 1, f 2, f 1, ┴ {e 1, f 2, f 1, ┴ } {e 2, e 1, ┴ } {┴}{┴}

15 Efficient Predicate Detection for Special Cases  stable predicate: [Chandy and Lamport 85] once the predicate becomes true, it stays true e.g., deadlock  unstable predicate: observer independent predicate [Charron-Bost et al 95] occurs in one interleaving  occurs in all interleavings e.g., any disjunction of local predicate linear predicate [Chase and Garg 95] e.g., conjunctive predicates such as there is no leader in the system relational predicate: x1 + x2 +…+ xn ≥ k [Chase and Garg 95] e.g., violation of k-mutual exclusion

16 Algorithms for Conjunctive Predicates Centralized Algorithm [Garg and Waldecker 92] Each non-checker process maintains its local vector and sends to the checker process the chain clock whenever – local predicate is true – at most once in each message interval. Time complexity: Checker requires at most O(n 2 m) comparisons. – token based algorithm [Garg and Chase 95] – completely distributed algorithm [Garg and Chase 95] – keeping queues shorter [Chiou and Korfhage 95] – avoiding control messages [Hurfin, Mizuno, Raynal, Singhal 96]

17 Other Special Classes of Predicates  Relational Predicates – Let x i : number of token at P i – Σ x i < k: loss of tokens – Algorithms: max-flow techniques [Groselj 93, Chase and Garg 95, Wu and Chen 98] – Dilworth's partition [Tomlinson and Garg 96]

18 Talk Outline  Motivation and Overview  Instrumentation – Clock : Tracking Dependency  Property Checking – Sensor : Detecting Global Properties – Slicer : Computation Slicing

19 The Main Idea of Computation Slicing Partial order trace slice state explosion keep all red global states slicing

20 How does Computation Slicing Help? Partial order trace slice retain all global states satisfying b 1 slicing for b 1 check b 1 Λ b 2 check b 2 satisfy b 1

21 Example  Detect predicate (x*y + z < 5) Λ (x ≥1) Λ (z ≤ 3) P1P1 P2P2 P3P3 x y z a 1 b 2 c d 0 e 0 f 2 g 1 h 3 u 4 v 1 w 2 x 4 {a,e,f,u,v} {b} {w}{g} Computation Slice with respect to (x ≥1) Λ (z ≤3)

22 Computation Slice computation slice: a sub-computation such that: [Mittal and Garg 01] 1.it contains all global states of the computation satisfying the given predicate, and 2.it contains the least number of global states

23 POTA Architecture [Sen Dissertation 04] Instrumentor Specification Slicer Predicate Detector Trace Slice Predicate (Specification) Translator Execute Program Execute SPIN Program Instrumented Program Promela TraceSlice yes/ witness no/ counter example no/ counter example yes Analyzer

24 Results Efficient polynomial-time algorithms for computing the slice for: – linear predicates: [Garg and Mittal 01] time-complexity: O(n 2 m) – general predicate: Theorem: Given a computation, if a predicate b can be detected efficiently then the slice for b can also be computed efficiently. [Mittal,Sen and Garg 03] – combining slices: Boolean operators – temporal logic operators: EF, AG, EG – approximate slice: For arbitrary boolean expression n: number of processes m: number of events

25 Experiments: Dining Philosophers Trace Verification  POTA: Partial Order Trace Analyzer (based on slicing) [Sen and Garg 03]  SPIN: A widely used model checking tool [Holzmann 97] – SPIN: 250 seconds for n = 6, runs out of memory for n > 6. – POTA: can handle n= 200. Used 400 seconds. Predicate: Two neighboring dining philosophers do not eat concurrently

26 Conclusions  Bug-hunting in concurrent systems  Total order vs. Partial Order  Abstraction like slicing to combat state space explosion problem

27 Questions ??