Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay.

Similar presentations


Presentation on theme: "Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay."— Presentation transcript:

1 Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay K. Garg Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 email: garg@ece.utexas.edu

2 2 Talk Outline  Motivation and Overview  Instrumentation – Clock : Tracking Dependency  Property Checking – Sensor : Detecting Global Properties – Slicer : Computation Slicing

3 3 Motivation: system-level verification  A system: a collection of HW and SW components – Processors, buses, bridges, memory controllers, etc. – Bus Functional Models (BFMs)  Verification becomes very important – Up to 80% of the design costs

4 4 Motivation: Reliable System  Concurrent systems are prone to errors. – Concurrency, nondeterminism, process and channel failures Techniques to ensure correctness  Modeling: Model Checking and Formal Verification  Bug Hunting: Simulation, Debugging and Verification  Fault-Tolerance

5 5 Paradise Environment ProgramMonitorSlicerPredicate Observe Control

6 6 Talk Outline  Motivation and Overview  Instrumentation – Clock : Tracking Dependency  Property Checking – Sensor : Detecting Global Properties – Slicer : Computation Slicing

7 7 Trace Model: Total Order vs Partial Order  Total order: interleaving of events in a trace  Partial order: Lamport’s happened-before model f2f2 e1e1 CS 2 CS 1 f1f1 e2e2 P1P1 P2P2 Partial Order Trace CS 2 CS 1 e1e1 e2e2 f1f1 f2f2 e2e2 e1e1 CS 2 f1f1 f2f2 Successful Trace Specification: CS 1 Λ CS 2 ¬CS 2 ¬CS 1 ¬CS 2 ¬CS 1 ¬ CS 2 Faulty Trace 

8 8 Tracking Dependency computation: a set of events ordered by “happened before” relation  Problem: Timestamp events to answer – e happened before f ? – e concurrent with f ?

9 9 Clocks in a Distributed System Result: s happened before t i the vector at s is less than the vector at t. Vector Clocks [Fidge 89, Mattern 89] P1P1 (1,0,0)(2,1,0)(3,1,0) P2P2 (0,1,0)(0,2,0) P3P3 (0,0,1)(0,0,2)(2,1,3)

10 10 Dynamic Chain Clocks  Problem with vector clocks: scalability, dynamic process structure  Idea: Computing the “chains” in an online fashion [Aggarwal and Garg PODC 05] for relevant events a f e b d c h g abcd e fgh A computation with 4 processes The relevant subcomputation P1P2P3P4P1P2P3P4

11 11 Experimental Results Simulation of a computation with 1% relevant events Measured – number of components vs number of threads – total time overhead vs number of threads

12 12 Talk Outline  Motivation and Overview  Instrumentation – Clock : Tracking Dependency  Property Checking – Sensor : Detecting Global Properties – Slicer : Computation Slicing

13 13 Global Property Detection Predicate: A global condition expressed using variables on processes – e.g., more than one process is in critical section, there is no token in the system Problem: find a global state that satisfies the given predicate P1P1 P2P2 G1G1 G2G2 Critical section

14 14 The Main Difficulty in Partial Order Algorithm for general predicate [Cooper and Marzullo 91] Too many global states : A computation may contain as many as O(k n ) global states k: maximum number of events on a process n: number of processes e1e1 e2e2 f1f1 f2f2 T ┴ P1P1 P2P2 {e 1, ┴ } {f 1, ┴ } {e 1, f 1, ┴ } {e 2, e 1, f 1, ┴ } {e 2, e 1, f 2, f 1, ┴ {e 1, f 2, f 1, ┴ } {e 2, e 1, ┴ } {┴}{┴}

15 15 Efficient Predicate Detection for Special Cases  stable predicate: [Chandy and Lamport 85] once the predicate becomes true, it stays true e.g., deadlock  unstable predicate: observer independent predicate [Charron-Bost et al 95] occurs in one interleaving  occurs in all interleavings e.g., any disjunction of local predicate linear predicate [Chase and Garg 95] e.g., conjunctive predicates such as there is no leader in the system relational predicate: x1 + x2 +…+ xn ≥ k [Chase and Garg 95] e.g., violation of k-mutual exclusion

16 16 Algorithms for Conjunctive Predicates Centralized Algorithm [Garg and Waldecker 92] Each non-checker process maintains its local vector and sends to the checker process the chain clock whenever – local predicate is true – at most once in each message interval. Time complexity: Checker requires at most O(n 2 m) comparisons. – token based algorithm [Garg and Chase 95] – completely distributed algorithm [Garg and Chase 95] – keeping queues shorter [Chiou and Korfhage 95] – avoiding control messages [Hurfin, Mizuno, Raynal, Singhal 96]

17 17 Other Special Classes of Predicates  Relational Predicates – Let x i : number of token at P i – Σ x i < k: loss of tokens – Algorithms: max-flow techniques [Groselj 93, Chase and Garg 95, Wu and Chen 98] – Dilworth's partition [Tomlinson and Garg 96]

18 18 Talk Outline  Motivation and Overview  Instrumentation – Clock : Tracking Dependency  Property Checking – Sensor : Detecting Global Properties – Slicer : Computation Slicing

19 19 The Main Idea of Computation Slicing Partial order trace slice state explosion keep all red global states slicing

20 20 How does Computation Slicing Help? Partial order trace slice retain all global states satisfying b 1 slicing for b 1 check b 1 Λ b 2 check b 2 satisfy b 1

21 21 Example  Detect predicate (x*y + z < 5) Λ (x ≥1) Λ (z ≤ 3) P1P1 P2P2 P3P3 x y z a 1 b 2 c d 0 e 0 f 2 g 1 h 3 u 4 v 1 w 2 x 4 {a,e,f,u,v} {b} {w}{g} Computation Slice with respect to (x ≥1) Λ (z ≤3)

22 22 Computation Slice computation slice: a sub-computation such that: [Mittal and Garg 01] 1.it contains all global states of the computation satisfying the given predicate, and 2.it contains the least number of global states

23 23 POTA Architecture [Sen Dissertation 04] Instrumentor Specification Slicer Predicate Detector Trace Slice Predicate (Specification) Translator Execute Program Execute SPIN Program Instrumented Program Promela TraceSlice yes/ witness no/ counter example no/ counter example yes Analyzer

24 24 Results Efficient polynomial-time algorithms for computing the slice for: – linear predicates: [Garg and Mittal 01] time-complexity: O(n 2 m) – general predicate: Theorem: Given a computation, if a predicate b can be detected efficiently then the slice for b can also be computed efficiently. [Mittal,Sen and Garg 03] – combining slices: Boolean operators – temporal logic operators: EF, AG, EG – approximate slice: For arbitrary boolean expression n: number of processes m: number of events

25 25 Experiments: Dining Philosophers Trace Verification  POTA: Partial Order Trace Analyzer (based on slicing) [Sen and Garg 03]  SPIN: A widely used model checking tool [Holzmann 97] – SPIN: 250 seconds for n = 6, runs out of memory for n > 6. – POTA: can handle n= 200. Used 400 seconds. Predicate: Two neighboring dining philosophers do not eat concurrently

26 26 Conclusions  Bug-hunting in concurrent systems  Total order vs. Partial Order  Abstraction like slicing to combat state space explosion problem

27 27 Questions ??


Download ppt "Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay."

Similar presentations


Ads by Google