A Performance Study of BDD-Based Model Checking Bwolen Yang Randal E. Bryant, David R. O’Hallaron, Armin Biere, Olivier Coudert, Geert Janssen Rajeev K.

Slides:

Advertisements

Similar presentations

Model Checking Lecture 4. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.

Advertisements

Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.

Fast Algorithms For Hierarchical Range Histogram Constructions

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Ensuring Robustness via Early- Stage Formal Verification Multicore Power Management: Anita Lungu *, Pradip Bose **, Daniel Sorin *, Steven German **, Geert.

Efficient Reachability Analysis for Verification of Asynchronous Systems Nishant Sinha.

1 Error-Free Garbage Collection Traces: How to Cheat and Not Get Caught ACM SIGMETRICS, 2002.

Fundamentals of Python: From First Programs Through Data Structures

A Performance Study of BDD-Based Model Checking Bwolen Yang Randal E. Bryant, David R. O’Hallaron, Armin Biere, Olivier Coudert, Geert Janssen Rajeev K.

Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison

Memory System Characterization of Big Data Workloads

An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Finite State Machine State Assignment for Area and Power Minimization Aiman H. El-Maleh, Sadiq M. Sait and Faisal N. Khan Department of Computer Engineering.

Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms Lin Huang, Feng Yuan and Qiang Xu Reliable Computing Laboratory Department.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Chapter 3: The Efficiency of Algorithms Invitation to Computer Science, C++ Version, Third Edition Additions by Shannon Steinfadt SP’05.

SAT-Based Decision Procedures for Subsets of First-Order Logic

Sanjit A. Seshia and Randal E. Bryant Computer Science Department

1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.

1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.

Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.

Chapter 3: The Efficiency of Algorithms Invitation to Computer Science, C++ Version, Fourth Edition.

1 Using A Multiscale Approach to Characterize Workload Dynamics Characterize Workload Dynamics Tao Li June 4, 2005 Dept. of Electrical.

5/6/2004J.-H. R. Jiang1 Functional Dependency for Verification Reduction & Logic Minimization EE290N, Spring 2004.

An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.

Logic Verification 1 Outline –Logic Verification Problem –Verification Approaches –Recursive Learning Approach Goal –Understand verification problem –Understand.

ECE Synthesis & Verification - Lecture 10 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Systems Binary.

Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.

Optimizing Symbolic Model Checking for Constraint-Rich Systems Randal E. Bryant Bwolen Yang, Reid Simmons, David R. O’Hallaron Carnegie Mellon University.

Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.

Realistic CPU Workloads Through Host Load Trace Playback Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.

Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.

Exploiting Prolific Types for Memory Management and Optimizations By Yefim Shuf et al.

Genetic Algorithm.

Digitaalsüsteemide verifitseerimise kursus1 Formal verification: BDD BDDs applied in equivalence checking.

Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.

@ Carnegie Mellon Databases Inspector Joins Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki 2 Carnegie Mellon University Intel Research.

1 CSE 326: Data Structures Sorting It All Out Henry Kautz Winter Quarter 2002.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.

CISC Machine Learning for Solving Systems Problems Presented by: Alparslan SARI Dept of Computer & Information Sciences University of Delaware

Algorithmic Software Verification V &VI. Binary decision diagrams.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,

1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.

Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.

CAS 721 Course Project Implementing Branch and Bound, and Tabu search for combinatorial computing problem By Ho Fai Ko ( )

1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

Chapter 10 Verification and Validation of Simulation Models

Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.

On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View BDDs.

A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.

A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International.

Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

To Split or to Conjoin: The Question in Image Computation 1 {mooni, University of Colorado at Boulder 2 Synopsys.

Sunpyo Hong, Hyesoon Kim

Solving problems by searching A I C h a p t e r 3.

Chapter 10 Verification and Validation of Simulation Models

Discrete Event Simulation - 4

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

Stochastic Planning using Decision Diagrams

Aiman H. El-Maleh Sadiq M. Sait Syed Z. Shazli

Discrete Controller Synthesis

Fast Min-Register Retiming Through Binary Max-Flow

D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin

Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project

Presentation transcript:

A Performance Study of BDD-Based Model Checking Bwolen Yang Randal E. Bryant, David R. O’Hallaron, Armin Biere, Olivier Coudert, Geert Janssen Rajeev K. Ranjan, Fabio Somenzi

2 Motivation for Studying Model Checking (MC) MC is an important part of formal verification á digital circuits and other finite state systems BDD is an enabling technology for MC Not well studied Packages are tuned using combinational circuits (CC) Qualitative differences between CC and MC computations á CC: build outputs, constant time equivalence checking MC: build model, many fixed-points to verify the specs á CC: BDD algorithms are polynomial MC: key BDD algorithms are exponential

3 Outline BDD Overview Organization of this Study á participants, benchmarks, evaluation process Experimental Results á performance improvements á characterizations of MC computations BDD Evaluation Methodology á evaluation platform » various BDD packages » real workload á metrics

4 BDD Overview BDD á DAG representation for Boolean functions á fixed order on Boolean variables Set Representation á represent set as Boolean function » an element’s value is true it is in the set Transition Relation Representation á set of pairs (current to next state transition) á each state variable is split into two copies: » current state and next state

5 BDD Overview (Cont’d) BDD Algorithms á dynamic programming á sub-problems (operations) » recursively apply Shannon decomposition á memoization: computed cache Garbage Collection á recycle unreachable (dead) nodes Dynamic Variable Reordering á BDD graph size depends on the variable order á sifting based » nodes in adjacent levels are swapped

6 Organization of this Study: Participants Armin Biere: ABCD Carnegie Mellon / Universität Karlsruhe Olivier Coudert : TiGeR Synopsys / Monterey Design Systems Geert Janssen: EHV Eindhoven University of Technology Rajeev K. Ranjan: CAL Synopsys Fabio Somenzi: CUDD University of Colorado Bwolen Yang: PBF Carnegie Mellon

7 Organization of this Study: Setup Metrics: 17 statistics Benchmark: 16 SMV execution traces á traces of BDD-calls from verification of » cache coherence, Tomasulo, phone, reactor, TCAS… á size » 6 million - 10 billion sub-operations » MB of memory Evaluation platform: trace driver á “drives” BDD packages based on execution trace

8 Organization of this Study: Evaluation Process Phase 1: no dynamic variable reordering Phase 2: with dynamic variable reordering hypothesize design experiments validate collect stats identify issues suggest improvements validate Iterative Process

9 Phase 1 Results: Initial / Final Conclusion: collaborative efforts have led to significant performance improvements speedups > 100: : : : 28

10 Computed Cache á effects of computed cache size á amounts of repeated sub-problems across time Garbage Collection á reachable / unreachable Complement Edge Representation á work á space Memory Locality for Breadth-First Algorithms Computed Cache á effects of computed cache size á amounts of repeated sub-problems across time Garbage Collection á reachable / unreachable Complement Edge Representation á work á space Memory Locality for Breadth-First Algorithms Phase 1: Hypotheses / Experiments

11 Phase 1: Hypotheses / Experiments (Cont’d) For Comparison ISCAS85 combinational circuits (> 5 sec, < 1GB) á c2670, c3540 á 13-bit, 14-bit multipliers based on c6288 Metrics depends only on the trace and BDD algorithms á machine-independent á implementation-independent

12 Computed Cache: Repeated Sub-problems Across Time Source of Speedup á increase computed cache size Possible Cause á many repeated sub-problems are far apart in time Validation á study the number of repeated sub-problems across user issued operations (top-level operations).

13 Hypothesis: Top-Level Sharing Hypothesis MC computations have a large number of repeated sub-problems across the top-level operations. Experiment á measure the minimum number of operations with GC disabled and complete cache. á compare this with the same setup, but cache is flushed between top-level operations.

14 Results on Top-Level Sharing flush: cache flushed between top-level operations Conclusion: large cache is more important for MC

15 Garbage Collection: Rebirth Rate Source of Speedup á reduce GC frequency Possible Cause á many dead nodes become reachable again (rebirth) » GC is delayed till the number of dead nodes reaches a threshold » dead nodes are reborn when they are part of the result of new sub-problems

16 Hypothesis: Rebirth Rate Hypothesis MC computations have very high rebirth rate. Experiment measure the number of deaths and the number of rebirths

17 Results on Rebirth Rate Conclusions á delay garbage collection á triggering GC should not base only on # of dead nodes á delay updating reference counts

18 BF BDD Construction On MC traces, breadth-first based BDD construction has no demonstrated advantage over traditional depth-first based techniques. Two packages (CAL and PBF) are BF based.

19 BF BDD Construction Overview Level-by-Level Access á operations on same level (variable) are processed together á one queue per level Locality group nodes of the same level together in memory Good memory locality due to BF ==> # of ops processed per queue visit must be high

20 Average BF Locality Conclusion: MC traces generally have less BF locality

21 Average BF Locality / Work Conclusion: For comparable BF locality, MC computations do much more work.

22 Phase 1: Some Issues / Open Questions Memory Management á space-time tradeoff » computed cache size / GC frequency á resource awareness » available physical memory, memory limit, page fault rate Top-Level Sharing á possibly the main cause for » strong cache dependency » high rebirth rate á better understanding may lead to » better memory management » higher level algorithms to exploit the pattern

23 Phase 2: Dynamic Variable Reordering BDD Packages Used á CAL, CUDD, EHV, TiGeR á improvements from phase 1 incorporated

24 Why is Variable Reordering Hard to Study Time-space tradeoff á how much time to spent to reduce graph sizes Chaotic behavior e.g., small changes to triggering / termination criteria can have significant performance impact Resource intensive á reordering is expensive á space of possible orderings is combinatorial Different variable order ==> different computation e.g., many “don’t-care space” optimization algorithms

25 Quality of Variable Order Generated Variable Grouping Heuristic á keep strongly related variables adjacent Reorder Transition Relation á BDDs for the transition relation are used repeatedly Effects of Initial Variable Order á with and without variable reordering Quality of Variable Order Generated Variable Grouping Heuristic á keep strongly related variables adjacent Reorder Transition Relation á BDDs for the transition relation are used repeatedly Effects of Initial Variable Order á with and without variable reordering Phase 2: Experiments Only CUDD is used

26 Variable Grouping Heuristic: Group Current / Next Variables Current / Next State Variables á for transition relation, state variable is split into two: » current state and next state Hypothesis Grouping the corresponding current- and next-state variables is a good heuristic. Experiment á for both with and without grouping, measure » work (# of operations) » space (max # of live BDD nodes) » reorder cost (# of nodes swapped with their children)

27 Results on Grouping Current / Next Variables All results are normalized against no variable grouping. Conclusion: grouping is generally effective

28 Effects of Initial Variable Order: Experimental Setup For each trace, á find a good variable ordering O á perturb O to generate new variable orderings » fraction of variables perturbed » distance moved á measure the effects of these new orderings with and without dynamic variable reordering

29 Effects of Initial Variable Order: Perturbation Algorithm Perturbation Parameters (p, d) á p: probability that a variable will be perturbed á d: perturbation distance Properties á in average, p fraction of variables is perturbed á max distance moved is 2d á (p = 1, d = infinity) ==> completely random variable order For each perturbation level (p, d) generate a number (sample size) of variable orders

30 Effects of Initial Variable Order: Parameters Parameter Values á p: (0.1, 0.2, …, 1.0) á d: (10, 20, …, 100, infinity) á sample size: 10 ==> for each trace, á 1100 orderings á 2200 runs (w/ and w/o dynamic reordering)

31 Effects of Initial Variable Order: Smallest Test Case Base Case (best ordering) á time: 13 sec á memory: 127 MB Resource Limits on Generated Orders á time: 128x base case á memory: 500 MB

32 Effects of Initial Variable Order: Result # of unfinished cases At 128x/500MB limit, “no reorder” finished 33%, “reorder” finished 90%. Conclusion: dynamic reordering is effective

33 Phase 2: Some Issues / Open Questions Computed Cache Flushing á cost Effects of Initial Variable Order á determine sample size Need New Better Experimental Design

34 BDD Evaluation Methodology Trace-Driven Evaluation Platform á real workload (BDD-call traces) á study various BDD packages á focus on key (expensive) operations Evaluation Metrics á more rigorous quantitative analysis

35 BDD Evaluation Methodology Metrics: Time elapsed time (performance) CPU time page fault rate BDD Alg time GC time reordering time # of GCs # of node swaps (reorder cost) memory usage # of ops (work) # of reorderings computed cache size

36 BDD Evaluation Methodology Metrics: Space memory usage # of GCs reordering time computed cache size max # of BDD nodes

37 Importance of Better Metrics Example: Memory Locality # of GCs # of reorderings computed cache size elapsed time (performance) CPU timepage fault rate cache miss rateTLB miss rate memory locality # of ops (work) without accounting for work, may lead to false conclusions

38 Summary Collaboration + Evaluation Methodology á significant performance improvements » up to 2 orders of magnitude á characterization of MC computation » computed cache size » garbage collection frequency » effects of complement edge » BF locality » reordering heuristic – current / next state variable grouping » effects of reordering the transition relation » effects of initial variable orderings á other general results (not mentioned in this talk) á issues and open questions for future research

39 Conclusions Rigorous quantitative analysis can lead to: á dramatic performance improvements á better understanding of computational characteristics Adopt the evaluation methodology by: á building more benchmark traces » for IP issues, BDD-call traces are hard to understand á using / improving the proposed metrics for future evaluation For data and BDD traces used in this study,

41 Computed Cache á effects of computed cache size á amounts of repeated sub-problems across time Garbage Collection á reachable / unreachable Complement Edge Representation á work á space Memory Locality for Breadth-First Algorithms Phase 1: Hypotheses / Experiments

42 Quality of Variable Order Generated Variable Grouping Heuristic á keep strongly related variables adjacent Reorder Transition Relation á BDDs for the transition relation are used repeatedly Effects of Initial Variable Order á for both with and without variable reordering Phase 2: Experiments Only CUDD is used

43 Benchmark “Sizes” Min Ops: minimum number of sub-problems/operations (no GC and complete cache) Max Live Nodes: maximum number of live BDD nodes

44 Phase 1: Before/After Cumulative Speedup Histogram 6 packages * 16 traces = 96 cases

45 Computed Cache Size Dependency Hypothesis The computed cache is more important for MC than for CC. Experiment Vary the cache size and measure its effects on work. á size as a percentage of BDD nodes á normalize the result to minimum amount of work necessary; i.e., no GC and complete cache.

46 Effects of Computed Cache Size # of ops: normalized to the minimum number of operations cache size: % of BDD nodes Conclusion: large cache is important for MC

47 Death Rate Conclusion: death rate for MC can be very high

48 Effects of Complement Edge Work Conclusion: complement edge only affects work for CC

49 Effects of Complement Edge Space Conclusion: complement edge does not affect space Note: maximum # of live BDD nodes would be better measure

50 Phase 2 Results With / Without Reorder 4 packages * 16 traces = 64 cases

51 Variable Reordering Effects of Reordering Transition Relation Only nodes swapped: number of nodes swapped with their children All results are normalized against variable reordering w/ grouping.

52 Quality of the New Variable Order Experiment á use the final variable ordering as the new initial order. á compare the results using the new initial order with the results using the original variable order.

53 Results on Quality of the New Variable Order Conclusion: qualities of new variable orders are generally good. 4 60

54 No Reorder (> 4x or > 500Mb)

55 > 4x or > 500Mb Conclusions: For very low perturbation, reordering does not work well. Overall, very few cases get finished.

56 > 32x or > 500Mb Conclusion: variable reordering worked rather well

57 Memory Out (> 512Mb) Conclusion: memory intensive at highly perturbed region.

58 Timed Out (> 128x) diagonal band from lower-left to upper-right

59 Issues and Open Questions: Cross Top-Level Sharing Potential experiment identify how far apart these repetitions are Why are there so many repeated sub-problems across top-level operations?

60 Issues and Open Questions: Inconsistent Cross-Platform Results For some BDD packages, results on machine A are 2x faster than B Other BDD packages, the difference is not as significant Probable Cause memory hierarchy This may shed some light on the memory locality issues.