D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin

A New Reachability Algorithm for Symmetric Multi-processor Architecture
D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin D. Dill, Stanford Formal Equivalence and Assertion-based Verification Workshop 2005

Outline Standard Reachability Analysis Multithreaded Reachability
Multithreaded Reachability in SMP machines Engineering Issues Results Conclusion and Future Work

Related Work Parallel Reachability Analysis: Stern and Dill [CAV, 97]
Stornetta and Brewer [DAC, 96] Yang, Hallaron [97] Heyman, Geist, Grumberg, Schuster [CAV, 00] Garavel, Mateescu, Smarandache [SPIN, 01] Pixley, Havlicek [03]

Reachability using BDD
[Burch et al. : 91] Partitioned Transition Relation Initial State I … … R1 Image computation Tr1 Tri Trn R2 Least Fixed Point Ri

Partitioned Reachability using POBDD
POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Initial States : I Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4

POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Initial States : I Local Fixed Point 1 Communicate from 1 -> 3 Communicate from 1 -> 4 Communicate from 1 -> 2 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4

POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Initial States : I Local Fixed Point 1 Local Fixed Point 2 Communicate from 2 -> 3 Communicate from 2 -> 1 Communicate from 2 -> 4 Local Fixed Point 3 Local Fixed Point 4 Similarly repeat for other partitions

POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Improvements: [Iyer et al. : 03] [Sahoo et al. : 04]

Motivation for Multi-threaded Approach
Scheduling Problem Increasing availability of powerful SMP machines Multi-threading is a way of achieving real parallelism in SMP machines

Multi-threaded Reachability [DAC 05]
Naïve parallelization Time Advantage: Parallel speedup Catch a bug faster than the sequential version Problems: Not much parallelism

Early Communication Time Advantage: Parallel speedup Finishes the reachability analysis faster Catches bug faster than the naive version Problems: Parallelism could be better

Early Communication and Partial Communication Time Advantage: Parallel speedup Finishes the reachability analysis faster Catches bug faster than the previous versions

Reachability in SMP Architecture
Time We find the bugs faster ! Improved parallelism Better parallel speedup

Engineering Issues Thread-safe BDD library Deterministic behavior
Smart thread scheduling

Sources of Non-determinism
Extensive memory based optimizations Pointer comparisons Hashing based on memory address Solutions: Deterministic Hashing Deterministic comparisons Thread 1 Thread 2 p = malloc (…) p = malloc (…) key = hash(p) if (p > p1) …

Sources of Non-determinism
Thread synchronization Solutions Synchronization based on deterministic count Number of ITE operations Number of Sift operations Thread 1 Thread 2 Image #n Image #n+1

Smart Thread Scheduling
Each processor has its own cache Thread is assigned to a processor The cache fills up with the thread’s memory usage. The same thread assigned to a different processor after sometime A large number of unnecessary cache miss when the thread use its previously used memory locations Solutions: Bind thread to a processor Leads to suboptimal throughput If the number of threads exceeds the number of processors CPU1 CPU2 Cache1 0x07ffd0 Cache2 Lookup 0x07ffd0 Cachemiss

BDD Performance : CUDD Vs New
Ckts BDD Statistics after Reachability Analysis (Static Order) P/F #img #nodes CUDD New Mem (MB) Cache hits Cache collision Time bpb F 10 1.8M 50M 41.0% 90.4% 18.6 61M 88.2% 26.3 eight P 47 79K 6.1M 42.9% 26.2% 0.8 7.5M 1.5 fru32 2 8K 9.2M 34.0% 28.4% 7.9 10.9M 28.9% 8.9 idu32 1 36K 6.6M 28.8% 5.0% 4.2 7.8M 28.7% 7.7% 4.5 usbphy 90K 6.4M 37.7% 16.6% 0.7 17.1%

BDD Performance : CUDD Vs New

Performance : Non-deterministic Vs Deterministic
Ckts Verification Time in Sec Non-deterministic Deterministic c1 T/O 227 c2 962 917 c3 809 62 c4 903 161 d1 13 d2 24 30 d3 84 100 d4 38 d5 37

Performance: Cache or Parallelism
Ckts Verification Time in Sec Uniprocessor Sequential In 8-way SMP Parallel c1 1570 286 227 d1 125 13 d2 180 39 30 d3 295 130 100 d4 176 60 38

Results on Industrial Circuits
Ckt Vis Seq POBDD Parallel Multi-threaded Approaches Parallel 8 CPUs Naïve Early Comm Early Comm + Partial Comm 1 CPU c1 371 T/O 286 227 c2 3346 1789 1564 93 917 c3 2540 228 62 c4 2236 2084 1174 161 509 d1 6 13 d2 10 11 45 39 30 d3 15 21 23 100 130 d4 60 38 d5 12 16 34 37

Results on public benchmarks
Ckt Vis Seq POBDD Parallel Multi-threaded Approaches Parallel 8 CPUs Naïve Early Comm Early Comm + Partial Comm 1 CPU spprod 891 61 53 93 510 440 am2910 T/O 281 122 204 386 356 palu 273 4 9 8 S1269b-1 3635 59 72 60 S1269b-5 2287 55 67 blackjck 1213 470 340 98 70

Results : Gantt charts Real execution traces from our multi-threaded reachability program

Conclusion and Future Work
Parallelize the Reachability Multi-threaded Reachability Better results Deterministic behavior Future Work Improve the parallelism further Study cache behavior

D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin

Similar presentations

Presentation on theme: "D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin

Similar presentations

Presentation on theme: "D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin"— Presentation transcript:

Similar presentations

About project

Feedback