Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paraglide Martin Vechev Eran Yahav Martin Vechev Eran Yahav.

Similar presentations


Presentation on theme: "Paraglide Martin Vechev Eran Yahav Martin Vechev Eran Yahav."— Presentation transcript:

1 Paraglide Martin Vechev Eran Yahav Martin Vechev Eran Yahav

2 Synthesizing System-Level Software Requirements Correctness Scalability Response time Requirements Correctness Scalability Response time Challenges Crossing abstraction levels Hardware complexity Time to market Challenges Crossing abstraction levels Hardware complexity Time to market

3 Highly Concurrent Algorithms Parallel pattern matching Anomaly detection Parallel pattern matching Anomaly detection Voxel trees Polyhedrons … Voxel trees Polyhedrons … Scene graph traversal Physics simulation Collision Detection … Scene graph traversal Physics simulation Collision Detection … Cartesian tree (fast fits) Lock-free queue Garbage collection … Cartesian tree (fast fits) Lock-free queue Garbage collection …

4 Goal Generate efficient provably correct components of concurrent systems from higher-level specs  Verification/checking integrated into the design process  Automatic exploration of implementation details Synthesize critical components  System-level code  Explore tradeoffs Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth

5 Implementation ?? Manual Construction Hard to verify/test Often buggy Did the programmer choose well?? One time deal Memory Model Thread Model Concurrency Primitives CPU primitives … Optimistic concurrency Adding metadata Adding space … ENVIRONMENTREQUIREMENTSBAG OF TRICKS Throughput Memory Consumption Pause Time … High(er) level description SYSTEM SPEC Current Approach: Manual Construction

6 Memory Model Thread Model Concurrency Primitives CPU primitives … Optimistic concurrency Adding metadata Adding space … Implementation ENVIRONMENTREQUIREMENTSBAG OF TRICKS ?? Throughput Memory Consumption Pause Time … Implementation Alternative impls Our Vision Machine Assistance Auto checking/verification Auto exploration of implementation details Repeatable Machine Assistance Auto checking/verification Auto exploration of implementation details Repeatable High(er) level description SYSTEM SPEC

7 Example: Concurrent Set Algorithm Systematically derived with machine assistance Correctness – automatically verified Performance – only uses CAS Systematically derived with machine assistance Correctness – automatically verified Performance – only uses CAS

8 Why Should You Care? Correctness  Checking/verification integrated into the design process Performance  Systematic exploration beats human in crossing levels of abstraction, leveraging non-intuitive memory models, etc.  Systematic exploration produces many candidates with varying tradeoffs Adaptability  Shorter development cycle for adapting system to a new environment

9 Correctness  Checking/verification integrated into the design process Performance  Systematic exploration beats human in crossing levels of abstraction, leveraging non-intuitive memory models, etc.  Systematic exploration produces many candidates with varying tradeoffs Adaptability  Shorter development cycle for adapting system to a new environment Why Should You Care?

10 Why There is Hope? Designer effort  Provide insights that are also required in manual construction Correctness  Checking helps eliminate large number of incorrect candidates  Designer can focus on remaining candidates Performance  … Adaptability  …

11 Why There is Hope II ? Transformational derivation  Concurrent garbage collection algorithms [PLDI’06] Combinatorial exploration  Concurrent GC algorithms [PLDI’07]  Concurrent set algorithms [PLDI’08] Automatic Verification  Comparison under Abstraction for Verifying Linearizability [Amit, CAV’07]  Shape Analysis for Concurrent Programs [TAU]  …

12 Risk Summary Designer Effort  Return on designer “investment”  Is the result competitive with manually crafted system?  Is the tool working in the right level of abstraction? Verification  scalability

13 Outline Technical details  Commonalities between concurrent algorithms  Adapting to a changing environment  Preliminary experience: our combinatorial approach Plan  Succeed Early Many open questions  Common representation  “more efficient”  …

14 Ben-Ari Base ‘84 Dijkstra(C) ‘78 Doligez(C) ‘93 Azatchi ‘03 Domani ‘03 Yuasa ‘90 Pixley ‘88 Ben-Ari Base ‘84 Doligez ‘94 Ben-Ari Extended ‘84 Steele(C) ‘75 Boehm ‘91 Barabash ‘03 ‘03 ALGORITHMS PROOFS Example: “The Origin of GCs” Incorrect Correct (C) Corrected FAMILY

15 Example: Concurrent Set Algorithms Harris ‘01 Michael ‘02 Heller ‘05 Valois ‘95 Ruppert ‘04 Massalin ‘91 Greenwald ‘99

16 Adapting to a Changing Environment Algorithm Synch primitives Memory model Thread model Memory manager Scheduler … …

17 Families of algorithms sharing a common skeleton with parametric functions Trace Step Mutator Step Expose Mutator Collector Machine Assisted Design Process

18

19 Overview High-level designFind a sufficient local invariant Find a sufficient abstraction Low-level searchVerify local invariant High-level designFind algorithm outline Find building blocks Low-level searchexplore algorithm space Generation Verification

20 { M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new } { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) Coarse-Grained to Fine-Grained Synchronization What now ? Can we remove atomics ? Result is incorrect, may lose objects! atomic

21 { M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new } { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) What now ? Can we remove atomics ? Coarse-Grained to Fine-Grained Synchronization

22 { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { M1: old = source.field M2: w = source.field.WF M5: w  old.MC-- M3: w  new.MC++ M4: w  log = log U {new} M6: source.fld = new } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) What now ? Can we remove atomics ? “When in doubt, use brute force.” --Ken Thompson “When in doubt, use brute force.” --Ken Thompson Coarse-Grained to Fine-Grained Synchronization

23 Tracing Step Building Blocks Mutator Building Blocks Expose Building Blocks M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new C1: dst = source.field C3: mark dst C2: source.field.WF = true E1: o= remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} System Input – Building Blocks Input Constraints Mutator blocks: [M3, M4] Tracing blocks: [C1, C3] Expose blocks: [ E1, E2, E3, E4 ] Dataflow e.g. M2 < M3

24 System Output – (Verified) Algorithms Mutator Step (source, field, new) { M1: old = source.field M6: source.fld = new M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC— } Set Expose(log) { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} } Trace Step (source, field) { C1: dst = source.field C3: mark dst C2: source.field.WF = true } Explored 306 variations in around 2 mins Least atomic (verified) algorithm with given blocks

25 But What Now ? How do we get further improvement? Need more insights Need new building blocks  Example: start and end of collector reading a field Coordination Meta-data AtomicityOrdering

26 Continuing the Search… We derived a non-atomic algorithm (at the granularity of blocks)  Non atomic write-barrier, collector step and expose  System explored over 1,600,000 algorithms (took ~34 hours) All experiments took ~41 machine hours and ~3 human hours

27 Plan Identify application domain Case studies  Concurrent garbage collection algorithms  Concurrent set algorithms  Concurrent memory allocator (used in metronome)  … Dynamic tool for testing systems (ParaDyn) Abstraction-guided synthesis Automatic verification using local abstractions Representation Choosing the right starting point

28 Highly Concurrent Plan Identify application domain Case studies  Concurrent garbage collection algorithms  Concurrent set algorithms  Concurrent memory allocator (used in metronome)  … Dynamic tool for testing systems (ParaDyn) Representation Choosing the right starting point … Abstraction-guided synthesis Automatic verification using local abstractions

29 Succeed Early Choose “the right” domain  Correctness is critical  High performance  Highly dynamic (concurrent changes)  Custom architecture (?)  Irregular structures (?)  Workloads unknown at compile time  Examples: VM components, drivers for embedded devices…

30 Longer-term Questions

31 Representation Appropriate for transformation? Makes concurrency apparent?

32 Choosing the Right Starting Point? “Higher-level specification” ? A sequential program? start with something else? Add(S,x): S’ = S  { x } Remove(S,x): S’ = S  { x } Contains(S,x): x  S

33 What is “More Efficient”? Multiple dimensions  Scalability  Response time  … Theoretical models exist  Disjoint-access parallelism  … Not clear whether existing theoretical models capture reality

34 Abstraction-Guided Synthesis Guarantee correctness  synthesize only programs that can be proved with your abstraction

35 Summary Machine assisted design and implementation of correct efficient highly-concurrent algorithms Designer provides insights, system explores implementation details Business impact  Change the way concurrent systems are built  (More) Reliable high-performance systems. Shorter time to market Scientific impact  Realistic semi-automated synthesis of concurrent systems

36 Why us? Our team has expertise in concurrency and verification of concurrent systems We have preliminary experience with synthesizing concurrent algorithms in the domain of concurrent garbage collectors We have ongoing collaborations with world experts on verification of concurrent programs, and with researchers working on parallel computing

37 THE END

38 Parallelization Higher-level Underlying structure does not change during computation System can be broken into independent parts

39 Synthesizing Concurrent Systems Designing practical and efficient concurrent systems is hard  trading off simplicity for performance  fine-grained coordination Result: sub-optimal, buggy algorithms Need a more structured approach to synthesize correct and optimal implementations out of coarse-grained specifications Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth


Download ppt "Paraglide Martin Vechev Eran Yahav Martin Vechev Eran Yahav."

Similar presentations


Ads by Google