Paraglide Martin Vechev Eran Yahav Martin Vechev Eran Yahav.

Slides:



Advertisements
Similar presentations
TRAMP Workshop Some Challenges Facing Transactional Memory Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign.
Advertisements

Test Yaodong Bi.
CAFÉ: Scalable Task Pool with Adjustable Fairness and Contention Dmitry Basin, Rui Fan, Idit Keidar, Ofer Kiselov, Dmitri Perelman Technion, Israel Institute.
2009 – E. Félix Security DSL Toward model-based security engineering: developing a security analysis DSML Véronique Normand, Edith Félix, Thales Research.
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } P1() Challenge: Correct and Efficient Synchronization { ……………………………
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } T1() Challenge: Correct and Efficient Synchronization { ……………………………
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Parallel Programming Patterns Eun-Gyu Kim June 10, 2004.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Computer Abstractions and Technology
“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.
A Rely-Guarantee-Based Simulation for Verifying Concurrent Program Transformations Hongjin Liang, Xinyu Feng & Ming Fu Univ. of Science and Technology.
1 Eran Yahav Technion Joint work with Martin Vechev (ETH), Greta Yorsh (ARM), Michael Kuperstein (Technion), Veselin Raychev (ETH)
Lecture # 2 : Process Models
On-the-Fly Garbage Collection: An Exercise in Cooperation Edsget W. Dijkstra, Leslie Lamport, A.J. Martin and E.F.M. Steffens Communications of the ACM,
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
SBSE Course 3. EA applications to SE Analysis Design Implementation Testing Reference: Evolutionary Computing in Search-Based Software Engineering Leo.
SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms Martin T. Vechev Eran Yahav David F. Bacon University of Cambridge IBM T.J.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.
Deriving Linearizable Fine-Grained Concurrent Objects Martin Vechev Eran Yahav IBM T. J. Watson Research Center Martin Vechev Eran Yahav IBM T. J. Watson.
System Partitioning Kris Kuchcinski
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
Graph Algorithms. Overview Graphs are very general data structures – data structures such as dense and sparse matrices, sets, multi-sets, etc. can be.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
System/Software Testing
October 26, 2006 Parallel Image Processing Programming and Architecture IST PhD Lunch Seminar Wouter Caarls Quantitative Imaging Group.
S/W Project Management Software Process Models. Objectives To understand  Software process and process models, including the main characteristics of.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Real-Time Software Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.
Chapter 2 소프트웨어공학 Software Engineering 임현승 강원대학교
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 21. Review ANALYSIS PHASE (OBJECT ORIENTED DESIGN) Functional Modeling – Use case Diagram Description.
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
Testing and Verifying Atomicity of Composed Concurrent Operations Ohad Shacham Tel Aviv University Nathan Bronson Stanford University Alex Aiken Stanford.
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
1 Introduction to Software Engineering Lecture 1.
Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.
CGCExplorer: A Semi-Automated Search Procedure for Provably Correct Concurrent Collectors Martin Vechev Eran Yahav David Bacon University of CambridgeIBM.
Inferring Synchronization under Limited Observability Martin Vechev, Eran Yahav, Greta Yorsh IBM T.J. Watson Research Center (work in progress)
1 What is OO Design? OO Design is a process of invention, where developers create the abstractions necessary to meet the system’s requirements OO Design.
Automated and Modular Refinement Reasoning for Concurrent Programs Shaz Qadeer.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
SCOPE DEFINITION,VERIFICATION AND CONTROL Ashima Wadhwa.
Parallel Computing Presented by Justin Reschke
What is a software? Computer Software, or just Software, is the collection of computer programs and related data that provide the instructions telling.
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
Håkan Sundell Philippas Tsigas
Seminar in automatic tools for analyzing programs with dynamic memory
Parallel Programming Patterns
A Methodology for System-on-a-Programmable-Chip Resources Utilization
Specifying Multithreaded Java semantics for Program Verification
Real-time Software Design
Store Recycling Function Experimental Results
Synchronization trade-offs in GPU implementations of Graph Algorithms
Gabor Madl Ph.D. Candidate, UC Irvine Advisor: Nikil Dutt
Objective of This Course
Automatic Derivation, Integration and Verification
Applying Use Cases (Chapters 25,26)
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Paraglide Martin Vechev Eran Yahav Martin Vechev Eran Yahav

Synthesizing System-Level Software Requirements Correctness Scalability Response time Requirements Correctness Scalability Response time Challenges Crossing abstraction levels Hardware complexity Time to market Challenges Crossing abstraction levels Hardware complexity Time to market

Highly Concurrent Algorithms Parallel pattern matching Anomaly detection Parallel pattern matching Anomaly detection Voxel trees Polyhedrons … Voxel trees Polyhedrons … Scene graph traversal Physics simulation Collision Detection … Scene graph traversal Physics simulation Collision Detection … Cartesian tree (fast fits) Lock-free queue Garbage collection … Cartesian tree (fast fits) Lock-free queue Garbage collection …

Goal Generate efficient provably correct components of concurrent systems from higher-level specs  Verification/checking integrated into the design process  Automatic exploration of implementation details Synthesize critical components  System-level code  Explore tradeoffs Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth

Implementation ?? Manual Construction Hard to verify/test Often buggy Did the programmer choose well?? One time deal Memory Model Thread Model Concurrency Primitives CPU primitives … Optimistic concurrency Adding metadata Adding space … ENVIRONMENTREQUIREMENTSBAG OF TRICKS Throughput Memory Consumption Pause Time … High(er) level description SYSTEM SPEC Current Approach: Manual Construction

Memory Model Thread Model Concurrency Primitives CPU primitives … Optimistic concurrency Adding metadata Adding space … Implementation ENVIRONMENTREQUIREMENTSBAG OF TRICKS ?? Throughput Memory Consumption Pause Time … Implementation Alternative impls Our Vision Machine Assistance Auto checking/verification Auto exploration of implementation details Repeatable Machine Assistance Auto checking/verification Auto exploration of implementation details Repeatable High(er) level description SYSTEM SPEC

Example: Concurrent Set Algorithm Systematically derived with machine assistance Correctness – automatically verified Performance – only uses CAS Systematically derived with machine assistance Correctness – automatically verified Performance – only uses CAS

Why Should You Care? Correctness  Checking/verification integrated into the design process Performance  Systematic exploration beats human in crossing levels of abstraction, leveraging non-intuitive memory models, etc.  Systematic exploration produces many candidates with varying tradeoffs Adaptability  Shorter development cycle for adapting system to a new environment

Correctness  Checking/verification integrated into the design process Performance  Systematic exploration beats human in crossing levels of abstraction, leveraging non-intuitive memory models, etc.  Systematic exploration produces many candidates with varying tradeoffs Adaptability  Shorter development cycle for adapting system to a new environment Why Should You Care?

Why There is Hope? Designer effort  Provide insights that are also required in manual construction Correctness  Checking helps eliminate large number of incorrect candidates  Designer can focus on remaining candidates Performance  … Adaptability  …

Why There is Hope II ? Transformational derivation  Concurrent garbage collection algorithms [PLDI’06] Combinatorial exploration  Concurrent GC algorithms [PLDI’07]  Concurrent set algorithms [PLDI’08] Automatic Verification  Comparison under Abstraction for Verifying Linearizability [Amit, CAV’07]  Shape Analysis for Concurrent Programs [TAU]  …

Risk Summary Designer Effort  Return on designer “investment”  Is the result competitive with manually crafted system?  Is the tool working in the right level of abstraction? Verification  scalability

Outline Technical details  Commonalities between concurrent algorithms  Adapting to a changing environment  Preliminary experience: our combinatorial approach Plan  Succeed Early Many open questions  Common representation  “more efficient”  …

Ben-Ari Base ‘84 Dijkstra(C) ‘78 Doligez(C) ‘93 Azatchi ‘03 Domani ‘03 Yuasa ‘90 Pixley ‘88 Ben-Ari Base ‘84 Doligez ‘94 Ben-Ari Extended ‘84 Steele(C) ‘75 Boehm ‘91 Barabash ‘03 ‘03 ALGORITHMS PROOFS Example: “The Origin of GCs” Incorrect Correct (C) Corrected FAMILY

Example: Concurrent Set Algorithms Harris ‘01 Michael ‘02 Heller ‘05 Valois ‘95 Ruppert ‘04 Massalin ‘91 Greenwald ‘99

Adapting to a Changing Environment Algorithm Synch primitives Memory model Thread model Memory manager Scheduler … …

Families of algorithms sharing a common skeleton with parametric functions Trace Step Mutator Step Expose Mutator Collector Machine Assisted Design Process

Overview High-level designFind a sufficient local invariant Find a sufficient abstraction Low-level searchVerify local invariant High-level designFind algorithm outline Find building blocks Low-level searchexplore algorithm space Generation Verification

{ M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new } { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) Coarse-Grained to Fine-Grained Synchronization What now ? Can we remove atomics ? Result is incorrect, may lose objects! atomic

{ M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new } { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) What now ? Can we remove atomics ? Coarse-Grained to Fine-Grained Synchronization

{ C1: dst = source.field C2: source.field.WF = true C3: mark dst } { M1: old = source.field M2: w = source.field.WF M5: w  old.MC-- M3: w  new.MC++ M4: w  log = log U {new} M6: source.fld = new } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) What now ? Can we remove atomics ? “When in doubt, use brute force.” --Ken Thompson “When in doubt, use brute force.” --Ken Thompson Coarse-Grained to Fine-Grained Synchronization

Tracing Step Building Blocks Mutator Building Blocks Expose Building Blocks M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new C1: dst = source.field C3: mark dst C2: source.field.WF = true E1: o= remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} System Input – Building Blocks Input Constraints Mutator blocks: [M3, M4] Tracing blocks: [C1, C3] Expose blocks: [ E1, E2, E3, E4 ] Dataflow e.g. M2 < M3

System Output – (Verified) Algorithms Mutator Step (source, field, new) { M1: old = source.field M6: source.fld = new M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC— } Set Expose(log) { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} } Trace Step (source, field) { C1: dst = source.field C3: mark dst C2: source.field.WF = true } Explored 306 variations in around 2 mins Least atomic (verified) algorithm with given blocks

But What Now ? How do we get further improvement? Need more insights Need new building blocks  Example: start and end of collector reading a field Coordination Meta-data AtomicityOrdering

Continuing the Search… We derived a non-atomic algorithm (at the granularity of blocks)  Non atomic write-barrier, collector step and expose  System explored over 1,600,000 algorithms (took ~34 hours) All experiments took ~41 machine hours and ~3 human hours

Plan Identify application domain Case studies  Concurrent garbage collection algorithms  Concurrent set algorithms  Concurrent memory allocator (used in metronome)  … Dynamic tool for testing systems (ParaDyn) Abstraction-guided synthesis Automatic verification using local abstractions Representation Choosing the right starting point

Highly Concurrent Plan Identify application domain Case studies  Concurrent garbage collection algorithms  Concurrent set algorithms  Concurrent memory allocator (used in metronome)  … Dynamic tool for testing systems (ParaDyn) Representation Choosing the right starting point … Abstraction-guided synthesis Automatic verification using local abstractions

Succeed Early Choose “the right” domain  Correctness is critical  High performance  Highly dynamic (concurrent changes)  Custom architecture (?)  Irregular structures (?)  Workloads unknown at compile time  Examples: VM components, drivers for embedded devices…

Longer-term Questions

Representation Appropriate for transformation? Makes concurrency apparent?

Choosing the Right Starting Point? “Higher-level specification” ? A sequential program? start with something else? Add(S,x): S’ = S  { x } Remove(S,x): S’ = S  { x } Contains(S,x): x  S

What is “More Efficient”? Multiple dimensions  Scalability  Response time  … Theoretical models exist  Disjoint-access parallelism  … Not clear whether existing theoretical models capture reality

Abstraction-Guided Synthesis Guarantee correctness  synthesize only programs that can be proved with your abstraction

Summary Machine assisted design and implementation of correct efficient highly-concurrent algorithms Designer provides insights, system explores implementation details Business impact  Change the way concurrent systems are built  (More) Reliable high-performance systems. Shorter time to market Scientific impact  Realistic semi-automated synthesis of concurrent systems

Why us? Our team has expertise in concurrency and verification of concurrent systems We have preliminary experience with synthesizing concurrent algorithms in the domain of concurrent garbage collectors We have ongoing collaborations with world experts on verification of concurrent programs, and with researchers working on parallel computing

THE END

Parallelization Higher-level Underlying structure does not change during computation System can be broken into independent parts

Synthesizing Concurrent Systems Designing practical and efficient concurrent systems is hard  trading off simplicity for performance  fine-grained coordination Result: sub-optimal, buggy algorithms Need a more structured approach to synthesize correct and optimal implementations out of coarse-grained specifications Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth