Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.

Slides:



Advertisements
Similar presentations
Verification of architectural memory models by model checking Shaz Qadeer Compaq Systems Research Center
Advertisements

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Symmetric Multiprocessors: Synchronization and Sequential Consistency.
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Hongjin Liang and Xinyu Feng
The complexity of predicting atomicity violations Azadeh Farzan Univ of Toronto P. Madhusudan Univ of Illinois at Urbana Champaign.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
1 Model checking. 2 And now... the system How do we model a reactive system with an automaton ? It is convenient to model systems with Transition systems.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Reduction, abstraction, and atomicity: How much can we prove about concurrent programs using them? Serdar Tasiran Koç University Istanbul, Turkey Tayfun.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
(C) 2001 Daniel Sorin Correctly Implementing Value Prediction in Microprocessors that Support Multithreading or Multiprocessing Milo M.K. Martin, Daniel.
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
PARTIAL-COHERENCE ABSTRACTIONS FOR RELAXED MEMORY MODELS Presented by Michael Kuperstein, Technion Joint work with Martin Vechev, IBM Research and Eran.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Concurrent Executions on Relaxed Memory Models Challenges & Opportunities for Software Model Checking Rajeev Alur University of Pennsylvania Joint work.
1 CheckFence: Checking Consistency of Concurrent Data Types on Relaxed Memory Models Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of Computer.
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Memory Model Sensitive Analysis of Concurrent Data Types Sebastian Burckhardt Dissertation Defense University of Pennsylvania July 30, 2007.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.
Deriving Linearizable Fine-Grained Concurrent Objects Martin Vechev Eran Yahav IBM T. J. Watson Research Center Martin Vechev Eran Yahav IBM T. J. Watson.
1 Martin Vechev IBM T.J. Watson Research Center Joint work with: Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged Michael.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Joint work with Sebastian Burckhardt and Milo Martin UCLA, November.
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
Formal verification Marco A. Peña Universitat Politècnica de Catalunya.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
© 2007 GrammaTech, Inc. All rights reserved GrammaTech, Inc. 317 N Aurora St. Ithaca, NY Tel: Verifying.
1 Thread Synchronization: Too Much Milk. 2 Implementing Critical Sections in Software Hard The following example will demonstrate the difficulty of providing.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
JAVA MEMORY MODEL AND ITS IMPLICATIONS Srikanth Seshadri
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Pattern-based Synthesis of Synchronization for the C++ Memory Model Yuri Meshman, Noam Rinetzky, Eran Yahav 1.
Constraints Assisted Modeling and Validation Presented in CS294-5 (Spring 2007) Thomas Huining Feng Based on: [1]Constraints Assisted Modeling and Validation.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
Compositional Verification for System-on-Chip Designs SRC Student Symposium Paper 16.5 Nishant Sinha Edmund Clarke Carnegie Mellon University.
Agenda  Quick Review  Finish Introduction  Java Threads.
Evolution of C and C++ n C was developed by Dennis Ritchie at Bell Labs (early 1970s) as a systems programming language n C later evolved into a general-purpose.
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Concurrency 2 CS 2110 – Spring 2016.
An Operational Approach to Relaxed Memory Models
Håkan Sundell Philippas Tsigas
Memory Consistency Models
Department of Computer Science, University of Rochester
Memory Consistency Models
Specifying Multithreaded Java semantics for Program Verification
Practical Non-blocking Unordered Lists
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 22: Consistency Models, TM
Relaxed Consistency Part 2
Programming with Shared Memory Specifying parallelism
Don Porter Portions courtesy Emmett Witchel
Presentation transcript:

Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010 Joint work with Sebastian Burckhardt Sela Mador-Haim Milo Martin

Amir’s Influence on My Own Research A really temporal logic, FOCS 1989 Joint work with Tom Henzinger Written while visiting Weizmann in Spring 1989 Extension of LTL with real-time bounds Always ( p -> Eventually <5 q)

Moore’s Law: Transistor density doubles every 2 years Engine behind Computing Power

Software Challenge: How to assure increased performance? Past: More transistors per chip and faster clock rate Same program would execute faster on new processor Emerging Trend and Future: Parallel hardware (multi-cores) Programs must be concurrent Applications must be reprogrammed to use parallelism The free lunch is over: A fundamental turn towards concurrency in software Herb Sutter

Challenge: Exploiting Concurrency, Correctly Multi-threaded Software Shared-memory Multiprocessor Concurrent Executions Bugs How to specify and verify shared-memory concurrent programs?

Concurrency on Multiprocessors thread 1 x = 1 y = 1 Initially x = y = 0 thread 2 r1 = y r2 = x Standard Interleavings x = 1 y = 1 r1 = y r2 = x r1=r2=1 x = 1 r1 = y y = 1 r2 = x r1=0,r2=1 x = 1 r1 = y r2 = x y = 1 r1=0,r2=1 r1 = y x = 1 y = 1 r2 = x r1=0,r2=1 r1 = y x = 1 r2 = x y = 1 r1=0,r2=1 r1 = y r2 = x x = 1 y = 1 r1=r2=0 Can we conclude that if r1 = 1 then r2 must be 1 ? No! On “real” multiprocessors, possible to have r1=1 and r2=0

Architectures with Weak Memory Models  A modern multiprocessor does not enforce global ordering of all instructions for performance reasons  Lamport (1979): Sequential consistency semantics for multiprocessor shared memory  Considered too limiting, and many “relaxations” proposed  In theory: TSO, PSO, RMO, Relaxed …  In practice: Alpha, Intel x86, IBM 370, Sun SPARC, PowerPC, ARM … Main Memory cache

Programming with Weak Memory Models  Concurrent programming is already hard, shouldn’t the effects of weaker models be hidden from the programmer?  Mostly yes …  Safe programming using extensive use of synchronization primitives  Use locks for every access to shared data  Compilers use memory fences to enforce ordering  Not always …  Non-blocking data structures  Highly optimized library code for concurrency  Code for lock/unlock instructions

Programs (multi-threaded) System-level code Concurrency libraries Highly parallel hardware -- multicores, SoCs Application level concurrency model Architecture level concurrency model Complex Efficient use of parallelism Simple Usable by programmers Architecture-aware Concurrency Analysis

Effect of Memory Model Ensures mutual exclusion if architecture supports SC memory Most architectures do not enforce ordering of accesses to different memory locations  Does not ensure mutual exclusion under weaker models Ordering can be enforced using “fence” instructions  Insert MEMBAR between lines 1 and 2 to ensure mutual exclusion 1. flag1 = 1; 2. if (flag2 == 0) crit. sect. 1. flag2 = 1; 2. if (flag1 == 0) crit. sect. thread 1 thread 2 Initially flag1 = flag2 = 0

Relaxed Memory Models  A large variety of models exist; a good starting point: Shared Memory Consistency Models: A tutorial IEEE Computer 96, Adve & Gharachorloo  How to relax memory order requirement?  Operations of same thread to different locations need not be globally ordered  How to relax write atomicity requirement?  Read may return value of a write not yet globally visible  Uniprocessor semantics preserved  Typically defined in architecture manuals (e.g. SPARC manual)

Unusual Effects of Memory Models Possible on TSO/SPARC  Write to A propagated only to local reads to A  Reads to flags can occur before writes to flags Not allowed on IBM 370  Read of A on a processor waits till write to A is complete flag1 = 1; A = 1; reg1 = A; reg2 = flag2; thread 1 thread 2 Initially A = flag1 = flag2 = 0 flag2 = 1; A = 2; reg3 = A; reg4 = flag1; Result reg1 = 1; reg3 = 2; reg2 = reg4 = 0

Memory Model Specifications in Practice Intel Architecture Manual (2008) Intel 64 memory ordering obeys following principles 1. Loads are not reordered with other loads 2. Stores are not reordered with other stores 3. Stores are not reordered with older loads 4. Loads may be reordered with older stores to different locations but not with older stores to same locations 4 more rules + Illustrative examples

Which Memory Model should a Verifier use? TSO PSO IA-32Alpha Relaxed RMO 390 SC

Formalization of Relaxed  Program Order: x < p y if x and y are instructions belonging to the same thread and x appears before y  Execution over a set X of accesses is correct wrt Relaxed if there exists a total order < over X such that 1.If x < p y, and both x and y are accesses to the same address, and y is a store, then x < y must hold 2.For a load l and a store s visible to l, either s and l have same value, or there exists another store s’ visible to l with s < s’ A store s is visible to load l if they are to the same address and either s < l or s < p l (i.e. stores are locally visible)  Constraint-based specification that can be easily encoded in logical formulas

Verification Target: Concurrent Data Structures Low-level high-performance concurrency libraries are essential infrastructure for multi-core programming Intel Threading Building Blocks Java Concurrency Library Challenging and tricky code Sets, queues, trees, hash-tables Designing such algorithms is publishable research! Subtle bugs in algorithms and/or implementation Libraries released by Sun Published code in textbooks Complexity not in # of lines of code but in concurrent interactions

Non-blocking Lock-free Queue Michael and Scott, 1996 boolean dequeue(queue *queue, value *pvalue) { node *head; node *tail; node *next; while (true) { head = queue->head; tail = queue->tail; next = head->next; if (head == queue->head) { if (head == tail) { if (next == 0) return false; cas(&queue->tail, tail, next); } else { *pvalue = next->value; if (cas(&queue->head, head, next)) break; } delete_node(head); return true; } Queue is being possibly updated concurrently Atomic compare-and-swap for synchronization Fences must be inserted to assure correctness on weak memory models

Bounded Model Checker Pass: all executions of the test are observationally equivalent to a serial execution Fail: CheckFence Memory Model Axioms Inconclusive: runs out of time or memory

Why symbolic test programs? 1) Make everything finite  State is unbounded (dynamic memory allocation)... is bounded for individual test  Checking sequential consistency is undecidable (AMP 96)... is decidable for individual test 2) Gives us finite instruction sequence to work with  State space too large for interleaved system model.... can directly encode value flow between instructions  Memory model specified by axioms.... can directly encode ordering axioms on instructions

Correctness Condition Data type implementations must appear sequentially consistent to the client program: the observed argument and return values must be consistent with some interleaved, atomic execution of the operations. enqueue(1) dequeue() -> 2 enqueue(2) dequeue() -> 1 enqueue(1) enqueue(2) dequeue() -> 1 dequeue() -> 2 Observation Witness Interleaving

Tool Architecture C code Symbolic Test Trace Symbolic test gives exponentially many executions (symbolic inputs, dynamic memory allocation, ordering of instructions). CheckFence solves for “incorrect” executions. Memory model

Example: Memory Model Bug Processor 1 links new node into list Processor 2 reads value at head of list --> Processor 2 loads uninitialized value... 3 node->value = 2;... 1 head = node;... 2 value = head->value;... Processor 1 reorders the stores! memory accesses happen in order adding a fence between lines on left side prevents reordering 1 23 head

TypeDescriptionLOCSource QueueTwo-lock queue80M. Michael and L. Scott (PODC 1996) QueueNon-blocking queue98 SetLazy list-based set141Heller et al. (OPODIS 2005) SetNonblocking list174T. Harris (DISC 2001) Deque“snark” algorithm159D. Detlefs et al. (DISC 2000) LL/VL/SCCAS-based74M. Moir (PODC 1997) LL/VL/SCBounded Tags198 Algorithms Analyzed

# Fences inserted 2 known 1 unknown regular bugs Store 2 4 Load Dependent Loads Aliased Loads Bounded Tags CAS-based fixed “snark” original “snark” Nonblocking list Lazy list-based set Non-blocking queue Two-lock queue Description Deque LL/VL/SC Deque Set Queue Type Results  snark algorithm has 2 known bugs  lazy list-based set had a unknown bug (missing initialization; missed by formal correctness proof [CAV 2006] because of hand-translation of pseudocode)  Many failures on relaxed memory model inserted fences by hand to fix them small testcases sufficient for this purpose

Ongoing Work Generating litmus tests for contrasting memory models (CAV 2010)  Developing and understanding formal specs of hardware memory models is challenging (frequent revisions, subtle differences…)  Two distinct styles: operational and axiomatic  Tool takes two specs and automatically finds a litmus test (small multi-threaded program) that demonstrates observable difference between the two  Litmus tests upto a specified bound systematically explored (with many reductions built in to reduce # of explored tests)  Feasibility demonstrated by debugging/contrasting existing specs Open question: Is there a bound on size of litmus tests needed to contrast two memory models (from a well- defined class of models)

Memory-model Aware Software Verification A problem of practical relevance Formal approaches can be potentially useful Not well-studied