Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC.

Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC Task 1031.001. Doctoral Dissertation Defense

2 Memory architectures - more aggressive Central Problem – shared memory consistency models - Need a clear specification of memory ordering rules - Need an executable version of memory ordering rules - Need a method to analyze thread executions against the rules Load/store Data dependence Semaphore Memory fence Load-acquire/store-release Write atomicity Motivation Multithreaded software – popular, BUT hard to analyze - Thread libraries: e.g., P-thread, Win32, Solaris - Language level support of threads: e.g., Java

3 What Is a Memory Model? It defines the legal orderings of memory operations that can be perceived at the user level CPU memory st a,1; st b,1; ld r1,b; ld r2,a; st a,1 ; st.rel b,1; ld.acq r1,b; ld r2,a; CPU memory Example (Itanium assembly code, initially: a = b = 0) Can’t observe 0 store/load less restriction store-release/load-acquire more restriction 0 is OK

4 Classical Memory Models 1.Common total order 2.Program order 3.Read sees the “latest” write Sequential Consistency (SC) Other Weaker Models: Parallel Random Access Memory (PRAM), Coherence, Causal Consistency, Processor Consistency, Release Consistency, Lazy Release Consistency, Location Consistency, and more … memory They execute as if connected to a single memory through a non-deterministic switch Non-operational View: Operational View:

5 Industrial Memory Models The Intel Itanium® Memory Model Intel application note contains more than 30 pages of semi-formal rules English + large amount of special notations Many non-obvious consequences Use litmus tests to illustrate properties Cannot automatically execute litmus tests Use pencil-and-paper reasoning Example:

6 Language Level Memory Models Original JMM: Chapter 17 of Java Language Specification Poorly understood Flawed - too weak (may introduce security hole) - too strong (prevents common optimizations) Currently under revision (JSR-133) - Extensive discussions for more than 3 years - Several replacement proposals - Issues still remain Example: The Java Memory Model (JMM)

7 Why Does a Memory Model Matter? Initially, flag1 = flag2 = false, turn = 0. Thread 1 Thread 2 flag1 = true; turn = 2; while (turn == 2 && flag2) ; flag1 = false; flag1 = true; turn = 2; while (turn == 2 && flag2) ; flag1 = false; flag2 = true; turn = 1; while (turn == 1 && flag1) ; flag2 = false; flag2 = true; turn = 1; while (turn == 1 && flag1) ; flag2 = false; Can both threads enter the critical section simultaneously? For sequential consistency: No (the “intended behavior” is guaranteed) For many weaker models: Yes (the algorithm would be broken) Example: Peterson’s Algorithm for Mutual Exclusion

8 Do Programmers Really Care? Another example: Double-Checked Locking for Singleton creation class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } return helper; } Only use locking as needed “Double-check” the reference

9 Broken Under the Current JMM class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } return helper; } Only use locking as needed “Double-check” the reference Problem: Broken under the JMM! - on weak architectures - with race conditions - reference can be “visible” before constructor completes Can’t guarantee Helper is fully constructed!

10 Problems with Previous Approaches Virtually for all industrial weak memory models They don’t have formal specifications For those that do have a formal spec on paper They can’t be executed For those that have a machine-readable formal spec They use a “state machine” approach that -employ architecture-specific data structures -cannot be decomposed into orthogonal components -have not been verified against higher level rules No support for verifying “programmer expectations” in multithreaded software

11 Analysis of Multithreaded Software Intra-procedural Inter-procedural Inter-threadIntra-thread More precise Memory-model insensitive More Scalable My thesis work Memory-model sensitive

12 Contributions Operational style framework - UMM Applications: Language level memory model issues Applications: Prototype tools based on various solvers: CLP, SAT, QBF Incremental SAT solving; Different encoding Intel Itanium Memory Model, Classical memory models Execution validation Race detection Atomicity verification Operational Specification Method Axiomatic Specification Method Constraint Solving Method Concurrency Analysis Non-Operational style framework - Nemos Applications: Java Memory Model, Classical memory models

13 Operational Approach: UMM 1. Supports formal verification  Integrates a model checker (Murphi)  Inspired by Park & Dill’s work on Sparc 2. Employs a generic memory abstraction  To eliminate architecture-specific complexities  Uniform notation  A parameterized method Uniform Memory Model

14 UMM Abstract Machine LIB – Local Instruction Buffer GIB – Global Instruction Buffer LIB j LIB i Thread j Thread i GIB - Only two layers - GIB can grow as needed Key insight : make it easy to configure program order and visibility order

15 General Strategy in UMM Enabling mechanism - Program order may be relaxed to enable - certain interleaving - Controlled via bypassing table Filtering mechanism - Visibility order constructed from GIB following - proper ordering requirements - Enforced in read selection rules

16 UMM Example: Sequential Consistency EventConditionAction read  i  LIB t(i) : ready(i)  op(i) = Read  (  w  GIB: legalWrite(i, w)) i.data := data(w); LIB t(i) := delete(LIB t(i), i); write  i  LIB t(i) : ready(i)  op(i) = Write GIB := append(GIB, i); LIB t(i) := delete(LIB t(i), i); Transition Table ready(i)    j  LIB t(i) : pc(j) < pc(i)  BYPASS[op(j)][op(i)] = No legalWrite(r, w)  op(w) = Write  var(w) = var(r)  (   w’  GIB : op(w’) = Write  var(w’) = var(r)  time(r) > time(w’)  time(w’) > time(w)) Program order Visibility order

17 Non-Operational Approach: Nemos Desired Features Easy to understand, flexible Precise Compositional, modular Executable Solutions Declarative (axiomatic) Predicate logic “Higher order” logic Make “hidden” rules explicit Key insights (1)Make the rules higher order - pass down the order relation through all the rules - Compositional, reusable, scalable, easy to compare (2) Make all rules explicit - Executable using a constraint-programming system (Non-operational yet Executable Memory Ordering Specifications)

18 legal ops order  requireProgramOrder ops order  requireReadValue ops order  requireWeakTotalOrder ops odder  requireTransitiveOrder ops order  requireAsymmetricOrder ops order Nemos Example: Sequential Consistency Formal Definition of SC - Program order requireTransitiveOrder ops order   i, j, k  ops. (order i j  order j k)  order i k requireProgramOrder ops order   i, j  ops. (t i = t j  pc i < pc j)  (t i = t_init  t j  t_init)  order i j - Common total order - Read sees “latest” write order is repeatedly refined Hidden rules are explicit (ops is the execution; order is the ordering relation)

19 The Itanium Memory Ordering Rules legal ops order  requireLinearOrder ops order  requireWriteOperationOrder ops order  requirePO ops odder  requireMemoryDataDependence ops order  requireDataFlowDependence ops order  requireCoherence ops order  requireReadValue ops order  requireAtomicWBRelease ops order  requireNoUCBypass ops order legal ops order  requireLinearOrder ops order  requireWriteOperationOrder ops order  requirePO ops odder  requireMemoryDataDependence ops order  requireDataFlowDependence ops order  requireCoherence ops order  requireReadValue ops order  requireAtomicWBRelease ops order  requireNoUCBypass ops order

20 –requireLinearOrder Irreflexive Transitive Total Asymmetric –requireWriteOperationOrder Local/Remote case Remote/Remote case –requireProgramOrder Acquire Rule Release Rule Fence Rule –requireMemoryDataDependence MD:RAW MD:WAR MD:WAW –requireDataFlowDependence DF:RAW DF:WAR DF:WAW – requireCoherence Local/Local case Remote/Remote case – requireReadValue ValidWr ValidLocalWr ValidRemoteWr ValidDefaultWr ValidRd – requireAutomicWBRelease – requireSequentialUC –RAR Rule –RAW Rule –WAR Rule –WAW Rule – requireNoUCBypasss Specification Hierarchy for Itanium

21 Execution Validation: Memory Model Specification Constraints How to Make an Axiomatic Specification Executable? SAT UNSAT Solver CLP SAT QBF Test Program validateExecution ops   order. legal ops order - Effective for revealing critical properties - Effective for verifying common programming patterns

22 Implementation in FD-Prolog is straightforward Universal quantification handled via enumeration Existential quantification handled via backtracking Built-in constraint solver from FD-Prolog: - logical variables - Finite-domain (FD) variables Using Constraint Logic Programming (CLP)

23 How to Encode the Ordering Relation? Given a test program with N operations, use a 2D precedence matrix with N 2 constraint variables Interpret the symbolic execution, impose constraints to the 2D matrix When interpretation finishes, x values reveal latitude in weak order When an x changes to a 1, an attempt to set it to 0 later triggers backtracking x x x j i Values of entry Mij: 1: i is ordered before j 0: i is not ordered before j x: value not bound yet Precedence matrix M nn Encoding: The Method:

24 Example of Prolog Implementation requireProgramOrder ops order   i, j  ops. (t i = t j  pc i < pc j)  (t i = t_init  t j  t_init)  order i j requireProgramOrder(Ops,Order):- for_each_elem(Ops,Order,doProgramOrder). elem_prog(doProgramOrder,Ops,Order,I,J):- nth(I,Ops,Oi), nth(J,Ops,Oj), p(Oi,P_i), p(Oj,P_j), pc(Oi,PC_i), pc(Oj,PC_j), length(Ops,N), matrix_elem(Order,N,I,J,Oij), (T_i #= T_j #/\ PC_i #< PC_j) #\/ T_i #= 0 #/\ T_j #\= 0) #=> Oij. Formal Specification (e.g., requireProgramOrder) SICStus Prolog Code

25 Interactive and Incremental Analysis Initially, a = b = 0. P1 st a,1; st b,1; P1 st a,1; st b,1; P2 ld r1,b; ld r2,a; P2 ld r1,b; ld r2,a; Can r1 = 1 and r2 = 0? P1 P2 (1) st_local(a,1); (7) ld(1,b); (2) st_remote1(a,1); (8) ld(0,a); (3) st_remote2(a,1); (4) st_local(b,1); (5) st_remote1(b,1); (6) st_remote2(b,1); P1 P2 (1) st_local(a,1); (7) ld(1,b); (2) st_remote1(a,1); (8) ld(0,a); (3) st_remote2(a,1); (4) st_local(b,1); (5) st_remote1(b,1); (6) st_remote2(b,1); Itanium Test ProgramExecution (ops) 0 1 1 x x x x x 0 0 1 x x x x x 0 0 0 x x x x 0 x x x 0 1 1 1 x x x x 0 0 1 1 x x x x 0 0 0 1 x x x x 0 0 0 0 x x x 1 x x x x 0 Result: legal 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Order satisfying all constraintsAn instantiated Order Interleaving: 8 4 5 6 7 1 2 3 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 1 0 1 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

26 The SAT/QBF Approach Initially, we “retro-fit” our Prolog version with SAT- generating code - Showed speed improvement in constraint solving, BUT … - Still slow in CNF generation - Very difficult to debug So we re-engineered our tool: (Done by Prof. Ganesh Gopalakrishnan) - “Stamping out” a finite execution as a QBF formula - “Stamping out” a finite execution as a CNF formula - Experimenting different encoding method: nn vs. nlogn - Check pointing SAT generation

27 Gist of Results 1. SAT seems to be better than QBF 2. The nn encoding method is better than nlogn - d espite using more bits - many unit clauses, good for SAT solving 2. Check pointing method does pay-off up to 64 tuples 3. We can easily handle 128 operations 4. Latest result: completed Intel-provided test run (experiment done by Hemanthkumar Sivaraj) - test contains 500 Itanium memory operations - had to suppress the total-order constraint, UNSAT - takes 10 sec to generate SAT instance; 0.1 sec to solve - still lots of room for improvement

28 How to Verify Programmer Expectations? Program properties e.g., race / atomicity (2) Model correctness properties as additional constraints (3) Reduce a verification problem to a constraint satisfaction problem and solve it automatically SAT UNSAT Solver Test Program Constraints (1) Define both intra-thread and inter-thread semantics as constraints Program semantics + Memory model semantics

29 Race Detection What’s a data-race? Informally: conflicting and concurrent accesses Initially, a = b = 0. Thread 1 r1 = a; if (r1 > 0) b = 1; Thread 1 r1 = a; if (r1 > 0) b = 1; Thread 2 r2 = b; if (r2 > 0) a = 1; Thread 2 r2 = b; if (r2 > 0) a = 1; Is this program race-free? Control flow interwoven with memory consistency requirements Hence, the question depends on the memory model - Under SC, this program is race-free - Under a weaker model, this program might contain races Are these two instructions conflicting and concurrent?

30 Constraints for Control Flow Treat control operations similar to memory operations –Imagine “assigns” and “uses” of “control variables” Add an auxiliary control variable c k for each branch statement k, and convert the if-statement to an auxiliary assign of c k –E.g. if(r1>0) becomes c1=r1>0 Every op k has a path predicate ctrExpr –K is a use of those control variables in ctrExpr k is feasible if ctrExpr evaluates to ture Feasibility of ops are checked when setting the rules

31 Data and Control Dependence Data/control flow can be treated similar to global read value rule, i.e., a read should see the “latest” write Global Reads: for all r = x, exists a x = … Local Reads: for all x = r, exists a r = … Control Reads: for all op that depends on c, exists a c = … requireReadValue ops order  globalReadValue ops order  localReadValue ops order  controlReadValue ops order

32 How to Formalize Data-Race? detectDataRace ops   scOrder, hbOrder. legalSC ops scOrder  requireHbOrder ops hbOrder  mapConstraints ops hbOrder scOrder  existDataRace ops hbOrder requireHbOrder ops hbOrder  requireProgramOrder ops hbOrder  requireSyncOrder ops hbOrder  requireTransitiveOrder ops hbOrder existDataRace ops hbOrder   i, j  ops. conflictingAccess i j  ¬ (hbOrder i j)  ¬ (hbOrder j i)

33 Atomicity Verification What’s Atomicity?  Informally: a block of code executed atomically  Neither a necessary nor a sufficient condition for race-freedom Our approach:  Annotate the atomic block with AtomicEnter and AtomicExit  Verify it automatically  Our definition is generic, can be fine-tuned

34 Constraints for Atomicity verifyAtomicity ops   order. legalSC ops order  existsAtomicityViolation ops order existsAtomicityViolation ops order   i, j, k  ops. matchedAtomicPair i j  (t k  t i)  ¬ (order k i)  ¬ (order j k)

35 Conclusion My thesis addressed the following issues - How to make memory ordering rules clear and executable? -How to analyze thread executions against these rules? Our methods have been shown to be practical - A wide range of academic memory models as well as real-world models (Itanium, JMM) - Validation of test cases far exceeded others’ both in speed and scale - Being applied for post-silicon verification in industry Many “customers” can benefit from our methods - Software developers, compiler writers, system designers

36 Publications Analyzing the CRF Java Memory Model (APSEC’01) Specifying Java Thread Semantics Using a Uniform Memory Model (JGI’02) UMM: An Operational Memory Model Specification Framework with Integrated Model Checking Capability (CCPE) Operational Specification Method Axiomatic Specification Method Constraint Solving Method Concurrency Analysis Analyzing the Intel Itanium Memory Ordering Rules Using Logic Programming and SAT(CHARME’03) Nemos: A Framework for Axiomatic and Executable Specifications of Memory Consistency Models (IPDPS’04) A Constraint-Based Approach for Specifying Memory Consistency Models (sent to TPLP) QB or not QB: An Efficient Execution Verification Tool for Memory Orderings (sent to CAV) Rigorous Concurrency Analysis of Multithreaded Programs (sent to ISSTA)

37 Continuing Research Opportunities  Scale-up our approach even further - Give up certain precision - Compositional methods - Create assertion language to help abstraction  Improve solving algorithms - Exploit the structural information  “Memory-model-sensitive” compilers - Code synthesis, optimization  Other application domains - Security, embedded systems

Thank You ! The dissertation is available at http://www.cs.utah.edu/~yyang/papers/thesis.pdf The prototype tools are available at http://www.cs.utah.edu/~yyang/research.html

Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC.

Similar presentations

Presentation on theme: "Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC.

Similar presentations

Presentation on theme: "Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC."— Presentation transcript:

Similar presentations

About project

Feedback