Analysis of Multithreaded Programs Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Analysis of Multithreaded Programs Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

What is a multithreaded program? Multiple Parallel Threads Of Control Shared Mutable Memory read write Lock Acquire and Release NOT general parallel programs No message passing No tuple spaces No functional programs No concurrent constraint programs NOT just multiple threads of control No continuations No reactive systems

Why do programmers use threads? Performance (parallel computing programs) Single computation Execute subcomputations in parallel Example: parallel sort Program structuring mechanism (activity management programs) Multiple activities Thread for each activity Example: web server Properties have big impact on analyses

Practical Implications Threads are useful and increasingly common POSIX threads standard for C, C++ Java has built-in thread support Widely used in industry Threads introduce complications Programs viewed as more difficult to develop Analyses must handle new model of execution Lots of interesting and important problems!

Outline Examples of multithreaded programs Parallel computing program Activity management program Analyses for multithreaded programs Handling data races Future directions

Parallel Sort

Example - Divide and Conquer Sort 47615382

82536147 47615382 Divide

28531674 82536147 47615382 Example - Divide and Conquer Sort Conquer Divide

Example - Divide and Conquer Sort 28531674 Conquer 82536147 Divide 47615382 41673258 Combine

Example - Divide and Conquer Sort 28531674 Conquer 82536147 Divide 47615382 41673258 Combine 21346578

Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Solve Subproblems in Parallel

Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Recursively Solve Subproblems in Parallel

Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Recursively Solve Subproblems in Parallel Combine Results in Parallel

“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n);

“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Divide array into subarrays and recursively sort subarrays in parallel

“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Subproblems Identified Using Pointers Into Middle of Array 47615382 d d+n/4 d+n/2 d+3*(n/4)

“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 74165328 d d+n/4 d+n/2 d+3*(n/4) Sorted Results Written Back Into Input Array

“Merge Sorted Quarters of d Into Halves of t” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 74165328 41673258 d t t+n/2

“Merge Sorted Halves of t Back Into d” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 21346578 41673258 d t t+n/2

“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 47615382 d d+n

“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 47165382 d d+n

Key Properties of Parallel Computing Programs Structured form of multithreading Parallelism confined to small region Single thread coming in Multiple threads exist during computation Single thread going out Deterministic computation Tasks update disjoint parts of data structure in parallel without synchronization May also have parallel reductions

Web Server

Accept new connection Start new client thread Main Loop Client Threads

Accept new connection Start new client thread Main Loop Client Threads Wait for input Produce output

Accept new connection Start new client thread Main Loop Client Threads Wait for input Produce output Wait for input

Accept new connection Start new client thread Main Loop Client Threads Wait for input Produce output Wait for input Produce output

Accept new connection Start new client thread Main Loop Wait for input Produce output Wait for input Produce output Wait for input Produce output Client Threads

Main Loop Class Main { static public void loop(ServerSocket s) { c = new Counter(); while (true) { Socket p = s.accept(); Worker t = new Worker(p,c); t.start(); } Accept new connection Start new client thread

Worker threads class Worker extends Thread { Socket s; Counter c; public void run() { out = s.getOutputStream(); in = s.getInputStream(); while (true) { inputLine = in.readLine(); c.increment(); if (inputLine == null) break; out.writeBytes(inputLine + "\n"); } Wait for input Increment counter Produce output

Synchronized Shared Counter Class Counter { int contents = 0; synchronized void increment() { contents++; } Acquire lock Increment counter Release lock

Simple Activity Management Programs Fixed, small number of threads Based on functional decomposition User Interface Thread Device Management Thread Compute Thread

Key Properties of Activity Management Programs Threads manage interactions One thread per client or activity Blocking I/O for interactions Unstructured form of parallelism Object is unit of sharing Mutable shared objects (mutual exclusion) Private objects (no synchronization) Read shared objects (no synchronization) Inherited objects passed from parent to child

Common Properties Dynamic thread creation Many threads execute same code Threads larger than procedures Data accessed via pointers or references Concept of data ownership Passed from parent thread to child thread Acquired with lock operations Private data that never escapes creator

Why analyze multithreaded programs? Discover or certify absence of errors (multithreading introduces new kinds of errors) Discover or verify application-specific properties (interactions between threads complicate analysis) Enable optimizations (new kinds of optimizations with multithreading) (complications with traditional optimizations)

Classic Errors in Multithreaded Programs Deadlocks Data Races

Deadlock Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Deadlock if circular waiting for resources (typically mutual exclusion locks)

Deadlock Threads 1 and 2 Start Execution Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Deadlock if circular waiting for resources (typically mutual exclusion locks)

Deadlock Thread 1 acquires lock l Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Deadlock if circular waiting for resources (typically mutual exclusion locks)

Deadlock Thread 2 acquires lock m Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Deadlock if circular waiting for resources (typically mutual exclusion locks)

Deadlock Thread 1 holds l and waits for m while Thread 2 holds m and waits for l Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Deadlock if circular waiting for resources (typically mutual exclusion locks)

Data Races A[i] = v; A[j] = w; || A[j] = w A[i] = v Data race Data race if two parallel threads access same memory location and at least one access is a write A[j] = w A[i] = v No data race

Synchronization and Data Races Thread 1: lock(l); x = x + 1; unlock(l); Thread 2: lock(l); x = x + 2; unlock(l); No data race if synchronization separates accesses Synchronization protocol: Associate lock with data Acquire lock to update data atomically

Why are data races errors? Exist correct programs which contain races But most races are programming errors Code intended to execute atomically Synchronization omitted by mistake Consequences can be severe Nondeterministic, timing-dependent errors Data structure corruption Complicates analysis and optimization

New Optimization Opportunities from Multithreading Lock Elimination Lock Coarsening Barrier Elimination Data Layout Communication Optimizations

Lock elimination for private data Integer i; lock(i); i.value++; unlock(i); Lock Elimination for Private Data Blanchet– OOPSLA99 Bogda, Hoelzle– OOPSLA99 Choi, Bupta, Serrano, Sreedhar, Midkiff – OOPSLA99 Whaley, Rinard – OOPSLA99 Ruf – PLDI 2000 i accessible to only one thread Integer i; i.value++; Lock Elimination for Nested Data Diniz, Rinard – JPDC 1998 Aldrich, Chambers, Sirer, Eggers – SAS 1999

Barrier Elimination Tseng – PPoPP 1995 Analysis Problem No interthread dependences across barrier

Lock Coarsening Integer i; lock(i); i.value++; unlock(i); … lock(i); i.value++; unlock(i); Integer i; lock(i); i.value++; … i.value++; unlock(i); Key Challenge Managing trade off between serialization and synchronization overhead Plevyak, Chien – POPL 1995 Diniz,Rinard – POPL 1997, PLDI 1997

Overview of Analyses for Multithreaded Programs Key problem: interactions between threads Flow-insensitive analyses Escape analyses Dataflow analyses Explicit parallel flow graphs Interference summary analysis State space exploration

Escape Analyses

void compute(d,e) ———— void multiplyAdd(a,b,c) ————————— void multiply(m) ———— void add(u,v) —————— void main(i,j) ——————— void evaluate(i,j) —————— void abs(r) ———— void scale(n,m) —————— Program With Allocation Sites

void compute(d,e) ———— void multiplyAdd(a,b,c) ————————— void multiply(m) ———— void add(u,v) —————— void main(i,j) ——————— void evaluate(i,j) —————— void abs(r) ———— void scale(n,m) —————— Program With Allocation Sites Correlate lifetimes of objects with lifetimes of computations

void compute(d,e) ———— void multiplyAdd(a,b,c) ————————— void multiply(m) ———— void add(u,v) —————— void main(i,j) ——————— void evaluate(i,j) —————— void abs(r) ———— void scale(n,m) —————— Program With Allocation Sites Correlate lifetimes of objects with lifetimes of computations Objects allocated at this site Do not escape computation of this method

Classical Approach Reachability analysis If an object is reachable only from local variables of current procedure, then object does not escape that procedure

Escape Analysis for Multithreaded Programs Extend analysis to recognize when objects do not escape to parallel thread – OOPSLA 1999 Blanchet Bogda, Hoelzle Choi, Bupta, Serrano, Sreedhar, Midkiff Whaley, Rinard Analyze interactions to recapture objects that do not escape multithreaded subcomputation Salcianu, Rinard – PPoPP 2001

Applications Synchronization elimination Stack allocation Region-based allocation Data race detection Eliminate accesses to captured objects as source of data races

Analysis via Parallel Flow Graphs

Parallel Flow Graphs p = &x *p = &y p = &z q = &a *q = &b xy pz qab Thread 1Thread 2 Intrathread control-flow edges Interthread control-flow edges Heap Basic Idea: Do dataflow analysis on parallel flow graph

Infeasible Paths Issue p = &x *p = &y p = &z q = &a *q = &b Thread 1Thread 2 Infeasible Path xy pz qab Heap Infeasible paths cause analysis to lose precision Because of infeasible path, analysis thinks xz

Analysis Time Issue Potential Solutions Partial Order Approaches p = &x *p = &y p = &z q = &a *q = &b Thread 1Thread 2

Analysis Time Issue Potential Solutions Partial Order Approaches – remove edges between statements in independent regions p = &x *p = &y p = &z q = &a *q = &b Thread 1Thread 2

Analysis Time Issue Potential Solutions Partial Order Approaches – remove edges between statements in independent regions How to recognize independent regions? Seems like might need analysis… p = &x *p = &y p = &z q = &a *q = &b Thread 1Thread 2

Potential Solutions Partial Order Approaches Control flow/synchronization analysis Synchronization may prevent m from immediately preceding n in execution If so, no edge from m to n No edges between these statements y = 1 lock(a) y = y + w x = x + 1 unlock(a) x = 1 lock(a) x = x + v y = y + 1 unlock(a) Analysis Time Issue

Experience Lots of research in field over last two decades Deadlock detection Data race detection Control analysis for multithreaded programs (mutual exclusion, precedence properties) Finite-state properties Scope – simple activity management programs Inlinable programs Bounded threads and objects

References FLAVERS Dwyer, Clarke - FSE 1994 Naumovich, Avrunin, Clarke – FSE 1999 Naumovich, Clarke, Cobleigh – PASTE 1999 Masticola, Ryder ICPP 1990 (deadlock detection) PPoPP 1993 (control-flow analysis) Duesterwald, Soffa - TAV 1991 Handles procedures Blieberger, Burgstaller, Scholz – Ada Europe 2000 Symbolic analysis for dynamic thread creation Scope Inlinable programs Bounded objects and threads

Interference Approaches

Dataflow Analysis for Bitvector Problems Knoop, Steffen, Vollmer – TOPLAS 1996 Bitvector problems Dataflow information is a vector of bits Transfer function for one bit does not depend on values of other bits Examples Reaching definitions Available expressions As efficient and precise as sequential version!

Available Expressions Example a = x + y c = x + y x = b b = x + y d = x + y Where is x+y available? Available here! parbegin parend Available here! Not available here (killed by x = b) ???

Three Interleavings a = x + y c = x + y x = b b = x + y d = x + y Available here! a = x + y c = x + y x = b b = x + y d = x + y a = x + y c = x + y x = b b = x + y d = x + y Not available here (killed by x = b) Available here!

Available Expressions Example a = x + y c = x + y x = b b = x + y d = x + y Where is x+y available? Available here! Not available here (killed by x = b) Not available here (killed by x = b) parbegin parend Available here!

Key Concept: Interference x=b interferes with x+y x+y not available at any statement that executes in parallel with x=b Nice algorithm: Precompute interference Propagate information along sequential control- flow edges only! Handle parallel joins specially a = x + y c = x + y x = b b = x + y d = x + y parbegin parend

Limitations No procedures Bitvector problems only (no pointer analysis) But can remove these limitations Integrate interference into abstraction Adjust rules to flow information from end of thread to start of parallel threads Iteratively compute interactions Summary-based approach for procedures Lose precision for non-bitvector problems

k = j Pointer Analysis for Multithreaded Programs Dataflow information is a triple : C = current points-to information I = interference points-to edges from parallel threads E = set of points-to edges created by current thread Interference: I k = U E j where t 1 … t n are n parallel threads Invariant: I  C Within each thread, interference points-to edges are always added to the current information

Analysis for Example parbegin parend p = &x; p = &y;*p = 1; *p = 2;

Analysis for Example parbegin parend p = &x; p = &y;*p = 1; Where does p point to at this statement? *p = 2;

Analysis for Example parbegin parend p = &x; p = &y;*p = 1; *p = 2; px, , ><px

Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , ><  px px ><px ><  px

Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , ><  px px ><px ><  px ><  px

Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , ><  py ><py px px ><px ><  px ><  px

Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , ><  py ><py px px ><px p,  >< x y py,

Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , ><  py ><py px px ><px p,  >< x y py, p>< x y py,

Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , ><  py ><py px px ><px p,  >< x y py, p >< x y py,

Analysis of Thread Joins parbegin parend p = &x; p = &y;*p = 1; *p = 2; p, , >< x y py ><py py p,  >< x y py, px, , ><px ><  px p,  >< x y py,

Final Result parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , ><  p >< x y py ><py px py px ><px p,  >< x y py, p >< x y py,

General Dataflow Equations parbegin parend Parent Thread Thread 2Thread 1 Parent Thread CE, I,>< C U E 2 , I U E 2,>< U C U E 1 , I U E 1,<> C1C1 E2E2 >< C 1 C 2 E U E 1 U E 2, I,>< C1C1 E1E1, I U E 2,><

General Dataflow Equations parbegin parend Thread 2Thread 1 Parent Thread C U E 2 , I U E 2,>< C U E 1 , I U E 1,<> C2C2 E2E2 >< C1C1 E1E1, I U E 2,>< U C 1 C 2 E U E 1 U E 2, I,>< Parent Thread CE, I,><

Compositionality Extension Compositional at thread level Analyze each thread once in isolation Abstraction captures potential interactions Compute interactions whenever need information Combine with escape analysis to obtain partial program analysis

Experience & Expectations Limited implementation experience Pointer analysis (Rugina, Rinard – PLDI 2000) Compositional pointer and escape analysis (Salcianu, Rinard – PPoPP 2001) Small but real programs Promising approach Scales like analyses for sequential programs Partial program analyses

Issues Developing abstractions Need interference abstraction Need fork/join rules Need interaction analysis Analysis time Precision for richer abstractions

State Space Exploration

State Space Exploration for Multithreaded Programs Thread 1: lock(a) lock(b) t = x x = y y = t unlock(b) unlock(a) Thread 2: lock(b) lock(a) s = y y = x x = s unlock(a) unlock(b) /* a controls x, b controls y */ lock a, b; int x, y;

State Space Exploration 2: lock(b)1: lock(b)2: lock(b)1: lock(a) 2: lock(b) Deadlocked States

Strengths Conceptually simple (at least at first…) Harmony with other areas of computer science (simple search often beats more sophisticated approaches) Can test for lots of properties and errors Lots of technology and momentum in this area Packaged model checkers Big successes in hardware verification

Challenges Analysis time Unbounded program features Dynamic thread creation Dynamic object creation Potential solutions Sophisticated abstractions (increases complexity…) Cousot, Cousot - 1984 Chow, Harrison – POPL 1992 Yahav – POPL 2001 Granularity coarsening/partial-order techniques Chow, Harrison – ICCL 1994 Valmari – CAV 1990 Godefroid, Wolper – LICS 1991

Granularity Coarsening x = 1 y = 2 a = 3 b = 4 x = 1 y = 2 x = 1 y = 2 a = 3 b = 4 a = 3 b = 4 x = 1 y = 2 a = 3 b = 4 Basic Idea: Eliminate Analysis of Interleavings from Independent Statements

Issue: Aliasing x = 1*p = 3 Are these two statements independent? Depends… Potential Solution: Layered analysis (Ball, Rajamani - PLDI 2001) Potential Problem: Information from later analyses may be needed or useful in previous analyses Model Extraction Model Checking Pointer Analysis PropertiesProgram

Experience Program analysis style Has been used for very detailed properties Analysis time issues limit to tiny programs Explicit model extraction/model checking style Still exploring how to work for software in general, not just multithreaded programs No special technology required for multithreaded programs (at first …)

Expectations In principle, approach should be quite useful Multithreaded programs typically have sparse interaction patterns Just not obvious from code Need some way to target tool to only those that can actually occur/are interesting Pointer preanalysis seems like promising approach

Application to safety problems Deadlock detection Variety of existing approaches Complex programs can have very simple synchronization behavior Ripe for model extraction/model checking Data race detection More complicated problem Largely unsolved Very important in practice

Why data races are so important Inadvertent atomicity violations Timing-dependent data structure corruption Nondeterministic, irreproducible failures Architecture effects Data races expose weak memory consistency models Destroy abstraction of single shared memory Compiler optimization effects Data races expose effect of standard optimizations Compiler can change meaning of program Analysis complications

Atomicity Violations class list { static int length=0; static list head = null; list next; int value; static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 1 length head 4

Atomicity Violations class list { static int length=0; static list head = null; list next; int value; static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 1 length head 4 insert(5) insert(6) ||

Atomicity Violations class list { static int length=0; static list head = null; list next; int value; static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 1 length head 4 insert(5) insert(6) || 5 6

Atomicity Violation Solution class list { static int length=0; static list head = null; list next; int value; Synchronized static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 2 length head 4 insert(5) insert(6) || 5 6

Analysis Complications Analysis unsound if does not take effect of data races into account Desirable to analyze program at granularity of atomic operations Reduces state space Required to extract interesting properties But must verify that operations are atomic! Complicated analysis problem Extract locking protocol Verify that program obeys protocol

Architecture Effects Weak Memory Consistency Models

y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z?

y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? y=0 x=1 z = x+y y=0 x=1 z = x+y y=0 x=1 z = x+y z = 1 z = 0 z = 1 Three Interleavings

y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? y=0 x=1 z = x+y y=0 x=1 z = x+y y=0 x=1 z = x+y z = 1 z = 0 z = 1 Three Interleavings z can be 0 or 1

y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? y=0 x=1 z = x+y y=0 x=1 z = x+y y=0 x=1 z = x+y z = 1 z = 0 z = 1 Three Interleavings z can be 0 or 1 INCORRECT REASONING!

y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? z can be 0 or 1 OR 2! Memory system can reorder writes as long as it preserves illusion of sequential execution within each thread! z = x+y y=0 x=1 Different threads can observe different orders!

Analysis Complications Interleaving semantics is incorrect No soundness guarantee for current analyses Formal semantics of weak memory consistency models still under development Maessen, Arvind, Shen – OOPSLA 2000 Manson, Pugh – Java Grande/ISCOPE 2001 Unclear how to prove ANY analysis sound… State space is larger than one might think Complicates state space exploration Complicates human reasoning

How does one write a correct program? y=0 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? Operations not reordered across synchronizations x=1 If synchronization separates conflicting actions from parallel threads Then reorderings not visible Race-free programs can use interleaving semantics z is 1 lock(l) unlock(l) lock(l) unlock(l)

Compiler Optimization Effects Standard optimizations assume single thread With interleaving semantics, optimizations may change meaning of program Even if only apply optimizations within serial parts of program! Superset of reordering effects Midkiff, Padua – ICPP 1990

Options Rethink and reimplement all compilers Lee, Padua, Midkiff – PPoPP 1999 Transform program to restore sequential memory consistency model Shasha, Snir – TOPLAS 1998 Lee, Padua – PACT 2000 No optimizations across synchronizations Java memory model (Pugh - JavaGrande 1999) Semantics no longer interleaving semantics

Program Analysis Analyze program, verify absence of data races Appealing option Unlikely to be feasible for full range of programs Reconstruct association between locks, data that they protect, threads that access data Dynamic object and thread creation References and pointers Diversity of locking protocols Whole-program analysis Exception: simple activity management programs

Eliminate races at language level Type system formalizes sharing patterns Check accesses properly synchronized Not as difficult as fully automatic approach Separate analysis of each module No need to reconstruct locking protocol Types provide locking information Limits sharing patterns program can use Key question: Is limitation worth benefit? Depends on expressiveness, flexibility, intrusiveness, perceived value of system

Standard Sharing Patterns for Activity Management Programs Private data - single thread ownership Mutual exclusion data lock protects data, acquire lock to get ownership Migrating data Ownership moves between threads in response to data structure insertions and removals Published data - distributed for read-only access

General Principle of Ownership Formalize as ownership relation Relation between data items and threads Basic requirement for reads When a thread reads a data item Must own item (but can share ownership with other threads) Basic requirement for writes When a thread writes data item Must be sole owner of item

Typical Actions to Change Ownership Object creation (creator owns new object) Synchronization operations Lock acquire (acquire data that lock protects) Lock release (release data) Similarly for post/wait, Ada accept, … Thread creation (thread inherits data from parent) Thread termination (parent gets data back) Unique reference acquisition and release (acquire or release referenced data)

Proposed Systems Monitors + copy in/copy out Concurrent Pascal (Brinch Hansen TSE 1975) Guava (Bacon, Strom, Tarafdar – OOPSLA 2000) Mutual exclusion data + private data Flanagan, Abadi – ESOP 2000 Flanagan, Freund – PLDI 2000 Mutual exclusion data + private data + linear/ownership types de Line, Fahndrich – PLDI 2001 Boyapati, Rinard – OOPSLA 2001

Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Copy model of communication Basic Approach

Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Type system ensures at most one reference to this object Extension: Unique References

Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Step One: Grab Lock Extension: Unique References

Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Step Two: Transfer Reference Extension: Unique References

Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Step Three: Release Lock Extension: Unique References

Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Result: Transferred Object Ownership Relation Changes Over Time Extension: Unique References

Prospects Remaining challenge: general data structures Objects with multiple references Ownership changes correlated with movements between to data structures Recognize insertions and deletions Language-level solutions are the way to go for activity management programs Tractable for typical sharing patterns Big impact in practice

Benefits of ownership formalization Identification of atomic regions Weak memory invisible to programmer Enables coarse-grain program analysis Promote lots of new and interesting analyses Component interaction analyses Object propagation analyses Better understanding of software structure Analysis and transformation Software engineering

What about parallel computing programs?

Parallel Computing Sharing Patterns Specialized Sharing Patterns Unsynchronized accesses to disjoint regions of a single aggregate structure Threads update disjoint regions of array Threads update disjoint subtrees Generalized reductions Commuting updates Reduction trees

Parallel Computing Prospects No language-level solution likely to be feasible Race freedom depends on arbitrarily complicated properties of updated data structures Impact of data races not as large Parallelism confined to specific algorithms Range of targeted analysis algorithms Parallel loops with dense matrices Divide and conquer programs Generalized reduction recognition

Future Directions

Integrating Specifications Past focus: discovering properties Future focus: verifying properties Understanding atomicity structure crucial Assume race-free programs Type system or previous analysis Enable Owicki/Gries style verification Assume property holds Show that each atomic action preserves it Consider only actions that affect property

Failure Containment Threads as unit of partial failure Partial executions of failed atomic actions Rollback mechanism Optimization opportunity New analyses and transformations Failure propagation analysis Failure response transformations

Model Checking Avalanche of model checking research Layered analyses for model extraction Flow-insensitive pointer analysis Initial focus on control problems Deadlock detection Operation sequencing constraints Checking finite-state properties

Steps towards practicality Java threads prompt experimentation Threads as standard part of safe language Available multithreaded benchmarks Open Java implementation platforms More implementations Interprocedural analyses Scalability emerges as key concern Directs analyses to relevant problems

Summary Multithreaded programs common and important Two kinds of multithreaded programs Parallel computing programs Activity management programs Data races as key analysis problem Programming errors Complicate analysis and transformation Different solutions for different programs Language solution for activity management Targeted analyses for parallel computing Future directions – specifications, failure containment, model checking, practical implementations

Analysis of Multithreaded Programs Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Similar presentations

Presentation on theme: "Analysis of Multithreaded Programs Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analysis of Multithreaded Programs Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Similar presentations

Presentation on theme: "Analysis of Multithreaded Programs Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology."— Presentation transcript:

Similar presentations

About project

Feedback