Download presentation

Presentation is loading. Please wait.

Published byAnna McKenzie Modified over 3 years ago

1
Determinate Imperative Programming: The CF Model Vijay Saraswat IBM TJ Watson Research Center joint work with Radha Jagadeesan, Armando Solar- Lezama, Christoph von Praun http://www.saraswat.org/cf.html

2
2 Outline Problem: Many concurrent imperative programs are determinate. Determinacy is not apparent from the syntax. Basic idea A variable is the stream of values written to it by a thread. Many examples Semantics Implementation Future work

3
3 Background: X10 Five basic themes: Partitioned address space Pervasive explicit asynchrony (Cilk-style recursive parallelism) Java base Guaranteed VM invariants Explicit, distributed VM Few language extensions = async = finish = foreach (, …, in ) Multidimensional arrays over distributions Subsumes MPI, OpenMP, SPMD languages, Cilk …

4
4 X10: clocks, clocked final data structures Clocks can be created dynamically. Activities are registered with clocks. An activity may register a newly created activity with one of its clocks. next; resumes each clock; blocks until each clock advances. This is sufficient for deadlock-freedom. Adequate for parallel operations on arrays But not dataflow Clock advances when all activities registered on it resume the clock. Operations c.resume(); next; c.drop(); Clocked final datum In each phase of the clock the datum is immutable. Read gets current value; write updates in next phase. Clocks do not introduce deadlock; clocked finals are determinate.

5
5 int clocked (c) final [0:M-1,0:N-1] G = …; finish foreach (int i,j in [1:M-1,1:N-1]) clocked (c) { for (int p in [0:TimeStep-1]) { G[i,j] = omega/4*(G[i-1,j]+G[i+1,j]+G[i,j-1]+G[i,j+1])+(1-omega)*G[i,j]; next; } Clocked final example: Array relaxation G elements are assigned to at most once in each phase of clock c. Wait for clock to advance. Takeaway: Each cell is assigned a clocked stream of immutable values. Read current value of cell. Each activity is registered on c. Write visible (only) when clock advances.

6
6 Imperative Programming Revisited Variables Value in a Box Read: fetch current value Write: change value Stability condition: Value does not change unless a write is performed Very powerful Permit repeated many- writer, many-reader communication through arbitrary reference graphs Asynchrony introduces indeterminacy May write out either 0 or 1. int x = 0; async x=1; print(x); Reader-reader, reader-writer, writer-writer conflicts.

7
7 Determinate Concurrent Imperative frameworks Asynchronous Kahn networks Nodes can be thought of as (continuous) functions over streams. Pop/peek Push Node-local state may mutate arbitrarily Concurrent Constraint Programming Tell constraints Ask if a constraint is true Subsumes Kahn networks (dataflow). Subsumes (det) concurrent logic programming, lazy functional programming Do not support arbitrary mutable variables.

8
8 Determinate Concurrent Imperative Frameworks Safe Asynchrony (Steele 1991) Parent may communicate with children. Children may communicate with parent. Siblings may communicate with each other only through commutative, associative writes (commuting writes). int x=0; finish foreach (int i in 1:N) { x += i; } print(x); // N*(N+1)/2 int x=0; finish foreach (int i in 1:N) { x += i; async print(x); } Good: Bad: Useful but limited. Does not permit dataflow synch.

9
9 The CF Basic model A shared variable is a stream of immutable values. Each activity maintains an index i + clean/dirty bit for every shared variable. Initially i=1, v[0] contains initial value. Read: If clean, block until v[i] is written and return v[i++] else return v[i-1]. Mark as clean. Write: Write into v[i++]. Mark as dirty. A read stutters (returns value in last phase) if no activity can write in this phase. E.g. for local variables. World Map=Collection of indices for an activity. Index transmission rules. Activity initialized with current world map of parent activity. On finish, world map of activity is lubbed with world map of finished activities. (clean lub dirty = clean) All programs are determinate and scheduler independent. May deadlock … nexts are not conjunctive. The clock of clocked final is made implicit.

10
10 CF example: Array relaxation shared int [0:M-1,0:N-1] G = …; finish foreach (int i,j in [1:M-1,1:N-1]) { for (int p in [0:TimeStep-1]) { G[i,j] = omega/4*(G[i-1,j]+G[i+1,j]+G[i,j-1]+G[i,j+1])+(1-omega)*G[i,j]; } All clock manipulations are implicit.

11
11 Some simple examples shared int x=0; finish { async {int r1 = x; int r2 = x; println(r1); println(r2);} async {x=1;x=2;} } 0101 Only one result – independent of the scheduler! ixA1A2 00read r1 11read r2write 1 22write 2

12
12 Some simple examples shared int x=0; finish { async {int r1 = x; int r2 = x; println(r1); println(r2);} async {x=1;} async {x=1; int r3 = x; async {x=2;}} } println(x); All programs are determinate. 012012 ixA1 (0)A2 (0)A3 (0)A4 (2) 00read r1 11read r2write 1write 1; read r3 22write 2

13
13 Some StreamIt examples void -> void pipeline Minimal { add IntSource; add IntPrinter; } void ->int filter IntSource { int x; init {x=0;} work push 1 { push(x++);} } int->void filter IntPrinter { work pop 1 { print(pop());} } shared int x=0; async while (true) x++; async while (true) println(x); StreamIt 01…01… The communication is through assignment to x, so the same result is obtained with: shared int x=0; async while (true) ++x; async while (true) println(x); 01…01… X10/CF Each shared variable is a multi-reader, multi-writer stream.

14
14 Some StreamIt examples: fibonacci shared int x=1, y=1; async while (true) y=x; async while (true) x+=y; iyx 011 112 223 335 ……… Activity 1 Activity 2 Can express any recursive, asynchronous Kahn network.

15
15 StreamIt examples: Moving Average void->void pipeline MovingAverage { add intSource(); add Averager(10); add IntPrinter(); } int->int filter Average(int n) { work pop 1 push 1 peek n { int sum=0; for (int i=0; i < n; i++) sum += peek(i); push(sum/n); pop(); } shared int y=0; shared int x=0; async while (true) x++; async while (true) { int sum=x; for (int i in 1:N-1) sum += peek(x, i); y = sum/N; } peek(x, i) reads the ith future value, without popping it. Blocks if necessary.

16
16 StreamIt examples: Bandpass filter float->float pipeline BandPassFilter(float rate, float low, float high, int taps) { add BPFCore(rate, low, high, taps); add Subtracter();} float ->float splitjoin BPFCore (float rate, float low, float high, int taps) { split duplicate; add LowPass(rate, low, taps, 0); add LowPass(rate, high, taps, 0); join roundrobin;} float->float filter Subtracter { Work pop 2 push 1 { push(peek(1)-peek(0)); pop(); pop();}} float bandPassFilter(float rate, float low, float high, int taps, int in) { int tmp=in; shared int in1=tmp, in2=tmp; async while (true) in1=in; async while (true) in2=in; shared int o1 = lowPass(rate, low, taps, 0, in1), o2 = lowPass(rate, high, taps, 0, in2); shared int o = o1-o2; async while(true) o = o1-o2; return o; } Functions return streams.

17
17 Canon matrix multiplication void canon (double[N,N] c, double[N,N] a, double[N,N] b) { finish foreach (int i,j in [0:N-1,0:N-1]) { a[i,j] = a[i,(j+1) % N]; b[i,j] = b[(i+j)%N, j]; } for (int k in [0:N-1]) finish foreach (int i,j in [0:N-1,0:N-1]) { c[i,j] = c[i+j] + a[i,j]*b[i,j]; a[i,j] = a[i,(j+1)%N]; b[i,j] = b[(i+1)%N, j]; } Local variables in each activity. Parameters whose values are finalized. The natural sequential program works (for finish foreach).

18
18 Histogram Permit commuting writes to be performed simultaneously in the same phase. Phase is completed when all activities that can write have written. [1:N][] histogram([1:N][] A) { final int[] B = new int [1:N]; finish foreach(int i in A) B[A[i]]++; return B; } Bs phase is not yet complete. A subsequent read will complete it.

19
19 Cilk programs with races int x; cilk void foo() { x = x +1; } cilk int main() { x=0; spawn foo(); sync; printf(x is \%d\n, x); return 0; } Determinate: Will always print 1 in CF. CF smoothly combines Cilk and StreamIt.

20
20 Implementation Each activitys world map increases monotonically with time. Use garbage collection to erase past unreachable values. Programs with no sibling communication may be executed in buffers with unit windows. Considering permitting user to specify bounds on variables (cf push/pop specifications in StreamIt). This will force writes to become blocking as well. Scheduling strategy affects size of buffers, not result.

21
21 Formalization MJ/CF Very straightforward additions to field read/write. Paper contains details. Surprisingly localized.

22
22 Future work Paper contains ideas on detecting deadlock (stabilities) at runtime and recovering from them. Programmability being investigated. Implementation. Leverage connection with StreamIt, and static scheduling. Coarser granularity for indices. Use same clock for many variables. Permits coordinated changes to multiple variables.

Similar presentations

OK

INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:

INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google