Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.

Similar presentations


Presentation on theme: "Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995."— Presentation transcript:

1 Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995

2 Goals Expand intuition about concurrent program behavior Explore execution sequences due to compiler or hardware optimizations Introduce shared memory consistency models Explore execution sequences due to a particular memory model Demonstrate Memory Barriers (“fences”)

3 What happens? Example of a mutual exclusion (“Dekker’s Algorithm”) Global variables initially: Flag1 = 0, Flag2 = 0 Flag1 = 1 If(Flag2 == 0) Critical section Flag2 = 1 If(Flag1 == 0) Critical section P1P2

4 Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 0Flag 2 == 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0

5 Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 0 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T1 P1 Flag1 = 1

6 Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 0 T1 Write Flag 1 T2 Read Flag 2 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1

7 Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 1 T1 Write Flag 1 T2 Read Flag 2 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1

8 Uniprocessor Hardware Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 == 1Flag 2 == 1 T1 Write Flag 1 T2 Read Flag 2 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T4 Read Flag 1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T4 P2 Flag1 == 1 T3 P2 Flag2 = 1

9 Uniprocessor Hardware Optimizations Buffer (Cache) Writes take about 100 cycles Reads take about 1 cycle Use Write Buffer Bypass

10 Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0

11 Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T1 P1 Flag1 = 1

12 Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == ? T1 P1 Flag1 = 1

13 Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1

14 Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1

15 Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T4 Read Flag 1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T4 P2 Flag1 == ? T3 P2 Flag2 = 1

16 Uniprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 2 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T4 Read Flag 1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T4 P2 Flag1 == 1 T3 P2 Flag2 = 1

17 Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 Flag 2 = 1 Shared Bus

18 Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T1 P1 Flag1 = 1 Flag 2 = 1 Shared Bus

19 Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == ? T1 P1 Flag1 = 1 Flag 2 = 1 Shared Bus

20 Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 Flag 2 = 1 Shared Bus

21 Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1 Flag 2 = 1 Shared Bus

22 Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1 Flag 2 = 1 Shared Bus T4 Read Flag 1 T4 P2 Flag1 == ?

23 Multiprocessor Hardware Optimizations Write Buffer Bypass Flag2 = 1 If(Flag1 == 0) Critical section P2 Flag 1 = 1 Flag 1 == 0Flag 2 == 0 T2 Read Flag 2 T1 Write Flag 1 Flag1 = 1 If(Flag2 == 0) Critical section P1 T3 Write Flag 2 T0 Flag 1 = 0 and Flag 2 = 0 T2 P1 Flag2 == 0 T1 P1 Flag1 = 1 T3 P2 Flag2 = 1 Flag 2 = 1 Shared Bus T4 Read Flag 1 T4 P2 Flag1 == 0

24 Producer Consumer Example of a Producer and Consumer Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0); print Data; P1P2

25 General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 0 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 P1 Head = 1 P1 Data = 2

26 General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T1 GI Head = 1 P1 Head = 1 P1 Data = 2

27 General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T2 Read Head = 1 P1 Head = 1 P1 Data = 2

28 General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T3 P2 Data == 0 T2 Read Head = 1 P1 Head = 1 P1 Data = 2

29 General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 2Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T4 GI Data = 2 T3 P2 Data == 0 T4 Write Data = 2 T2 Read Head = 1 P1 Head = 1 P1 Data = 2

30 What was expected? Example of a Producer and Consumer Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0); print Data; P1P2

31 Simplify Example and the Operations Simple Program Global variables initially: A = 0, B = 0 A = 1 B = 2 P1 print A print B P2 WX WY RX RY

32 Reason about possible sequences Expected Output A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX RX WY RY WX RX RY WY RX WX RY WY RX WX WY RY RX RY WX WY 1212 1212 1010 0202 0000 0000

33 Reason about possible sequences. We get them all? A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX RX WY RY WX RX RY WY RX WX RY WY RX WX WY RY RX RY WX WY

34 Similar Reasoning Example of a Producer and Consumer Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0);... = Data; P1P2 WX WY RY RX

35 Reason about possible sequences. Expected Outcomes Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX WX RY WY RX WX RY RX WY RY WX RX WY RY WX WY RX RY RX WX WY 2

36 Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX WX RY WY RX WX RY RX WY RY WX RX WY RY WX WY RX RY RX WX WY 2 0 Reason about possible sequences. Expected Outcomes

37 General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 0 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 P1 Head = 1 P1 Data = 2 WY RY RX WX

38 General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T1 GI Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX

39 General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T2 Read Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX

40 General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 0Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T3 P2 Data == 0 T2 Read Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX

41 General Interconnect Multiprocessor Hardware Optimizations Overlapped Writes while(Head == 0); print Data P2 Data == 2Head == 1 T1 Write Head = 1 Data = 2 Head = 1 P1 T3 Read Data = 0 T0 Data = 0, Head = 0 T2 P2 Head == 1 T1 GI Head = 1 T4 GI Data = 2 T3 P2 Data == 0 T4 Write Data = 2 T2 Read Head = 1 P1 Head = 1 P1 Data = 2 WY RY RX WX 0

42 Compiler Optimizations Constant Propagation Register Allocation Loop Transformation Instruction Scheduling Common Subexpression elimination Et Cetera

43 More H/W Optimizations Speculative Execution Execution reordering (e.g. pipelining) Speculative Store Read to Write reordering Write to Read reordering Write to Write reordering Read to Read reordering Et Cetera

44 Possible Outcomes Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WY RY RX WX WY RX RY WX WY RX WX RY RX WX RY WY RX WY RY WX RX WY WX RY 000 000 WX WY RY RX WX RY WY RX WX RY RX WY RY WX RX WY RY WX WY RX RY RX WX WY 2

45 What’s missing? A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX RX WY RY WX RX RY WY RX WX RY WY RX WX WY RY RX RY WX WY

46 Simple Program All possible sequences A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX

47 Dekker’s Algorithm Simplify the Operations Example of a mutual exclusion (“Dekker’s Algorithm) Global variables initially: Flag1 = 0, Flag2 = 0 Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX

48 Dekker’s Algorithm All possible sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Example of a Synchronization (“Dekker’s Algorithm”) Which of these sequences will prevent concurrent execution?

49 OK WrongOK Wrong OK Wrong Dekker’s Algorithm Sequences and Outcomes WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX

50 OK WrongOK Wrong OK Wrong Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Need to restrict certain sequences Dekker’s Algorithm Sequences and Outcomes

51 OK WrongOK Wrong OK Wrong WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Works whenever WX precedes RX or WY precedes RY Dekker’s Algorithm Sequences and Outcomes

52 Dekker’s Algorithm All possible sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Works whenever WX precedes RX or WY precedes RY 18 are OK      6 are Wrong 

53 Simple Program All possible sequences A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX No ordering requirement

54 Simple Program All possible sequences A = 1 B = 2 P1 print A print B P2 WX WY RX RY WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX No ordering requirement All 24 are “OK”  0 are “Wrong”     

55 Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Require WX precede RX and WY precede RY and WY precede RX

56 Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Require WX precede RX and WY precede RY and WY precede RX 5 are OK  19 are Wrong     

57 Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX When RY precedes WY, while-RY-loop spins. Eventually we get WY < RY. 5 are OK  19 are Wrong     

58 Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX We have RY, RY, RY … Sequences with RY < WY will eventually end with RY 5 are OK  19 are Wrong?     

59 Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX We have RY, RY, RY … Sequences with RY < WY will eventually end with RY 5 are OK  19 are Wrong?    

60  Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY RY WX RY RX WY RY WX RY WY RX RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RY RX RY WY WX RY RY WX WY RX RY RY WX RX WY RY RY WY WX RX RY RY WY RX WX RY RY RX WX WY RY RY RX WY WX RY We have RY, RY, RY … Sequences with RY < WY will eventually end with RY 5 are OK  19 are Wrong?    

61  Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY RY WX RY RX WY RY WX RY WY RX RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RY RX RY WY WX RY RY WX WY RX RY RY WX RX WY RY RY WY WX RX RY RY WY RX WX RY RY RX WX WY RY RY RX WY WX RY We can remove the earlier RY in those sequences. 5 are OK  19 are Wrong?    

62  Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX WY RY WX RX WY RY WX WY RX RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY RX WX WY RY RX WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX RY Remove all of the duplicated sequences 5 are OK  19 are Wrong?    

63 Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Remove all of the duplicated sequences 5 are OK  7 are Wrong    

64 Producer Consumer Possible sequences w/write acknowledge Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Some H/W provides write acknowledgment (i.e. wait for pending writes to complete) 5 are OK  7 are Wrong    

65 Producer Consumer Possible sequences w/write acknowledge Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Remove all sequences where WY < WX. 5 are OK  7 are Wrong    

66 Producer Consumer Possible sequences w/write acknowledge Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY RX WX WY RY Remove all sequences where WY < WX. 2 are OK  2 are Wrong   

67 Review. What does the H/W provide? Reordering of loads and stores – doesn’t help Write acknowledge – almost helps Memory Models

68 Sequential Consistency Definition: [A multiprocessor system is sequentially consistent if] the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. [Lamport 1979] Pros Cons Simple view of program OK for Uniprocessor environments Simple view of program OK for Uniprocessor environments Not OK for Multiprocessor environments Too restrictive for processor performance Not OK for Multiprocessor environments Too restrictive for processor performance

69 Memory Models Relaxed Consistency Description: Relaxed memory consistency models are already implemented on the multiprocessors available. They specify what memory operations may be expected to be reordered by the hardware. Write to Read Write to Write Read to Read / Write Read Others Write Early Read Own Early Write to Read Write to Write Read to Read / Write Read Others Write Early Read Own Early They all have methods to force a particular ordering and these are known as the Safety Net

70 Available Relaxed Memory Models SYNC PowerPC various MEMBARs RMO MB, WMB Alpha release, acquire, nsync, RMW RCpc release, acquire, nsync, RMW RCsc synchronization WO RMW, STBAR PSO RMW PC RMW TSO serialization instructions IBM 370 Safety Net Read Own Write Early Read Others’ Write Early R  RW Order W  W Order W  R Order Relaxation:

71 Producer Consumer Relaxed W->R memory model WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Which of these sequences can be expected with all the memory models listed?

72 Producer Consumer All sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Require WX precede RX and WY precede RY and WY precede RX

73 Producer Consumer Possible sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Require WX precede RX and WY precede RY and WY precede RX 5 are OK  7 are Wrong    

74 Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RX RY WX WY RY RX WX RX WY RY WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX WY RY RX WY RY WX RX WY WX RY Start with Sequential Consistency 5 are OK  7 are Wrong     Producer Consumer with sequential consistency

75 Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX Start with Sequential Consistency 1 is OK  0 are Wrong  

76 Producer Consumer Relaxed W->R ordering sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX Add sequences due to the relaxation of W->R ordering 1 is OK  0 are Wrong  

77 Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX No change 1 is OK  0 are Wrong   Producer Consumer Relaxed W->R ordering sequences

78 Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX WX WY RY RX Most processors have relaxed w->w orderings also. 1 is OK  0 are Wrong   Producer Consumer Relaxed W->R, and W->W ordering sequences

79 Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX Started with sequential consistency, then added relaxed w->r and w->w orderings 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX  

80 Dekker’s Algorithm Relaxed W->R memory model WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Which of these sequences can be expected with all the memory models listed?

81 Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Works whenever WX precedes RX or WY precedes RY 18 are OK      6 are Wrong 

82 Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Start with Sequential Consistency 18 are OK      6 are Wrong 

83 Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Start with Sequential Consistency 6 are OK    0 are Wrong 

84 Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Add sequences due to relaxed memory model 6 are OK    0 are Wrong 

85 Dekker’s Algorithm Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Add sequences due to relaxed memory model 18 are OK      6 are Wrong 

86 Safety Nets Atomic instruction (RMW) Code delineation (serialization instructions) Synchronization instructions (SYNC) Identify Data and Synch operations (Weak Ordering model, and Release Consistency model) Memory Bars (aka “fences”)

87 Producer Consumer w/Fence Insert a memory barrier between the instructions we want ordered. Global variables initially: Data = 0, Head = 0 Data = 2 Head = 1 while(Head == 0);... = Data; P1P2 WX WY RY RX

88 Producer Consumer w/Fence Example of a Producer and Consumer with a Memory Barrier applied. Global variables initially: Data = 0, Head = 0 Data = 2 memory_barrier Head = 1 while(Head == 0); memory_barrier... = Data; P1P2 WX WY RY RX All memory operations before the memory barrier must complete before proceeding to memory operations after the memory barrier.

89 Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences Data = 2 Head = 1 P1 while(Head == 0); print Data; P2 WX WY RY RX Started with sequential consistency, then added relaxed w->r and w->w orderings 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX  

90 Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Add memory barriers to force WX < WY and RY < RX 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX  

91 Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Looks the same. 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

92 Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible. 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

93 Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Due to MB, RY < RX is enforced 2 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

94 Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX WY < RY < MB. while-RY-loop waits for WY. 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

95 Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX Due to MB, WX < WY is enforced 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

96 Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX WX < WY and WY < RY and RY < RX is enforced therefore WX < RX is enforced 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

97 Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible. 3 are OK  1 is Wrong  WX WY RY RX WY WX RY RX WY RY WX RX WY RY RX WX   Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

98 Data = 2 memory_barrier Head = 1 P1 while(Head == 0); memory_barrier print Data; P2 WX WY RY RX With WX < WY and RY < RX enforced with memory barriers, RX < WX is not possible. WX WY RY RX WY WX RY RX WY RY WX RX   3 are OK  0 are Wrong  Producer Consumer w/Fence Relaxed W->R, and W->W ordering sequences

99 Dekker’s Algorithm w/Fence Example of a mutual exclusion (“Dekker’s Algorithm) Global variables initially: Flag1 = 0, Flag2 = 0 Flag1 = 1 memory_barrier If(Flag2 == 0) Critical section Flag2 = 1 memory_barrier If(Flag1 == 0) Critical section P1P2 WX WY RY RX All memory operations before the memory barrier must complete before proceeding to memory operations after the memory barrier.

100 Dekker’s Algorithm w/Fence Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Flag1 = 1 If(Flag2 == 0) Critical section P1 Flag2 = 1 If(Flag1 == 0) Critical section P2 WX RY WY RX Started with sequential consistency, then added relaxed w->r orderings 18 are OK      6 are Wrong 

101 Dekker’s Algorithm w/Fence Relaxed W->R ordering sequences WX WY RX RY WX WY RY RX WX RX WY RY WX RX RY WY WX RY RX WY WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY WY RY WX RX WY RX RY WX WY RY RX WX RX WX RY WY RX WX WY RY RX WY RY WX RX WY WX RY RX RY WX WY RX RY WY WX RY WX WY RX RY WX RX WY RY WY WX RX RY WY RX WX RY RX WX WY RY RX WY WX Add memory barriers to force WX < WY and RY < RX 18 are OK      6 are Wrong  Flag1 = 1 memory_barrier If(Flag2 == 0) Critical section P1 Flag2 = 1 memory_barrier If(Flag1 == 0) Critical section P2 WX WY RY RX

102 Dekker’s Algorithm w/Fence Relaxed W->R ordering sequences Add memory barriers to force WX < RY and WY < RX 6 are OK  0 are Wrong  Flag1 = 1 memory_barrier If(Flag2 == 0) Critical section P1 Flag2 = 1 memory_barrier If(Flag1 == 0) Critical section P2 WX RY WY RX WX WY RX RY WX WY RY RX WX RY WY RX WY WX RX RY WY WX RY RX WY RX WX RY  

103 Serialization of Writes (Fig 6) w/Fence Insert a memory barrier between the instructions we want ordered. Global variables initially: A = 0, B = 0, C= 0 A = 1 B = 2 P1 WX WY while(B != 1); while(C != 1); Register1 = A P3 RY RZ A = 2 C = 1 P2 WX WZ while(B != 1); while(C != 1); Register2 = A P4 RY RZ W1W2

104 Higher Level Abstractions Lower level of complexity Explicit Parallel Constructs – Fortran 90 – MPI

105 Conclusion The Uniprocessor programming model is simple, but does not work on Multiprocessors Hardware and compilers make many optimizations that reorder loads and stores Memory models exist on the hardware and need to be considered for program correctness The Sequential Consistency model was considered for concurrent programs on the Uniprocessor Relaxed Memory Consistency models are considered on the Multiprocessor because SC is too restrictive for hardware performance. Use memory barriers (fences) to override relaxed memory model when ordering between memory operations must be maintained.

106 Other Processors


Download ppt "Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995."

Similar presentations


Ads by Google