Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture #21 Shared Objects and Concurrent Programming This material is not available in the textbook. The online powerpoint presentations contain the.

Similar presentations


Presentation on theme: "1 Lecture #21 Shared Objects and Concurrent Programming This material is not available in the textbook. The online powerpoint presentations contain the."— Presentation transcript:

1 1 Lecture #21 Shared Objects and Concurrent Programming This material is not available in the textbook. The online powerpoint presentations contain the text explanations given in class.

2 Art of Multiprocessor Programming 2 Moore’s Law Clock speed flattening sharply Transistor count still rising

3 Art of Multiprocessor Programming 3 Vanishing from your Desktops: The Uniprocesor memory cpu

4 Art of Multiprocessor Programming 4 Your Server: The Shared Memory Multiprocessor (SMP) cache Bus shared memory cache

5 Art of Multiprocessor Programming 5 Your New Server or Desktop: The Multicore Processor (CMP) cache Bus shared memory cache All on the same chip Sun T2000 Niagara

6 Art of Multiprocessor Programming 6 From the 2008 press… …Intel has announced a press conference in San Francisco on November 17th, where it will officially launch the Core i7 Nehalem processor… …Sun’s next generation Enterprise T5140 and T5240 servers, based on the 3rd Generation UltraSPARC T2 Plus processor, were released two days ago…

7 Art of Multiprocessor Programming 7 Why is Kunle Smiling? Niagara 1

8 © 2006 Herlihy and Shavit 8 Traditional Software Scaling Process User code Traditional Uniprocessor Speedup 1.8x 7x 3.6x Time: Moore’s law

9 © 2006 Herlihy and Shavit 9 Multicore Software Scaling Process User code Multicore Speedup 1.8x7x3.6x Unfortunately, not so simple…

10 © 2006 Herlihy and Shavit 10 Real-World Software Scaling Process 1.8x 2x 2.9x User code Multicore Speedup Parallelization and Synchronization require great care…

11 11 Concurrent Programming object Shared Memory Challenge: coordinating access

12 12 Persistent vs. Transient Communication Persistent Communication medium: the sending of information changes the state of the medium forever. Example: Blackboard. Transient communication medium: the change of state is only for some limited time period. Example: Talking.

13 13 Parallel Primality Testing Task: Print all primes from 1 to 10 10 in some order Available: A machine with 10 processors Solution: Speed work up 10 times, that is, new time to print all primes will be 1/10 of time for single processor

14 14 Parallel Primality Testing P1P1 P2P2 P 10 110 9 2x10 9 10 Split the work among processors! Each processor P i gets 10 9 numbers to test. … …

15 15 Parallel Primality Testing (define (P i) (let ((counter (+ 1 (* (- i 1) (power 10 9)))) (upto (* i (power 10 9)))) (define (iter) (if (< counter upto) (begin (if (prime? counter) (display counter) #f) (increment-counter) (iter)) 'done)) (iter))) (parallel-execute (P 1) (P 2)... (P 10))

16 16 Problem: work is split unevenly Some processors have less primes to test… Some composite numbers are easier to test… P1P1 P2P2 P 10 110 9 2x10 9 10 Need to split the work range dynamically!

17 Art of Multiprocessor Programming 17 18 19 Shared Counter each thread takes a number

18 18 A Shared Counter Object (define (make-shared-counter value) (define (fetch) value) (define (increment) (set! value (+ 1 value)) (define (dispatch m) (cond (((eq? m 'fetch) (fetch)) (eq? m 'increment) (increment)) (else (error “unknown request”)))) dispatch) (define shared-counter (make-shared-counter 1))

19 19 Using the Shared Counter (define (P i) (define (iter) (let ((index (shared-counter 'fetch))) (if (< index (power 10 10)) (begin (if (prime? index) (display index) #f) (shared-counter 'increment) (iter)) 'done)) (iter))) (parallel-execute (P 1) (P 2)... (P 10))

20 20 This Solution Doesn’t Work time Increment: (set! value (+ 1 value)) P 1 read value 77 P 2 increment 10 times 87 P 1 set! value 78 Error! (let ((index (shared-counter 'fetch))) 77 P 1 fetch P 2 fetch 77 Error!

21 Art of Multiprocessor Programming 21 Is this problem inherent? If we could only glue reads and writes together… read write read write !!

22 22 The Fetch-and-Increment Operation (define (make-shared-counter value) (define (fetch-and-increment) (let ((old value)) (set! value (+ old 1)) old)) (define (dispatch m) (cond (((eq? m 'fetch-and-increment) (fetch-and-increment)) (else (error ``unknown request -- counter'' m)))) dispatch) Instantaneous Shared Counter Fetch-and-inc

23 © 2006 Herlihy and Shavit 23 Where Things Reside cache Bus cache 1 shared counter shared memory void primePrint { int i = ThreadID.get(); // IDs in {0..9} for (j = i*10 9 +1, j<(i+1)*10 9 ; j++) { if (isPrime(j)) print(j); } code Local variables

24 24 A Correct Shared Counter (define shared-counter (make-shared-counter 1)) (define (P i) (define (iter) (let ((index (shared-counter 'fetch-and-increment))) (if (< index (power 10 10)) (begin (if (prime? index) (display index) #f) (iter)) 'done)) (iter))) (parallel-execute (P 1) (P 2)... (P 10))

25 25 Implementing Fetch-and-Inc To make the program work we need an “instantaneous” implementation of fetch-and-increment. How can we do this: Special Hardware. Built-in synchronization instructions. Special Software. Use regular instructions -- the solution will involve waiting. Software: Mutual Exclusion

26 26 Mutual Exclusion (mutex 'start) (let ((old value)) (set! value (+ old 1)) old) (mutex 'end)) Only one process at a time can execute these instructions P1P1 P2P2 P 10... 1 1 P2P2 returns 1 Mutex count

27 27 The Story of Alice and Bob Bob Alice Yard * As told by Leslie Lamport

28 28 The Mutual Exclusion Problem Requirements: Mutual Exclusion: there will never be two dogs simultaneously in the yard. No Deadlock: if only one dog wants to be in the yard it will succeed, and if both dogs want to go out, at least one of them will succeed.

29 29 Cell Phone Solution Bob Alice Yard

30 30 Coke Can Solution Bob Alice Yard

31 31 Flag Solution -- Alice (define (Alice) (loop ;; ``repeat forever'' (set! Alice-flag 'up) ;; Alice wants to enter (do ((= Bob-flag 'up)) (skip)) ;; loop until Bob lowers flag (Alice-dog-in-yard) ;; Dog can enter the yard (set! Alice-flag 'down) ;; Alice is leaving )) (define (Alice) (loop ;; ``repeat forever'' (set! Alice-flag 'up) ;; Alice wants to enter (do ((= Bob-flag 'up)) (skip)) ;; loop until Bob lowers flag (Alice-dog-in-yard) ;; Dog can enter the yard (set! Alice-flag 'down) ;; Alice is leaving )) Bob Alice

32 32 Flag Solution -- Bob (define (Bob) (loop ;; ``repeat forever'' (set! Bob-flag 'up) ;; Bob wants to enter (do ((= Alice-flag 'up)) ;; If Alice wants to enter (set! Bob-flag 'down) ;; Bob is a gentleman (do ((= Alice-flag 'up)) (skip)) ;; loop (skip) till Alice leaves (set! Bob-flag 'up) ;; raise flag ) ;; and go through the do again (Bob-dog-in-yard) ;; Dog can enter yard (set! Bob-flag 'down) ;; Bob is leaving )) (define (Bob) (loop ;; ``repeat forever'' (set! Bob-flag 'up) ;; Bob wants to enter (do ((= Alice-flag 'up)) ;; If Alice wants to enter (set! Bob-flag 'down) ;; Bob is a gentleman (do ((= Alice-flag 'up)) (skip)) ;; loop (skip) till Alice leaves (set! Bob-flag 'up) ;; raise flag ) ;; and go through the do again (Bob-dog-in-yard) ;; Dog can enter yard (set! Bob-flag 'down) ;; Bob is leaving ))

33 33 Flag Solution -- Both (define (Alice) (loop ;; ``repeat forever'' (set! Alice-flag 'up) ;; Alice wants to enter (do ((= Bob-flag 'up)) (skip)) ;; loop until Bob lowers flag (Alice-dog-in-yard) ;; Dog can enter the yard (set! Alice-flag 'down) ;; Alice is leaving )) (define (Alice) (loop ;; ``repeat forever'' (set! Alice-flag 'up) ;; Alice wants to enter (do ((= Bob-flag 'up)) (skip)) ;; loop until Bob lowers flag (Alice-dog-in-yard) ;; Dog can enter the yard (set! Alice-flag 'down) ;; Alice is leaving )) (define (Bob) (loop ;; ``repeat forever'' (set! Bob-flag 'up) ;; Bob wants to enter (do ((= Alice-flag 'up)) ;; If Alice wants to enter (set! Bob-flag 'down) ;; Bob is a gentleman (do ((= Alice-flag 'up)) (skip)) ;; loop (skip) till Alice leaves (set! Bob-flag 'up) ;; raise flag ) ;; and go through the do again (Bob-dog-in-yard) ;; Dog can enter yard (set! Bob-flag 'down) ;; Bob is leaving )) (define (Bob) (loop ;; ``repeat forever'' (set! Bob-flag 'up) ;; Bob wants to enter (do ((= Alice-flag 'up)) ;; If Alice wants to enter (set! Bob-flag 'down) ;; Bob is a gentleman (do ((= Alice-flag 'up)) (skip)) ;; loop (skip) till Alice leaves (set! Bob-flag 'up) ;; raise flag ) ;; and go through the do again (Bob-dog-in-yard) ;; Dog can enter yard (set! Bob-flag 'down) ;; Bob is leaving ))

34 34 Intuition: Why Mutual Exclusion is Preserved Each perform: First raise the flag, to signal interest. Then look to see if the other one has raised the flag. One can claim that the following flag principle holds: since Alice and Bob each raise their own flag and then look at the others flag, the last one to start looking must notice that both flags are up.

35 Art of Multiprocessor Programming 35 Proof of Mutual Exclusion Assume both dogs in yard Derive a contradiction By reasoning backwards Consider the last time Alice and Bob each looked before letting the dogs in Without loss of generality assume Alice was the last to look…

36 Art of Multiprocessor Programming 36 Proof time Alice’s last look Alice last raised her flag Bob’s last look QED Alice must have seen Bob’s Flag. A Contradiction Bob last raised flag

37 37 Why is there no Deadlock? Since Alice has priority over Bob…if neither is entering the critical section, both are repeatedly trying, and Bob will give Alice priority. Unfortunately, the algorithm is not a fair one, and Bob's dogs might eventually grow very anxious :-)

38 38 The Morals of our Story The Mutual Exclusion problem cannot be solved using transient communication. (I.e. Cell-phones.) The Mutual Exclusion problem cannot be solved using interrupts or interrupt bits (I.e. Cans) The Mutual Exclusion problem can be solved with one bit registers (i.e. Flags), memory locations that can be read and written (set!-ed). We cheated a little: the arbiter problem…

39 Art of Multiprocessor Programming 39 The Arbiter Problem (an aside) Pick a point

40 40 The Solution and Conclusion (define (Alice) (loop (mutex 'begin) (Alice-dog-in-yard) ;; critical section (mutex 'end) )) Question: then why not execute all the code of the parallel prime-printing algorithm in a critical section?

41 Art of Multiprocessor Programming 41 Answer: Amdahl’s Law Speedup= …of computation given n CPUs instead of 1

42 Art of Multiprocessor Programming 42 Amdahl’s Law Speedup=

43 Art of Multiprocessor Programming 43 Amdahl’s Law Speedup= Parallel fraction

44 Art of Multiprocessor Programming 44 Amdahl’s Law Speedup= Parallel fraction Sequential fraction

45 Art of Multiprocessor Programming 45 Amdahl’s Law Speedup= Parallel fraction Number of processors Sequential fraction

46 Art of Multiprocessor Programming 46 Example Ten processors 60% concurrent, 40% sequential How close to 10-fold speedup?

47 Art of Multiprocessor Programming 47 Example Ten processors 60% concurrent, 40% sequential How close to 10-fold speedup? Speedup = 2.17=

48 Art of Multiprocessor Programming 48 Example Ten processors 80% concurrent, 20% sequential How close to 10-fold speedup?

49 Art of Multiprocessor Programming 49 Example Ten processors 80% concurrent, 20% sequential How close to 10-fold speedup? Speedup = 3.57=

50 Art of Multiprocessor Programming 50 Example Ten processors 90% concurrent, 10% sequential How close to 10-fold speedup?

51 Art of Multiprocessor Programming 51 Example Ten processors 90% concurrent, 10% sequential How close to 10-fold speedup? Speedup = 5.26=

52 Art of Multiprocessor Programming 52 Example Ten processors 99% concurrent, 01% sequential How close to 10-fold speedup?

53 Art of Multiprocessor Programming 53 Example Ten processors 99% concurrent, 01% sequential How close to 10-fold speedup? Speedup = 9.17=

54 Art of Multiprocessor Programming Back to Real-World Multicore Scaling 54 1.8x 2x 2.9x User code Multicore Speedup Why the bad performance?

55 As num cores grows the effect of 25% becomes more accute 2.3/4, 2.9/8, 3.4/16, 3.7/32…. Amdahl’s Law: Pay for N = 8 cores SequentialPart = 25% Speedup = only 2.9 times! Must parallelize applications on a very fine grain! Where is sequential code coming from…

56 Need Fine-Grained Locking 75% Unshared 25% Shared cc cc cc cc Coarse Grained c c c c c c c c cc cc cc cc Fine Grained c c c c c c c c The reason we get only 2.9 speedup 75% Unshared 25% Shared Fine grained synchornization has huge performance benefit

57 57 Multicores are here …

58 58 “Life is the synchronicity of chance” You just saw a bit of what concurrent programming is about Today we don’t have sufficient expertise yet on how to make use of multicore machines… You guys are the generation that will get to use them and hopefully develop this expertise. Programming Multicore Machines


Download ppt "1 Lecture #21 Shared Objects and Concurrent Programming This material is not available in the textbook. The online powerpoint presentations contain the."

Similar presentations


Ads by Google