Presentation is loading. Please wait.

Presentation is loading. Please wait.

Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.

Similar presentations


Presentation on theme: "Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara."— Presentation transcript:

1 Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}

2 Motivation Parallel Computing Becomes Dominant Form of Computation Parallel Machines Require Parallel Software Parallel Constructs Require New Analysis and Optimization Techniques Our Goal Eliminate Synchronization Overhead

3 Talk Outline Motivation Model of Computation Synchronization Optimization Algorithm Applications Experience Dynamic Feedback Related Work Conclusions

4 Model of Computation Parallel Programs Serial Phases Parallel Phases Single Address Space Atomic Operations on Shared Data Mutual Exclusion Locks Acquire Constructs Release Constructs Acq S1 Mutual Exclusion Region Rel

5 Reducing Synchronization Overhead Acq S1 S2 Rel S3

6 Rel Acq

7 Synchronization Optimization Idea: Replace Computations that Repeatedly Acquire and Release the Same Lock with a Computation that Acquires and Releases the Lock Only Once Result: Reduction in the Number of Executed Acquire and Release Constructs Mechanism: Lock Movement Transformations and Lock Cancellation Transformations

8 Lock Cancellation

9 Acquire Lock Movement

10 Release Lock Movement

11 Synchronization Optimization Algorithm Overview: Find Two Mutual Exclusion Regions With the Same Lock Expand Mutual Exclusion Regions Using Lock Movement Transformations Until They are Adjacent Coalesce Using Lock Cancellation Transformation to Form a Single Larger Mutual Exclusion Region

12 Interprocedural Control Flow Graph

13 Acquire Movement Paths

14 Release Movement Paths

15 Migration Paths and Meeting Edge

16 Intersection of Paths

17 Compensation Nodes

18 Final Result

19 Synchronization Optimization Trade-Off Advantage: Reduces Number of Executed Acquires and Releases Reduces Acquire and Release Overhead Disadvantage: May Introduce False Exclusion Multiple Processors Attempt to Acquire Same Lock Processor Holding the Lock is Executing Code that was Originally in No Mutual Exclusion Region

20 False Exclusion Policy Goal: Limit Potential Severity of False Exclusion Mechanism: Constrain the Application of Basic Transformations Original: Never Apply Transformations Bounded: Apply Transformations only on Cycle-Free Subgraphs of ICFG Aggressive: Always apply Transformations

21 Experimental Results Automatic Parallelizing Compiler Based on Commutativity Analysis [PLDI’96] Set of Complete Scientific Applications (C++ subset) Barnes-Hut N-Body Solver (1500 lines of Code) Liquid Water Simulation Code (1850 lines of Code) Seismic Modeling String Code (2050 lines of Code) Different False Exclusion Policies Performance of Generated Parallel Code on Stanford DASH Shared-Memory Multiprocessor

22 Lock Overhead 0 20 40 60 Percentage Lock Overhead Barnes-Hut (16K Particles) Original Bounded Aggressive Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exculsion Locks 0 20 40 60 Percentage Lock Overhead Water (512 Molecules) Original Bounded Aggressive 0 20 40 60 Percentage Lock Overhead String (Big Well Model) Original Aggressive

23 Contention Overhead Contention Percentage Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors 100 0 25 50 75 0481216 Processors Barnes-Hut (16K Bodies) 0 25 50 75 100 0481216 Processors Water (512 Molecules) 0 25 50 75 100 0481216 Processors String (Big Well Model) Original Bounded Aggressive

24 0 2 4 6 8 10 12 14 16 Speedup 0246810121416 Number of Processors Ideal Aggressive Bounded Original Barnes-Hut (16384 bodies) Performance Results : Barnes-Hut

25 Performance Results: Water Ideal Aggressive Bounded Original 0 2 4 6 8 10 12 14 16 0246810121416 Speedup Number of Processors Water (512 Molecules)

26 Performance Results: String String (Big Well Model) Speedup Number of Processors 0 2 4 6 8 10 12 14 16 0246810121416 Ideal Original Aggressive

27 Choosing Best Policy Best False Exclusion Policy May Depend On Topology of Data Structures Dynamic Schedule Of Computation Information Required to Choose Best Policy Unavailable at Compile Time Complications Different Phases May Have Different Best Policy In Same Phase, Best Policy May Change Over Time

28 Solution: Dynamic Feedback Generated Code Consists of Sampling Phases: Measure Performance of Different Policies Production Phases : Use Best Policy From Sampling Phase Periodically Resample to Discover Changes in Best Policy Guaranteed Performance Bounds

29 Dynamic Feedback AggressiveOriginalBounded Time Overhead Sampling PhaseProduction PhaseSampling Phase Aggressive Code Version

30 Dynamic Feedback : Barnes-Hut 0 2 4 6 8 10 12 14 16 Speedup 024681010 1212 1414 1616 Number of Processors Ideal Aggressive Dynamic Feedback Bounded Original Barnes-Hut (16384 bodies)

31 Dynamic Feedback : Water 0 2 4 6 8 10 12 14 16 0246810121416 Speedup Number of Processors Ideal Bounded Original Aggressive Dynamic Feedback Water (512 Molecules)

32 Dynamic Feedback : String String (BigWell Model) 0 2 4 6 8 10 12 14 16 0246810121416 Speedup Number of Processors Ideal Original Aggressive Dynamic Feedback

33 Related Work Parallel Loop Optimizations (e.g. [Tseng:PPoPP95]) Array-based Scientific Computations Barriers vs. Cheaper Mechanisms Concurrent Object-Oriented Programs (e.g. [PZC:POPL95]) Merge Access Regions for Invocations of Exclusive Methods Concurrent Constraint Programming Bring Together Ask and Tell Constructs Efficient Synchronization Algorithms Efficient Implementations of Synchronization Primitives

34 Conclusions Synchronization Optimizations Basic Synchronization Transformations for Locks Synchronization Optimization Algorithm Integrated into Prototype Parallelizing Compiler Object-Based Programs with Dynamic Data Structures Commutativity Analysis Experimental Results Optimizations Have a Significant Performance Impact With Optimizations, Applications Perform Well Dynamic Feedback


Download ppt "Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara."

Similar presentations


Ads by Google