Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.

Similar presentations


Presentation on theme: "ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation."— Presentation transcript:

1 ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation

2 ECE669 L23: Parallel Compilation April 29, 2004 Parallel Compilation °Two approaches to compilation Parallelize a program manually Sequential code converted to parallel code °Develop a parallel compiler Intermediate form Partitioning -Block based or loop based Placement Routing

3 ECE669 L23: Parallel Compilation April 29, 2004 Compilation technologies for parallel machines Assumptions: Input:Parallel program Output:“Coarse” parallel program & directives for: Which threads run in 1 task Where tasks are placed Where data is placed Which data elements go in each data chunk Limitation: No special optimizations for synchronization -- synchro mem refs treated as any other comm.

4 ECE669 L23: Parallel Compilation April 29, 2004 Toy example °Loop parallelization

5 ECE669 L23: Parallel Compilation April 29, 2004 Example °Matrix multiply °Typically, °Looking to find parallelism...

6 ECE669 L23: Parallel Compilation April 29, 2004 Choosing a program representation... °Dataflow graph No notion of storage problem Data values flow along arcs Nodes represent operations + a0a0 b0b0 c0c0 + a2a2 b2b2 c2c2 + a5a5 b5b5 c5c5

7 ECE669 L23: Parallel Compilation April 29, 2004 Compiler representation °For certain kinds of structured programs °Unstructured programs A Data X Task A Task B LOOP LOOP nest Data array Index expressions Communication weight Array

8 ECE669 L23: Parallel Compilation April 29, 2004 Process reference graph Nodes represent threads (processes) computation Edges represent communication (memory references) Can attach weights on edges to represent volume of communication Extension: precedence relations edges can be added too Can also try to represent multiple loop produced threads as one node P0P0 P1P1 P2P2 P5P5 Memory Monolithic memory or21......Not very useful

9 ECE669 L23: Parallel Compilation April 29, 2004 Allocate data items to nodes as well Nodes: Threads, data objects Edges: Communication Key: Works for both shared-memory, object-oriented, and dataflow systems! (Msg. passing) Process communication graph P0P0 A0A0 B0B0 C0C0 d1d1 d0d0 1 1 1 P1P1 A 01 A2A2 C1C1 d3d3 d2d2 1 1 1 P2P2 B0B0 d5d5 d4d4 P5P5

10 ECE669 L23: Parallel Compilation April 29, 2004 PCG for Jacobi relaxation Fine PCG Coarse PCG 4 4 4 4 4 : Computation : Data

11 ECE669 L23: Parallel Compilation April 29, 2004 Compilation with PCGs Fine process communication graph Partitioning Coarse process communication graph

12 ECE669 L23: Parallel Compilation April 29, 2004 Compilation with PCGs Fine process communication graph Coarse process communication graph Placement... other phases, scheduling. Dynamic? MP: Partitioning Coarse process communication graph

13 ECE669 L23: Parallel Compilation April 29, 2004 Parallel Compilation °Consider loop partitioning °Create small local compilation °Consider static routing between tiles °Short neighbor-to-neighbor communication °Compiler orchestrated

14 ECE669 L23: Parallel Compilation April 29, 2004 Flow Compilation °Modulo unrolling °Partitioning °Scheduling

15 ECE669 L23: Parallel Compilation April 29, 2004 Modulo Unrolling – Smart Memory °Loop unrolling relies on dependencies °Allow maximum parallelism °Minimize communication

16 ECE669 L23: Parallel Compilation April 29, 2004 Array Partitioning – Smart Memory °Assign each line to separate memory °Consider exchange of data °Approach is scalable

17 ECE669 L23: Parallel Compilation April 29, 2004 Communication Scheduling – Smart Memory °Determine where data should be sent °Determine when data should be sent

18 ECE669 L23: Parallel Compilation April 29, 2004 Speedup for Jacobi – Smart Memory °Virtual wires indicates scheduled paths °Hard wires are dedicated paths °Hard wires require more wiring resources °RAW is a parallel processor from MIT

19 ECE669 L23: Parallel Compilation April 29, 2004 Partitioning Use heuristic for unstructured programs For structured programs......start from: Arrays List of arrays A B C L0 L1 L2 Loop Nests List of loops

20 ECE669 L23: Parallel Compilation April 29, 2004 Notion of Iteration space, data space E.g. A Matrix j i Data space Iteration space Represents a “thread” with a given value of i,j

21 ECE669 L23: Parallel Compilation April 29, 2004 °Partitioning: How to “tile” iteration for MIMD M/Cs data spaces? Notion of Iteration space, data space E.g. A Matrix j i Data space Iteration space Represents a “thread” with a given value of i,j This thread affects the above computation +

22 ECE669 L23: Parallel Compilation April 29, 2004 Loop partitioning for caches °Machine model Assume all data is in memory Minimize first-time cache fetches Ignore secondary effects such as invalidations due to writes Memory Network Cache PPP A

23 ECE669 L23: Parallel Compilation April 29, 2004 Summary °Parallel compilation often targets block based and loop based parallelism °Compilation steps address identification of parallelism and representations Graphs often useful to represent program dependencies °For static scheduling both computation and communication can be represented °Data positioning is an important for computation


Download ppt "ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation."

Similar presentations


Ads by Google