Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore.

Similar presentations


Presentation on theme: "Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore."— Presentation transcript:

1 Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore

2 Path profiles Profiling a program’s execution Profiling a program’s execution –Count based –Path based Count based profiles are more aggregate Count based profiles are more aggregate –# of execution of the program’s basic blocks –# of accesses of various memory locations Path based profiles are more accurate Path based profiles are more accurate –Sequence of basic blocks executed –Sequence of memory locations accessed Use Online compression to generate compact path profiles. Use Online compression to generate compact path profiles.

3 Organization Compressed Path Profiles in Sequential Programs Compressed Path Profiles in Sequential Programs Parallel Program Path Profiles Parallel Program Path Profiles Compression Efficiency and Overheads Compression Efficiency and Overheads Data race detection over path profiles Data race detection over path profiles

4 Compressed Path - Example 1 2 3 Uncompressed Path 123123 Compressed Representation S  AA A  123 Control Flow Graph

5 Online Path Compression A program path is a string over a finite alphabet A program path is a string over a finite alphabet Alphabet decided by what we instrument Alphabet decided by what we instrument –Control flow (Basic Blocks executed) –Data flow (Memory Locations accessed) A string s is represented by a Context Free Grammar Gs: Language of Gs is {s} A string s is represented by a Context Free Grammar Gs: Language of Gs is {s} Construction of Gs is online and not post-mortem Construction of Gs is online and not post-mortem –Start with trivial grammar & modify it for each symbol No recursive rules (DAG representation) No recursive rules (DAG representation) Compression scheme – Nevill-Manning & Witten 97 Compression scheme – Nevill-Manning & Witten 97 –Application to program paths – Larus 99

6 Online Compression in action Path Executed Compressed Representation 1 S -> 1 12 S -> 12 123 S -> 123 1231 S -> 1231 12312 S -> 12312 S -> A3A A -> 12

7 Online Compression in action Path Executed Compressed Representation 123123 S -> A3A3 A -> 12 S -> BB B -> A3 A -> 12 S -> BB B -> 123

8 Organization Compressed Path Profiles in Sequential Programs Compressed Path Profiles in Sequential Programs Parallel Program Path Profiles Parallel Program Path Profiles Compression Efficiency and Overheads Compression Efficiency and Overheads Data race detection over path profiles Data race detection over path profiles

9 What to represent ? Control/data flow in each program thread Control/data flow in each program thread Communication among threads Communication among threads –Synchronization (locks, barriers) –Unsynchronized shared variable accesses Too costly to observe/record order of all shared variable accesses Too costly to observe/record order of all shared variable accesses We will represent We will represent –Compressed flow in each thread (via Grammar) –Communication via synchronizations (How ?)

10 Synchronization Pattern (Locks) lock unlock Compute lock unlock P1P2 Memory Message Sequence Chart (MSC) Pgm = P1 || P2

11 Synchronization Pattern (Barrier) Blocked go go ready Compute Compute P1P2 Pgm = P1 || P2 Memory ready

12 Connection to MSCs Partial Order of MSC unlock lock Matches Observed Ordering Total order in each threadTotal order in each thread Ordering across threads visible via synchronization (msg. exchange)Ordering across threads visible via synchronization (msg. exchange) All synchronization ops. form a total order Th. 1 Th. 2 Shared Mem.

13 A first cut Instrument each thread to observe local control/data flow and global synch. Instrument each thread to observe local control/data flow and global synch. Represent path profile of P1 || P2 Represent path profile of P1 || P2 –Each thread’s flow as a Grammar – (G1, G2) Contains synch. ops. as well. Contains synch. ops. as well. –All synchronization ops. as a list. –Associate entries in this list to the occurrence of synch. ops. in (G1,G2) How to navigate the path profile ? How to navigate the path profile ? –Zoom in to a specific lock—unlock segment of P1

14 Edge annotations a b (lock) c (unlock) x b (lock) c (unlock) y S Aa bc x y Grammar for one thread 02 01 2 4

15 Locating synch. operations S Aa bc x y 02 01 2 4 Locating the 3 rd synchronization operation Can find synch. segments by looking up global list. X Y } n synch ops. n

16 So far Control flow of each thread stored as a grammar Control flow of each thread stored as a grammar Synchronization ops. form a global list Synchronization ops. form a global list Grammar of each thread annotated with counts Grammar of each thread annotated with counts –Easy searching of synchronization operations What about shared data accesses ? What about shared data accesses ? Sequence of memory locations accessed by a single LD/ST instruction can be compressed Sequence of memory locations accessed by a single LD/ST instruction can be compressed –Use a Grammar representation for this seq. as well

17 Further compression Locations accessed by a memory operation Locations accessed by a memory operation –10,14,18,22,26,54,58,62,66,70,98 Online Compression of the string as grammar Online Compression of the string as grammar –10(1), 4(4), 28(1), 4(4), 28(1) –Difference representation + Run-length encoding Useful for detecting regularity of array accesses Useful for detecting regularity of array accesses –Sweep through an array: A run of constant diffs. –Accessing a sub-grid of a multidimensional array

18 Organization Compressed Path Profiles in Sequential Programs Compressed Path Profiles in Sequential Programs Parallel Program Path Profiles Parallel Program Path Profiles Compression Efficiency and Overheads Compression Efficiency and Overheads Data race detection over path profiles Data race detection over path profiles

19 Any better than gzip ? Compression % (2 Processors)

20 Scalability of Compression Compression % for our scheme

21 Concerns about Timing Overheads Our scheme does not add substantial time overhead over grammar based string compression Our scheme does not add substantial time overhead over grammar based string compression Our experiments conducted using RSIM Our experiments conducted using RSIM –Tracing overheads can be higher in a real multiprocessor –Can tracing distort program behavior ? Possible solution Possible solution –Trace minimal number of operations in a parallel program execution (Netzer 1993) to ensure deterministic replay –Collect compressed path profile during replay.

22 Organization Compressed Path Profiles in Sequential Programs Compressed Path Profiles in Sequential Programs Parallel Program Path Profiles Parallel Program Path Profiles Compression Efficiency and Overheads Compression Efficiency and Overheads Data race detection over path profiles Data race detection over path profiles

23 Apparent Data races lock unlock lock unlock lock Th. 1 Th.2 unlock lock unlock Th.3Mem. Last unlock in Th. 1 (first unlock)Last unlock in Th. 1 (first unlock) Next lock in Th. 1 (second lock)Next lock in Th. 1 (second lock) Locate root-to-leaf paths of these ops.Locate root-to-leaf paths of these ops. Tree rooted at the least common ancestor of these ops.Tree rooted at the least common ancestor of these ops. No Decompression of the grammar of Th. 1

24 Data race artifacts Sub := 1 A[1] := 0 X := Sub; Y := A[X] (artifact) X decides which addr. is accessed in Y := A[X] X is set by Sub:= 1 which is also in a data race. Detecting artifacts requires Data-flow Not captured by rd/wr sets in synch. segments Captured in our compact path profiles.

25 Summary Compressed representation of the execution profile of shared memory parallel programs Compressed representation of the execution profile of shared memory parallel programs –Control and shared data flow per thread –Synchronization patterns across threads Overall compression efficiency 0.25% -- 9.81% Overall compression efficiency 0.25% -- 9.81% Compression efficiency scalable with increasing number of processors Compression efficiency scalable with increasing number of processors Application: Post-mortem debugging such as detecting data races Application: Post-mortem debugging such as detecting data races

26 Other Applications We do not capture actual order of unsynchronized shared memory accesses across processors We do not capture actual order of unsynchronized shared memory accesses across processors Can be useful in making architectural decisions such as choice of cache coherence protocol Can be useful in making architectural decisions such as choice of cache coherence protocol Sufficient to maintain [Netzer 1993] Sufficient to maintain [Netzer 1993] –transitive reduction of program order on each proc. –shared variable conflict orders Can we capture transitive reduction relation via annotations of WPP edges? Can we capture transitive reduction relation via annotations of WPP edges?


Download ppt "Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore."

Similar presentations


Ads by Google