Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 7810 Lecture 18 The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization J.G. Steffan and T.C. Mowry Proceedings.

Similar presentations


Presentation on theme: "CS 7810 Lecture 18 The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization J.G. Steffan and T.C. Mowry Proceedings."— Presentation transcript:

1 CS 7810 Lecture 18 The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization J.G. Steffan and T.C. Mowry Proceedings of HPCA-4 February 1998

2 Multi-Threading CMPs advocate low complexity and static approaches to parallelism extraction Resolving memory dependences for integer codes is not easy! Large window 100 in-flight instrs Compiler-generated threads 4 windows of 25 instrs each

3 Probable Conflicts p  q 

4 Example: Compress

5 Example Execution Bullet

6 Compiler Optimizations Induction variables: in_count Reduction: out_count Parallel I/O: getchar() and putchar() Scalar forwarding: free_entries Ambiguous loads and stores: hash[…]

7 Methodology Threads (epochs) were constructed by hand The procs are in-order and instrs are unit latency

8 Ambiguous Loads and Stores

9 Average Run Lengths

10 Forwarding Registers and Scalars

11 Average Run Lengths

12 Realistic Models 10-cycle forwarding latency Sharing at cache line granularity Recovery from misspeculation Results are not sensitive to forwarding latency or cache line size

13 Hardware Support Cache coherence protocol for the L1 caches For each cache line, keep track of whether the line has been read/modified When the oldest thread writes to a cache line, an invalidate is sent to the other caches The younger thread sets a violation flag if the younger thread has speculatively loaded the line -- s/w recovery is initiated when the thread commits Cache line evicts cause violations (not common)

14 Role of the Compiler Profiling to identify epochs large enough to offset thread management and communication cost; small enough to have low speculative state Estimate probability of violation (static/dynamic) Optimizations (induction, reduction, parallel I/O) Scalar forwarding and rescheduling Insertion of register recovery code

15 Conclusions Hardware catches violations; compiler can parallelize aggressively Competitive implementation: large window with store sets prediction

16 Title Bullet


Download ppt "CS 7810 Lecture 18 The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization J.G. Steffan and T.C. Mowry Proceedings."

Similar presentations


Ads by Google