Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

Similar presentations


Presentation on theme: "Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and."— Presentation transcript:

1 Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and Per Stenstrom, Member, IEEE

2 INTRODUCTION Why prefetching? Motivations for Prefetching of data. Types of prefetching. SOFTWARE HARDWARE

3 SIMPLEST & MOST OBVIOUS PREFETCHING TECHNIQUE: INCREASE BLOCK SIZE!!!

4 IMPACT OF BLOCK SIZE ON ACCESS PENALTIES AND TRAFFIC TYPES OF MISSES TRUE SHARINGFALSE SHARING REPLACEMENT COHERENCE COLD

5 EFFECT OF INCREASING THE BLOCK SIZE ON THE DIFFERENT TYPES OF MISSES EFFECT OF BLOCK SIZE ON DIFFERENT TYPES OF MISSES EFFECT OF BLOCK SIZE ON MEMORY TRAFFIC EFFECT OF BLOCK SIZE ON WRITE PENALTY

6 SIMULATED NODE ARCHITECTURE

7 TWO SIMPLE HARDWARE CONTROLLED SEQUENTIAL PREFETCHING TECHNIQUE 1. FIXED SEQUENTIAL PREFETCHING

8 2. ADAPTIVE SEQUENTIAL PREFETCHING K is controlled by the LookAhead Counter Mechanism needed : Prefetch bit Zero bit Look Ahead Counter Prefetch Counter Useful Counter

9 ADAPTIVE SEQUENTIAL PREFETCHING ALGORITHM MEASURES PREFETCH EFFICIENCY BY COUNTING THE FRACTION OF USEFUL PREFETCHS. COUNTING PREFETCH BLOCKS : INCREMENT THE PREFETCH COUNTER WHENEVER WE DO A PREFETCH. COUNTING THE USEFUL PREFETCHES : INCREMENT THE USEFUL COUNTER WHENEVER A BLOCK WITH ITS PREFETCH BIT = 1 IS ACCESSED. IF PREFETCH COUNTER = MAX, THEN CHECK USEFUL COUNTER. USEFUL COUNTER > UPPER THRESHOLD ; LOOKAHEAD COUNTER INCREASED. USEFUL COUNTER < LOWER THRESHOLD ; LOOKAHEAD COUNTER DECREASED. LOWER THRESHOLD < USEFUL COUNTER < UPPER THRESHOLD ; LOOKAHEAD COUNTER UNAFFECTED.

10 EXPERIMENTAL RESULTS READ, WRITE AND SYNCHRONIZATION TRAFFIC & THE THREE SCHEMES

11 RELATIVE READ STALL TIMES FOR FIXED AND ADAPTIVE PREFETCHING NORMALIZED TO NO PREFETCHING.

12

13

14 CONCLUSIONS Prefetching improves efficiency Fixed Sequential Prefetching analyzed for K=1. Read Misses Decreases by 25 – 45 %, Read stall time decreases by 20 – 35 % Under Adaptive Sequential prefetching Read stall time reduced by 58%, execution time decreased by 25%.

15 QUESTIONS ?? THANK YOU


Download ppt "Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and."

Similar presentations


Ads by Google