Presentation is loading. Please wait.

Presentation is loading. Please wait.

2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

Similar presentations


Presentation on theme: "2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok."— Presentation transcript:

1 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok Garg, and Michael Huang Department of Electrical & Computer Engineering University of Rochester

2 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20062 Motivation Hiding long latencies Scaling up of many structures Complex, hard to design Consumes more energy Slower Inefficiency in hardware Meticulously keep track of all instructions No prior knowledge of out-of-order execution Simply cross-compare all loads and stores ROB size: 320 SQ size: 48 LQ size: 48 LQ Size 16%

3 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20063 Software Assistance Global information Statically identify non-conflicting memory accesses Advantages Reduced resource pressure Energy savings Loads not requiring memory disambiguation Average 43% dynamic loads in FP Spec applications

4 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20064 Recent Research Chrysos and Emer (ISCA’98) Sethumadhavan et al. (MICRO’03) Park et al. (MICRO’03) Baugh and Zilles (PACC’04) Akkary et al. (MICRO’03) Gandhi et al. (ISCA’05), etc. Hardware-only: Provisioning, re-occurring overhead Cooperative: Consumption, one-time overhead

5 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20065 Outline Cooperative Memory Disambiguation Framework Evaluation Conclusion

6 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20066 Cooperative Memory Disambiguation - Resource-Effective Approach 90% dynamic loads do not communicate with in-flight stores Many loads do not require memory disambiguation resources Safe loads: Software analyzer can identify them Can exploit hardware specific information Hardware resources only for non-safe loads int A[1000], B[1000]; void VecAdd() { for(int i=0; i<1000; i++) A[i] = A[i] + B[i]; }

7 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20067 Cooperative Memory Disambiguation Framework Software-hardware Interface Decoupled ISA (No compatibility obligations) Software Support Binary to binary translator - alto (Muth et al.) Binary analyzer Identify read-only data loads Identify other general safe loads Architectural Support Light-weight Source compiler Original binary Hardware Translator Compilation Hardware specific translator ISA Extended instruction set Hardware specific internal binary

8 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20068 General Safe Loads Scope of parser analysis Steady state loop No internal control flow Limited in-flight instructions ROB size, store queue size … Load … Store Branch Simple loop body … Store … Store … Load … Store … i i-1 i-2 Steady state loop execution Instruction window

9 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20069 General Safe Loads (Cont.) -Real example from a SPEC FP application 0x120033140:ldl r31, 256(r3); prefetch 0x120033144: ldt f21, 0(r3) ; Ld1 0x120033148: lda r27, -2(r27) ; r27 = r27-2 0x12003314c: lda r3, 16(r3) ; r3 = r3+16 0x120033150: ldt f22, -8(r3) ; Ld2 0x120033154: ldt f23, 0(r11) ; Ld3 0x120033158: cmple r27, 0x1, r1 ; 0x12003315c: lda r11, 16(r11) ; r11 = r11+16 0x120033160: ldt f24, -8(r11) ; Ld4 0x120033164: lds f31, 240(r11) ; prefetch 0x120033168: mult f20, f21, f21 ; 0x12003316c: mult f20, f22, f22 ; 0x120033170: addt f23, f21, f21 ; 0x120033174: addt f24, f22, f22 ; 0x120033178: stt f21, -16(r11) ; St1 0x12003317c: stt f22, -8(r11) ; St2 0x120033180: beq r1, 0x120033140 ; One loop from galgel 0x120033140:ldl r31, 256(r3); prefetch 0x120033144: ldt f21, 0(r3) ; Ld1 0x120033148: lda r27, -2(r27) ; r27 = r27-2 0x12003314c: lda r3, 16(r3) ; r3 = r3+16 0x120033150: ldt f22, -8(r3) ; Ld2 0x120033154: ldt f23, 0(r11) ; Ld2 0x120033158: cmple r27, 0x1, r1 ; 0x12003315c: lda r11, 16(r11) ; r11 = r11+16 0x120033160: ldt f24, -8(r11) ; Ld4 0x120033164: lds f31, 240(r11) ; prefetch 0x120033168: mult f20, f21, f21 ; 0x12003316c: mult f20, f22, f22 ; 0x120033170: addt f23, f21, f21 ; 0x120033174: addt f24, f22, f22 ; 0x120033178: stt f21, -16(r11) ; St1 0x12003317c: stt f22, -8(r11) ; St2 0x120033180: beq r1, 0x120033140 ; AddrLd1=_R3+16*i AddrLd2=_R11+16*i AddrSt1=_R11+16*i AddrSt2=_R11+16*i+8 Analysis window: 16 iterations Address range = _R11+(i-16)*16 to _R11+(i-1)*16+8 Ld2 statically determined to be safe Ld1 need run-time evaluation

10 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200610 General Safe Loads (Cont.) -Real example from a SPEC FP application New_entry: mark_sq if(r3-r11+8>0) or (r3-r11+264<0) then cset CR0, 1 0x120033144: sldt f21, 0(r3), [CR0]; Ld1 (safe) 0x12003314c: lda r3, 16(r3) ; r3 = r3+16 0x120033154: sldt f23, 0(r11), [CR_TRUE]; Ld2 (safe) 0x120033158: cmple r27, 0x1, r1 ; 0x12003315c: lda r11, 16(r11) ; r11 = r11+16 0x120033174: addt f24, f22, f22 ; 0x120033178: stt f21, -16(r11) ; St1 0x12003317c: stt f22, -8(r11) ; St2 Modified Code

11 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200611 Safe stores If it does not communicate with future loads Indirectly discover safe loads Un-analyzable store Load is safe if all stores in SQ are safe Summary of safe load detection Simple loop body All stores must be analyzable Address range calculation … Load (A) … Store1 (UA) … Store2 (A) … Branch Loop Body … Load (A) … Store1 (UA) … Store2 (A) … Branch … Load (A)... In-flight instructions

12 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200612 Architectural Support Safe loads Boolean condition registers cset (instruction) Safe stores Scope marker Indirect jumps Flash-reset all condition registers

13 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200613 Outline Cooperative Memory Disambiguation Framework Evaluation Conclusion

14 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200614 Experimental Setup Modified SimpleScalar 3.0b simulator Wattch to estimate dynamic energy consumption SPEC CPU2000 benchmark suite

15 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200615 Breakdown of Safe Loads (FP) 97% 43%

16 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200616 Performance Improvement (FP) 40/48%

17 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200617 Breakdown of Safe Loads (INT)

18 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200618 Performance Improvement (INT)

19 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200619 Energy Savings Floating-point applications Integer applications

20 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200620 Conclusions Software assistance improves LSQ efficiency Detects average 43% loads as safe Average 10% performance gain Compiler techniques for optimization of micro- architecture resources Future work More powerful static analyzer Manage other micro-architecture resources E.g., register file

21 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200621 Thank you! Questions?

22 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200622 Support for Coherency Hash Table: 2-bit Total entries: 512 Details: http://www.ece.rochester.edu/~mihuang/PAPERS/hpca06tr.pdf Table 1Table 2 Access bit Invalidation bit

23 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 200623 Read-Only Data Loads Alpha COFF binary header Global pointer (GP) Read-only sections Access address calculation Algorithm - extended constant propagation gp=0x120022000 Read-Only Section Start: 0x120023000 End: 0x120024000


Download ppt "2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok."

Similar presentations


Ads by Google