Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 MacSim Tutorial (In ISCA-39, 2012). Thread fetch policies Branch predictor Thread fetch policies Branch predictor Software and Hardware prefetcher Cache.

Similar presentations


Presentation on theme: "1 MacSim Tutorial (In ISCA-39, 2012). Thread fetch policies Branch predictor Thread fetch policies Branch predictor Software and Hardware prefetcher Cache."— Presentation transcript:

1 1 MacSim Tutorial (In ISCA-39, 2012)

2 Thread fetch policies Branch predictor Thread fetch policies Branch predictor Software and Hardware prefetcher Cache studies (sharing, inclusion) DRAM scheduling Interconnection studies Software and Hardware prefetcher Cache studies (sharing, inclusion) DRAM scheduling Interconnection studies Power model Front-endMemory SystemMisc. 2/8 MacSim Tutorial (In ISCA-39, 2012)

3 Memory System Trace Generator (PIN, GPUOCelot) Trace Generator (PIN, GPUOCelot) Hardware Prefetcher Frontend Software prefetch instructions PTX  prefetch, prefetchu x86  prefetcht0, prefetcht1, prefetchnta Hardware prefetch requests Stream, stride, GHB, … Many-thread Aware Prefetching Mechanism [Lee et al. MICRO-43, 2010] When prefetching works, when it doesn’t, and why [Lee et al. ACM TACO, 2012] MacSim 3/8 MacSim Tutorial (In ISCA-39, 2012)

4 |Cache studies – sharing, inclusion property |On-chip interconnection studies TLP-Aware Cache Management Policy [Lee and Kim, HPCA-18, 2012] $ $ $ $ $ $ $ $ $ $ $ $ $ $ Shared $ Interconnection Private Caches Interconnection Shared Cache 4/8 MacSim Tutorial (In ISCA-39, 2012)

5 |Heterogeneous link configuration Ring Network GPU CPU L3 MC Different topologies CCMM CCMM CCGG CCGG C0 L3 G0 M1 C1C2 G1G2 M0L3 C0 L3 G0 M1 C1C2 G1G2 M0L3 On-chip Interconnection for CPU-GPU Heterogeneous Architecture [Lee et al. under review] 5/8 MacSim Tutorial (In ISCA-39, 2012)

6 Execution Trace Generator (GPUOCelot) Trace Generator (GPUOCelot) Frontend Effect of Instruction Fetch and Memory Scheduling on GPU Performance [Lakshminarayana and Kim, LCA-GPGPU, 2010] DRAM RR, ICOUNT, FAIR, LRF, … FCFS, FRFCFS, FAIR, … 6/8 MacSim Tutorial (In ISCA-39, 2012)

7 DRAM Bank DRAM Controller Core-0 Core-1 Qs for Core-0 RH RM RH RM RH RM RH RM RH RM RH RM W0 W1 W2 W3 Tolerance(Core-0) < Tolerance(Core-1) Qs for Core-1 RH RM RH RM RH W0 W1 W2 W3 Potential of Requests from Core-0 = |W0| α + |W1| α + |W2| α + |W3| α = 4 α + 3 α + 5 α (α < 1) Reduction in potential if: row hit from queue of length L is serviced next  L α – (L – 1) α row hit from queue of length L is serviced next  L α – (L – 1/m) α m = cost of servicing row miss/cost of servicing row hit Potential of Requests from Core-0 = |W0| α + |W1| α + |W2| α + |W3| α = 4 α + 3 α + 5 α (α < 1) Reduction in potential if: row hit from queue of length L is serviced next  L α – (L – 1) α row hit from queue of length L is serviced next  L α – (L – 1/m) α m = cost of servicing row miss/cost of servicing row hit Tolerance(Core-0) < Tolerance(Core-1)  select Core-0 Servicing row hit from W1 (of Core-0) results in greatest reduction in potential, so service row hits from W1 next Tolerance(Core-0) < Tolerance(Core-1)  select Core-0 Servicing row hit from W1 (of Core-0) results in greatest reduction in potential, so service row hits from W1 next DRAM Scheduling Policy for GPGPU Architectures Based on a Potential Function [Lakshminarayana et al. IEEE CAL, 2011] 7/8 MacSim Tutorial (In ISCA-39, 2012)

8 |Verifying simulator and GTX580 |Modeling X86-CPU power |Modeling GPU power Still on-going research 8/8 MacSim Tutorial (In ISCA-39, 2012)

9 2012 ~ 2013 Power/Energy Model ARM Architecture Mobile Platform OpenGL Program MacSim Tutorial (In ISCA-39, 2012)


Download ppt "1 MacSim Tutorial (In ISCA-39, 2012). Thread fetch policies Branch predictor Thread fetch policies Branch predictor Software and Hardware prefetcher Cache."

Similar presentations


Ads by Google