Presentation is loading. Please wait.

Presentation is loading. Please wait.

Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong.

Similar presentations


Presentation on theme: "Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong."— Presentation transcript:

1 Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong

2 Outline Memory System Overview Experiment setup Page level access measurements Solution Expected Speedup

3 Memory Access Time CPU L1 L2 MC DRAM CPU Access Time (cycles) L13 L28 DRAM181 Data for 1.8GHz Opteronwww.aceshardware.com/

4 Memory Access Applications Initialization Data Movement Stream operations Operating System Task Creation System Calls Page Allocation, Management Library Routines Used: Memset, Clear User (MEMZERO) Memcpy, Copy from User, Copy To User

5 Experiment Setup Workstation based 2.4 GHz P4 (wonko.sca.umd.edu) 750 MHz PIII (majikthise.eng.umd.edu) 900 MHz PIII (jaleel.eng.umd.edu) Bochs x86 emulator Operating System Mandrake 9.0 Linux Kernel v2.4.19 Applications SPEC2000 Integer benchmarks using glibc-2.2.5

6 Using In The Resources CPU CORE IL1DL1 UNIFIED L2 Mandrake Linux 9.0 KERNEL MEM CNTRLR UPROC SYSTEM LIBRARIES BOCHS DRAM User Level Routines Kernel Level Routines Running Same OS SW HW

7 MEMSET – SPECINT 2000

8 MEMSET Overhead – SPECINT 2000

9 MEMCPY – SPECINT 2000

10 MEMCPY Overhead – SPECINT 2000

11 OS Behavior: MEMZERO/MEMCPY SHOW LIVE DATA

12 Page based Commands SET_PAGE #(CONS), #ADDR, #(SIZE) ADDR  CONS COPY_PAGE #DST, #SRC, #(SIZE) DST  SRC Page level stream operations A  B + C A  B - C

13 Issues w/Page Based Commands Data partially present in cache? Cache-Memory Consistency Issues SET_PAGE Add logic in cache to latch in data If cache block dirty, write to memory COPY_PAGE If destination in cache, evict Address is not page aligned Will require accessing 2 rows SET_PAGE #(CONS), #ADDR, #(SIZE) ~~~ ~~~~ COPY_PAGE #DST, #SRC, #(SIZE) ~~~~ SET_PAGE #(CONS), #ADDR, #(SIZE) ~~~~ Instruction Stream

14 How much data is actually in the cache ? Function% Hit Rate Boot + Halt % Hit Rate SPEC workload Memset7.23%0.23 Memcpy ( Source)7.8810.53% Memcpy (Destination)< 0.01 %

15 Page Based MEMSET Proposed Implementation end  s + n while ( n >= PageSize) SET_PAGE (c), s, n n  n – PageSize s  s + n while ( s < end) MEM[ s++ ]  c void *memset( void *s, int c, size_t n) Current Implementation end  s + n while ( s < end) MEM[ s++ ]  c

16 Expected Speedup Avg Memset Time For 4KB Page with 128 byte cache line size: Row Read Time * #Rows + Misc = 100 ns * 32 + X = 3.2 + X  s Measured Average: 4  s Expected Time Using Page Based CMDs Max # Rows/page * Row Read Time + Cache Coherence Logic + Misc = 2 * 100 ns + X = 200 ns + X Expected Speedup: >= 50% (Approximation)

17 Conclusions Memory accessed frequently on a page granularity Use page based commands to replace existing routines that perform work on a cache line basis If implemented, we are looking to expect significant speedups

18 Related Work IRAM – On-chip DRAM Advantage: energy efficient, eliminates much of the off-chip memory access Disadvantage: not much performance increase, doesn’t work with conventional microprocessors Active page – bring computation to DRAM break the memory into fixed page-size and add reconfigurable logic to DRAM Elimination of compulsory cache misses due to dynamic initialization


Download ppt "Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong."

Similar presentations


Ads by Google