Presentation is loading. Please wait.

Presentation is loading. Please wait.

“ NAHALAL : Cache Organization for Chip Multiprocessors ” New LSU Policy By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz.

Similar presentations


Presentation on theme: "“ NAHALAL : Cache Organization for Chip Multiprocessors ” New LSU Policy By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz."— Presentation transcript:

1 “ NAHALAL : Cache Organization for Chip Multiprocessors ” New LSU Policy By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz

2 NAHALAL ARCHTECTURE NAHALAL architecture defines the memory cache banks of the L2 cache. Each processor has a private backyard bank and all processors shared a small bank. The architecture is based on the hot shared line phenomenon.

3 LSU Improvement  Placement Policy  Replacement Policy from Private Bank : LRU  Replacement Policy from Public Bank : NAHALAL LRU X LSU LSU policy wisely select the Least Shared Used line to throw from the public bank.

4 LSU Implementation  Shift-register with N cells for each Line.  Each cell in the shift-register hold CPU num  In throwing by CPUi : For each shift-register do XOR between each cell and the ID of CPUi. The shift-register on which the XOR produce 0, will be the chosen one. If non produce 0 then do regular LRU.  In order ro reduce memory overhead, define N=4. Therefore 2 *4*3 = 0.1875MB  18.75% memory overhead. 14 Simple, short time algorithm in HW

5 Simulation Structure in Simics Using pyhton script we defined :

6 Writing Benchmarks Writing Benchmarks is done in the simulated target console :

7 Writing Benchmarks  Using Threads with pthread library  Each Thread is associated to a CPU using sched library.  Parallel code is written in the benchmark  Also OS code and pthread code cause to Parallel code.  Each benchmark we run first without LSU and second with LSU.

8 Collecting Statistics Cache statistics: l2c ----------------- Total number of transactions: 610349 Total memory stall time: 31402835 Total memory hit stall time: 28251635 Device data reads (DMA): 0 Device data writes (DMA): 0 Uncacheable data reads: 17 Uncacheable data writes: 30738 Uncacheable instruction fetches: 0 Data read transactions: 403488 Total read stall time: 17488735 Total read hit stall time: 14383135 Data read remote hits: 0 Data read misses: 10352 Data read hit ratio: 97.43% Instruction fetch transactions: 0 Instruction fetch misses: 0 Data write transactions: 176106 Total write stall time: 4687600 Total write hit stall time: 4687600 Data write remote hits: 0 Data write misses: 0 Data write hit ratio: 100.00% Copy back transactions: 0 Number of replacments in the middle (NAHALAL): 557

9 Results 1. Improvement of 54% in average stall time per transaction. 2. Improvement of 61% in average stall time per transaction. 3. 8.375% from the transactions cause a replacement in the middle without LSU, and with LSU only 0.09% ! Improvement of ∆=8.28% 4. 8.75% from the transactions cause a replacement in the middle without LSU, and with LSU only 0.02% ! Improvement of ∆=8.73% 1 2 3 4

10 Conclusions LSU policy significantly improve average stall time per transaction, Therefore : LSU Policy implemented in NAHALAL architecture significantly reduce number of cycles for a benchmark. LSU policy significantly reduce number of replacements in the middle, Therefore : LSU Policy implemented in NAHALAL architecture, better keep the hot shared lines in the public bank. According to our implementation, LRU is activated if LSU did not find a line, Therefore : LSU Policy as we implemented is always preferable then LRU.


Download ppt "“ NAHALAL : Cache Organization for Chip Multiprocessors ” New LSU Policy By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz."

Similar presentations


Ads by Google