Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri.

Slides:



Advertisements
Similar presentations
Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu
Advertisements

Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.
55:035 Computer Architecture and Organization Lecture 7 155:035 Computer Architecture and Organization.
Appendix B. Memory Hierarchy CSCI/ EENG – W01 Computer Architecture 1 Dr. Babak Beheshti Slides based on the PowerPoint Presentations created by.
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,
MonetDB: A column-oriented DBMS Ryan Johnson CSC2531.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Memory access scheduling Authers: Scott RixnerScott Rixner,William J. Dally,Ujval J. Kapasi, Peter Mattson, John D. OwensWilliam J. DallyUjval J. KapasiPeter.
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Session 2B – 10:45 AM Vivek Seshadri Gennady Pekhimenko, Olatunji.
The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.
Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,
CUDA Performance Considerations (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†
Micro 2012 Closing Remarks Onur Mutlu PC Chair December 3, 2012 Vancouver, BC, Canada.
Exploiting Compressed Block Size as an Indicator of Future Reuse
The Evicted-Address Filter
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, Amirali Boroumand,
Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.
DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.
Simple DRAM and Virtual Memory Abstractions for Highly Efficient Memory Systems Thesis Oral Committee: Todd Mowry (Co-chair) Onur Mutlu (Co-chair) Phillip.
A Case for Toggle-Aware Compression for GPU Systems
Computer Architecture Lecture 12: Virtual Memory I
CS 704 Advanced Computer Architecture
Improving Cache Performance using Victim Tag Stores
Vivek Seshadri 15740/18740 Computer Architecture
18-447: Computer Architecture Lecture 23: Caches
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Session 2A: Today, 10:20 AM Nandita Vijaykumar.
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Nandita Vijaykumar Gennady Pekhimenko, Adwait.
Prof. Onur Mutlu and Gennady Pekhimenko Carnegie Mellon University
Prof. Gennady Pekhimenko University of Toronto Fall 2017
Prof. Zhang Gang School of Computer Sci. & Tech.
(Find all PTEs that map to a given PPN)
The UK Tier 1 Entrepreneur Visa and the UK Representative of Overseas Business Visa - SmartMove2UK
Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.
Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, Onur Mutlu
Rachata Ausavarungnirun GPU 2 (Virginia EF) Tuesday 2PM-3PM
Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, Onur Mutlu
Lecture 21: Memory Hierarchy
MASK: Redesigning the GPU Memory Hierarchy
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Today, DRAM is just a storage device
Ambit In-memory Accelerator for Bulk Bitwise Operations
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Lecture 23: Cache, Memory, Virtual Memory
A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap with Expressive Memory Nandita Vijaykumar Abhilasha Jain, Diptesh Majumdar, Kevin.
CSC D70: Compiler Optimization Memory Optimizations
Lecture 22: Cache Hierarchies, Memory
Application Slowdown Model
Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu
Lecture 22: Cache Hierarchies, Memory
Session 1A at am MEMCON Detecting and Mitigating
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
Lecture 24: Memory, VM, Multiproc
A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap with Expressive Memory Nandita Vijaykumar Abhilasha Jain, Diptesh Majumdar, Kevin.
Lecture 20: OOO, Memory Hierarchy
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,
Recitation 6: Cache Access Patterns
Lecture 22: Cache Hierarchies, Memory
15-740/ Computer Architecture Lecture 19: Main Memory
Lecture 21: Memory Hierarchy
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators
RAIDR: Retention-Aware Intelligent DRAM Refresh
Prof. Onur Mutlu ETH Zürich Fall September 2018
A Case for Richer Cross-layer Abstractions:
Computer Architecture Lecture 30: In-memory Processing
Presentation transcript:

Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri Thomas Mullins, Amirali Boroumand, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry

Problem: Non-unit strided accesses

Problem: Non-unit strided accesses READ Today’s DRAM Cache Line Inefficiency: High latency , wasted bandwidth and cache space

Problem: Non-unit strided accesses READ Today’s DRAM Cache Line Inefficiency: High latency , wasted bandwidth and cache space Gather-Scatter DRAM READ Pattern 0 Pattern 1

Example result Problem: Non-unit strided accesses READ Today’s DRAM Cache Line Inefficiency: High latency , wasted bandwidth and cache space Gather-Scatter DRAM READ Pattern 0 Pattern 1 In-memory databases Best of both row store and column store layouts Example result

Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri Thomas Mullins, Amirali Boroumand, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry