Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri.

Slides:

Advertisements

Similar presentations

Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu

Advertisements

Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.

55:035 Computer Architecture and Organization Lecture 7 155:035 Computer Architecture and Organization.

Appendix B. Memory Hierarchy CSCI/ EENG – W01 Computer Architecture 1 Dr. Babak Beheshti Slides based on the PowerPoint Presentations created by.

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,

MonetDB: A column-oriented DBMS Ryan Johnson CSC2531.

1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.

Memory access scheduling Authers: Scott RixnerScott Rixner,William J. Dally,Ujval J. Kapasi, Peter Mattson, John D. OwensWilliam J. DallyUjval J. KapasiPeter.

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.

Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Session 2B – 10:45 AM Vivek Seshadri Gennady Pekhimenko, Olatunji.

The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.

Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.

Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,

CUDA Performance Considerations (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†

Micro 2012 Closing Remarks Onur Mutlu PC Chair December 3, 2012 Vancouver, BC, Canada.

Exploiting Compressed Block Size as an Indicator of Future Reuse

The Evicted-Address Filter

Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, Amirali Boroumand,

Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.

DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.

Simple DRAM and Virtual Memory Abstractions for Highly Efficient Memory Systems Thesis Oral Committee: Todd Mowry (Co-chair) Onur Mutlu (Co-chair) Phillip.

A Case for Toggle-Aware Compression for GPU Systems

Computer Architecture Lecture 12: Virtual Memory I

CS 704 Advanced Computer Architecture

Improving Cache Performance using Victim Tag Stores

Vivek Seshadri 15740/18740 Computer Architecture

18-447: Computer Architecture Lecture 23: Caches

A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Session 2A: Today, 10:20 AM Nandita Vijaykumar.

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Nandita Vijaykumar Gennady Pekhimenko, Adwait.

Prof. Onur Mutlu and Gennady Pekhimenko Carnegie Mellon University

Prof. Gennady Pekhimenko University of Toronto Fall 2017

Prof. Zhang Gang School of Computer Sci. & Tech.

(Find all PTEs that map to a given PPN)

The UK Tier 1 Entrepreneur Visa and the UK Representative of Overseas Business Visa - SmartMove2UK

Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.

Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, Onur Mutlu

Rachata Ausavarungnirun GPU 2 (Virginia EF) Tuesday 2PM-3PM

Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, Onur Mutlu

Lecture 21: Memory Hierarchy

MASK: Redesigning the GPU Memory Hierarchy

Accelerating Dependent Cache Misses with an Enhanced Memory Controller

Today, DRAM is just a storage device

Ambit In-memory Accelerator for Bulk Bitwise Operations

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Lecture 23: Cache, Memory, Virtual Memory

A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap with Expressive Memory Nandita Vijaykumar Abhilasha Jain, Diptesh Majumdar, Kevin.

CSC D70: Compiler Optimization Memory Optimizations

Lecture 22: Cache Hierarchies, Memory

Application Slowdown Model

Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

Lecture 22: Cache Hierarchies, Memory

Session 1A at am MEMCON Detecting and Mitigating

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.

Lecture 24: Memory, VM, Multiproc

A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap with Expressive Memory Nandita Vijaykumar Abhilasha Jain, Diptesh Majumdar, Kevin.

Lecture 20: OOO, Memory Hierarchy

Accelerating Dependent Cache Misses with an Enhanced Memory Controller

Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,

Recitation 6: Cache Access Patterns

Lecture 22: Cache Hierarchies, Memory

15-740/ Computer Architecture Lecture 19: Main Memory

Lecture 21: Memory Hierarchy

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.

CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators

RAIDR: Retention-Aware Intelligent DRAM Refresh

Prof. Onur Mutlu ETH Zürich Fall September 2018

A Case for Richer Cross-layer Abstractions:

Computer Architecture Lecture 30: In-memory Processing

Presentation transcript:

Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri Thomas Mullins, Amirali Boroumand, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry

Problem: Non-unit strided accesses

Problem: Non-unit strided accesses READ Today’s DRAM Cache Line Inefficiency: High latency , wasted bandwidth and cache space

Problem: Non-unit strided accesses READ Today’s DRAM Cache Line Inefficiency: High latency , wasted bandwidth and cache space Gather-Scatter DRAM READ Pattern 0 Pattern 1

Example result Problem: Non-unit strided accesses READ Today’s DRAM Cache Line Inefficiency: High latency , wasted bandwidth and cache space Gather-Scatter DRAM READ Pattern 0 Pattern 1 In-memory databases Best of both row store and column store layouts Example result

Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri Thomas Mullins, Amirali Boroumand, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry