Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong.

Slides:



Advertisements
Similar presentations
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
Advertisements

1 Overview Assignment 5: hints  Garbage collection Assignment 4: solution.
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
File Systems.
Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University
A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.
Chapter 11: File System Implementation
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong.
Memory Arithmetic Unit Interface Jason M. Meier Justin S. Teller Tom J. Keeley.
Embedded DRAM for a Reconfigurable Array S.Perissakis, Y.Joo 1, J.Ahn 1, A.DeHon, J.Wawrzynek University of California, Berkeley 1 LG Semicon Co., Ltd.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Memory Management Ch.8.
ITEC 325 Lecture 29 Memory(6). Review P2 assigned Exam 2 next Friday Demand paging –Page faults –TLB intro.
Defining Anomalous Behavior for Phase Change Memory
1 Memory Management Memory Management COSC513 – Spring 2004 Student Name: Nan Qiao Student ID#: Professor: Dr. Morteza Anvari.
Comparing Memory Systems for Chip Multiprocessors Leverich et al. Computer Systems Laboratory at Stanford Presentation by Sarah Bird.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
Embedded System Lab. 최 길 모최 길 모 Kilmo Choi A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Operating Systems CMPSC 473 Virtual Memory Management (4) November – Lecture 22 Instructor: Bhuvan Urgaonkar.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
Page 112/7/2015 CSE 30341: Operating Systems Principles Chapter 11: File System Implementation  Overview  File system structure – layered, block based.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
CS333 Intro to Operating Systems Jonathan Walpole.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Memory Management Overview.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
CSC 360, Instructor Kui Wu Memory Management I: Main Memory.
Using Uncacheable Memory to Improve Unity Linux Performance
Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.
Virtual Memory Pranav Shah CS147 - Sin Min Lee. Concept of Virtual Memory Purpose of Virtual Memory - to use hard disk as an extension of RAM. Personal.
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
The Universal Machine (UM) Implementing the UM Noah Mendelsohn Tufts University Web:
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
Virtual Memory By CS147 Maheshpriya Venkata. Agenda Review Cache Memory Virtual Memory Paging Segmentation Configuration Of Virtual Memory Cache Memory.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
Virtual Memory (Section 9.3). The Need For Virtual Memory Many computers don’t have enough memory in RAM to accommodate all the programs a user wants.
Virtual Memory Chapter 8.
Jonathan Walpole Computer Science Portland State University
File-System Implementation
FileSystems.
Day 21 Virtual Memory.
Day 22 Virtual Memory.
Filesystems.
Improving java performance using Dynamic Method Migration on FPGAs
CSCE 212 Chapter 4: Assessing and Understanding Performance
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Memory Management Overview
Virtual Memory Hardware
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CMSC 611: Advanced Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Virtual Memory: Working Sets
Paging Memory Relocation and Fragmentation Paging
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Chapter 1: Introduction CSS503 Systems Programming
Computer Architecture Lecture 30: In-memory Processing
Presentation transcript:

Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong

Outline Memory System Overview Related work Experiment setup Page level access measurements Solution Expected Speedup

Processor-Memory Gap µProc 60% / year. Doubles every 1.5years DRAM 9% / year. Doubles every 10 years Processor-Memory Performance Gap: Grows 50% / year

Memory Access Time Core L1 L2 MC DRAM CPU Access Time (cycles) L13 L28 DRAM181 Data for 1.8GHz Opteronwww.aceshardware.com/

Large Size Memory Accesses Applications –Initialization –Data Movement –Stream operations Operating System –Task Creation –System Calls –Page Allocation, Management Functions that would use them –Memset, Clear User –Memcpy, Copy from User, Copy To User

Experiment Setup Workstation based – 2.4 GHz P4 (Wonko) – 750MHz PIII (Majikthise) – 900 MHz P III (Jaleel) Bochs x86 emulator Operating System –Linux Kernel v Applications –SPEC2000 Integer benchmarks using glibc-2.2.5

Memset : Count

Memset : Access Size

Memset : % Overhead

Memcpy: Count

Memcpy : Access Size

Memcpy: % Overhead

OS : Memset / Clear User Real-Time Plot Behavior over Time Frequency of operation Access Size Operation Duration Averages

OS : Memcpy / Copy User Real-Time Plot Behavior over Time Frequency of operation Access Size Operation Duration Averages

Page based Commands Set Page –A  constant Copy Page –A  B Page level Arithmetic operations –A  B + C –A  B - C

Page based Commands 4 kB DRAM SETPAGE ZERO, 0x04000

Page based Commands 4 kB DRAM SETPAGE ZERO, 0x04000 Cache 128 bytes

Page based Commands Issue 4 kB DRAM SETPAGE ZERO, 0x04000 Cache How do we ensure Memory and Cache Consistency? 128 bytes

How much data is actually in the cache ? Function% Hit Rate Boot + Halt % Hit Rate SPEC workload Memset7.23%0.23 Memcpy ( Source) % Memcpy (Destination)< 0.01 %

Page based Commands 4 kB DRAM SETPAGE ZERO, 0x04000

Page based Commands Issue SETPAGE ZERO, 0x kB DRAM 4 kB DRAM level Page Fragmentation

Page based Commands Issue SETPAGE ZERO, 0x kB DRAM 4 kB DRAM level Page Fragmentation Maximum number of rows a page can occupy is 2

Solution Hardware at Cache Level Ability to map s/w pages to h/w pages

Expected Speedup I Current Implementation EndAddr  Addr + Length While ( Address < EndAddr) Mem[Address]  SetValue Address  Address + 1 Memset( Address, Length, SetValue) Proposed Implementation While (Length >= PageSize) SetPage (SetValue, Address) Length  Length – PageSize Address  Address + Length Call Memset ( Address, Length, SetValue)

Expected Speedup II Current Memset Time for a page : 4  s Expected Memset Time for a page = # Rows in a page * Time to read a Row + +Cache Coherence Logic + Misc = 2 * 100 ns + X = 200 ns + X

Related Work IRAM – On-chip DRAM –Advantage: bigger storage, eliminates much of the off-chip memory access, energy efficient –Disadvantage: not much performance increase, doesn’t work with conventional microprocessors Active page – bring computation to DRAM –break the memory into fixed page-size and add reconfigurable logic to DRAM Heap paper shows some memory accesses that can be eliminated entirely

Conclusion Page- based commands are necessary.