Embedded System Lab. 최 길 모최 길 모 Kilmo Choi A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore.

Slides:

Advertisements

Similar presentations

Zehan Cui, Yan Zhu, Yungang Bao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011.

Advertisements

A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State.

Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.

Orchestrated Scheduling and Prefetching for GPGPUs Adwait Jog, Onur Kayiran, Asit Mishra, Mahmut Kandemir, Onur Mutlu, Ravi Iyer, Chita Das.

Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.

PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.

MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isolation in Multi-core Platforms Apr 9, 2012 Heechul Yun +, Gang Yao +, Rodolfo.

Understanding a Problem in Multicore and How to Solve It

Parallelism-Aware Batch Scheduling Enhancing both Performance and Fairness of Shared DRAM Systems Onur Mutlu and Thomas Moscibroda Computer Architecture.

1 Multi-Core Systems CORE 0CORE 1CORE 2CORE 3 L2 CACHE L2 CACHE L2 CACHE L2 CACHE DRAM MEMORY CONTROLLER DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank.

Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)

1 Coordinated Control of Multiple Prefetchers in Multi-Core Systems Eiman Ebrahimi * Onur Mutlu ‡ Chang Joo Lee * Yale N. Patt * * HPS Research Group The.

ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.

Improving Real-Time Performance on Multicore Platforms Using MemGuard University of Kansas Dr. Heechul Yun 10/28/2013.

1 Lecture 4: Memory: HMC, Scheduling Topics: BOOM, memory blades, HMC, scheduling policies.

Comp-TIA Standards.  AMD- (Advanced Micro Devices) An American multinational semiconductor company that develops computer processors and related technologies.

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,

Jiang Lin 1, Qingda Lu 2, Xiaoning Ding 2, Zhao Zhang 1, Xiaodong Zhang 2, and P. Sadayappan 2 Gaining Insights into Multi-Core Cache Partitioning: Bridging.

A Lightweight Hybrid Hardware/Software Approach for Object-Relative Memory Profiling Licheng Chen, Zehan Cui, Yungang Bao, Mingyu Chen, Yongbing Huang,

StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache Hyunjin Lee Sangyeun Cho Bruce R. Childers Dept. of Computer Science University.

Stall-Time Fair Memory Access Scheduling Onur Mutlu and Thomas Moscibroda Computer Architecture Group Microsoft Research.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.

Timing Channel Protection for a Shared Memory Controller Yao Wang, Andrew Ferraiuolo, G. Edward Suh Feb 17 th 2014.

Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee

Thread Cluster Memory Scheduling : Exploiting Differences in Memory Access Behavior Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter.

1 Coordinating Accesses to Shared Caches in Multi-core Processors Software Approach Xiaodong Zhang Ohio State University Collaborators: Jiang Lin, Zhao.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Managing Distributed, Shared L2 Caches through OS-Level Page Allocation Jason Bosko March 5 th, 2008 Based on “Managing Distributed, Shared L2 Caches through.

Memory System Performance in a NUMA Multicore Multiprocessor Zoltan Majo and Thomas R. Gross Department of Computer Science ETH Zurich 1.

Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Dimitris Kaseridis +, Jeffrey Stuecheli *+, and.

Analyzing Performance Vulnerability due to Resource Denial-Of-Service Attack on Chip Multiprocessors Dong Hyuk WooGeorgia Tech Hsien-Hsin “Sean” LeeGeorgia.

By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim

HPCA Laboratory for Computer Architecture1/11/2010 Dimitris Kaseridis 1, Jeff Stuecheli 1,2, Jian Chen 1 & Lizy K. John 1 1 University of Texas.

Computer Organization CS224 Fall 2012 Lessons 45 & 46.

Trading Cache Hit Rate for Memory Performance Wei Ding, Mahmut Kandemir, Diana Guttman, Adwait Jog, Chita R. Das, Praveen Yedlapalli The Pennsylvania State.

Parallelism-Aware Batch Scheduling Enhancing both Performance and Fairness of Shared DRAM Systems Onur Mutlu and Thomas Moscibroda Computer Architecture.

Embedded System Lab. 오명훈 Addressing Shared Resource Contention in Multicore Processors via Scheduling.

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

1 Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors.

1 Lecture 3: Memory Buffers and Scheduling Topics: buffers (FB-DIMM, RDIMM, LRDIMM, BoB, BOOM), memory blades, scheduling policies.

15-740/ Computer Architecture Lecture 18: Caching in Multi-Core Prof. Onur Mutlu Carnegie Mellon University.

RTAS 2014 Bounding Memory Interference Delay in COTS-based Multi-Core Systems Hyoseung Kim Dionisio de Niz Bj ӧ rn Andersson Mark Klein Onur Mutlu Raj.

Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.

1 Lecture 4: Memory Scheduling, Refresh Topics: scheduling policies, refresh basics.

Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems Tsinghua University Tsinghua National Laboratory for Information.

Cache memory. Cache memory Overview CPU Cache Main memory Transfer of words Transfer of blocks of words.

Current Generation Hypervisor Type 1 Type 2.

Reducing Memory Interference in Multicore Systems

A Staged Memory Resource Management Method for CMP Systems

Zhichun Zhu Zhao Zhang ECE Department ECE Department

Improving Memory Access 1/3 The Cache and Virtual Memory

ISPASS th April Santa Rosa, California

18742 Parallel Computer Architecture Caching in Multi-core Systems

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Achieving High Performance and Fairness at Low Cost

Module IV Memory Organization.

Jianbo Dong, Lei Zhang, Yinhe Han, Ying Wang, and Xiaowei Li

Miss Rate versus Block Size

Manjunath Shevgoor, Rajeev Balasubramonian, University of Utah

15-740/ Computer Architecture Lecture 19: Main Memory

Presented by Florian Ettinger

Presentation transcript:

Embedded System Lab. 최 길 모최 길 모 Kilmo Choi A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, Chengyong Wu

Embedded System Lab. 최 길 모최 길 모 Contents Background and Motivation Bank-Level Partition Mechanism(BPM) Results Conclusion Reference

Embedded System Lab. 최 길 모최 길 모 Background and Motivation Memory bank  The same set of memory access speed Multicore platform

Embedded System Lab. 최 길 모최 길 모 Background and Motivation Bank-Level Parallelism(BLP) and Bank Sharing  Multiple banks can serve memory requests concurrently and independently  Memory system usually employs a bank-interleaved address mapping schema Memory interference on multicore platform  Causes performance degradation(throughput slowdown and unfairness )  ex. row buffer hit rate decrease from 1 core(over 60%) to 16 core(35%) CoreCore MC CoreCore Bank row buffer conflict row buffer conflict

Embedded System Lab. 최 길 모최 길 모 Background and Motivation Numerous new memory scheduling algorithms have been proposed to address the interference problem  However, these algorithms usually employ complex scheduling logic and need hardware modification to memory controllers Bank-level conflicts can be fully eliminated by exclusively mapping a thread’s data to specific banks  How much influence the performance of thread amount of available bank?

Embedded System Lab. 최 길 모최 길 모 Bank-Level Partition Mechanism(BPM) Overview of BPM  OS memory management system uses a page-coloring mechanism to partition banks into several groups and maps each thread (process) to a specific bank group  Address mapping policy Advantages  row buffer conflict ↓ row buffer hit ↑  BPM is entirely software approach  Flexible  Easier for OS to monitor thread’s behavior than hardware

Embedded System Lab. 최 길 모최 길 모 Bank-Level Partition Mechanism(BPM) Discover bank bits by software method

Embedded System Lab. 최 길 모최 길 모 Results Environments  4 cores, 2.8GHz Intel Core i7-860 processor, 8GB DDR3 main memory  CentOS Linux 5.4 with kernel  SPEC CPU2006

Embedded System Lab. 최 길 모최 길 모 Results Overall system performance

Embedded System Lab. 최 길 모최 길 모 Results Page-Policy and Power

Embedded System Lab. 최 길 모최 길 모 Results BPM VS Cache-Partition-Only The correlation between BPM improvements and Per-core bandwidth

Embedded System Lab. 최 길 모최 길 모 Reference J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In HPCA-14, Dimitris Kaseridis, Jeffrey Stuecheli, Lizy Kurian John. Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many- core Era. In MICRO 44, 2011