Memory Network: Enabling Technology for Scalable Near-Data Computing Gwangsun Kim, John Kim Korea Advanced Institute of Science and Technology Jung Ho.

Slides:



Advertisements
Similar presentations
Tony Naaman Systems Architecture iDirect, USA
Advertisements

Memory-centric System Interconnect Design with Hybrid Memory Cubes Gwangsun Kim, John Kim Korea Advanced Institute of Science and Technology Jung Ho Ahn,
A Novel 3D Layer-Multiplexed On-Chip Network
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
1 Exploiting 3D-Stacked Memory Devices Rajeev Balasubramonian School of Computing University of Utah Oct 2012.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Multi-GPU System Design with Memory Networks
On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic.
LAN DESIGN. Functionality - the network must work with reasonable speed and reliability.
Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.
Network based System on Chip Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Dragonfly Topology and Routing
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
1 Energy Efficient Communication in Wireless Sensor Networks Yingyue Xu 8/14/2015.
Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Comp-TIA Standards.  AMD- (Advanced Micro Devices) An American multinational semiconductor company that develops computer processors and related technologies.
Interconnect Networks
On-Chip Networks and Testing
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
SEOUL NATIONAL UNIVERSITY DESIGN AUTOMATION LAB Issues on Designing Many-core Architectures Seokhyun Lee, Hanmin Park, Kyoung Hoon Kim, Jinho Lee and Junwhan.
1J. Kim Web Science & Technology Forum Enabling Hardware Technology for Web Science John Kim Department of Computer Science KAIST.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Javier Lira (Intel-UPC, Spain)Timothy M. Jones (U. of Cambridge, UK) Carlos Molina (URV, Spain)Antonio.
Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.
Architecture Examples And Hierarchy Samuel Njoroge.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
LIBRA: Multi-mode On-Chip Network Arbitration for Locality-Oblivious Task Placement Gwangsun Kim Computer Science Department Korea Advanced Institute of.
CSE Dept., (XHU) 1 The Salishan conference on High-Speed Computing No Free Lunch, No Hidden Cost X. Sharon Hu Dept. Computer Science and Engineering University.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
ShiDianNao: Shifting Vision Processing Closer to the Sensor
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation Sangyeun Cho and Lei Jin Dept. of Computer Science University of Pittsburgh.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
By Islam Atta Supervised by Dr. Ihab Talkhan
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
Challenges in the Next Generation Internet Xin Yuan Department of Computer Science Florida State University
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
PADS Power Aware Distributed Systems Architecture Approaches – Deployable Platforms & Reconfigurable Power-aware Comm. USC Information Sciences Institute.
Towards a Framework to Evaluate Performance of the NoCs Mahmoud Moadeli University of Glasgow.
Building manycore processor-to-DRAM networks using monolithic silicon photonics Ajay Joshi †, Christopher Batten †, Vladimir Stojanović †, Krste Asanović.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
Boris Grot, Joel Hestness, Stephen W. Keckler 1 The University of Texas at Austin 1 NVIDIA Research Onur Mutlu Carnegie Mellon University.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
NoCVision: A Network-on-Chip Dynamic Visualization Solution
F1-17: Architecture Studies for New-Gen HPC Systems
Lynn Choi School of Electrical Engineering
Daniel King, Old Dog Consulting Adrian Farrel, Old Dog Consulting
Daniel King, Old Dog Consulting Adrian Farrel, Old Dog Consulting
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Buffered Compares: Excavating the Hidden Parallelism inside DRAM Architectures with Lightweight Logic Jinho Lee, Kiyoung Choi, and Jung Ho Ahn Seoul.
Accelerating Linked-list Traversal Through Near-Data Processing
Accelerating Linked-list Traversal Through Near-Data Processing
Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor
Interconnection Network Design Lecture 14
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
The Memory-Processor Gap
Advanced Computer Architecture 5MD00 Project on Network-on-Chip
RECONFIGURABLE NETWORK ON CHIP ARCHITECTURE FOR AEROSPACE APPLICATIONS
Active-Routing: Compute on the Way for Near-Data Processing
Presentation transcript:

Memory Network: Enabling Technology for Scalable Near-Data Computing Gwangsun Kim, John Kim Korea Advanced Institute of Science and Technology Jung Ho Ahn Seoul National University Yongkee Kwon SK Hynix

Memory Network I/O port … Vault controller I/O port Intra-HMC Network Vault controller … Logic layer High-speed link DRAM layers Vault Hybrid Memory Cube (HMC) 2/10  “Near”-data processing with multiple memories?  “Far”-data?  Memory network enables scalable near-data computing. Data AData B “compute A+B” Data B

DIVA Processing-in-Memory (PIM) Chip Draper et al., “The architecture of the DIVA processing-in-memory chip”, ICS’02  For multimedia and irregular applications.  Proposed memory network for PIM modules.  Simple low-dimensional network (e.g., ring )  High packet hop count  performance & energy inefficiency  Advanced technology is available – high off-chip bandwidth 3/10

Memory Networks from Micron D. R. Resnick, “Memory Network Methods, Apparatus, and Systems,” US Patent Application Publication, US A1, D Mesh topology 4/10 Local memories Network-attached memories

Memory Network Design Issues  Difficult to leverage high-radix topology – Low-radix vs. high-radix topology – High-radix topology  smaller network diameter – Limited # of ports in memory modules.  Adaptive routing requirement – Can increase network cost – Depends on traffic pattern, memory mapping, etc. 5/10 Low-radix networks High-radix networks

Memory-centric Network  Host-memory bandwidth still matters. – To support conventional applications while adopting NDP. – NDP involves communication with host processors.  MCN Leverage the same network for NDP. Network … Processor-centric Network (PCN) (e.g., Intel QPI, AMD HyperTransport) … … CPU Network … … … CPU Memory-centric Network (MCN) [PACT’13] Memory BW Processor-to processor BW Flexible BW utilization Separate network required for NDP The same network can be used for NDP 6/10

Memory Network for Heterogeneous NDP  NDP for not only CPU, but also for GPU.  Unified memory network for multi-GPU systems [MICRO’14].  Extending the memory network for heterogeneous NDP. CPUGPU … … Unified Memory Network … … … … … …… … FPGA … … … 7/10

Hierarchical Network  With intra-HMC network, the memory network is a hierarchical network.  NDP requires additional processing elements at the logic layer.  Need to support various types of traffic – Local (on-chip) traffic vs. global traffic – Conventional memory access traffic vs. NDP-induced traffic DRAM (stacked) Hybrid Memory Cube Vault controller On-chip channel I/O port Concentrated Mesh-based intra-HMC network [PACT’13] 8/10

Issues with Memory Network-based NDP  Power management – Large number of channels possible in memory network – Power-gating, DVFS, and other circuit-level techniques.  Data placement & migration – Optimal placement of shared data – Migration within memory network  Consistency & coherence – Direct memory access by multiple processors – Heterogeneous processors 9/10

Summary  Memory network can enable scalable near-data processing.  Leveraging recent memory network researches – Memory-centric network [PACT’13] – Unified memory network [MICRO’14]  Intra-HMC design considerations  Further issues 10/10