7 December 2011Powering Smartphone SoCs with Multichannel TSV DRAMs1 3D Mania: Powering Smartphone SoCs with Multichannel TSV DRAMs Drew Wingard, Sonics.

Slides:



Advertisements
Similar presentations
Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip.
Advertisements

3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
Tableau Software Australia
Multi-core SoC Design is the Challenge! What is the Solution? Drew Wingard CTO Sonics, Inc.
Distributed Processing, Client/Server and Clusters
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
Processing Efficiency Jonah Probell Multimedia Systems Engineer Tensilica Truly Understanding Low-Power Multimedia Chip Design.
Ascendent's Fusion 360 hybrid platform creates a true hybrid surveillance system by utilizing the advantages of Analog, Megapixel, and IP technologies.
Recent Progress In Embedded Memory Controller Design
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.
Reporter :LYWang We propose a multimedia SoC platform with a crossbar on-chip bus which can reduce the bottleneck of on-chip communication.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
NETWORK LOAD BALANCING NLB.  Network Load Balancing (NLB) is a Clustering Technology.  Windows Based. (windows server).  To scale performance, Network.
Distributed Processing, Client/Server, and Clusters
University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Hardware Overview Net+ARM – Well Suited for Embedded Ethernet
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Computer performance.
Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January
COnvergence of fixed and Mobile BrOadband access/aggregation networks Work programme topic: ICT Future Networks Type of project: Large scale integrating.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
The Way Forward Factors Driving Video Conferencing Dr. Jan Linden, VP of Engineering Global IP Solutions.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
The World Leader in High Performance Signal Processing Solutions Multi-core programming frameworks for embedded systems Kaushal Sanghai and Rick Gentile.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Aditya Dayal M. Tech, VLSI Design ITM University, Gwalior.
3.2.2 SRAM Memories – Design and interfacing to 16-bit memories
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Hands On SoC FPGA Design
Microarchitecture.
How to Quick Start Virtual Platform Development
ISPASS th April Santa Rosa, California
System On Chip.
The University of Adelaide, School of Computer Science
Architecture & Organization 1
Parallel Algorithm Design
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Short Circuiting Memory Traffic in Handheld Platforms
Lecture 15: DRAM Main Memory Systems
Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor
Multi-Processing in High Performance Computer Architecture:
Architecture & Organization 1
DRAM Bandwidth Slide credit: Slides adapted from
Lecture: Cache Innovations, Virtual Memory
Example of usage in Micron Italy (MIT)
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Introduction to Heterogeneous Parallel Computing
15-740/ Computer Architecture Lecture 19: Main Memory
Lecture: Cache Hierarchies
Presentation transcript:

7 December 2011Powering Smartphone SoCs with Multichannel TSV DRAMs1 3D Mania: Powering Smartphone SoCs with Multichannel TSV DRAMs Drew Wingard, Sonics

7 December 2011Powering Smartphone SoCs with Multichannel TSV DRAMs2 Concurrency in Smartphone SoCs H.264 Decode Audio Decode Video Out Transport demux Smartphone SoCs process data in parallel, but communicate…

7 December 2011Powering Smartphone SoCs with Multichannel TSV DRAMs3 Concurrency in Smartphone SoCs H.264 Decode Audio Decode Video Out Transport demux DRAM

6 July 2011MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.4 Smartphone Problems and Opportunities ■Problems in current systems In mobile SoCs, DRAM bandwidth needs grow faster than capacity needs Scaling DRAM bandwidth requires extra DRAMs o And power-hungry PHYs Wider DRAM interfaces & deeper pipelining increases access granularity, driving need for multichannel approaches o Which causes extra pin costs (after 2 channels) Current DRAM interfaces are a bottleneck between o Lots of parallel initiators (data clients) o Lots of parallel DRAM banks (data servers) ■What if we could get rid of the interface bottleneck? And be limited by DRAM bank bandwidths, instead Without needing fancy PHYs

6 July 2011MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.5 Wide I/O: TSV-enabled Mobile DRAM ■Pending JEDEC standard (JC 42.6) ■High bandwidth, even at low capacity Like 4 (x32) channels of LPDDR GBytes/sec peak bandwidth ■Lowest power No PHY (simple drivers/receivers, no PLL/DLL) Low loading (capacitance and inductance) Modest frequency (200 MHz) ■Smallest form factor 3D stacked based on thin (50µm) TSV-based die ■Minimal change to DRAM design (ex-TSV) ■Risks are all TSV-related (cost, yield, biz model) ■Smartphone market will drive volumes

6 July 2011MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.6 Wide I/O Power Estimates Sophie Dumas, ST-E JEDEC Mobile Memory Forum June 2011 (Taiwan)

6 July 2011MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.7 Multichannel DRAM System Challenges Application View Physical Organization Key Problems: ■Load balancing Must balance memory traffic evenly among channels ■Maintaining throughput Multiple channels cause throughput & ordering issues for pipelined memories Software and IP cores must manage multiple channels explicitly

7 December 2011Powering Smartphone SoCs with Multichannel TSV DRAMs8 Locality Challenges: DRAM Access Styles ■Exploiting spatial locality is key for high utilization CPUs tend to stay with an O/S page (multiple DRAM pages) Much processing and I/O DMA uses long incrementing bursts Image processing is tougher ■Two-dimensional bursts 2D transaction using a single OCP read or write command Popular for HD video and graphics ■Address tiling Rearrange DRAM address organization to exploit locality Avoids page misses One size doesn’t fit all!

6 July 2011MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.9 Interleaved Multichannel Technology (IMT*): Seamless Transition to Multichannel ■Interleaving support requires splitting (and chopping!) traffic for delivery to proper channel Splitting in memory controller creates performance and routing congestion bottleneck ■Predictably high performance Automatically spreads traffic across channels to ensure load balancing Keeps DRAMs operating at full throughput without costly reorder buffers ■Scalable architecture Up to 8 interleaved channels within the same address region Fully distributed to avoid bottlenecks & placement restrictions ■Application flexibility Transparent to software and initiator hardware Supports full or partial memory configurations – at run time Multichannel Interleaving in the Interconnect Higher Performance, Lower Area, More Scalable *patents pending

7 December 2011Powering Smartphone SoCs with Multichannel TSV DRAMs10 Transparent Multichannel Interleaving with Access-Optimized Boundaries Application View Physical Organization Independent Interleaving Size Support

7 December 2011Powering Smartphone SoCs with Multichannel TSV DRAMs11 MemMax™ Memory Scheduler Multi-threaded & multi-tagged OCP with non-blocking flow control In-order with early pre-charge/activate Compiled (SRAM) data buffers decouple rates & cross clock domain ■Address tiling ■Request grouping ■2D burst support ■Guaranteed bandwidth QoS with demotion ■Per-thread buffer sizing ■Run time re-programmable

6 July 2011MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.12 Complete Wide I/O System Solution Sonics Interconnect with IMT Video Pre Processor Graphics Engine Video Decoder Video Post Processor Transport Engine IA Host CPU Audio Processor Streaming Processor Debug Interface IA Wide I/O Controller 0 MemMax 0 TA MemMax 3 Wide I/O Controller 3 Wide I/O DRAM TA GPIO USBFLASH TA SRAM TA Storage SOC TA ■Channel splitting and load balancing ■Scheduling for high efficiency & QoS ■Wide I/O interface

October 11, 2011© Sonics, Inc., 2011, All rights reserved, NDA required13 Power Domain Partitioning CPU1 CPU0 MemMax DRAM Controller Encrypt Engine GPU Flash Cntr Face Detection Video Codec Camera Cntr DMA Cntr HSI MMC SD USB Cntr UART SPI Timer Clk Mgr GPIO BootROM Audio Codec Misc Touch Cntr Power Mgr Accel INTC Digital Compass L2 Cache Power Domain 1 Power Domain 4 Power Domain 3 Power Domain 2 Sonics Peripheral Network SonicsGN Network on Chip Advanced Partitioning Multiple Clocks Efficient power Mgmt Better timing closure

6 July 2011MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.14 Summary ■Smartphone SoC performance dominated by DRAM ■New JEDEC Wide I/O standard offers compelling performance and power benefits Assuming TSV-related issues can be managed ■3D smartphone SoC solutions require: Massive connectivity Transparent multi-channel DRAM load balancing High-efficiency DRAM scheduling with high QoS Minimum energy usage

7 December 2011Powering Smartphone SoCs with Multichannel TSV DRAMs15 THANK YOU!