1 RAMP 100K Core Breakout Assorted RAMPants RAMP Retreat, UC San Diego June 14, 2007 1M.

Slides:



Advertisements
Similar presentations
MEMORY popo.
Advertisements

WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
Cache Performance, Interfacing, Multiprocessors CPSC 321 Andreas Klappenecker.
Input-output and Communication Prof. Sin-Min Lee Department of Computer Science.
Interfacing Processors and Peripherals Andreas Klappenecker CPSC321 Computer Architecture.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
UC Berkeley 1 Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
1 Lecture 21: Virtual Memory, I/O Basics Today’s topics:  Virtual memory  I/O overview Reminder:  Assignment 8 due Tue 11/21.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Chapter 8: Part II Storage, Network and Other Peripherals.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
1 Chapter 7: Storage Systems Introduction Magnetic disks Buses RAID: Redundant Arrays of Inexpensive Disks.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Survey of Existing Memory Devices Renee Gayle M. Chua.
The Alpha Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented.
1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
Cache (Memory) Performance Optimization. Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: reduce the miss rate.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Block-Based Packet Buffer with Deterministic Packet Departures Hao Wang and Bill Lin University of California, San Diego HSPR 2010, Dallas.
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
1  2004 Morgan Kaufmann Publishers Page Tables. 2  2004 Morgan Kaufmann Publishers Page Tables.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Computer System Structures Storage
CS 704 Advanced Computer Architecture
COMP541 Memories II: DRAMs
ECE232: Hardware Organization and Design
Computer Structure Multi-Threading
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: illusion of having more physical memory program relocation protection.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Lecture: DRAM Main Memory
Lecture 28: Reliability Today’s topics: GPU wrap-up Disk basics RAID
Overview Continuation from Monday (File system implementation)
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
COT 5611 Operating Systems Design Principles Spring 2014
Virtual Memory 1 1.
Presentation transcript:

1 RAMP 100K Core Breakout Assorted RAMPants RAMP Retreat, UC San Diego June 14, M

2 Two Kinds of 1M Core Machine Scientific supercomputer (e.g., BlueGene) Data Center (e.g., Google) Some commonalities  Both will require heavy virtualization of the physical FPGA resources  Both will need host disk to hold target RAM state Goal is 1M cores in a “few” racks Roughly 100-1,000 TB of target RAM ( disks) Some differences:  Latency of core interaction Supercomputer: microseconds Datacenter: milliseconds

3 Virtualization Techniques (virtualizing the old PMS model) Cores: Virtualize both functional and timing model on single physical pipeline on FPGA  Simple barrel pipeline should suffice, since always update timing model even if target thread is stalled Routers: Virtualize crossbar using single physical switch/RAM block  Probably need to buffer one target cycle worth of inputs, to allow arbitrary arbitration scheme in model Memory: Virtualize memory ports using single physical RAM Each PMS arc is a virtualized channel, maybe use simple striping everywhere to make things composable

4 Virtualized Memory Hierarchy 16 physical cores/FPGA only generate <1 request/cycle off chip (~few % miss rate, independent of # virtualized threads+timing models)  100MHz, 800MB/s DRAM provides next level of cache  What miss rate from 4GB DRAM cache to run at full speed? Assume roughly one disk/FPGA  1TB is enough state for 1000 cores with 1GB each  Provides <100MB/s bandwidth best case  Only need <10% miss rate ???? BUT! Disk has huge latency. Even with large block transfers (Pages? Tracks?) want to predetermine memory requests, schedule disk accesses, to get reasonable performance  Use runahead technique to guess what each thread will want in next few simulation cycles (checkpoint registers, run 1000 cycles ahead, don’t write memory but record misses, restore to checkpoint, then run 1000 cycles in demand mode)

5 Interaction Latency/Bandwidth Supercomputer model design more difficult due to need to model low latency interactions  Only a few target clock cycles between core interactions, some interactions synchronous (e.g., barrier sync logic) Datacenter cores only interact through Ethernet  OK to run each core for longer before checking for interaction event and interactions asynchronous (e.g., can schedule NIC interrupts when convenient for model)