April 20, 2004 Prof. Andreas Savvides Spring 2004

Slides:



Advertisements
Similar presentations
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Advertisements

Chapter 5 Internal Memory
Computer Organization and Architecture
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
Anshul Kumar, CSE IITD CSL718 : Main Memory 6th Mar, 2006.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
1 DIGITAL DESIGN I DR. M. MAROUF MEMORY Read-only memories Static read/write memories Dynamic read/write memories Author: John Wakerly (CHAPTER 10.1 to.
Memory Chapter 3. Slide 2 of 14Chapter 1 Objectives  Explain the types of memory  Explain the types of RAM  Explain the working of the RAM  List the.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 19, 2003 Topic: Main Memory (DRAM) Organization.
EECC722 - Shaaban #1 lec # 12 Fall Computer System Components SDRAM PC100/PC MHZ bits wide 2-way interleaved ~ 900 MBYTES/SEC.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
Spring 2002EECS150 - Lec19-memory Page 1 EECS150 - Digital Design Lecture 18 - Memory April 4, 2002 John Wawrzynek.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 18, 2002 Topic: Main Memory (DRAM) Organization – contd.
1 EE365 Read-only memories Static read/write memories Dynamic read/write memories.
Main Memory by J. Nelson Amaral.
CSCI 4717/5717 Computer Architecture
12/1/2004EE 42 fall 2004 lecture 381 Lecture #38: Memory (2) Last lecture: –Memory Architecture –Static Ram This lecture –Dynamic Ram –E 2 memory.
Memory Hierarchy Registers Cache Main Memory Fixed Disk (virtual memory) Tape Floppy Zip CD-ROM CD-RWR Cost/Bit Access/Speed Capacity.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
EEL 5708 Main Memory Organization Lotzi Bölöni Fall 2003.
EEL 5708 Memory technology Lotzi Bölöni. EEL 5708 Acknowledgements All the lecture slides were adopted from the slides of David Patterson (1998, 2001)
Chapter 5 Internal Memory. Semiconductor Memory Types.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
Lecture 13 Main Memory Computer Architecture COE 501.
Chapter 3 Internal Memory. Objectives  To describe the types of memory used for the main memory  To discuss about errors and error corrections in the.
Main Memory CS448.
CPEN Digital System Design
University of Tehran 1 Interface Design DRAM Modules Omid Fatemi
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Chapter 6: Internal Memory Computer Architecture Chapter 6 : Internal Memory Memory Processor Input/Output.
Computer Memory Storage Decoding Addressing 1. Memories We've Seen SIMM = Single Inline Memory Module DIMM = Dual IMM SODIMM = Small Outline DIMM RAM.
Memory Hierarchy Registers Cache Main Memory Fixed Disk (virtual memory) Tape Floppy Zip CD-ROM CD-RWR Cost/Bit Access/Speed Capacity.
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
Dynamic Random Access Memory (DRAM) CS 350 Computer Organization Spring 2004 Aaron Bowman Scott Jones Darrell Hall.
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
MBG 1 CIS501, Fall 99 Lecture 11: Memory Hierarchy: Caches, Main Memory, & Virtual Memory Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
Semiconductor Memory Types
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Memory Hierarchy (I). 2 Outline Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Suggested Reading: 6.1.
Chapter 5 Internal Memory. contents  Semiconductor main memory - organisation - organisation - DRAM and SRAM - DRAM and SRAM - types of ROM - types of.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
Computer Architecture Chapter (5): Internal Memory
“With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
Administration Midterm on Thursday Oct 28. Covers material through 10/21. Histogram of grades for HW#1 posted on newsgroup. Sample problem set (and solutions)
COMP541 Memories II: DRAMs
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
Information Storage and Spintronics 10
CMSC 611: Advanced Computer Architecture
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
BIC 10503: COMPUTER ARCHITECTURE
William Stallings Computer Organization and Architecture 8th Edition
Bob Reese Micro II ECE, MSU
Presentation transcript:

April 20, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer Systems Lecture 19 Memory Hierarchy Design Part III Memory Technologies April 20, 2004 Prof. Andreas Savvides Spring 2004 http://www.eng.yale.edu/courses/eeng449bG Review today, not so fast in future

Announcements Midterm 2 next time (20% of class grade) Material from chapters 3,4,5 Use lecture slides and HW exercises as a study guide Project presentation (10% of grade) April 26th (or May 4th) Project reports (15% of grade) Due May 6th

Main Memory Background Performance of Main Memory: Latency: Cache Miss Penalty Access Time: time between request and word arrives Cycle Time: time between requests Bandwidth: I/O & Large Block Miss Penalty (L2) Main Memory is DRAM: Dynamic Random Access Memory Dynamic since needs to be refreshed periodically (8 ms, 1% time) Addresses divided into 2 halves (Memory as a 2D matrix): RAS or Row Access Strobe CAS or Column Access Strobe Cache uses SRAM: Static Random Access Memory No refresh (6 transistors/bit vs. 1 transistor for DRAM Size: DRAM/SRAM ­ 4-8, Cost/Cycle time: SRAM/DRAM ­ 8-16

Main Memory Organizations Simple: CPU, Cache, Bus, Memory same width (32 or 64 bits) Memory Performance Example 4 cycles to send address 56 cycles access time per word 4 clock cycles to send a word of data Cache block size 4, 8-byte words Miss Penalty 4 x ( 4 + 56 + 4) = 256 clock cycles

Main Memory Organizations Wide Memory Organization: CPU/Mux 1 word; Mux/Cache, Bus, Memory N words (Alpha: 64 bits & 256 bits; UtraSPARC 512)

Main Memory Organizations Wide Memory Organization: CPU/Mux 1 word; Mux/Cache, Bus, Memory N words (Alpha: 64 bits & 256 bits; UtraSPARC 512) Consider memory width of 2 words – miss penalty: 2 x ( 4 + 56 + 4 ) = 128 cycles 4-word width => 64

Main Memory Organizations Wide Memory drawbacks HW overhead – wider bus and multiplexers at each level If error correction is supported, then the whole block has to be read at each byte write to compute a new code

Main Memory Organizations Memory Interleaving: CPU, Cache, Bus 1 word: Memory N Modules (4 Modules); example is word interleaved New Miss Penalty: 4 + 56 + ( 4 x 4 ) = 76 cycles Bank advantages Can have up to 1-word/cycle writes if writes are not on the same bank

Memory Technologies

Memory Technologies DRAM Dynamic Random Access Memory Write Charge bitline HIGH or LOW and set wordline HIGH Read Bit line is precharged to a voltage halfway between HIGH and LOW, and then the word line is set HIGH. Depending on the charge in the cap, the precharged bitline is pulled slightly higher or lower. Sense Amp Detects change Destructive read! Explains why Cap can’t shrink Need to sufficiently drive bitline Increase density => increase parasitic capacitance Word Line Bit Line C Sense Amp . . .

DRAM Charge Leakage Need to have frequent refresh, rates vary from ms to ns, update approx. every 8 reads

DRAM logical organization (4 Mbit) Column Decoder … Data In Sense Amps & I/O D Data Out Memory Array Q A0…A1 3 Address buffer Bit Line Row decoder (16,384 x 16,384) Storage Cell Word Line Square root of bits per RAS/CAS

DRAM-chip internal organization 64K x 1 DRAM

RAS/CAS operation Row Address Strobe, Column Address Strobe n address bits are provided in two steps using n/2 pins, referenced to the falling edges of RAS_L and CAS_L Traditional method of DRAM operation for 20 years. Now being supplanted by synchronous, clocked interfaces in SDRAM (synchronous DRAM).

DRAM read timing

DRAM read timing Read Latency

DRAM refresh timing

DRAM write timing

DRAM History DRAMs: capacity +60%/yr, cost –30%/yr 2.5X cells/area, 1.5X die size in ­3 years ‘98 DRAM fab line costs $2B DRAM only: density, leakage v. speed Rely on increasing no. of computers & memory per computer (60% market) SIMM or DIMM is replaceable unit => computers use any generation DRAM Commodity, second source industry => high volume, low profit, conservative Little organization innovation in 20 years Order of importance: 1) Cost/bit 2) Capacity First RAMBUS: 10X BW, +30% cost => little impact

So, Why do I freaking care? By it’s nature, DRAM isn’t built for speed Reponse times dependent on capacitive circuit properties which get worse as density increases DRAM process isn’t easy to integrate into CMOS process DRAM is off chip Connectors, wires, etc introduce slowness IRAM efforts looking to integrating the two Memory Architectures are designed to minimize impact of DRAM latency Low Level: Memory chips High Level memory designs. You will pay $$$$$$ and then some $$$ for a good memory system.

So, Why do I freaking care? 1960-1985: Speed = ƒ(no. operations) 1990 Pipelined Execution & Fast Clock Rate Out-of-Order execution Superscalar Instruction Issue 1998: Speed = ƒ(non-cached memory accesses)

DRAM Future: 1 Gbit DRAM (ISSCC ‘96; production ‘02?) Mitsubishi Samsung Blocks 512 x 2 Mbit 1024 x 1 Mbit Clock 200 MHz 250 MHz Data Pins 64 16 Die Size 24 x 24 mm 31 x 21 mm Sizes will be much smaller in production Metal Layers 3 4 Technology 0.15 micron 0.16 micron Latency comparison 180ns in 1980, 40ns in 2002

Single Port 6-T SRAM Cell Static RAM (SRAM) Six transistors in cross connected fashion Prevent the information from being disturbed when read SRAM requires minimal power to retain the charges – better than SRAM On the same process DRAM 4-8 times more capacity, SRAMs 8-16 times faster Single Port 6-T SRAM Cell

Fast Memory Systems: DRAM specific Multiple CAS accesses: several names (page mode) Extended Data Out (EDO): 30% faster in page mode New DRAMs to address gap; what will they cost, will they survive? RAMBUS: startup company; reinvent DRAM interface Each Chip a module vs. slice of memory Short bus between CPU and chips (300-400MHz < 4inches long) Does own refresh Variable amount of data returned 1 byte / 2 ns (500 MB/s per chip) @1.6GB/sec bandwidth 20% increase in DRAM area Synchronous DRAM (SDRAM): 2 banks on chip, a clock signal to DRAM, transfer synchronous to system clock (66 - 150 MHz in 2001)

RAMBUS (RDRAM) Protocol based RAM w/ narrow (16-bit) bus High clock rate (400 Mhz), but long latency Pipelined operation Multiple arrays w/ data transferred on both edges of clock RAMBUS Bank RDRAM Memory System

RAMBUS vs. SDRAM SDRAM comes in DIMMs, RAMBUS comes in RIMMs – similar in size but incompatible SDRAMs have almost comparable performance to RAMBUS Newer DRAM generations of DRAM such as RDRAM and DRDRAM provide more bandwidth at a price premium

Need for Error Correction! Motivation: Failures/time proportional to number of bits! As DRAM cells shrink, more vulnerable Went through period in which failure rate was low enough without error correction that people didn’t do correction DRAM banks too large now Servers always corrected memory systems Basic idea: add redundancy through parity bits Simple but wasteful version: Keep three copies of everything, vote to find right value 200% overhead, so not good! Common configuration: Random error correction SEC-DED (single error correct, double error detect) One example: 64 data bits + 8 parity bits (11% overhead) Papers up on reading list from last term tell you how to do these types of codes Really want to handle failures of physical components as well Organization is multiple DRAMs/SIMM, multiple SIMMs Want to recover from failed DRAM and failed SIMM! Requires more redundancy to do this All major vendors thinking about this in high-end machines

More esoteric Storage Technologies? Tunneling Magnetic Junction RAM (TMJ-RAM): Speed of SRAM, density of DRAM, non-volatile (no refresh) New field called “Spintronics”: combination of quantum spin and electronics Same technology used in high-density disk-drives MicroElecromechanicalSystems(MEMS) storage devices: Large magnetic “sled” floating on top of lots of little read/write heads Micromechanical actuators move the sled back and forth over the heads

Tunneling Magnetic Junction

MEMS-based Storage Magnetic “sled” floats on array of read/write heads Approx 250 Gbit/in2 Data rates: IBM: 250 MB/s w 1000 heads CMU: 3.1 MB/s w 400 heads Electrostatic actuators move media around to align it with heads Sweep sled ±50m in < 0.5s Capacity estimated to be in the 1-10GB in 10cm2 See Ganger et all: http://www.lcs.ece.cmu.edu/research/MEMS

Embedded Processor Memory Technologies Read Only Memory (ROM) – programmed once at manufacture time – non-destructible FLASH Memory Non-volatile but re-programmable Almost DRAM reading speeds but 10 – 100 slower writing Typical access times 65ns for 16Mbit flash and 150ns for 128Mbit flash Flash building blocks are based on NOR or NAND devices NOR devices can be reprogrammed about 100,000 cycles NAND devices can be reprogrammed up to 1,000,000 cycles

Project & Reports Your final report should build up on the midterm report Document your software architecture & approach If you have hardware, document the hardware Useful metrics: Power consumption (in mA current drawn from the power supply) Projects dealing with evaluations Report on your results Experiments with negative results are as important as positive results. Explain what did not work and why Demo your project status on the day of final presentation

Concluding Remarks Processor internals Processor interfaces Performance, Pipelining, ILP SW & HW, Memory Hierarchies Processor interfaces Using microcontrollers, peripherals and tools