Lecture 13 Main Memory Computer Architecture COE 501.

Slides:



Advertisements
Similar presentations
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Advertisements

Prith Banerjee ECE C03 Advanced Digital Design Spring 1998
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
Anshul Kumar, CSE IITD CSL718 : Main Memory 6th Mar, 2006.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
10/11/2007EECS150 Fa07 - DRAM 1 EECS Components and Design Techniques for Digital Systems Lec 14 – Storage: DRAM, SDRAM David Culler Electrical Engineering.
ENGIN112 L30: Random Access Memory November 14, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 30 Random Access Memory (RAM)
COMP3221 lec31-mem-bus-I.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 31: Memory and Bus Organisation - I
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
CS252/Kubiatowicz Lec /05/03 CS252 Graduate Computer Architecture Lecture 19 Memory Systems Continued November 5 th, 2003 Prof. John Kubiatowicz.
Memory Computer Architecture Lecture 16: Memory Systems.
1 Lecture 16B Memories. 2 Memories in General Computers have mostly RAM ROM (or equivalent) needed to boot ROM is in same class as Programmable Logic.
CS152 / Kubiatowicz Lec /9/01©UCB Fall 2001 CS152 Computer Architecture and Engineering Lecture 20 Locality and Memory Technology November 9 th,
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
April 20, 2004 Prof. Andreas Savvides Spring 2004
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
SDRAM Memory Controller
Ceg3420 L15.1 DAP Fa97,  U.CB CEG3420 Computer Design Locality and Memory Technology.
ENGS 116 Lecture 141 Main Memory and Virtual Memory Vincent H. Berk October 26, 2005 Reading for today: Sections 5.1 – 5.4, (Jouppi article) Reading for.
CS 151 Digital Systems Design Lecture 30 Random Access Memory (RAM)
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
ECE 232 L24.Memory.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 24 Memory.
CS152 / Kubiatowicz Lec /01/99©UCB Fall 1999 CS152 Computer Architecture and Engineering Lecture 18 Locality and Memory Technology November 1, 1999.
1 Lecture 16B Memories. 2 Memories in General RAM - the predominant memory ROM (or equivalent) needed to boot ROM is in same class as Programmable Logic.
1 EE365 Read-only memories Static read/write memories Dynamic read/write memories.
Main Memory by J. Nelson Amaral.
Overview Booth’s Algorithm revisited Computer Internal Memory Cache memory.
Physical Memory and Physical Addressing By: Preeti Mudda Prof: Dr. Sin-Min Lee CS147 Computer Organization and Architecture.
Memory Technology “Non-so-random” Access Technology:
Lecture 2-3: Digital Circuits & Components (1) Logic Gates(6) Registers Parallel Load (2) Boolean AlgebraShift Register Counter (3) Logic Simplification.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
CS 152 / Fall 02 Lec 19.1 CS 152: Computer Architecture and Engineering Lecture 19 Locality and Memory Technologies Randy H. Katz, Instructor Satrajit.
CpE 442 Memory System Start: X:40.
Survey of Existing Memory Devices Renee Gayle M. Chua.
CMPUT 429/CMPE Computer Systems and Architecture1 CMPUT429 - Winter 2002 Topic5: Memory Technology José Nelson Amaral.
Memory Systems Embedded Systems Design and Implementation Witawas Srisa-an.
IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
Memory System Unit-IV 4/24/2017 Unit-4 : Memory System.
Main Memory CS448.
CS 312 Computer Architecture Memory Basics Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki
CPEN Digital System Design
Memory Hierarchy— Reducing Miss Penalty Reducing Hit Time Main Memory Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Lecture 12: Memory Hierarchy— Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
07/11/2005 Register File Design and Memory Design Presentation E CSE : Introduction to Computer Architecture Slides by Gojko Babić.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
CSE431 L18 Memory Hierarchy.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 18: Memory Hierarchy Review Mary Jane Irwin (
CPEG3231 Integration of cache and MIPS Pipeline  Data-path control unit design  Pipeline stalls on cache misses.
1Chapter 7 Memory Hierarchies Outline of Lectures on Memory Systems 1. Memory Hierarchies 2. Cache Memory 3. Virtual Memory 4. The future.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 25 Memory Hierarchy Design (Storage Technologies Trends and Caching) Prof.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
CS152 / Kubiatowicz Lec17.1 4/5/99©UCB Spring 1999 CS152 Computer Architecture and Engineering Lecture 17 Locality and Memory Technology April 5, 1999.
Administration Midterm on Thursday Oct 28. Covers material through 10/21. Histogram of grades for HW#1 posted on newsgroup. Sample problem set (and solutions)
Memory Hierarchy— Reducing Miss Penalty Reducing Hit Time Main Memory
CS 704 Advanced Computer Architecture
Yu-Lun Kuo Computer Sciences and Information Engineering
Reducing Hit Time Small and simple caches Way prediction Trace caches
CMSC 611: Advanced Computer Architecture
Main Memory Background
Bob Reese Micro II ECE, MSU
Presentation transcript:

Lecture 13 Main Memory Computer Architecture COE 501

µProc 60%/yr. (2X/1.5yr) DRAM 9%/yr. (2X/10 yrs) DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance Time “Moore’s Law” Processor-DRAM Memory Gap (latency) Who Cares About the Memory Hierarchy?

Impact on Performance Suppose a processor executes at –Clock Rate = 200 MHz (5 ns per cycle) –CPI = 1.1 –50% arith/logic, 30% ld/st, 20% control Suppose that 10% of memory operations get 50 cycle miss penalty CPI = ideal CPI + average stalls per instruction = 1.1(cyc) +( 0.30 (datamops/ins) x 0.10 (miss/datamop) x 50 (cycle/miss) ) = 1.1 cycle cycle = % of the time the processor is stalled waiting for memory! a 1% instruction miss rate would add an additional 0.5 cycles to the CPI!

Main Memory Background Performance of Main Memory: –Latency: Cache Miss Penalty »Access Time: time between request and when desired word arrives »Cycle Time: time between requests –Bandwidth: I/O & Large Block Miss Penalty (L2) Main Memory is DRAM: Dynamic Random Access Memory –Dynamic since needs to be refreshed periodically (8 ms) –Addresses divided into 2 halves: »RAS or Row Access Strobe »CAS or Column Access Strobe Cache uses SRAM: Static Random Access Memory –No refresh (6 transistors/bit vs. 1 transistor/bit for DRAM) –Address not divided Size: DRAM/SRAM ­ 4-8, Cost & Cycle time: SRAM/DRAM ­ 8-16

Typical SRAM Organization: 16-word x 4-bit SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell -+ Sense Amp :::: Word 0 Word 1 Word 15 Dout 0Dout 1Dout 2Dout 3 -+ Wr Driver & Precharger -+ Wr Driver & Precharger -+ Wr Driver & Precharger -+ Wr Driver & Precharger Address Decoder WrEn Precharge Din 0Din 1Din 2Din 3 A0 A1 A2 A3 Q: Which is longer: word line or bit line?

Write Enable is usually active low (WE_L) Bi-directional data –A new control signal, output enable (OE_L) is needed –WE_L is asserted (Low), OE_L is disasserted (High) »D serves as the data input pin –WE_L is disasserted (High), OE_L is asserted (Low) »D is the data output pin –Both WE_L and OE_L are asserted: »Result is unknown. Don’t do that!!! A DOE_L 2 N words x M bit SRAM N M WE_L Logic Diagram of a Typical SRAM

Classical DRAM Organization (square) rowdecoderrowdecoder row address Column Selector & I/O Circuits Column Address data RAM Cell Array word (row) select bit (data) lines Row and Column Address together: –Select 1 bit a time Each intersection represents a 1-T DRAM Cell

A D OE_L 256K x 8 DRAM 98 WE_L Control Signals (RAS_L, CAS_L, WE_L, OE_L) are all active low Din and Dout are combined (D): –WE_L is asserted (Low), OE_L is disasserted (High) »D serves as the data input pin –WE_L is disasserted (High), OE_L is asserted (Low) »D is the data output pin Row and column addresses share the same pins (A) –RAS_L goes low: Pins A are latched in as row address –CAS_L goes low: Pins A are latched in as column address –RAS/CAS edge-sensitive CAS_LRAS_L Logic Diagram of a Typical DRAM

Simple: –CPU, Cache, Bus, Memory same width (32 bits) Interleaved: –CPU, Cache, Bus 1 word: Memory N Modules (4 Modules); example is word interleaved Wide: –CPU/Mux 1 word; Mux/Cache, Bus, Memory N words (Alpha: 64 bits & 256 bits) Main Memory Performance

DRAM (Read/Write) Cycle Time >> DRAM (Read/Write) Access Time –­ 2:1; why? DRAM (Read/Write) Cycle Time : –How frequent can you initiate an access? –Analogy: A little kid can only ask his father for money on Saturday DRAM (Read/Write) Access Time: –How quickly will you get what you want once you initiate an access? –Analogy: As soon as he asks, his father will give him the money DRAM Bandwidth Limitation analogy: –What happens if he runs out of money on Wednesday? Time Access Time Cycle Time Main Memory Performance

Access Pattern without Interleaving: Start Access for D1 CPUMemory Start Access for D2 D1 available Access Pattern with 4-way Interleaving: Access Bank 0 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again CPU Memory Bank 1 Memory Bank 0 Memory Bank 3 Memory Bank 2 Increasing Bandwidth - Interleaving

Main Memory Performance (Figure 5.32, pg. 432) How long does it take to send 4 words? Timing model –1 cycle to send address, –6 cycles to access data + 1 cycle to send data –Cache Block is 4 words Simple Mem. = 4 x (1+6+1) = 32 Wide Mem. = = 8 Interleaved Mem. = x1 = 11 address Bank address Bank address Bank address Bank

Wider vs. Interleaved Memory Wider memory –Cost for the wider connection –Need a multiplexor between cache and CPU –May lead to significantly larger/more expensive memory Intervleaved memory –Send address to several banks simultaneously –Want Number of banks > Number of clocks to access bank –As the size of memory chips increases, it becomes difficult to interleave memory –Difficult to expand memory by small amounts –May not work well for non-sequential accesses

Avoiding Bank Conflicts Lots of banks int x[256][512]; for (j = 0; j < 512; j = j+1) for (i = 0; i < 256; i = i+1) x[i][j] = 2 * x[i][j]; Even with 128 banks their are conflicts, since 512 is an even multiple of 128 SW: loop interchange or declaring array not power of 2 HW: Prime number of banks –bank number = address mod number of banks –address within bank = address / number of banks –modulo & divide per memory access? –address within bank = address mod number words in bank (3, 7, 31)

Page Mode DRAM: Motivation Regular DRAM Organization: –N rows x N column x M-bit –Read & Write M-bit at a time –Each M-bit access requires a RAS / CAS cycle Fast Page Mode DRAM –N x M “register” to save a row ARow AddressJunk CAS_L RAS_L Col AddressRow AddressJunkCol Address 1st M-bit Access2nd M-bit Access N rows N cols DRAM M bits Row Address Column Address M-bit Output

Fast Page Mode Operation Fast Page Mode DRAM –N x M “SRAM” to save a row After a row is read into the register –Only CAS is needed to access other M-bit blocks on that row –RAS_L remains asserted while CAS_L is toggled ARow Address CAS_L RAS_L Col Address 1st M-bit Access N rows N cols DRAM Column Address M-bit Output M bits N x M “SRAM” Row Address Col Address 2nd M-bit3rd M-bit4th M-bit

Fast Memory Systems: DRAM specific Multiple column accesses: page mode –DRAM buffers a row of bits inside the DRAM for column access (e.g., 16 Kbits row for 64 Mbits DRAM) –Allow repeated column access without another row access –64 Mbit DRAM: cycle time = 90 ns, optimized access = 25 ns New DRAMs: what will they cost, will they survive? –Synchronous DRAM : Provide a clock signal to DRAM, transfer synchronous to system clock –RAMBUS : startup company; reinvent DRAM interface »Each chip a module vs. slice of memory »Short bus between CPU and chips »Does own refresh »Variable amount of data returned »1 byte / 2 ns (500 MB/s per chip) Niche memory or main memory? –e.g., Video RAM for frame buffers, DRAM + fast serial output

Main Memory Summary Wider Memory Interleaved Memory: for sequential or independent accesses Avoiding bank conflicts: SW & HW DRAM specific optimizations: page mode & Specialty DRAM