1 Above: The first magnetic core memory, from the IBM 405 Alphabetical Accounting Machine. This experimental system was tested successfully in April 1952.

Slides:



Advertisements
Similar presentations
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Advertisements

Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
Chapter 5 Internal Memory
Computer Organization and Architecture
Prith Banerjee ECE C03 Advanced Digital Design Spring 1998
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
Anshul Kumar, CSE IITD CSL718 : Main Memory 6th Mar, 2006.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
DRAM. Any read or write cycle starts with the falling edge of the RAS signal. –As a result the address applied in the address lines will be latched.
1 Lecture 16B Memories. 2 Memories in General Computers have mostly RAM ROM (or equivalent) needed to boot ROM is in same class as Programmable Logic.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 19, 2003 Topic: Main Memory (DRAM) Organization.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
1 Lecture 16B Memories. 2 Memories in General RAM - the predominant memory ROM (or equivalent) needed to boot ROM is in same class as Programmable Logic.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 18, 2002 Topic: Main Memory (DRAM) Organization – contd.
Main Memory by J. Nelson Amaral.
8-5 DRAM ICs High storage capacity Low cost Dominate high-capacity memory application Need “refresh” (main difference between DRAM and SRAM) -- dynamic.
Overview Booth’s Algorithm revisited Computer Internal Memory Cache memory.
CSCI 4717/5717 Computer Architecture
CompE 460 Real-Time and Embedded Systems Lecture 5 – Memory Technologies.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 8 – Memory Basics Logic and Computer Design.
Faculty of Information Technology Department of Computer Science Computer Organization and Assembly Language Chapter 5 Internal Memory.
Survey of Existing Memory Devices Renee Gayle M. Chua.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL RAM Basics Anselmo Lastra.
1 CSCI 2510 Computer Organization Memory System I Organization.
Chapter 5 Internal Memory. Semiconductor Memory Types.
Memory and Storage Dr. Rebhi S. Baraka
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use ECE/CS 352: Digital Systems.
Memory System Unit-IV 4/24/2017 Unit-4 : Memory System.
Main Memory CS448.
CPEN Digital System Design
University of Tehran 1 Interface Design DRAM Modules Omid Fatemi
Asynchronous vs. Synchronous Counters Ripple Counters Deceptively attractive alternative to synchronous design style State transitions are not sharp! Can.
Chapter 4: MEMORY Internal Memory.
Computer Architecture Lecture 24 Fasih ur Rehman.
Semiconductor Memory Types
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 8 – Memory Basics Logic and Computer Design.
07/11/2005 Register File Design and Memory Design Presentation E CSE : Introduction to Computer Architecture Slides by Gojko Babić.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
Chapter 5 Internal Memory. contents  Semiconductor main memory - organisation - organisation - DRAM and SRAM - DRAM and SRAM - types of ROM - types of.
Computer Architecture Chapter (5): Internal Memory
“With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
RAM RAM - random access memory RAM (pronounced ramm) random access memory, a type of computer memory that can be accessed randomly;
COMP541 Memories II: DRAMs
Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 7th Edition
COMP541 Memories II: DRAMs
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
Computer Architecture
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
AKT211 – CAO 07 – Computer Memory
DRAM Hwansoo Han.
William Stallings Computer Organization and Architecture 8th Edition
Bob Reese Micro II ECE, MSU
Presentation transcript:

1 Above: The first magnetic core memory, from the IBM 405 Alphabetical Accounting Machine. This experimental system was tested successfully in April 1952 Right: The IBM 2361 Core Storage Module housed 16K bytes of core memory. (from Columbia University)

2 COMP 206: Computer Architecture and Implementation Montek Singh Thu, April 16, 2009 Topic: Main Memory (DRAM) Organization

3Outline  Introduction  SRAM (briefly)  DRAM Organization  Challenges Bandwidth Bandwidth Granularity Granularity  Performance

4 4 Structure of SRAM Cell  Control logic  One memory cell per bit Cell consists of one or more transistors Cell consists of one or more transistors Not really a latch made of logic Not really a latch made of logic  Logic equivalent

5 5 Bit Slice  Cells connected to form 1 bit position  Word Select gates one latch from address lines  Note it selects Reads also  B (and B not) set by R/W, Data In and BitSelect

6 6 Bit Slice can Become Module  Basically bit slice is a x1 memory  Next

X 1 RAM  Now shows decoder

8 8Row/Column  If RAM gets large, there is a large decoder Impossibly large! Impossibly large!  Also run into chip layout issues  Larger memories usually “2D” in a matrix layout  Next Slide

X 1 as 4 X 4 Array  Two decoders Row Row Column Column  Address just broken up  Not visible from outside

10 Dynamic RAM  Capacitor can hold charge  Transistor acts as gate  No charge is a 0  Can add charge to store a 1  Then open switch (disconnect)  Can read by closing switch Explanation next Explanation next

11 Precharge and Sense Amps  You’ll see “precharge time”  B is precharged to ½ V  Charge/no-charge on C will increase or decrease voltage  Sense amps detect this

12 DRAM Characteristics  Destructive Read When cell read, charge removed When cell read, charge removed Must be restored after a read Must be restored after a read  Refresh Also, there’s steady leakage Also, there’s steady leakage Charge must be restored periodically Charge must be restored periodically

13 DRAM Logical Diagram

14 DRAM Refresh  Many strategies w/ logic on chip  Here a row counter

15 Timing  Say need to refresh every 64ms  Distributed refresh Spread refresh out evenly over 64ms Spread refresh out evenly over 64ms Say on a 4Mx4 DRAM, refresh every 64ms/4096=15.6 us Say on a 4Mx4 DRAM, refresh every 64ms/4096=15.6 us Total time spent is 0.25ms, but spread Total time spent is 0.25ms, but spread  Burst refresh Same 0.25ms, but all at once Same 0.25ms, but all at once May not be good in a computer system May not be good in a computer system  Refresh takes 1 % or less of total time

16 Summary: DRAM vs. SRAM  DRAM (Dynamic RAM)  Used mostly in main mem.  Capacitor + 1 transistor/bit  Need refresh every 4-8 ms 5% of total time 5% of total time  Read is destructive (need for write-back)  Access time < cycle time (because of writing back)  Density (25-50):1 to SRAM  Address lines multiplexed pins are scarce! pins are scarce!  SRAM (Static RAM)  Used mostly in caches (I, D, TLB, BTB)  1 flip-flop (4-6 transistors) per bit  Read is not destructive  Access time = cycle time  Speed (8-16):1 to DRAM  Address lines not multiplexed high speed of decoding imp. high speed of decoding imp.

17 Chip Organization  Chip capacity (= number of data bits) tends to quadruple tends to quadruple 1K, 4K, 16K, 64K, 256K, 1M, 4M, … 1K, 4K, 16K, 64K, 256K, 1M, 4M, …  In early designs, each data bit belonged to a different address (x1 organization)  Starting with 1Mbit chips, wider chips (4, 8, 16, 32 bits wide) began to appear Advantage: Higher bandwidth Advantage: Higher bandwidth Disadvantage: More pins, hence more expensive packaging Disadvantage: More pins, hence more expensive packaging

18 Chip Organization Example: 64Mb DRAM

19 Memory Performance Characteristics  Latency (access time) The time interval between the instant at which the data is called for (READ) or requested to be stored (WRITE), and the instant at which it is delivered or completely stored The time interval between the instant at which the data is called for (READ) or requested to be stored (WRITE), and the instant at which it is delivered or completely stored  Cycle time The time between the instant the memory is accessed, and the instant at which it may be validly accessed again The time between the instant the memory is accessed, and the instant at which it may be validly accessed again  Bandwidth (throughput) The rate at which data can be transferred to or from memory The rate at which data can be transferred to or from memory Reciprocal of cycle time Reciprocal of cycle time “Burst mode” bandwidth is of greatest interest “Burst mode” bandwidth is of greatest interest  Cycle time > access time for conventional DRAM  Cycle time < access time in “burst mode” when a sequence of consecutive locations is read or written

20 Improving Performance  Latency can be reduced by Reducing access time of chips Reducing access time of chips Using a cache (“cache trades latency for bandwidth”) Using a cache (“cache trades latency for bandwidth”)  Bandwidth can be increased by using Wider memory (more chips) Wider memory (more chips) More data pins per DRAM chip More data pins per DRAM chip Increased bandwidth per data pin Increased bandwidth per data pin

21 Two Recent Problems  DRAM chip sizes quadrupling every three years  Main memory sizes doubling every three years  Thus, the main memory of the same kind of computer is being constructed from fewer and fewer DRAM chips  This results in two serious problems Diminishing main memory bandwidth Diminishing main memory bandwidth Increasing granularity of memory systems Increasing granularity of memory systems

22 Increasing Granularity of Memory Systems  Granularity of memory system is the minimum memory size, and also the minimum increment in the amount of memory permitted by the memory system  Too large a granularity is undesirable Increases cost of system Increases cost of system Restricts its competitiveness Restricts its competitiveness  Granularity can be decreased by Widening the DRAM chips Widening the DRAM chips Increasing the per-pin bandwidth of the DRAM chips Increasing the per-pin bandwidth of the DRAM chips

23 Granularity Example We are using 16K  1 DRAM parts, running at 2.5 MHz (400ns cycle time). Eight such DRAM parts provide 16KB of memory with 2.5MB/s bandwidth. We are using 16K  1 DRAM parts, running at 2.5 MHz (400ns cycle time). Eight such DRAM parts provide 16KB of memory with 2.5MB/s bandwidth. Industry switches to 64Kb (64K  1) DRAM parts. Two such DRAM parts provide the desired 16KB of memory. Such a system would have a 2-bit wide bus. Industry switches to 64Kb (64K  1) DRAM parts. Two such DRAM parts provide the desired 16KB of memory. Such a system would have a 2-bit wide bus. To maintain a 2.5MB/s bandwidth, parts would need to run at 10 MHz. But the parts run only at 3.7 MHz. What are the options? To maintain a 2.5MB/s bandwidth, parts would need to run at 10 MHz. But the parts run only at 3.7 MHz. What are the options? 8 2

24 Granularity Example (2) 8 Solution 1 Use eight 64K  1 DRAM parts (six would suffice for required bandwidth). Problem: Now we have 64KB of memory rather than 16KB. Solution 1 Use eight 64K  1 DRAM parts (six would suffice for required bandwidth). Problem: Now we have 64KB of memory rather than 16KB. Solution 2 Use two 16K  4 DRAM parts (same capacity, different organization). This provides 16KB of memory at the required bandwidth. Solution 2 Use two 16K  4 DRAM parts (same capacity, different organization). This provides 16KB of memory at the required bandwidth. 8

25 Improving Memory Chip Performance Several techniques to get more bits/sec from a DRAM chip: Allow repeated accesses to the row buffer without another row access time Allow repeated accesses to the row buffer without another row access time  burst mode, fast page mode, EDO mode, … Simplify the DRAM-CPU interface Simplify the DRAM-CPU interface  add a clock to reduce overhead of synchronizing with the controller  = synchronous DRAM (SDRAM) Transfer data on both rising and falling clock edges Transfer data on both rising and falling clock edges  double data rate (DDR)  Each of the above adds a small amount of logic to exploit the high internal DRAM bandwidth

26 Block Diagram

27 Activate Row

28 Read (Select column)

29 Basic Mode of Operation  Slowest mode  Uses only single row and column address  Row access is slow (60-70ns) compared to column access (5-10ns)  Leads to three techniques for DRAM speed improvement Getting more bits out of DRAM on one access given timing constraints Getting more bits out of DRAM on one access given timing constraints Pipelining the various operations to minimize total time Pipelining the various operations to minimize total time Segmenting the data in such a way that some operations are eliminated for a given set of accesses Segmenting the data in such a way that some operations are eliminated for a given set of accesses RowColumn Address RAS CAS Data

30 Nibble (or Burst) Mode  Several consecutive columns are accessed  Only first column address is explicitly specified  Rest are internally generated using a counter RAS CASCASCASCAS RACA D1D2D3D4 RAS CASCASCASCAS RACA D1D2D3D4

31 Fast Page Mode  Accesses arbitrary columns within same row  Static column mode is similar RAS CASCASCASCAS RACA1CA2CA3CA4 D1D2D3D4 RAS CASCASCASCAS RACA1CA2CA3CA4 D1D2D3D4

32 EDO Mode  Arbitrary column addresses  Pipelined  EDO = Extended Data Out  Has other modes like “burst EDO”, which allows reading of a fixed number of bytes starting with each specified column address RAS CASCASCASCASCASCASCAS RACA1CA2CA3CA4CA5CA6CA7 D1D2D3D4D5D6 RAS CASCASCASCASCASCASCAS RACA1CA2CA3CA4CA5CA6CA7 D1D2D3D4D5D6

33 Evolutionary DRAM Architectures  SDRAM (Synchronous DRAM) Interface retains a good part of conventional DRAM interface Interface retains a good part of conventional DRAM interface  addresses multiplexed in two halves  separate data pins  two control signals All address, data, and control signals are synchronized with an external clock ( MHz) All address, data, and control signals are synchronized with an external clock ( MHz)  Allows decoupling of processor and memory  Allows pipelining a series of reads and writes Peak speed per memory module: MB/sec Peak speed per memory module: MB/sec

34 Synchronous DRAM (SDRAM)  Common type in PCs since late-90s  Clocked  Addresses multiplexed in two halves  Burst transfers  Multiple banks  Pipelined Start read in one bank after another Start read in one bank after another Come back and read the resulting values one after another Come back and read the resulting values one after another

35 DDR DRAM  Double Data Rate SDRAM Transfers data on both edges of the clock Transfers data on both edges of the clock  Currently popular DDRx, where x refers to voltage and signaling specs. DDR1 was 2.5v, DDR2 1,8v, DDR3 1.5v DDRx, where x refers to voltage and signaling specs. DDR1 was 2.5v, DDR2 1,8v, DDR3 1.5v  Graphics cards now using GDDR4 (Graphics Double Data Rate) memory chips  Memory clocks of 900MHz or so (xfer 1800MHz equivalent)

36 RAMBUS DRAM (RDRAM, XDR)  RDRAM Another attempt to alleviate pinout limits Another attempt to alleviate pinout limits Many (16-32), smaller banks per chip Many (16-32), smaller banks per chip Made to be read/written in packet protocol Made to be read/written in packet protocol  Each chip has more of a controller Did not do well in market. High latency. Did not do well in market. High latency.  XDR A newer technology A newer technology Differential, low voltage swing signaling Differential, low voltage swing signaling Used in PS3, 65 GB/s xfer rate Used in PS3, 65 GB/s xfer rate

37 DRAM Controllers  Very common to have circuit that controls memory Handles banks Handles banks Handles refresh Handles refresh  Multiplexes column and row addresses RAS and CAS timing RAS and CAS timing  Northbridge on PC chip set

38 Memory Interleaving  Goal: Try to take advantage of bandwidth of multiple DRAMs in memory system  Memory address A is converted into (b,w) pair, where b = bank index b = bank index w = word index within bank w = word index within bank  Logically a wide memory Accesses to B banks staged over time to share internal resources such as memory bus Accesses to B banks staged over time to share internal resources such as memory bus  Interleaving can be on Low-order bits of address (cyclic) Low-order bits of address (cyclic)  b = A mod B, w = A div B High-order bits of address (block) High-order bits of address (block) Combination of the two (block-cyclic) Combination of the two (block-cyclic)

39 Low-order Bit Interleaving

40 Mixed Interleaving  Memory address register is 6 bits wide Most significant 2 bits give bank address Most significant 2 bits give bank address Next 3 bits give word address within bank Next 3 bits give word address within bank LSB gives (parity of) module within bank LSB gives (parity of) module within bank  6 = = (00, 011, 0) = (0, 3, 0)  41 = = (10, 100, 1) = (2, 4, 1)

41 Other types of Memory  ROM = Read-only Memory  Flash = ROM which can be written once in a while Used in embedded systems, small microcontrollers Used in embedded systems, small microcontrollers Offer IP protection, security Offer IP protection, security  Other?