Presentation is loading. Please wait.

Presentation is loading. Please wait.

Main Memory by J. Nelson Amaral.

Similar presentations


Presentation on theme: "Main Memory by J. Nelson Amaral."— Presentation transcript:

1 Main Memory by J. Nelson Amaral

2 Types of Memories Read/Write Memory (RWM):
we can store and retrieve data. Random Access Memory (RAM): the time required to read or write a bit of memory is independent of the bit’s location. Static Random Access Memory (SRAM): once a word is written to a location, it remains stored as long as power is applied to the chip, unless the location is written again. the data stored at each location must be refreshed periodically by reading it and then writing it back again, or else it disappears. Dynamic Random Access Memory (DRAM): CMPUT 229

3 Static × Dynamic Memory Cell
Static Memory Cell (6 transistors) word line bit line Dynamic Memory Cell (1 transistor) CMPUT Computer Organization and Architecture II

4 Writing 1 in a Dynamic Memories
bit line word line To store a 1 in this cell, a HIGH voltage is placed on the bit line, causing the capacitor to charge through the on transistor. CMPUT Computer Organization and Architecture II

5 Writing 0 in a Dynamic Memories
bit line word line To store a 0 in this cell, a LOW voltage is placed on the bit line, causing the capacitor to discharge through the on transistor. CMPUT Computer Organization and Architecture II

6 CMPUT 329 - Computer Organization and Architecture II
Destructive Reads bit line word line To read the DRAM cell, the bit line is precharged to a voltage halfway between HIGH and LOW, and then the word line is set HIGH. Depending on the charge in the capacitor, the precharged bit line is pulled slightly higher or lower. A sense amplifier detects this small change and recovers a 1 or a 0. CMPUT Computer Organization and Architecture II

7 Recovering from Destructive Reads
bit line word line The read operation discharges the capacitor. Therefore a read operation in a dynamic memory must be immediately followed by a write operation of the same value read to restore the capacitor charges. CMPUT Computer Organization and Architecture II

8 CMPUT 329 - Computer Organization and Architecture II
Forgetful Memories bit line word line The problem with this cell is that it is not bi-stable: only the state 0 can be kept indefinitely, when the cell is in state 1, the charge stored in the capacitor slowly dissipates and the data is lost. CMPUT Computer Organization and Architecture II

9 Refreshing the Memory: Why DRAMs are Dynamic
Vcap 0V HIGH LOW VCC time 0 stored 1 written refreshes The solution is to periodically refresh the memory cells by reading and writing back each one of them. CMPUT 229

10 1 2 3 4 5 6 7 3-to-8 decoder 1 A2 A1 A0 2 1 DIN3 DIN2 DIN1 DIN0 WE_L
1 2 3 4 5 6 7 DIN3 DIN2 DIN1 DIN0 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR 3-to-8 decoder IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR 1 A2 A1 A0 2 1 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR WE_L WR_L CS_L IOE_L OE_L DOUT3 DOUT2 DOUT1 DOUT0

11 1 2 3 4 5 6 7 3-to-8 decoder 1 A2 A1 A0 2 1 DIN3 DIN3 DIN3 DIN3 WE_L
1 2 3 4 5 6 7 DIN3 DIN3 DIN3 DIN3 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR 3-to-8 decoder IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR 1 A2 A1 A0 2 1 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR WE_L WR_L CS_L IOE_L OE_L DOUT3 DOUT3 DOUT3 DOUT3

12 1 2 3 4 5 6 7 3-to-8 decoder 1 A2 A1 A0 2 1 DIN3 DIN3 DIN3 DIN3 WE_L
1 2 3 4 5 6 7 DIN3 DIN3 DIN3 DIN3 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR 3-to-8 decoder IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR 1 A2 A1 A0 2 1 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR WE_L WR_L CS_L IOE_L OE_L DOUT3 DOUT3 DOUT3 DOUT3

13 1 2 3 4 5 6 7 3-to-8 decoder 1 A2 A1 A0 2 1 DIN3 DIN3 DIN3 DIN3 WE_L
1 2 3 4 5 6 7 DIN3 DIN3 DIN3 DIN3 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR 3-to-8 decoder IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR 1 A2 A1 A0 2 1 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR WE_L WR_L CS_L IOE_L OE_L DOUT3 DOUT3 DOUT3 DOUT3

14 Bi-directional Data Bus
microprocessor IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR WE_L WR_L CS_L IOE_L OE_L DIO3 DIO2 DIO1 DIO0 CMPUT 229

15 DRAM High Level View DRAM chip Cols 1 2 3 Memory controller addr 1
1 2 3 Memory controller 2 / addr 1 Rows 2 (to CPU) 3 8 / data Internal row buffer CMPUT 229 Byant/O’Hallaron, pp. 459

16 RAS = Row Address Strobe
DRAM RAS Request RAS = 2 Cols Rows 1 2 3 Internal row buffer DRAM chip Row 2 addr data / 8 Memory controller RAS = Row Address Strobe CMPUT 229 Byant/O’Hallaron, pp. 460

17 CAS = Column Address Strobe
DRAM CAS Request DRAM chip Cols Memory controller 1 2 3 CAS = 1 2 / addr 1 Rows Supercell (2,1) 2 3 8 / data Internal row buffer CAS = Column Address Strobe CMPUT 229 Byant/O’Hallaron, pp. 460

18 Memory Modules Byant/O’Hallaron, pp. 461 addr (row = i, col = j)
: Supercell (i,j) 31 7 8 15 16 23 24 32 63 39 40 47 48 55 56 64-bit double word at main memory address A addr (row = i, col = j) data 64 MB memory module consisting of 8 8Mx8 DRAMs Memory controller bits 0-7 DRAM 7 DRAM 0 8-15 16-23 24-31 32-39 40-47 48-55 56-63 64-bit doubleword to CPU chip Memory Modules Byant/O’Hallaron, pp. 461

19 Read Cycle on an Asynchronous DRAM
Step 1: Apply row address 1 Step 8: RAS and CAS return to high 8 Step 2: RAS go from high to low and remain low 2 Step 5: CAS goes from high to low and remain low 5 Step 3: Apply column address 3 Step 4: WE must be high 4 Step 6: OE goes low 6 Step 7: Data appears 7 Read Cycle on an Asynchronous DRAM

20 Improved DRAMs Central Idea: Each read to a DRAM actually
reads a complete row of bits or word line from the DRAM core into an array of sense amps. A traditional asynchronous DRAM interface then selects a small number of these bits to be delivered to the cache/microprocessor. All the other bits already extracted from the DRAM cells into the sense amps are wasted. CMPUT 229

21 Fast Page Mode DRAMs In a DRAM with Fast Page Mode, a page is defined as all memory addresses that have the same row address. To read in fast page mode, all the steps from 1 to 7 of a standard read cycle are performed. Then OE and CAS are switched high, but RAS remains low. Then the steps 3 to 7 (providing a new column address, asserting CAS and OE) are performed for each new memory location to be read. CMPUT 229

22 A Fast Page Mode Read Cycle on an Asynchronous DRAM

23 Enhanced Data Output RAMs (EDO-RAM)
The process to read multiple locations in an EDO-RAM is very similar to the Fast Page Mode. The difference is that the output drivers are not disabled when CAS goes high. This distinction allows the data from the current read cycle to be present at the outputs while the next cycle begins. As a result, faster read cycle times are allowed. CMPUT 229

24 An Enhanced Data Output Read Cycle on an Asynchronous DRAM

25 Synchronous DRAMs (SDRAM)
A Synchronous DRAM (SDRAM) has a clock input. It operates in a similar fashion as the fast page mode and EDO DRAM. However the consecutive data is output synchronously on the falling/rising edge of the clock, instead of on command by CAS. How many data elements will be output (the length of the burst) is programmable up to the maximum size of the row. The clock in an SDRAM typically runs one order of magnitude faster than the access time for individual accesses. CMPUT 229

26 DDR SDRAM A Double Data Rate (DDR) SDRAM is an SDRAM
that allows data transfers both on the rising and falling edge of the clock. Thus the effective data transfer rate of a DDR SDRAM is two times the data transfer rate of a standard SDRAM with the same clock frequency. A Quad Data Rate (QDR) SDRAM doubles the data transfer rate again by separating the input and output of a DDR SDRAM. CMPUT 229 P-H 473

27 Main Memory Supporting Caches
Morgan Kaufmann Publishers 17 April, 2017 Main Memory Supporting Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width clocked bus Bus clock is typically slower than CPU clock Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 27 P-H 471 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

28 Improving Memory Bandwidth
Baer p. 248

29 SIMM × DIMM SIMM ≡ Single Inline Memory Module DIMM ≡ Dual Inline
Uses two edges of the physical connector → twice as many connections to the chip

30 Memory System Example 1 bus cycle for address transfer
15 bus cycles per DRAM access 1 bus cycle per data transfer Cache 4-word cache block Memory Bus Miss penalty = 1 + 4×15 + 4×1 = 65 bus cycles Bandwidth = 16 bytes / 65 cycles = 0.25 byte/cycle Memory 1-word wide DRAM Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 30 P-H 471

31 Example: Wider Memory 1 bus cycle for address transfer
15 bus cycles per DRAM access 1 bus cycle per data transfer Cache 4-word cache block Memory Bus 4-word wide DRAM Miss penalty = = 17 bus cycles Bandwidth = 16 bytes / 17 cycles = 0.94 byte/cycle Memory Wider bus/memories are costly! Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 31 P-H 471

32 Example: Interleaved Memory
1 bus cycle for address transfer 15 bus cycles per DRAM access 1 bus cycle per data transfer Cache 4-word cache block Memory Bus Bank Bank Bank Bank Miss penalty = ×1 = 20 bus cycles Bandwidth = 16 bytes / 20 cycles = 0.8 byte/cycle Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 32 P-H 471

33 Split –Transaction Bus
Issue: Memory should not hold the processor-memory bus while it fetches the data to its buffers. Solution: Split-transaction bus Phase 1 for access A can be in parallel with Phase 2 for access B Example (load): Phase 1: Processor sends address and operation type to bus, then releases the bus Phase 2: Memory fetches data into its buffers. Phase 3: Memory controller requests the bus Memory sends the data into the bus. Release the bus Baer p. 250

34 Bank Interleaving and Cache Indexing
Issue: In both cases, cache Index overlaps Bank Index Cache Index Cache Displ. Cache Tag Bank Index ⇒ on a miss, the missing line is in the same bank as the replaced line. Line Interleaving ⇒ full penalty for precharge, row and column access Page Index Page Offset Page Interleaving Baer p. 249

35 Bank Interleaving and Cache Indexing
Solution: bank rehash by XORing the k bits of the bank index with k bits of the tag. Baer p. 250

36 Memory Controller Transactions do not need to be processed in order.
Intelligent controllers optimize accesses by reordering transactions. Baer p. 250

37 Memory Controller Why the controller’s job is difficult?
Must obey more than 50 timing constraints 2. Must prioritize requests to optimize performance Scheduling decisions have long-term consequence: Future requests depends on which request is served first (which instruction is unblocked). Benefit of a scheduling decision depends on future processor behavior. IpekISCA2008 p. 40

38 Reinforcement-Learning Controller
IpekISCA2008 p. 41

39 Reinforcement-Learning Controller
IpekISCA2008 p. 42

40 Reinforcement Learning Controller Performance
Peak BW: 6.4 GB/s In-Order FR-FCFS RL Optimistic 26% 46% 56% 80% Bus Utilization: 4-core system IpekISCA2008 p. 42

41 Online RL is better than offline RL
IpekISCA2008 p. 48

42 Rambus Introduced in 1997 SDRAMs were at 100 MHz
and had a peak of 0.4 GB/s Narrow and fast buses. Split transactions Separate row and column control lines Rambus 2010: 64-bit DDR DRAMs at 133 MHz ⇒ same peak 400 MHz GB/s 16 internal banks

43 Morgan Kaufmann Publishers
17 April, 2017 DRAM Generations Access time to a new row/column (ns) Year Capacity $/GB 1980 64Kbit $ 1983 256Kbit $500000 1985 1Mbit $200000 1989 4Mbit $50000 1992 16Mbit $15000 1996 64Mbit $10000 1998 128Mbit $4000 2000 256Mbit $1000 2004 512Mbit $250 2007 1Gbit $50 Year Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 43 P-H 474 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy


Download ppt "Main Memory by J. Nelson Amaral."

Similar presentations


Ads by Google