Presentation is loading. Please wait.

Presentation is loading. Please wait.

Main Memory CS448.

Similar presentations


Presentation on theme: "Main Memory CS448."— Presentation transcript:

1 Main Memory CS448

2 Main Memory Bottom of the memory hierarchy Measured in
Access Time Time between a read is requested and data delivered Cycle Time Minimum time between requests to memory Greater than access time to ensure address lines are stable Three Important Issues Capacity Bell’s law - 1 MB per MIP needed for balance, avoid page faults Latency Time to access the data Bandwidth Amount of data that can be transferred

3 Early DRAMs Dynamic RAM Number of address lines was a large cost issue
Solution: multiplex the address lines e.g., :Address: CAS CAS = Column Address Select RAS = Row Address Select DRAM Multiplexed RAS

4 DRAMS Dynamic RAM Transistor stores each bit Loss over time
Must periodically “refresh” the bits All bits in a row can be refreshed by reading that row Memory controllers periodically refresh, e.g. every 8 ms If the CPU tries to access memory during the refresh, we must wait (hopefully won’t occur often) Typical cycle times 60-90ns

5 SRAMs Static RAM Typical memories Does not need a refresh
Faster than DRAM, generally not multiplexed But more expensive Typical memories DRAM 4-8 times the capacity of SRAM Used for main memory SRAM 8-16 times faster than DRAM Typical cycle times 4-7ns Also 8-16 times as expensive Used to build cache Exceptions; Cray built main memory out of SRAM

6 Memory Example Consider the following scenario
1 cycle to send the address 4 cycles per word of access 1 cycle to transmit the data If main memory is organized by word 6 cycles for every word Given a cache line of 4 words 24 cycles is the miss penalty Yikes! What can we do about this penalty?

7 #1 : More Bandwidth to Memory
Make a word of main memory look like a cache line Easy to do conceptually Say we want 4 words, so send all four words back on the bus at one time instead of one after the other In the previous scenario, we could send 1 address, access the four words (4 cycles if in parallel or 16 if sequential), and then send all data at once in 1 more cycle for a total of 6 or 18 cycles Still better than 24 even if we don’t access each word in parallel Problem Need a wider bus, which is expensive Especially true if this contributes to the pin count on the CPU Usually the bus width to memory will match the width of the L2 cache

8 #2 : Interleaved Memory Banks
Take advantage of potential parallelism by interleaving memory 4-way interleaved memory Allow simultaneous access to data in different memory banks Good for sequential data

9 #3: Independent Memory Banks
Can take the idea of interleaved banks a bit farther… make independent banks altogether Multiple independent accesses Multiple memory controllers Each bank needs separate address lines and probably a separate data bus Will see this idea appear again with multiprocessors that share a common memory

10 Summary of Techniques

11 #4 : Avoiding Memory Bank Conflicts
Have the compiler schedule code to operate on all data in a bank first before moving on to a separate bank Might result in cache conflicts Similar to previous optimizations we saw using compiler-based scheduling

12 Other Types of RAM SDRAM RDRAM Synchronous Dynamic RAM
Like a DRAM, but uses pipelining across two sets of memory addresses 8-10ns read cycle time RDRAM Rambus Direct RAM Also uses pipelining Replace RAS/CAS with a packet-switched bus, new interface to act like a memory system not memory component Can deliver one byte every 2ns Somewhat expensive, most architectures need a new bus systems to really take advantage of the increased RAM speed


Download ppt "Main Memory CS448."

Similar presentations


Ads by Google