Main Memory Background Random Access Memory (vs. Serial Access Memory) Cache uses SRAM: Static Random Access Memory –No refresh (6 transistors/bit vs.

Main Memory Background Random Access Memory (vs. Serial Access Memory) Cache uses SRAM: Static Random Access Memory –No refresh (6 transistors/bit vs. 1 transistor Size: DRAM Cost: DRAM Speed: SRAM Main Memory is DRAM: Dynamic Random Access Memory –Dynamic since needs to be refreshed periodically –Addresses divided into 2 halves (Memory as a 2D matrix): RAS or Row Access Strobe CAS or Column Access Strobe

SRAM vs. DRAM DRAM = Dynamic RAM SRAM: 6T per bit –built with normal high-speed CMOS technology DRAM: 1T per bit –built with special DRAM process optimized for density

Hardware Structures bb SRAM wordline b DRAM wordline

DRAM Chip Organization Row Decoder Sense Amps Column Decoder Memory Cell Array Row Buffer Row Address Column Address Data Bus

DRAM Chip Organization (2) Differences with SRAM reads are destructive: contents are erased after reading –row buffer read lots of bits all at once, and then parcel them out based on different column addresses –similar to reading a full cache line, but only accessing one word at a time “Fast-Page Mode” FPM DRAM organizes the DRAM row to contain bits for a complete page –row address held constant, and then fast read from different locations from the same page

Row Buffer Refresh So after a read, the contents of the DRAM cell are gone The values are stored in the row buffer Write them back into the cells for the next read in the future Sense Amps DRAM cells

Refresh (2) Fairly gradually, the DRAM cell will lose its contents even if it’s not accessed –This is why it’s called “dynamic” –Contrast to SRAM which is “static” in that once written, it maintains its value forever (so long as power remains on) All DRAM rows need to be regularly read and re-written 1 Gate Leakage 0 If it keeps its value even if power is removed, then it’s “non-volatile” (e.g., flash, HDD, DVDs)

DRAM Read Timing Accesses are asynchronous: triggered by RAS and CAS signals, which can in theory occur at arbitrary times (subject to DRAM timing constraints)

SDRAM Read Timing Burst Length Double-Data Rate (DDR) DRAM transfers data on both rising and falling edge of the clock Timing figures taken from “A Performance Comparison of Contemporary DRAM Architectures” by Cuppu, Jacob, Davis and Mudge Command frequency does not change

Dynamic RAM SRAM cells exhibit high speed/poor density DRAM: simple transistor/capacitor pairs in high density form Word Line Bit Line C Sense Amp......

Other Types of DRAM Synchronous DRAM (SDRAM): Ability to transfer a burst of data given a starting address and a burst length – suitable for transferring a block of data from main memory to cache. Page Mode DRAM: Access all bits on the same ROW –RAS keep active, Toggle CAS with new column address Extended Data Output (EDO) –A new access cycle can be started while keeping the data output of the previous cycle active. Rambus DRAM (RDRAM) - Uses pipelining to move data from RAM to cache memory.

Rambus (RDRAM) Synchronous interface Row buffer cache –last 4 rows accessed cached Uses other tricks since adopted by SDRAM –multiple data words per clock, high frequencies Chips can self-refresh Expensive for PC’s, used by X-Box, PS2

Faster DRAM Speed Clock FSB faster –DRAM chips may not be able to keep up Latency dominated by wire delay –Bandwidth may be improved (DDR vs. regular) but latency doesn’t change much Instead of 2 cycles for row access, may take 3 cycles at a faster bus speed Doesn’t address latency of the memory access

Memory Interleaving Interleaved memory is a design made to compensate for the relatively slow speed of dynamic random-access memory (DRAM). Main memory divided into two or more sections. The CPU can access alternate sections immediately, without waiting for memory to catch up (through wait states). Interleaved memory is more flexible than wide-access memory in that it can handle multiple independent accesses at once.

Memory Interleaving cont. For example, in an interleaved system with two memory banks (assuming word- addressable memory), if logical address 32 belongs to bank 0, then logical address 33 would belong to bank 1, logical address 34 would belong to bank 0, and so on. An interleaved memory is said to be n-way interleaved when there are n banks and memory location i resides in bank i mod n.word- addressable

Latency Width/Speed varies depending on memory type Significant wire delay just getting from the CPU to the memory controller More wire delay getting to the memory chips (plus the return trip…)

So what do we do about it? Caching –reduces average memory instruction latency by avoiding DRAM altogether Limitations –Capacity programs keep increasing in size –Compulsory misses

Idea: Caching! Not caching of data, but caching of translations 0K 4K 8K 12K Virtual Addresses 0K 4K 8K 12K 16K 20K 24K 28K Physical Addresses 8 16 0 20 44 12X VPN 8 PPN 16

Data movement in a memory hierarchy. Memory Hierarchy: The Big Picture

Virtual Memory has own terminology Each process has its own private “virtual address space” (e.g., 2 32 Bytes); CPU actually generates “virtual addresses” Each computer has a “physical address space” (e.g., 128 MegaBytes DRAM); also called “real memory” Address translation: mapping virtual addresses to physical addresses –Allows multiple programs to use (different chunks of physical) memory at same time –Also allows some chunks of virtual memory to be represented on disk, not in main memory (to exploit memory hierarchy)

Virtual Memory Idea 1: Many Programs sharing DRAM Memory so that context switches can occur Idea 2: Allow program to be written without memory constraints – program can exceed the size of the main memory Idea 3: Relocation: Parts of the program can be placed at different locations in the memory instead of a big chunk. Virtual Memory: (1) DRAM Memory holds many programs running at same time (processes) (2) use DRAM Memory as a kind of “cache” for disk

Programmer’s View Example 32-bit memory –When programming, you don’t care about how much real memory there is –Even if you use a lot, memory can always be paged to disk Kernel Text Data Heap Stack 0-2GB 4GB AKA Virtual Addresses

Pages Memory is divided into pages, which are nothing more than fixed sized and aligned regions of memory –Typical size: 4KB/page (but not always) 0-4095 4096-8191 8192-12287 12288-16383 … Page 0 Page 1 Page 2 Page 3

Mapping Virtual Memory to Physical Memory Divide Memory into equal sized “chunks” (say, 4KB each) 0 Physical Memory  Virtual Memory Heap 64 MB 0 Any chunk of Virtual Memory assigned to any chunk of Physical Memory (“page”) Stack Heap Static Code Single Process

Page Table Map from virtual addresses to physical locations 0K 4K 8K 12K Virtual Addresses 0K 4K 8K 12K 16K 20K 24K 28K Physical Addresses “Physical Location” may include hard-disk Page Table implements this V  P mapping

Page Tables 0K 4K 8K 12K 0K 4K 8K 12K 16K 20K 24K 28K 0K 4K 8K 12K Physical Memory

Need for Translation Virtual Address Virtual Page NumberPage Offset Page Table Main Memory Physical Address 0xFC51908B 0x001520xFC519 0x0015208B

Choosing a Page Size Page size inversely proportional to page table overhead Large page size permits more efficient transfer to/from disk –vs. many small transfers –Like downloading from Internet Small page leads to less fragmentation –Big page likely to have more bytes unused

Translation Cache: TLB TLB = Translation Look-aside Buffer TLB Virtual Address Cache Data Physical Address Cache Tags Hit? If TLB hit, no need to do page table lookup from memory Note: data cache accessed by physical addresses now

Impact on Performance? Every time you load/store, the CPU must perform two (or more) accesses! Even worse, every fetch requires translation of the PC! Observation: –Once a virtual page is mapped into a physical page, it’ll likely stay put for quite some time

Main Memory Background Random Access Memory (vs. Serial Access Memory) Cache uses SRAM: Static Random Access Memory –No refresh (6 transistors/bit vs.

Similar presentations

Presentation on theme: "Main Memory Background Random Access Memory (vs. Serial Access Memory) Cache uses SRAM: Static Random Access Memory –No refresh (6 transistors/bit vs."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Main Memory Background Random Access Memory (vs. Serial Access Memory) Cache uses SRAM: Static Random Access Memory –No refresh (6 transistors/bit vs.

Similar presentations

Presentation on theme: "Main Memory Background Random Access Memory (vs. Serial Access Memory) Cache uses SRAM: Static Random Access Memory –No refresh (6 transistors/bit vs."— Presentation transcript:

Similar presentations

About project

Feedback