1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small.

1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small size and allow for higher miss rate –Usually implemented on the same die as the processor Second level designed –to reduce miss rate (miss penalty) –to be larger in size –Can be on or off-chip (built from SRAMs)

2  2004 Morgan Kaufmann Publishers Multilevel cache: example (page 505) Processor with base CPI = 1.0 (assuming all references hit in primary cache) Clock rate 5 GHz Main memory access time of 100 ns (including miss handling) Miss rate per instruction at primary cache is 2% How much faster will be the processor if we add a secondary level cache with access time 5 ns and large enough to reduce miss rate to main memory to 0.5%?

3  2004 Morgan Kaufmann Publishers Solution Using total execution time

4  2004 Morgan Kaufmann Publishers Cache Complexities Not always easy to understand implications of caches: Theoretical behavior of Radix sort vs. Quicksort Observed behavior of Radix sort vs. Quicksort

5  2004 Morgan Kaufmann Publishers Cache Complexities Here is why: Memory system performance is often critical factor –multilevel caches, pipelined processors, make it harder to predict outcomes –Compiler optimizations to increase locality sometimes hurt ILP Think of putting instructions that access same data near each other in code leading to data hazards Difficult to predict best algorithm: need experimental data

6  2004 Morgan Kaufmann Publishers Virtual memory

Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media)

Issues DRAM is too expensive to buy gigabytes –Yet we want our programs to work even if they require more DRAM than we bought. –We also don’t want a program that works on a machine with 128MB of DRAM to stop working if we try to run it on a machine with only 64MB of main memory. We run more than one program on the machine. –The sum of needed memory for all of them usually exceeds the amount of available memory –We need to protect the programs from each other

Virtual Memory Virtual memory technique: Main memory can act as a cache for the secondary storage (disk) Virtual memory is responsible for the mapping of blocks of memory (called pages) from one set of addresses (called virtual addresses) to another set (called physical addresses)

Virtual memory advantages Illusion of having more physical memory –Keep only active portions of a program in RAM Program relocation –Maps virtual address of a program to a physical address in memory –Put program anywhere in memory no need to have a single contiguous block of main memory, program relocated as a set of fixed-size pages Protection (code and data between simultaneously running programs) –Each program has its own address space –Virtual memory implements translation of program address space into physical address while enforcing protection

Virtual memory terminology Blocks are called Pages –A virtual address consists of A virtual page number A page offset field (low order bits of the address) Misses are call Page faults –and they are generally handled as an exception Virtual page numberPage offset 01131

Page faults Page faults: the data is not in memory, retrieve it from disk –huge miss penalty [main memory is about 100,000 times faster than disk], thus pages should be fairly large (4KB to 16 KB) –reducing page faults is important (LRU is worth the price) –can handle the faults in software instead of hardware [overhead will be small compared to disk access time] –using write-through is too expensive so we use write-back The structure that holds the information related to the pages (i.e. page is in memory or disk) is called Page Table –Page table is stored in memory

Here the page size is 2 12 = 4 KB (determined by number of bits in page offset) Number of allowed physical pages = 2 18 Thus, main memory is at most 1GB (=2 30 ) while virtual address space is 4GB (=2 32 ) CPU ( an address in main memory)

Placing a page and finding it again High penalty of a page fault necessitates optimizing page placement –Fully associative is attractive as it allows OS to replace any page by sophisticated LRU algorithms Full search of main memory is impractical –Use a page table that indexes the memory and resides in memory –Page table indexed by page number from the virtual address (no need of tags) –Page table for each program –Page table register to indicate page table location in memory –Page table may contain entries not in main memory, rather on disk

Pointer to the starting address of the page table in memory Here the page size is 2 12 = 4 KB (determined by number of bits in page offset) Number of allowed physical pages = 2 18 Thus, main memory is at most 1GB (=2 30 ) while virtual address space is 4GB (=2 32 ) Number of entries in page table is 2 20 (very large)

The OS Role OS indexes the page table OS moves the pages in/out of the memory When a process is created the OS tries to reserve in the disk enough space for all the pages. This space is called swap area. Page table maps each page in virtual memory to either a page in main memory or on disk

Problem Consider a virtual memory system with the following properties: 40-bit virtual address 36-bit physical address 16KB page size What is the total size of the page table for each process on this processor, assuming that the valid, protection, dirty, and use bits take a total of 4 bits, and that all the virtual pages are in use?

Solution The total size is equal to the number of entries times the size of each entry. Each page is 16 KB, and thus, 14 bits of the virtual and physical address will be used as a page offset. The remaining 40 – 14 = 26 bits of the virtual address constitute the virtual page number There are thus 2 26 entries in the page table, one for each virtual page number. Each entry requires 36 – 14 = 22 bits to store the physical page number and an additional 4 bits for the valid, protection, dirty, and use bits. We round the 26 bits up to a full word per entry, so this gives us a total size of 2 26 x 32 bits or 256 MB.

Performance of virtual memory We must access physical memory to access the page table to make the translation from a virtual address to a physical one Then we access physical memory again to get (or store) the data A load instruction performs at least 2 memory reads A store instruction performs at least 1 read and then a write.

Translation lookaside buffer We fix this performance problem by avoiding main memory in the translation from virtual to physical pages. We buffer the common translations in a Translation lookaside buffer (TLB), a fast cache memory dedicated to storing a small subset of valid Virtual-to-Physical translations. It is usually put before the cache

21  2004 Morgan Kaufmann Publishers Main design questions for memory hierarchy Where can a block be placed? How is a block found? Which block should be replaced on a miss? What happens on a write?

22  2004 Morgan Kaufmann Publishers HW HW: 7.10, 7.14, 7.20, 7.32 –Due Dec 23 Section problems: 7.9, 7.12, 7.29, 7.33

Chapters 8

Interfacing Processors and Peripherals ( محيطى / خارجى ) I/O Design affected by many factors (expandability, resilience, dependability) Performance: – Measured by access latency throughput – Depends on connection between devices and the system the memory hierarchy the operating system A variety of different users (e.g., banks, supercomputers, engineers)

Which performance measure is important? Depends on the application – Multimedia applications, most I/O requests are long streams, BW is important – File tax processing, lots of small I/O requests, need to handle large number of small I/O requests simultaneously – ATM transactions, both high throughput and short response time

I/O Devices Very diverse devices — behavior (i.e., input vs. output) — partner (who is at the other end? Human/machine) — data rate

I/O Example: Disk Drives To access data: — seek: position head over the proper track (3 to 14 ms. avg.) — rotational latency: wait for desired sector (.5 / RPM) — transfer: grab the data (one or more sectors) 30 to 80 MB/sec

Example page 570

solution

Dependability, reliability and availability Dependability: fuzzy concept – Needs reference specification System alternating between two states: 1.Service accomplishment: service delivered as specified 2.Service interruption: delivered service is different from specified Transition from 1 to 2  failure Transition from 2 to 1  restoration

1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small.

Similar presentations

Presentation on theme: "1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small.

Similar presentations

Presentation on theme: "1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small."— Presentation transcript:

Similar presentations

About project

Feedback