What is it and why do we need it? Chris Ward CS147 10/16/2008.

What is it and why do we need it? Chris Ward CS147 10/16/2008

What drives us to require cache? How and Why does it work? ???

 What we would prefer in our Computer Memory:  Fast  Large  Cheap However,  Very Fast memory = Very expen$ive memory  Since we need large capacity ( today multi- gigabyte memories) we need to build a system that is the best compromise to keep the total $$ reasonable.

SRAM DRAM DRAMs use only one transistor, plus a capacitor. DRAMs are smaller and less expensive because SRAMs are made from four to six transistors (flip flops) per bit. SRAMs don't require external refresh circuitry or other work in order for them to keep their data intact. SRAM is faster than DRAM

 In the early days of PC technology, memory access was only slightly slower than register access  Since the 1980s the performance gap between processor and memory has been growing.  CPU speed continues to double every few years, while the speed of disk and RAM cannot boast such a rapid rate of speed improvements.  For Main Memory RAM, the speed has increased from 50 nanoseconds (one billionth of a second) to <2 nanoseconds, a 25x improvement over a 30-year period

 It has been discovered that for about 90% of the time that our programs execute only 10% of our code is used!  This is known as the Locality Principle  Temporal Locality  When a program asks for a location in memory, it will likely ask for that same location again very soon thereafter  Spatial Locality  When a program asks for a memory location at a memory address (lets say 1000)… It will likely need a nearby location 1001,1002,1003,10004 … etc.

Construct a Memory Hierarchy which tricks the CPU into thinking it has a very fast, large, cheap memory system.

fastest possible access (usually 1 CPU cycle) Registers <1 ns often accessed in just a few cycles, usually tens – hundreds of kilobytes ~$80/MB Level 1 (SRAM) cache 2-8ns higher latency than L1 by 2× to 10×, now multi- MB ~$80/MB Level 2 (SRAM) cache 5-12ns may take hundreds of cycles, but can be multiple gigabytes eg.2GB $11 ($0.0055/MB) Main memory (DRAM) 10- 60ns millions of cycles latency, but very large eg.1TB $139 ($.000139/MB) Disk storage 3,000,000 - 10,000,000 ns several seconds latency, can be huge Tertiary storage (really slow) For a 1 GHz CPU a 50 ns wait means 50 wasted clock cycles Main Memory and Disk estimates Fry’s Ad 10/16/2008

 We established that the Locality principle states that only a small amount of Memory is needed for most of the program’s lifetime…  We now have a Memory Hierarchy that places very fast yet expensive RAM near the CPU and larger – slower – cheaper RAM further away…  The trick is to keep the data that the CPU wants in the small expensive fast memory close to the CPU … and how do we do that???

 Hardware and the Operating System are responsible for moving data throughout the Memory Hierarchy when the CPU needs it.  Modern programming languages mainly assume two levels of memory, main memory and disk storage.  Programmers are responsible for moving data between disk and memory through file I/O.  Optimizing compilers are responsible for generating code that, when executed, will cause the hardware to use caches and registers efficiently.

 A computer program or a hardware-maintained structure that is designed to manage a cache of information  When the smaller cache is full, the algorithm must choose which items to discard to make room for the new data  The "hit rate" of a cache describes how often a searched-for item is actually found in the cache  The "latency" of a cache describes how long after requesting a desired item the cache can return that item

Each replacement strategy is a compromise between hit rate and latency.  Direct Mapped Cache  The direct mapped cache is the simplest form of cache and the easiest to check for a hit.  Unfortunately, the direct mapped cache also has the worst performance, because again there is only one place that any address can be stored.  Fully Associative Cache  The fully associative cache has the best hit ratio because any line in the cache can hold any address that needs to be cached.  However, this cache suffers from problems involving searching the cache  A replacement algorithm is used usually some form of a LRU "least recently used" algorithm  N-Way Set Associative Cache  The set associative cache is a good compromise between the direct mapped and set associative caches.

What happens when we run out of main memory? Our programs need more and more RAM!

 Virtual Memory is basically the extension of physical main memory (RAM) into a lower cost portion of our Memory Hierarchy (lets say… Hard Disk)  A form of the Overlay approach, managed by the OS, called Paging is used to swap “pages” of memory back and forth between the Disk and Physical Ram.  Hard Disks are huge, but to you remember how slow they are??? Millions of times slower that the other memories in our pyramid.

 http://en.wikipedia.org/wiki/CPU_cache http://en.wikipedia.org/wiki/CPU_cache  http://download.intel.com/pressroom/kits/IntelProcessorHistory.pdf http://download.intel.com/pressroom/kits/IntelProcessorHistory.pdf  http://processorfinder.intel.com/details.aspx?sSpec=SLBBD http://processorfinder.intel.com/details.aspx?sSpec=SLBBD  http://www.dba-oracle.com/t_history_ram.htm http://www.dba-oracle.com/t_history_ram.htm  http://www.superssd.com/products/ramsan-400/indexb.htm http://www.superssd.com/products/ramsan-400/indexb.htm  http://www.pcguide.com/ref/ram/types_DRAM.htm http://www.pcguide.com/ref/ram/types_DRAM.htm  http://en.wikipedia.org/wiki/Memory_hierarchy http://en.wikipedia.org/wiki/Memory_hierarchy  http://e-articles.info/e/a/title/Memory-Basics-~-ROM-DRAM-SRAM- Cache-Memory/ http://e-articles.info/e/a/title/Memory-Basics-~-ROM-DRAM-SRAM- Cache-Memory/  http://en.wikipedia.org/wiki/Cache_algorithms http://en.wikipedia.org/wiki/Cache_algorithms  http://www.pcguide.com/ref/mbsys/cache/funcWhy-c.html http://www.pcguide.com/ref/mbsys/cache/funcWhy-c.html

m(mili)10^-3k(kilo)10^3 micro (µ)(micro)10^-6M(mega)10^6 n(nano)10^-9G(giga)10^9 p(pico)10^-12T(tera)10^12 f(femto)10^-15P(peta)10^15 a(atto)10^-18E(exa)10^18 z(zepto)10^-21Z(zeta)10^21 Y(yotta)10^24

What is it and why do we need it? Chris Ward CS147 10/16/2008.

Similar presentations

Presentation on theme: "What is it and why do we need it? Chris Ward CS147 10/16/2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

What is it and why do we need it? Chris Ward CS147 10/16/2008.

Similar presentations

Presentation on theme: "What is it and why do we need it? Chris Ward CS147 10/16/2008."— Presentation transcript:

Similar presentations

About project

Feedback