CacheLab Recitation 7 10/8/2012. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Tips (warnings, getopt,

Slides:

Advertisements

Similar presentations

CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.

Advertisements

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Simulations of Memory Hierarchy LAB 2: CACHE LAB.

Recitation 7 Caching By yzhuang. Announcements Pick up your exam from ECE course hub ◦ Average is 43/60 ◦ Final Grade computation? See syllabus

Memory Hierarchy. Smaller and faster, (per byte) storage devices Larger, slower, and cheaper (per byte) storage devices.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

Caching I Andreas Klappenecker CPSC321 Computer Architecture.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Caches The principle that states that if data is used, its neighbor will likely be used soon.

11/3/2005Comp 120 Fall November 10 classes to go! Cache.

Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

Systems I Locality and Caching

ECE Dept., University of Toronto

Cache Lab Implementation and Blocking

Cache Lab Implementation and Blocking

CacheLab 10/10/2011 By Gennady Pekhimenko. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings.

CMPE 421 Parallel Computer Architecture

Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Lecture 5: Memory Performance. Types of Memory Registers L1 cache L2 cache L3 cache Main Memory Local Secondary Storage (local disks) Remote Secondary.

1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.

What is it and why do we need it? Chris Ward CS147 10/16/2008.

Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

Files. FILE * u In C, we use a FILE * data type to access files. u FILE * is defined in /usr/include/stdio.h u An example: #include int main() { FILE.

Cache Data Compaction: Milestone 2 Edward Ma, Siva Penke, Abhijeeth Nuthan.

نظام المحاضرات الالكترونينظام المحاضرات الالكتروني Cache Memory.

Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.

Chapter 9 Memory Organization. 9.1 Hierarchical Memory Systems Figure 9.1.

Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.

CMSC 611: Advanced Computer Architecture

CSE 351 Section 9 3/1/12.

The Goal: illusion of large, fast, cheap memory

Cache Memories CSE 238/2038/2138: Systems Programming

Section 9: Virtual Memory (VM)

Local secondary storage (local disks)

Caches and Blocking 8 October 2018.

Cache Memories September 30, 2008

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

Lecture 22: Cache Hierarchies, Memory

Andy Wang Operating Systems COP 4610 / CGS 5765

CMSC 611: Advanced Computer Architecture

Caches II CSE 351 Winter 2018 Instructor: Mark Wyse

Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory

Virtual Memory Overcoming main memory size limitation

Contents Memory types & memory hierarchy Virtual memory (VM)

CS 3410, Spring 2014 Computer Science Cornell University

Cache Memory and Performance

Sarah Diesburg Operating Systems CS 3430

Memory Principles.

Sarah Diesburg Operating Systems COP 4610

Presentation transcript:

CacheLab Recitation 7 10/8/2012

Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Tips (warnings, getopt, files) – Part (a) Building Cache Simulator – Part (b) Efficient Matrix Transpose Blocking

Memory Hierarchy Registers SRAM DRAM Local Secondary storage Remote Secondary storage We will discuss this interaction

SRAM vs DRAM tradeoff SRAM (cache) –Faster (L1 cache: 1 CPU cycle) –Smaller (Kilobytes (L1) or Megabytes (L2)) –More expensive and “energy-hungry” DRAM (main memory) –Relatively slower (hundreds of CPU cycles) –Larger (Gigabytes) –Cheaper

Caching Temporal locality – A memory location accessed is likely to be accessed again multiple times in the future – After accessing address X in memory, save the bytes in cache for future access Spatial locality – If a location is accessed, then nearby locations are likely to be accessed in the future. – After accessing address X, save the block of memory around X in cache for future access

Memory Address 64-bit on shark machines Block offset: b bits Set index: s bits

Cache A cache is a set of 2^s cache sets A cache set is a set of E cache lines –E is called associativity –If E=1, it is called “direct-mapped” Each cache line stores a block –Each block has 2^b bytes

Cachelab Warnings are errors! Include proper header files Part (a) Building a cache simulator Part (b) Optimizing matrix transpose

Warnings are Errors Strict compilation flags Reasons: – Avoid potential errors that are hard to debug – Learn good habits from the beginning Add “-Werror” to your compilation flags

Missing Header Files If function declaration is missing – Find corresponding header files – Use: man Live example – man 3 getopt

Getopt function

We want you to use getopt! You don’t have t, but why waste time reinventing the wheel? Your programs MUST us the same command line arguments as the reference programs or the autograder will not work

Part (a) Cache simulator A cache simulator is NOT a cache! – Memory contents NOT stored – Block offsets are NOT used – Simply counts hits, misses, and evictions Your cache simulator need to work for different s, b, E, given at run time. Use LRU replacement policy

Files #include FILE *my_fp=fopen(char * filename, char *mode) – Mode = “r” for read, “w+” for read/write, “w” for a new file – Returns NULL (or 0) if opening fails fscanf(fp,char *format, pointers to vars … – Same formats as printf – Returns # of items scanned – Returns EOF at the end of the file – Man fscanf for details – If reading a string, watch out for string length! Remember buf lab. Stops at white space fclose(fp) when done with the file

Cache simulator: Hints A cache is just 2D array of cache lines: – struct cache_line cache[S][E]; – S = 2^s, is the number of sets – E is associativity Each cache_line has: – Valid bit – Tag – LRU counter

Part (b) Efficient Matrix Transpose Matrix Transpose (A -> B) Matrix A Matrix B

Part (b) Efficient Matrix Transpose Matrix Transpose (A -> B) Suppose block size is 8 bytes (2 ints) Matrix A Matrix B Access A[0][0] cache miss Access B[0][0] cache miss Access A[0][1] cache hit Access B[1][0] cache miss Question: After we handle 1&2. Should we handle 3&4 first, or 5&6 first ? 1 2

Blocking What inspiration do you get from previous slide ? – Divide matrix into sub-matrices – This is called blocking (CSAPP2e p.629) – Size of sub-matrix depends on cache block size, cache size, input matrix size – Try different sub-matrix sizes We hope you invent more tricks to reduce the number of misses !

Part (b) Cache: –You get 1 kilobytes of cache –Directly mapped (E=1) –Block size is 32 bytes (b=5) –There are 32 sets (s=5) Test Matrices: –32 by 32, 64 by 64, 61 by 67