M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Advertisements

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
CS 153 Design of Operating Systems Spring 2015
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley and Rabi Mahapatra & Hank Walker.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
Computer ArchitectureFall 2008 © CS : Computer Architecture Lecture 22 Virtual Memory (1) November 6, 2008 Nael Abu-Ghazaleh.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Virtual Memory  Modern Operating systems can run programs that require more memory than the system has  If your CPU is 32-bit, meaning that it has registers.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
CS203 – Advanced Computer Architecture Virtual Memory.
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
Virtual Memory So, how is this possible?
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Lecture: Cache Hierarchies
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
Cache Memory Presentation I
Lecture: Cache Hierarchies
Bojian Zheng CSCD70 Spring 2018
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Andy Wang Operating Systems COP 4610 / CGS 5765
Lecture 23: Cache, Memory, Virtual Memory
Andy Wang Operating Systems COP 4610 / CGS 5765
CDA 5155 Caches.
Adapted from slides by Sally McKee Cornell University
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE451 Virtual Memory Paging Autumn 2002
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Paging and Segmentation
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
10/18: Lecture Topics Using spatial locality
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo

 DIFT Lifeguards very interesting ◦ TaintCheck ◦ MemCheck  Can help detect a series of bugs or extract useful information for the program running  Hardware Accelerators used to achieve reasonable performance

 All hardware approaches so far use “normal” cache for metadata storage ◦ In the normal cache hierarchy ◦ Or in extended bits (RAKSHA) ◦ Or in dedicated L1-T (Flexitaint)  Conventional approaches very effective for 1- or 2-bit states  But what about word/word or even word/byte lifeguards?

 Word/Word ◦ Lockset ◦ TaintCheck with full tracking / word ◦ “Super MemCheck” with alloc/free and NULLing PC  Word/Byte ◦ Super MemCheck per byte ◦ Tomography Lifeguard  L/G similar to TaintCheck, stores exactly how each input byte was used to calculate each byte in the app  Extended-State L/Gs very useful, but where can we store their state?

 Previous caching schemes are ineffective for byte/byte L/Gs ◦ Extending cache lines impractical ◦ Using normal cache will pollute the hierarchy ◦ Dedicated small L1-T will miss frequently avg max

 Observation ◦ Tags exhibit high spatial locality ◦ If one byte is tagged as ‘A’, neighboring bytes will be ‘A’ also  Replace normal cache with range cache  Consecutive addresses with same metadata will only occupy a single entry Address Metadata From This (L1-T) Start Addr Start Addr Metadata To This (Range Cache) End Addr End Addr

 Updates and Reads must be handled fast ◦ Especially common case ones (R/W in a single area)  Regions must be identified on the fly ◦ Split, Combine, Increase ranges automatically ◦ Extremely important since areas are usually increased slowly  Only few L/Gs (eg AddrCheck) get to know areas always

 Assuming infinite number of entries 0+1→1 1+1→1 1+1→2 1+1→1 1+1→3

N+1→3 2+X+1→3 MISS ??? 1+1→2 2+1→2 2+X+1→1 ???

We need index table to detect internal segments Not frequent, but not that rare, handled by H/W state machine All entries considered dirty. S/W deals with evictions. LRU Replacement

 Fast Case: Hit in a single range ◦ Return tag for that segment  Medium Case: Multiple ranges, all cached ◦ Consecutive ranges must have different tags ◦ How to combine? Multiple Solutions:  Reduce algorithm (eg Raksha style rules)  Call S/W  Bad Case: One or more segments miss ◦ S/W brings 64B segments to cache  Main Memory: 2-Level table with 64B 2 nd level segments ◦ Reduce and repeat until read is serviced

Double linked list for detecting internal segments

 3 L/Gs ◦ TaintCheck 1-bit/byte ◦ MemCheck 2-bit/byte ◦ Tomography 32-bit/byte  Apps ◦ SPEC, Java App, Store Webserver  Verilog RTL Model ◦ 3000 gates for controller of cache  Single issue, in-order CPU model

 Maximum number of Tagged Ranges varies greatly: cannot be stored fully in cache ◦ Must support swapping  Gcc: Snapshot of 128-entry cache  100/122 < 64B  Largest > 2MB  Fixed range-size ill-advised

 Everyone spends time on simple read hits and silent updates ◦ TaintCheck spends time on “other updates” ◦ Other L/Gs have simple hits TaintCheck 1-bitMemCheck 2-bit Tomography 32-bit

 4KB L1-T vs 128 entry Range Cache  For Large States Range Cache winner  For Small States almost equal Base=∞ L1-T with 0 misses

 L2 misses increased caused by  Increased mem refs (previous slide)  L2 pollution by tags

Base=∞ L1-T with 0 misses TaintCheck 1-bit MemCheck 2-bit  Difference usually minimal between L1-T and Range Cache for small states

Base=∞ L1-T with 0 misses L/G: 32-bit Tomography  Significant Difference for large States

 L1-T is a very simple scheme, easily handled by H/W ◦ Misses can be hidden with prefetch  Will have the increase memory pressure, but hide the latency ◦ Prefetch can bypass L2 and bring tags directly to L1  Minimize the L2 pollution  Range Cache scheme too complicated for H/W ◦ Must have S/W miss handler or complex H/W walk mechanism ◦ Effect on L1-I and TLB unaccounted for

 Interesting approach to exploit the metadata spatial stability with good results ◦ Assuming fair comparison  The equivalent of monochromatic-pages only  Multiprocessor consistency quite tricky…  Questions?