Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Efficient Memory Shadowing for 64-bit Architectures ISMM 2010, Toronto, Canada June 6, 2010.

Slides:



Advertisements
Similar presentations
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Advertisements

Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
Efficient Virtual Memory for Big Memory Servers U Wisc and HP Labs ISCA’13 Architecture Reading Club Summer'131.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
CS 153 Design of Operating Systems Spring 2015
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.
CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)
1 Lecture 14: Virtual Memory Topics: virtual memory (Section 5.4) Reminders: midterm begins at 9am, ends at 10:40am.
Operating System Support Focus on Architecture
CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)
Translation Buffers (TLB’s)
Memory Management (continued) CS-3013 C-term Memory Management CS-3013 Operating Systems C-term 2008 (Slides include materials from Operating System.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Memory ManagementCS-502 Fall Memory Management CS-502 Operating Systems Fall 2006 (Slides include materials from Operating System Concepts, 7 th.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
Memory ManagementCS-3013 C-term Memory Management CS-3013 Operating Systems C-term 2008 (Slides include materials from Operating System Concepts,
CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.
Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Umbra: Efficient and Scalable Memory Shadowing CGO 2010, Toronto, Canada April 26, 2010.
Paging. Memory Partitioning Troubles Fragmentation Need for compaction/swapping A process size is limited by the available physical memory Dynamic growth.
CS333 Intro to Operating Systems Jonathan Walpole.
Protection and the Kernel: Mode, Space, and Context.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day14:
Lecture 11 Page 1 CS 111 Online Memory Management: Paging and Virtual Memory CS 111 On-Line MS Program Operating Systems Peter Reiher.
CE Operating Systems Lecture 14 Memory management.
Chapter 4 Memory Management Virtual Memory.
CS399 New Beginnings Jonathan Walpole. Virtual Memory (1)
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Review (1/2) °Caches are NOT mandatory: Processor performs arithmetic Memory stores data Caches simply make data transfers go faster °Each level of memory.
1 Lecture 16: Virtual Memory Topics: virtual memory, improving TLB performance (Sections )
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
ICOM Noack Memory management Virtual memory Paging and segmentation Demand paging Memory management hardware.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Virtual Memory: Concepts Slides adapted from Bryant.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
CS203 – Advanced Computer Architecture Virtual Memory.
COMP 3500 Introduction to Operating Systems Paging: Translation Look-aside Buffers (TLB) Dr. Xiao Qin Auburn University
Virtual Machines (part 2) CPS210 Spring Papers  Xen and the Art of Virtualization  Paul Barham  ReVirt: Enabling Intrusion Analysis through Virtual.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Lecture 11 Virtual Memory
ECE232: Hardware Organization and Design
CS161 – Design and Architecture of Computer
Memory Caches & TLB Virtual Memory
Virtual Memory - Part II
Virtual Memory User memory model so far:
Outline Paging Swapping and demand paging Virtual memory.
Paging COMP 755.
CS510 Operating System Foundations
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Introduction to Memory Management
CS 5204 Operating Systems Lecture 10
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE451 Virtual Memory Paging Autumn 2002
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Lecture 7: Flexible Address Translation
Lecture 8: Efficient Address Translation
Paging and Segmentation
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Review What are the advantages/disadvantages of pages versus segments?
Virtual Memory and Paging
Dynamic Binary Translators and Instrumenters
Paging Andrew Whitaker CSE451.
Presentation transcript:

Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Efficient Memory Shadowing for 64-bit Architectures ISMM 2010, Toronto, Canada June 6, 2010

Dynamic Program Analysis Understand Program Behavior –Optimization –Debugging –Security –Memory management Shadow Memory Tools –Maintain meta-data for every memory location –Update meta-data on every memory operation ISMM, Toronto, Canada, 6/6/2010 2

Examples Memory Error Detection –MemCheck [VEE’07] –Purify [USENIX’92] –Dr. Memory Dynamic Information Flow Tracking –LIFT [MICRO’39] –TaintTrace [ISCC’06] Multi-threaded Program Analysis –Eraser [TCS’97] –Helgrind Memory Usage Analysis –CETS [ISMM’10] –Staleness ISMM, Toronto, Canada, 6/6/2010 3

Shadow Memory System Shadow Memory Manager –Meta-data for application memory –Memory mapping scheme (addr A  addr S ) DMS (Direct Mapping) SMS (Segmented Mapping) Instrumentor –Every memory operation Address calculation Meta-data update –Expensive MemCheck (~25x) –~12x for addr A  addr S ISMM, Toronto, Canada, 6/6/2010 a.out stack libc Application Memory Shadow Memory heap 4

Direct Mapping Scheme (DMS) Single memory region for entire address space. Translation: Issue: address conflict between mem A and mem S ISMM, Toronto, Canada, 6/6/2010 lea [addr]  %r1 add %r1 disp  %r1 Slowdown relative to native execution Application Shadow 5

Slowdown relative to native execution Segmented Mapping Scheme (SMS) Shadow segment per application segment Translation: –Segment lookup (address indexing) –Address translation ISMM, Toronto, Canada, 6/6/2010 lea [addr]  %r1 mov %r1  %r2 shr %r2, 16  %r2 add %r1, disp[%r2]  %r1 addr A addr S App 1 Shd 1 Shd 2 App 2 Segment table 6

Kernel space Shadow Memory Mapping Scaling to 64-bit Architecture –DMS Infeasible due to memory layout ISMM, Toronto, Canada, 6/6/2010 a.out Unusable space stack User space vsyscall

Shadow Memory Mapping Scaling to 64-bit Architecture –DMS Infeasible due to memory layout –Single-Level SMS Too big (~4 billion entries) ISMM, Toronto, Canada, 6/6/2010 addr A 8

Shadow Memory Mapping Scaling to 64-bit Architecture –DMS Infeasible due to memory layout –Single-Level SMS Too big (~4 billion entries) –Multi-Level SMS Even more expensive ISMM, Toronto, Canada, 6/6/2010 Slowdown relative to native execution addr A 9

Umbra (CGO’10) Scaling to 64-bit Architecture –Single-Level SMS is too big but sparse Umbra (CGO’10) –Eliminate empty entries –Compact table –Walk the table to find the entry ISMM, Toronto, Canada, 6/6/

Slowdown relative to native execution Umbra (CGO’10) Reference Uni-Cache –Software cache per instr per thread Segment tag & displacement Check uni-cache before table walk 99.97% hit ratio ISMM, Toronto, Canada, 6/6/ tag = addr A & mask; if (cache  tag != tag) { … // table walk} addr S = addr A + cache  disp

EMS64: Key Idea Umbra EMS64 –Speculatively use a disp without check –Smart shadow memory placement Notified by memory access violation fault for incorrect displacement ISMM, Toronto, Canada, 6/6/ tag = addr A & mask; if (cache  tag != tag) { … // table walk (0.03%)} addr S = addr A + cache  disp

EMS64: Example A0 A2 S0 0: Application 2: Shadow 11: Application 12: Unavailable S2 10: Shadow 13: Unavailable 15: Unavailable 14: Unavailable 6: Shadow 7: Application A1 S1 Displacement: {-1, 2} ISMM, Toronto, Canada, 6/6/ : Reserved 13: Unavailable/Reserved 15: Unavailable/Reserved

EMS64: Potential Problem A0 A2 S0 0: Application 2: Shadow 11: Application 12: Unavailable S2 10: Shadow 14: Unavailable 6: Shadow 7: Application A1 S1 Displacement: {-1, 2} ISMM, Toronto, Canada, 6/6/ : Reserved 13: Unavailable/Reserved 15: Unavailable/Reserved

EMS64: Final Solution A0 A2 S0 0: Application 2: Shadow 11: Application 12: Unavailable S2 10: Shadow 13: Unavailable/Reserved 15: Unavailable/Reserved 14: Unavailable 6: Shadow 7: Application A1 S1 Displacement: {-1, 2} ISMM, Toronto, Canada, 6/6/ : Reserved 4: Reserved 5: Reserved 1: Reserved 12: Unavailable/Reserved 8: Reserved

Slot Finding Problem Given n slots: –k Application slots –x Empty slots –y Reserved slots Find k S-slots. –For each slot A i, there is one associated slot S with displacement d i where d i = S i - A i. –For each slot A i and each existing displacement d j where d i ≠d j, slot ((A i + d j ) mod n) is an R-slot or an E-slot. –For each slot S and any existing valid displacement d i slot, slot ((S + d i ) mod n) is an R-slot or an E-slot. ISMM, Toronto, Canada, 6/6/ A0A0 A1A1 E0E0 E1E1 E2E2 E3E3 E4E4 R0R0 AiAi Application slot Shadow slot EiEi Empty slot RiRi Reserved slot SiSi S0S0 S1S1 R1R1 R2R2

Slot Finding Problem Given n slots: –k Application slots –x Empty slots –y Reserved slots Can We Find k S-slots? –Depends on layout! –Guarantee to find it, for 48-bit address space, if Application memory < 250 GB –Proof x ≥ 8k 2 +2k+1 We can always find an S i for A i if #E-slot > #conflicts ISMM, Toronto, Canada, 6/6/ AiAi Application slot Shadow slot EiEi Empty slot RiRi Reserved slot SiSi

Implementation & Optimization Implementation –Shadow memory allocation –Add signal handler –Remove reference uni-cache check Optimization –Restore uni-cache checks for instructions that access multiple segments, e.g., references from memcpy When number of access violation exceed 2 ISMM, Toronto, Canada, 6/6/2010 lea [addr]  %r1 add %r1, unicache  disp  %r1 18

Experimental Results Slowdown relative to native execution ISMM, Toronto, Canada, 6/6/

Thank You Download – Q & A ISMM, Toronto, Canada, 6/6/

Slot Finding Example Can always find a solution –No AiAi Application slot SiSi Shadow slot EiEi Empty slot RiRi Reserved slot A0A0 A1A1 E0E0 E1E1 E2E2 E3E3 E4E4 R0R0 E-slotsS 0 (disp)ConflictS 1 (disp)Conflict E0E0 Х (1)E  A 1 X (7)E  A 0 E1E1 √ (3)√ (1) E2E2 X (4)E  A 0 X (2)A  A 1 E3E3 X (5)E  A 1 X (3)E  A 0 E4E4 X (6)A  A 0 X (4)E  A 1 ISMM, Toronto, Canada, 6/6/