Continuous, Low Overhead, Run-Time Validation of Program Executions

Slides:



Advertisements
Similar presentations
ROP is Still Dangerous: Breaking Modern Defenses Nicholas Carlini et. al University of California, Berkeley USENIX Security 2014 Presenter: Yue Li Part.
Advertisements

Memory Protection: Kernel and User Address Spaces  Background  Address binding  How memory protection is achieved.
Using Instruction Block Signatures to Counter Code Injection Attacks Milena Milenković, Aleksandar Milenković, Emil Jovanov The University of Alabama in.
Memory Management (II)
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Memory Management 2010.
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
On-Chip Control Flow Integrity Check for Real Time Embedded Systems Fardin Abdi Taghi Abad, Joel Van Der Woude, Yi Lu, Stanley Bak, Marco Caccamo, Lui.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
CS533 Concepts of Operating Systems Jonathan Walpole.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Branch Regulation: Low-Overhead Protection from Code Reuse Attacks.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
Exploitation possibilities of memory related vulnerabilities
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Introduction Program File Authorization Security Theorem Active Code Authorization Authorization Logic Implementation considerations Conclusion.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
A Survey on Runtime Smashed Stack Detection 坂井研究室 M 豊島隆志.
Efficient Software-Based Fault Isolation Authors: Robert Wahbe Steven Lucco Thomas E. Anderson Susan L. Graham Presenter: Gregory Netland.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
A Framework For Trusted Instruction Execution Via Basic Block Signature Verification Milena Milenković, Aleksandar Milenković, and Emil Jovanov Electrical.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
Memory Protection: Kernel and User Address Spaces Andy Wang Operating Systems COP 4610 / CGS 5765.
1 Computer System Overview Chapter 1. 2 Operating System Exploits the hardware resources of one or more processors Provides a set of services to system.
Translation Lookaside Buffer
Remix: On-demand Live Randomization
Jump-Oriented Programming
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Non Contiguous Memory Allocation
Protecting Memory What is there to protect in memory?
Cache Memory.
ECE232: Hardware Organization and Design
CS161 – Design and Architecture of Computer
Memory Protection: Kernel and User Address Spaces
Protecting Memory What is there to protect in memory?
Protecting Memory What is there to protect in memory?
EnGarde: Mutually Trusted Inspection of SGX Enclaves
Multiscalar Processors
Multilevel Memories (Improving performance using alittle “cash”)
How will execution time grow with SIZE?
CSC 495/583 Topics of Software Security Stack Overflows (2)
5.2 Eleven Advanced Optimizations of Cache Performance
Modularity and Memory Clearly, programs must have access to memory
Morgan Kaufmann Publishers
/ Computer Architecture and Design
Microarchitectural for monitoring application specific instructions
Improving Program Efficiency by Packing Instructions Into Registers
Lecture 28: Virtual Memory-Address Translation
Memory Protection: Kernel and User Address Spaces
Memory Protection: Kernel and User Address Spaces
Memory Protection: Kernel and User Address Spaces
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Hardware Multithreading
CS399 New Beginnings Jonathan Walpole.
Chapter 6 Memory System Design
Translation Lookaside Buffer
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CS703 - Advanced Operating Systems
COMP755 Advanced Operating Systems
Memory Protection: Kernel and User Address Spaces
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

Continuous, Low Overhead, Run-Time Validation of Program Executions Paper by Erdem Aktas, Furat Afram, Kanad Ghose of State University of New York Presentation by Alec Lofquist Security is important, right When I say security, I mean making sure only programs you want to run can run, and that bugs in the program can’t be exploited to take control of the system

Outline Common vulnerabilities Existing work What is REV Experimental Assessment of REV Limitations of REV

Code Injection Attacks An attacker somehow gets their own program into memory and moves control flow to it Direct code injection Binary is overwritten by another process Indirect code injection Binary overwritten or “added to” using a vulnerability in the program The classic example is a buffer overflow exploit Code injection attacks can be protected from pretty well with features like the NX bit Allows programmers/the OS to specify memory as not-executable (typically done with the stack) Most programs don’t need memory that is both executable and writeable JITs are the notable exception

Code-reuse Attacks Return-oriented programming (ROP) Find existing functions in the binary (that end with a return) that do something useful just before returning (this is called a “gadget”) If we could compose the right gadgets, we could “write” any program… We can do this without execute permissions by manipulating the call stack Overwrite the return address of many stack frames with a chain of gadget addresses Control will jump to the first gadget in the list, perform the function, then return, then jump to the second gadget on the stack, and so on Also jump-oriented programming, return-to-libc, overwriting vtables Jump-oriented programming - Like ROP, but with indirect jumps instead of ret instructions Return-to-libc - Sort of like ROP, but using libc functions instead of gadgets Overwriting vtables

How can we protect against these exploits? Static program certification before execution Useless for protecting against zero-day vulnerabilities Run-time validation Validating the executed instructions (code injection attacks) Validating program control flow (ROP/JOP/vtable/return-to-libc attacks) Zero-day vulnerability – vulnerability unknown to the vendor in a program that has already been shipped Static certification is common because it’s easy to implement (even in software) NX bit is also used to protect from code injection attacks

Problems with Existing Designs Too slow to be practical Do not play well with OoO CPU designs Program size limitations Require modifications to binaries Require source code access Not continuous - only validate at checkpoints The paper discusses nearly a dozen related works, listing reasons why the authors find them unsatisfying A lot of these target embedded systems (first 3) Modifications to binaries – use modified ISA Require source code access – could be to recompile for new ISA, could be to generate signatures Validating at checkpoints can be defeated if the attacker cleans up after themselves

REV Design Goals Be general enough to detect any kind of compromise to code and control flow. Be scalable for arbitrarily large programs. Have a low performance overhead. Be transparent to executables Changes must be validated before being committed. Support speculative execution. General – detect both code injection and code reuse attacks Scalable – many existing implementations keep reference signatures in CPU internal tables, which might work for small embedded programs but doesn’t scale for large binary sizes Also support cross-module calls 3. Existing implementations are often too slow to be useable, have very high penalties for context switches 4. Existing implementations require modifying the binary in some way with some sort of ISA change, and these changes often require source code level access. 5. We don’t want to allow changes to the permanent state of the system without first validating the change. This is a stumbling block for some HW and pretty much all software CFI implementations which only validate at certain checkpoints. 6. Allow aborting/rolling back

How REV Works Pipelined Crypto Hash Generator (CHG) calculates hashes of basic blocks on the fly as instructions are fetched Compare calculated hashes to reference hashes before commit If they don’t match, raise an exception When committing control-flow modifying instructions (branch/call/jump/ret), validate that the control that flowed into the BB along an expected path Compare computed target address with stored target addresses CHG has from instruction fetch to commit to calculate the hash

Reference Signatures Stored encrypted in main memory in a hash table (Signature Table) Contains crypto hash for BB, successor/predecessor addresses The CPU is informed of tables by special limit register/base address reg pairs Generated with static analysis or profiling Accessed through a Signature Cache (SC) to improve performance The limit registers are how multiple modules are supported There is a hardware limit on the number of limit/base reg pairs

REV Diagram CHG – pipelined, responsible for hashing basic block by commit SC – cache for recent signatures, goes through the normal processor memory stages on miss (L1/L2 caches) SAG – generates main memory address of signature table entry based on base address registers, retrieves it Match -> return entry No match -> iterate through “Next entry” linked list pointers, then generate exception SAG also holds limit registers/base pointers for signature tables!

Signature Cache Entry

Signature Table (RAM) Encrypted in RAM, decrypted on the fly in the CPU Extra tags are used to distinguish between blocks with matching crypto hash values (extremely unlikely, ended up unique in experiments) Size ranges from 15-52% of binary size with an average of around 37%

Overhead Dynamic power overhead at 3GHz is estimated at 7.2% the power consumption of the base core design “With a shared L3 cache and the I/O pad power added in, the overall power overhead added by the REV logic to a multicore chip is reduced from 7.2% to less than 5.5%.” Area overhead is about 8% of base Performance overhead is 1.87% on average across all SPEC CPU 2006 benchmarks. Evaluated through simulation using the cycle-accurate MARSS Models a modern x86-64 OoO CPU with L1/L2 caches An additional L1 data cache port was assumed for use by the SC Area/power could be even lower if crypto logic is shared with the CPU Control flow locality has a strong impact on program performance (SC misses)

IPC for SPEC CPU 2006

IPC Overhead for SPEC CPU 2006 Gobmk – AI for the game Go

REV Limitations Self-modifying code (e.g. JIT-compiled code) Only use last 4 bytes of hashes Limited number of simultaneous modules Control flow validation during exceptions? Is generating control flow signatures difficult? Self modifying code 1. Generate references on the slow (“significant overhead”) 2. Disable REV for JITed code (give up) It may be possible to alter a BB such that the last 4 bytes still match

Thank You Questions?