Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practical Data Confinement Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley.

Similar presentations


Presentation on theme: "Practical Data Confinement Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley."— Presentation transcript:

1 Practical Data Confinement Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley

2 Introduction Controlling the flow of sensitive information is one of the central challenges in managing an organization  Preventing exfiltration (theft) by malicious entities  Enforcing dissemination policies

3 Why is it so hard to secure sensitive data?

4 Modern software is rife with security holes that can be exploited for exfiltration

5 Why is it so hard to secure sensitive data? Modern software is rife with security holes that can be exploited for exfiltration Users must be trusted to remember, understand, and obey dissemination restrictions  In practice, users are careless and often inadvertently allow data to leak sensitive documents to the wrong parties Transfer data to insecure machines and portable devices

6 Our Goal Develop a practical data confinement solution

7 Our Goal Develop a practical data confinement solution Key requirement: compatibility with existing infrastructure and patterns of use  Support current operating systems, applications, and means of communication Office productivity apps: word processing, spreadsheets, … Communication: , IM, VoIP, FTP, DFS, …  Avoid imposing restrictions on user behavior Allow access to untrusted Internet sites Permit users to download and install untrusted applications

8 Our Assumptions and Threat Model Users  Benign, do not intentionally exfiltrate data  Make mistakes, inadvertently violate policies Software platform (productivity applications and OS)  Non-malicious, does not exfiltrate data in pristine state  Vulnerable to attacks if exposed to external threats Attackers  Malicious external entities seeking to exfiltrate sensitive data  Penetrate security barriers by exploiting vulnerabilities in the software platform

9 Central Design Decisions Policy enforcement responsibilities  Cannot rely on human users  The system must track the flow of sensitive information, enforce restrictions when the data is externalized

10 Central Design Decisions Policy enforcement responsibilities  Cannot rely on human users  The system must track the flow of sensitive information, enforce restrictions when the data is externalized Granularity of information flow tracking (IFT)  Need fine-grained byte-level tracking and policy enforcement to prevent accidental partial exfiltrations

11 Central Design Decisions Placement of functionality  PDC inserts a thin software layer (hypervisor) between the OS and hardware  The hypervisor implements byte-level IFT and policy enforcement  A hypervisor-level solution Retains compatibility with existing OSes and applications Has sufficient control over hardware

12 Central Design Decisions Placement of functionality  PDC inserts a thin software layer (hypervisor) between the OS and hardware  The hypervisor implements byte-level IFT and policy enforcement  A hypervisor-level solution Retains compatibility with existing OSes and applications Has sufficient control over hardware Resolving tension between safety and user freedom  Partition the application environment into two isolated components: a “Safe world” and a “Free world”

13 Partitioning the User Environment Hypervisor Hardware (CPU, Memory, Disk, NIC, USB, Printer, …) Safe Virtual MachineUnsafe Virtual Machine Access to sensitive data Unrestricted communication and execution of untrusted code IFT, policy enforcement

14 Partitioning the User Environment Sensitive data Non-sensitive data Trusted code/data Untrusted (potentially malicious) code/data Exposure to the threat of exfiltration

15 PDC Use Cases Logical “air gaps” for high-security environments  VM-level isolation obviates the need for multiple physical networks Preventing information leakage via  “Do not disseminate the attached document” Digital rights management  Keeping track of copies; document self-destruct Auto-redaction of sensitive content

16 Talk Outline Introduction Requirements and Assumptions Use Cases PDC Architecture Prototype Implementation Preliminary Performance Evaluation Current Status and Future Work

17 PDC Architecture: Hypervisor PDC uses an augmented hypervisor to  Ensure isolation between safe and unsafe VMs  Tracks the propagation of sensitive data in the safe VM  Enforces security policy at exit points Network I/O, removable storage, printer, etc.

18 PDC Architecture: Tag Tracking in the Safe VM PDC associates an opaque 32-bit sensitivity tag with each byte of virtual hardware state  User CPU registers accessible  Volatile memory  Files on disk

19 PDC Architecture: Tag Tracking in the Safe VM These tags are viewed as opaque identifiers The semantics can be tailored to fit the specific needs of administrators/users Tags can be used to specify  Security policies  Levels of security clearance  High-level data objects  High-level data types within an object

20 PDC Architecture: Tag Tracking in the Safe VM An augmented x86 emulator performs fine-grained instruction-level tag tracking (current implementation is based on QEMU) PDC tracks explicit data flows (variable assignments, arithmetic operations) add %eax, %ebx ebx eax

21 PDC Architecture: Tag Tracking in the Safe VM An augmented x86 emulator performs fine-grained instruction-level tag tracking (current implementation is based on QEMU) PDC also tracks flows resulting from pointer dereferencing mov %eax, %(ebx) ebx eax Memory Tag merge

22 Challenges Tag storage overhead in memory and on disk  Naïve implementation would incur a 400% overhead Computational overhead of online tag tracking Tag explosion  Tag tracking across pointer exacerbates the problem Tag erosion due to implicit flows Bridging the semantic gap between application data units and low-level machine state Impact of VM-level isolation on user experience

23 Talk Outline Introduction Requirements and Assumptions Use Cases PDC Architecture Prototype Implementation  Storing sensitivity tags in memory and on disk  Fine-grained tag tracking in QEMU  “On-demand” emulation  Policy enforcement Performance Evaluation Current Status and Future Work

24 PDC Implementation: The Big Picture PDC-Xen (ring 0) Shadow page tablesSafe VM page tables PageTag Mask CPU CR3 Safe VM App1App2 Xen-RPC NFS Client VFS Dom 0 Xen-RPC NFS Server Event channel Shared ring buffer PDC-ext3 QEMU / tag tracker Safe VM (emulated) PageTag Descriptors Network daemon Policy daemon NIC

25 Storing Tags in Volatile Memory PageNumber PDC maintains a 64-bit PageTagSummary for each page of machine memory Uses a 4-level tree data structure to keep PageNumber  PageTagSummary mappings Array of 64-bit PageTagSummary structures

26 Storing Tags in Volatile Memory PageTagSummary Page-wide tag for uniformly-tagged pages Pointer to a PageTagDescriptor otherwise PageTagDescriptor stores fine-grained (byte-level) tags within a page in one of two formats PageTagDescriptor Linear array of tags (indexed by page offset) RLE encoding

27 Storing Tags on Disk PDC-ext3 provides persistent storage for the safe VM New i-node field for file-level tags Leaf indirect blocks store pointers to BlockTagDescriptors BlockTagDescriptor byte-level tags within a block i-node Ind. block Leaf Ind. block Data block BlockTagDescriptor Linear array RLE FileTag

28 Back to the Big Picture PDC-Xen (ring 0) Shadow page tablesSafe VM page tables PageTag Mask CPU CR3 Safe VM App1App2 Xen-RPC NFS Client VFS Dom0 Xen-RPC NFS Server Event channel Shared ring buffer PDC-ext3 QEMU / tag tracker Safe VM (emulated) Network daemon Policy daemon NIC Emul. CPU Context

29 Fine-Grained Tag Tracking A modified version of QEMU emulates the safe VM and tracks movement of sensitive data QEMU relies on runtime binary recompilation to achieve reasonably efficient emulation We augment the QEMU compiler to generate a tag tracking instruction stream from the input stream of x86 instructions Guest machine code block (x86) Intermediate representation (TCG) Host machine code block (x86) Tag tracking code block stage 1 stage 2

30 Fine-Grained Tag Tracking Tag tracking instructions manipulate the tag status of emulated CPU registers and memory The tag tracking instruction stream executes asynchronously in a separate thread Basic instruction format Dest. OperandSrc. OperandAction {Clear, Set, Merge}{Reg, Mem}

31 Fine-Grained Tag Tracking Problem: some of the instruction arguments are not known at compile time  Example: mov %eax,(%ebx)  Source memory address is not known The main emulation thread writes the values of these arguments to a temporary log (a circular memory buffer) at runtime The tag tracker fetches unknown values from this log

32 Binary Recompilation (Example) Input x86 instructions Intermediate representationTag tracking instructions mov %eax, $123 push %ebp movi_i32 tmp0,$123 st_i32 tmp0,env,$0x0 ld_i32 tmp0,env,$0x14 ld_i32 tmp2,env,$0x10 movi_i32 tmp14, $0xfffffffc add_i32 tmp2,tmp2,tmp14 qemu_st_logaddr tmp0,tmp2 st_i32 tmp2,env,$0x10 Clear4 eax Set4 mem,ebp,0 Merge4 mem,esp,0 Tag tracking argument log MachineAddr(%esp)

33 Binary Recompilation But things get more complex…  Switching between operating modes (Protected/real/virtual8086, 16/32bit)

34 Binary Recompilation But things get more complex…  Switching between operating modes (Protected/real/virtual8086, 16/32bit)  Recovering from exceptions in the middle of a translation block

35 Binary Recompilation But things get more complex…  Switching between operating modes (Protected/real/virtual8086, 16/32bit)  Recovering from exceptions in the middle of a translation block  Multiple memory addressing modes

36 Binary Recompilation But things get more complex…  Switching between operating modes (Protected/real/virtual8086, 16/32bit)  Recovering from exceptions in the middle of a translation block  Multiple memory addressing modes  Repeating instructions rep movs

37 Binary Recompilation But things get more complex…  Switching between operating modes (Protected/real/virtual8086, 16/32bit)  Recovering from exceptions in the middle of a translation block  Multiple memory addressing modes  Repeating instructions rep movs  Complex instructions whose semantics are partially determined by the runtime state saved EFLAGS saved CS saved EIP saved ESP saved SS iret

38 Back to the Big Picture PDC-Xen (ring 0) Shadow page tablesSafe VM page tables PageTag Mask CPU CR3 Safe VM App1App2 Xen-RPC NFS Client VFS Dom0 Xen-RPC NFS Server Event channel Shared ring buffer PDC-ext3 QEMU / tag tracker Safe VM (emulated) Network daemon Policy daemon NIC Emul. CPU Context

39 “On-Demand” Emulation PageTag Mask During virtualized execution, PDC-Xen uses the paging hardware to intercept sensitive data access Safe VM page tables Shadow page tables PageTag Descriptors PDC-Xen (ring 0) QEMU / tag tracker Access to a tagged page from the safe VM causes a page fault and transfer of control to the hypervisor Maintains shadow page tables, in which all memory pages containing tagged data are marked as not present

40 “On-Demand” Emulation If the page fault is due to tagged data, PDC-Xen suspends the guest domain and transfers control to the emulator (QEMU) QEMU initializes the emulated CPU context from the native processor context (saved upon entry to the page fault handler) and resumes the safe VM in emulated mode SafeVM VCPUDom0 VCPU Safe VM Safe VM MemoryDom0 Memory Dom0 QEMU / tag tracker Emul. SafeVM CPU Access to a tagged page Page fault handler Safe VM memory mappings

41 “On-Demand” Emulation Returning from emulated execution  QEMU terminates the main emulation loop, waits for the tag tracker to catch up  QEMU then makes a hypercall to PDC-Xen and provides Up-to-date processor context for the safe VM VCPU Up-to-date PageTagMask

42 “On-Demand” Emulation Returning from emulated execution  QEMU terminates the main emulation loop, waits for the tag tracker to catch up  QEMU then makes a hypercall to PDC-Xen and provides Up-to-date processor context for the safe VM VCPU Up-to-date PageTagMask  The hypercall awakens the safe VM VCPU (blocked in the page fault handler)  The page fault handler Overwrites the call stack with up-to-date values of CS/EIP, SS/ESP, EFLAGS Restores other processor registers Returns control to the safe VM

43 “On-Demand” Emulation - Challenges

44 Updating PTEs in read-only page table mappings  Solution: QEMU maintains local writable “shadow” copies, synchronizes them in background via hypercalls

45 “On-Demand” Emulation - Challenges Updating PTEs in read-only page table mappings  Solution: QEMU maintains local writable “shadow” copies, synchronizes them in background via hypercalls Transferring control to the hypervisor during emulated execution (hypercall and fault handlers)  Emulating hypervisor-level code is not an option  Solution: Transient switch to native execution Resume native execution at the instruction that causes a jump to the hypervisor (e.g., int 0x82 for hypercalls)

46 “On-Demand” Emulation - Challenges Delivery of timer interrupts (events) in emulated mode  The hardware clock advances faster in the emulated context (i.e., each instruction consumes more clock cycles)  Xen needs to scale the delivery of timer events accordingly

47 “On-Demand” Emulation - Challenges Delivery of timer interrupts (events) in emulated mode  The hardware clock advances faster in the emulated context (i.e., each instruction consumes more clock cycles)  Xen needs to scale the delivery of timer events accordingly Use of the clock cycle counter ( rdtsc instruction)  Linux timer interrupt/event handler uses the clock cycle counter to estimate timer jitter  After switching from emulated to native execution, the guest kernel observes a sudden jump forward in time

48 Policy Enforcement The policy controller module  Resides in dom0 and interposes between the front-end and the back-end device driver  Fetches policies from a central policy server  Looks up the tags associated with the data in shared I/O request buffers and applies policies Netw. interface back-end Dom0 Block storage back-end Safe VM Netw. Interface front-end Block storage front-end Policy controller

49 Network Communication PDC annotates outgoing packets with PacketTagDescriptors, carrying the sensitivity tags Current implementation transfers annotated packets via a TCP/IP tunnel PayloadTCPHdrIPHdr Tags EthHdr PayloadTCPHdrIPHdrEthHdrTCPHdrIPHdrEthHdr Annotation TCP/IP encapsulation

50 Talk Outline Introduction Requirements and Assumptions Use Cases PDC Architecture Prototype Implementation Preliminary Performance Evaluation  Application-level performance overhead  Filesystem performance overhead  Network bandwidth overhead Current Status and Future Work

51 Preliminary Performance Evaluation Experimental setup:  Quad-core AMD Phenom 9500, 2.33GHz, 3GB of RAM  100Mbps Ethernet  PDC Hypervisor based on Xen v  Paravirtualized Linux kernel v  Tag tracker based on QEMU v

52 Application-Level Overhead Goal: estimate the overall performance penalty (as perceived by users) in realistic usage scenarios First scenario: recursive text search within a directory tree ( grep )  Input dataset: 1GB sample of the Enron corporate database ( )  We mark a fraction (F) of the messages as sensitive, assigning them uniform sensitivity tag  We search the dataset for a single-word string and measure the overall running time

53 Application-Level Overhead PDC-Xen, paravirt. Linux, tag tracking Linux on “bare metal” Standard Xen, paravirt. Linux F (%)

54 Filesystem Performance Overhead Configurations:  C1 – Linux on “bare metal”; standard ext3  C2 – Xen, paravirt. Linux; dom0 exposes a paravirt. block device; Guest domain mounts it as ext3  C3 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/TCP  C4 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/Xen-RPC  C5 – Xen, paravirt. Linux; dom0 exposes PDC-ext3 to the guest domain via NFS/Xen-RPC First experiment: sequential file write throughput  Create a file  write 1GB of data sequentially  close  sync

55 Filesystem Performance Overhead Configurations:  C1 – Linux on “bare metal”; standard ext3  C2 – Xen, paravirt. Linux; dom0 exposes a paravirt. block device; Guest domain mounts it as ext3  C3 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/TCP  C4 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/Xen-RPC  C5 – Xen, paravirt. Linux; dom0 exposes PDC-ext3 to the guest domain via NFS/Xen-RPC ConfigC1C2C3C4C5 Elapsed time (sec.)

56 Filesystem Performance Overhead Second experiment: Metadata operation overhead  M1: Create a large directory tree (depth=6, fanout=6)  M2: Remove the directory tree created by M1 ( rm –rf *)

57 Network Bandwidth Overhead We used iperf to measure end-to-end bandwidth between a pair of directly-connected hosts Configurations:  NC1 – No packet interception  NC2 – Interception and encapsulation  NC3 –Interception, encapsulation, and annotation with sensitivity tags Sender assigns sensitivity tags to a random sampling of outgoing packets We vary two parameters: Tag Prevalence (P) and Tag Fragmentation (F)

58 Network Bandwidth Overhead

59 Performance Evaluation - Summary Application performance in the safe VM  10x slowdown in the worst-case scenario  We expect to reduce this overhead significantly through a number of optimizations Disk and network I/O overhead  Proportional to the amount sensitive data and the degree of tag fragmentation  4x overhead in the worst-case scenairo (assuming 32-bit tag identifiers)

60 Summary and Future Work PDC seeks a practical solution to the problem of data confinment  Defend against exfiltration by outside attackers  Prevent accidental policy violations Hypervisor-based architecture provides mechanisms for isolation, information flow tracking, and policy enforcement Currently working on  Improving stability and performance of the prototype  Studying the issue of taint explosion in Windows and Linux environments and its implications on PDC


Download ppt "Practical Data Confinement Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley."

Similar presentations


Ads by Google