Presentation is loading. Please wait.

Presentation is loading. Please wait.

Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert.

Similar presentations


Presentation on theme: "Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert."— Presentation transcript:

1 Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert

2 Preamble ● HotCloud '11 → VM Retrospection ● Examining historical state of VMs ● Stored in a central VM image library ● Done in collaboration with IBM Research ● Did a summer internship...and...

3 Introspection vs Retrospection ● Examine active state of VM during execution ● Examine historical state of VMs and their snapshots VM Instance A Examine live logs A'A1A2 B'B1B2... Examine all historic logs A*

4 Change: Shift in Thinking ● Traditionally a VM == executable content ● VM Image Libraries break this paradigm ● Think of VMs as big data ● What can we do with them?

5 Change: Shift in Thinking ● Traditionally a VM == executable content ● VM Image Libraries break this paradigm ● Think of VMs as big data ● What can we do with them? Apple's Time Machine over all VM instances including their complete snapshotted history

6 Example: Forensics - Before Compromised VMQuarantined VM 1. Crawl live or frozen logs 2. Pull backups if available 3. Examine differences 4. Determine root causes 5. Redeploy

7 Example: Forensics - After Compromised VMQuarantined VM 1. Crawl live or frozen logs 2. Retrospect entire history 3. Examine differences 4. Determine root causes 5. Redeploy

8 Introspection Technical Report: CMU-CS-12-103 What?

9 Why? Because state-of-the-art snapshotting isn't good enough. (1) High IO overhead (2) High latency to file mutations

10 file-level mutations createdeletemodify

11 Goals Low IO Overhead Low Latency Scalability

12 Goals Low IO Overhead Low Latency Scalability 0 Guest Modifications

13 1: Storage ● Efficient snapshotting techniques ● Focus: low IO overhead implementation ● Fast branching storage systems

14 1: Storage ● Efficient snapshotting techniques ● Focus: low IO overhead implementation ● Fast branching file systems ● File versioning Simplifying Assumption: Understanding doesn't matter

15 2: Smart Disk ● Semantic-based prefetching ● Semantic-based layout/reordering ● “File-aware block level storage” ● Efficient inference

16 2: Smart Disk ● Semantic-based prefetching ● Semantic-based layout/reordering ● “File-aware block level storage” ● Efficient inference Simplifying Assumption: Omniscience doesn't matter

17 3: Virtual Machine Introspection ● Infer file-level mutations ● Use only disk block writes ● Good: Low latency to file mutations ● Bad: Performance overhead ● Slow IO ● Single VMM system considered → not cloud

18 3: Virtual Machine Introspection ● Infer file-level mutations ● Use only disk block writes ● Good: Low latency event notification ● Bad: Performance overhead ● Slow IO ● Single VMM system considered → not cloud Simplifying Assumption: Performance doesn't matter

19 What do we want? Performance of storage Semantic understanding of smart disks Virtual expertise of VMI

20 Idea: Cloud VMI ● Built into the cloud fabric ● Centralized/external... ● Log monitoring ● Software auditing ● Configuration auditing ● Virus scanning ● No internal agents or guest modifications

21 Idea: Cloud VMI ● Built into the cloud fabric ● Centralized/external... ● Log monitoring ● Software auditing ● Configuration auditing ● Virus scanning ● No internal agents or guest modifications Simplifying Assumption: Privacy doesn't matter

22 File-Level Mutation Stream Δ

23 (1) Fidelity better than state-of- the-art snapshotting (2) Applications working with snapshotting can work on a stream of file-level mutations

24 Cloud VMI ● Low Overhead: asynchronous by design ● Low Latency: near-real-time file-level mutation inference ● No guest modifications ● VM instances → near-real-time data sources

25 We're going full stack...

26 KVM + QEMU

27 The Inference Engine

28 Current Status: Project xray ● Coding C since 2/9/12 ● ext2 file system (read) driver complete ● Static disk analysis complete ● Converting to serialized format underway ● Dynamic virtual disk write stream analysis underway

29 Integration with Qemu ● Using the tracing system ● Compile-time enabled ● Run-time configured ● Added a single trace event 'bdrv_write' ● Outputs a binary format we interpret struct qemu_bdrv_write_header { int64_t sector_num; int nb_sectors; }__attribute__((packed)); struct qemu_bdrv_write { struct qemu_bdrv_write_header header; const uint8_t* data; };

30 Demo VM Instance ● Tinycore Linux ● 9 MB Virtual Drive ● 1 ext2 partition ● Full Linux 3.0 Kernel with virtio etc. ● We will first ● Operate on a known write stream capture ● Then, operate on the live running instance

31 Demo with Questions?


Download ppt "Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert."

Similar presentations


Ads by Google