Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert.

Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert

Preamble ● HotCloud '11 → VM Retrospection ● Examining historical state of VMs ● Stored in a central VM image library ● Done in collaboration with IBM Research ● Did a summer internship...and...

Introspection vs Retrospection ● Examine active state of VM during execution ● Examine historical state of VMs and their snapshots VM Instance A Examine live logs A'A1A2 B'B1B2... Examine all historic logs A*

Change: Shift in Thinking ● Traditionally a VM == executable content ● VM Image Libraries break this paradigm ● Think of VMs as big data ● What can we do with them?

Change: Shift in Thinking ● Traditionally a VM == executable content ● VM Image Libraries break this paradigm ● Think of VMs as big data ● What can we do with them? Apple's Time Machine over all VM instances including their complete snapshotted history

Example: Forensics - Before Compromised VMQuarantined VM 1. Crawl live or frozen logs 2. Pull backups if available 3. Examine differences 4. Determine root causes 5. Redeploy

Example: Forensics - After Compromised VMQuarantined VM 1. Crawl live or frozen logs 2. Retrospect entire history 3. Examine differences 4. Determine root causes 5. Redeploy

Introspection Technical Report: CMU-CS-12-103 What?

Why? Because state-of-the-art snapshotting isn't good enough. (1) High IO overhead (2) High latency to file mutations

file-level mutations createdeletemodify

Goals Low IO Overhead Low Latency Scalability

Goals Low IO Overhead Low Latency Scalability 0 Guest Modifications

1: Storage ● Efficient snapshotting techniques ● Focus: low IO overhead implementation ● Fast branching storage systems

1: Storage ● Efficient snapshotting techniques ● Focus: low IO overhead implementation ● Fast branching file systems ● File versioning Simplifying Assumption: Understanding doesn't matter

2: Smart Disk ● Semantic-based prefetching ● Semantic-based layout/reordering ● “File-aware block level storage” ● Efficient inference

2: Smart Disk ● Semantic-based prefetching ● Semantic-based layout/reordering ● “File-aware block level storage” ● Efficient inference Simplifying Assumption: Omniscience doesn't matter

3: Virtual Machine Introspection ● Infer file-level mutations ● Use only disk block writes ● Good: Low latency to file mutations ● Bad: Performance overhead ● Slow IO ● Single VMM system considered → not cloud

3: Virtual Machine Introspection ● Infer file-level mutations ● Use only disk block writes ● Good: Low latency event notification ● Bad: Performance overhead ● Slow IO ● Single VMM system considered → not cloud Simplifying Assumption: Performance doesn't matter

What do we want? Performance of storage Semantic understanding of smart disks Virtual expertise of VMI

Idea: Cloud VMI ● Built into the cloud fabric ● Centralized/external... ● Log monitoring ● Software auditing ● Configuration auditing ● Virus scanning ● No internal agents or guest modifications

Idea: Cloud VMI ● Built into the cloud fabric ● Centralized/external... ● Log monitoring ● Software auditing ● Configuration auditing ● Virus scanning ● No internal agents or guest modifications Simplifying Assumption: Privacy doesn't matter

File-Level Mutation Stream Δ

(1) Fidelity better than state-of- the-art snapshotting (2) Applications working with snapshotting can work on a stream of file-level mutations

Cloud VMI ● Low Overhead: asynchronous by design ● Low Latency: near-real-time file-level mutation inference ● No guest modifications ● VM instances → near-real-time data sources

We're going full stack...

KVM + QEMU

The Inference Engine

Current Status: Project xray ● Coding C since 2/9/12 ● ext2 file system (read) driver complete ● Static disk analysis complete ● Converting to serialized format underway ● Dynamic virtual disk write stream analysis underway

Integration with Qemu ● Using the tracing system ● Compile-time enabled ● Run-time configured ● Added a single trace event 'bdrv_write' ● Outputs a binary format we interpret struct qemu_bdrv_write_header { int64_t sector_num; int nb_sectors; }__attribute__((packed)); struct qemu_bdrv_write { struct qemu_bdrv_write_header header; const uint8_t* data; };

Demo VM Instance ● Tinycore Linux ● 9 MB Virtual Drive ● 1 ext2 partition ● Full Linux 3.0 Kernel with virtio etc. ● We will first ● Operate on a known write stream capture ● Then, operate on the live running instance

Demo with Questions?

Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert.

Similar presentations

Presentation on theme: "Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert.

Similar presentations

Presentation on theme: "Near-Real-Time Inference of File-Level Mutations from Virtual Disk Writes Wolfgang Richter, Mahadev Satyanarayanan, Jan Harkes, Benjamin Gilbert."— Presentation transcript:

Similar presentations

About project

Feedback