Presentation is loading. Please wait.

Presentation is loading. Please wait.

VXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Similar presentations


Presentation on theme: "VXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute."— Presentation transcript:

1 VXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology http://pdos.csail.mit.edu/~baford/vxa/

2 The Ubiquity of Data Compression Everything is compressed these days – Archive/Backup/Distribution: ZIP, tar.gz,... – Multimedia streams: mp3, ogg, wmv,... – Office documents: XML-in-ZIP – Digital cameras: JPEG, proprietary RAW,... – Video camcorders: DV, MPEG-2,...

3 Compressed Data Formats Observation #1: Data compression formats evolve rapidly

4 Compressed Data Formats Observation #1: Data compression formats evolve rapidly

5 Compressed Data Formats Observation #1: Data compression formats evolve rapidly Problems: – Inconvenient: each new algorithm requires decoder install/upgrade – Impedes data portability: data unusable on systems without supported decoder – Threatens long-term data usability: old decoders may not run on new operating systems

6 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively

7 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively Fully Backward Compatible Extensions

8 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively

9 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively

10 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively

11 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively

12 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively Itanic

13 VXA: Virtual Executable Archives Observation 1+2: Instruction formats are historically more durable than compressed data formats Make archive self-extracting (data + executable decoder) To extract data, archive reader runs embedded decoder Archive Writer Archive Reader D D Archive Encoder Decode r

14 Goals of VXA Make self-extracting archives... Archive Writer Archive Reader D D Archive Encoder Decode r

15 Goals of VXA Make self-extracting archives... 1Safe: malicious decoders can't compromise host 2Future-proof: simple, well-defined architecture [Lorie] Archive Writer Archive Reader D Emulator D Archive Encoder Decode r

16 Goals of VXA Make self-extracting archives... 1Safe: malicious decoders can't compromise host 2Future-proof: simple, well-defined architecture [Lorie] 3Easy: allow reuse of existing code, languages, tools Archive Writer Archive Reader D x86 Emulator D Archive Encoder Decode r

17 Goals of VXA Make self-extracting archives... 1Safe: malicious decoders can't compromise host 2Future-proof: simple, well-defined architecture [Lorie] 3Easy: allow reuse of existing code, languages, tools 4Efficient: practical for short term data packaging too Archive Writer Archive Reader D Fast x86 Emulator D Archive Encoder Decode r

18 Outline ● Archiver Operation ● vxZIP Archive Format ● Decoder Architecture ● Emulator Design & Implementation ● Evaluation (performance, storage overhead) ● Conclusion

19 Archive Writer Operation Archive VXA Archiver

20 Archive Writer Operation Archive D1D1 Uncompressed Input Files General Compressor Decode r 1 VXA Archiver

21 Archive Writer Operation Archive D1D1 Uncompressed Input Files General Compressor Decode r 1 VXA Archiver

22 Archive Writer Operation Archive D1D1 D2D2 D3D3 Uncompressed Input Files General Compressor Decode r 1 Image Compressor Decode r 2 Audio Compressor Decode r 3

23 Archive Writer Operation Archive Uncompressed Input FilesPre-Compressed Input Files D1D1 D2D2 D3D3 General Compressor Image Compressor Audio Compressor Decode r 1 Decode r 2 Decode r 3

24 Archive Writer Operation Archive D4D4 D5D5 Uncompressed Input FilesPre-Compressed Input Files Image Format Recognizer Decode r 4 Audio Format Recognizer Decode r 5 D1D1 D2D2 D3D3 General Compressor Image Compressor Audio Compressor Decode r 1 Decode r 2 Decode r 3

25 Archive Reader Operation Archive VXA Archive Reader x86 Emulator D4D4 D5D5 D1D1 D2D2 D3D3

26 VXA Archive Reader Archive Reader Operation Archive Original Uncompressed Files x86 Emulator Decoder 1 D4D4 D5D5 D1D1 D2D2 D3D3

27 VXA Archive Reader Archive Reader Operation Archive Original Uncompressed Files x86 Emulator Decoder 1 Decoder 2 Decoder 3 D4D4 D5D5 D1D1 D2D2 D3D3

28 VXA Archive Reader Archive Reader Operation Archive Original Uncompressed FilesOriginal Pre-Compressed Files x86 Emulator D4D4 D5D5 D1D1 D2D2 D3D3 Decoder 1 Decoder 2 Decoder 3

29 VXA Archive Reader Archive Reader Operation Archive Original Uncompressed FilesDe-compressed Files x86 Emulator Decoder 4 Decoder 5 D4D4 D5D5 D1D1 D2D2 D3D3 Decoder 1 Decoder 2 Decoder 3

30 vxZIP Archive Format ● Backward compatible with legacy ZIP format Central Directory Audio file Image file vxZIP Archive

31 vxZIP Archive Format ● Backward compatible with legacy ZIP format ● Decoders intermixed with archived files Central Directory Audio file FLAC Decoder Audio file Image file JP2 Decoder vxZIP Archive

32 vxZIP Archive Format ● Backward compatible with legacy ZIP format ● Decoders intermixed with archived files ● Archived files have new extension header pointing to decoder Central Directory Audio file (FLAC-encoded) FLAC Decoder Audio file (FLAC-encoded) Image file (JP2-encoded) JP2 Decoder vxZIP Archive

33 vxZIP Archive Format ● Backward compatible with legacy ZIP format ● Decoders intermixed with archived files ● Archived files have new extension header pointing to decoder ● Decoders are hidden, “deflated” (gzip) Central Directory Audio file (FLAC-encoded) FLAC Decoder (deflated) Audio file (FLAC-encoded) Image file (JP2-encoded) JP2 Decoder (deflated) vxZIP Archive

34 vxZIP Decoder Architecture ● Decoders are ELF executables for x86-32 – Can be written in any language, safe or unsafe – Compiled using ordinary tools (GCC) ● Decoders have access to five “system calls”: – read stdin, write stdout, malloc, next file, exit ● Decoders cannot: – open files, windows, devices, network connections,... – get system info: user name, current time, OS type,...

35 Decoders Ported So Far (using existing implementations in C, mostly unmodified) General-purpose (lossless): ● zlib: Classic gzip/deflate algorithm ● bzip2: Burrows-Wheeler algorithm Still image codecs: ● jpeg: Classic lossy image compression scheme ● jp2: JPEG 2000 wavelet-based algorithm, lossy or lossless Audio codecs: ● flac: Free Lossless Audio Codec ● vorbis: Standard lossy audio codec for Ogg streams

36 vx32 Emulator Architecture Runs in vxUnZIP process ● Loads decoder into address space sandbox ● Restricts decoder's memory accesses to sandbox ● Dispatches decoder's VXA “system calls” to vxUnZIP (not to host OS!) VXA Decoder Address Space (up to 1GB) vxUnZIP Process Address Space vxUnZIP Application Decoder Address Space 0 0 VXA System Calls vx32 Emulator library

37 vx32 Emulator Implementation On x86-{32/64} hosts: – Secure fault isolation [Wahbe] – Data sandboxing via custom LDT segments – Code sandboxing via instruction rewriting [Sites, Nethercote] – No privileges or kernel extensions Kernel Address Space Code Rewriting VXA Decoder Address Space (up to 1GB) Flat-Model Code/Data Segment vxUnZIP Application Decoder Data Segment (LDT) 0 0 Controlled Procedure Calls vx32 Emulator library Transformed code cache

38 vx32 Emulator Implementation On other host architectures: – Portable but slow “fallback” instruction interpreter (mostly done) – Fast x86-to-PowerPC binary translator (in progress) – Hopefully more in the future Emulator implemented as generic library – Can be used for other sandboxing applications

39 Evaluation Two issues to address: ● Performance overhead of emulated decoders – not important for long-term archival storage, but... – very important for common short-term uses of archives: backups, software distribution, structured documents,... ● Storage overhead of archived decoders

40 Performance Test Method Run 6 ported decoders on appropriate data sets – Athlon 64 3000+ PC running SuSE Linux 9.3 – Measure user-mode CPU time (not wall-clock time) Compare: – Emulated vs native execution – Running on x86-32 vs x86-64 host environment

41 Performance Overhead

42

43 Storage Overhead Archiver stores only one copy of each decoder – Storage cost amortized over all files of same type – Relative overhead depends on size of archive Therefore, measure only absolute decoder size (compressed, as stored in archive)

44 Storage Overhead

45 Conclusion VXA makes self-extracting archives... ● Safe: decoders fully sandboxed ● Future-proof: simple, OS-independent environment ● Easy: re-use existing decoders, languages, tools ● Efficient: ≤ 11% slowdown vs native x86-32 Available at: http://pdos.csail.mit.edu/~baford/vxa/


Download ppt "VXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute."

Similar presentations


Ads by Google