Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joshua Reich Princeton University Department of Computer Science 1 P2P File-Systems for Scalable Content Use.

Similar presentations


Presentation on theme: "Joshua Reich Princeton University Department of Computer Science 1 P2P File-Systems for Scalable Content Use."— Presentation transcript:

1 Joshua Reich Princeton University Department of Computer Science 1 P2P File-Systems for Scalable Content Use

2 Goal: Scalable Content Distribution In crowd (WAN – gateways) or cloud (LAN – data-center servers) For use Not all parts of content are used at the same time! –Multimedia content –Executables –Virtual Appliances 2

3 Domain: Cloud Data-center w/ –physical machines –network storage VM - Software implementation of hardware machine [the machine as an executable] VMM - Software layer virtualizing hardware [OS for VMs] 3

4 Motivation VM optimized for specific purpose –Virtual Appliances –Virtual Servers –Virtual Desktops (VDI) Zero config, isolated, easy to replicate Shared infrastructure is cheaper Less IT headache 15772 unique images on EC2 alone!* Hosted VDI market alone est. $65B in 2013** 4 *http://thecloudmarket.com/stats#/totals, checked 15 July 2011http://thecloudmarket.com/stats#/totals ** http://www.cio.com/article/487109/Hosted_Virtual_Desktop_Market_to_Cross_65_Billion_in_2013 http://www.cio.com/article/487109/Hosted_Virtual_Desktop_Market_to_Cross_65_Billion_in_2013 26 March, 2009

5 VM images stored on network Contention for networked storage results in I/O bottlenecks I/O bottlenecks significantly delay VM execution 5 High-level problem

6 VM image stored on SAN or NAS Accessed by servers hosting VDI instances Everyone comes to work in the morning, starts up their desktop SAN overloaded by simultaneous access Virtual Desktops stall SAN Example: Hosted VDI Boot Storm

7 7 Specific Challenges 1.Large image size + high demand = contention-induced network bottleneck 1.VMM expects complete image –Either download image completely –Or continual remote access 1.Complex VM image access patterns –Non-linear –Differ from run to run

8 Assume (2) & (3) aren’t problems Begins to look like video streaming Known approach: P2P Video-on-Demand –Need to stream a series of ordered pieces –While maintaining swarm efficiency –Use mix of earliest-first & rarest-first 8 Analogy to Streaming Video

9 9 Novel VMTorrent Architecture 1.Large image/high demand -> P2P 2.Complete VM image req’d -> Quick-Start 3.Non-linear access -> Profile Prefetch

10 Related Work Matrix WorkApproachProblem Addressed Notes Mietzner:2008 Shi:2008 Sequential distribution of VM images VM DeploymentSlow, doesn’t scale O’Donnell:2008 Chen:2009 Naive P2P distribution of VM images VM DeploymentSlow, scales IndustryHardware overprovisioning VM DeploymentFast, expensive Chandra:2005 Moka5 content prefetching + on-demand streaming Virtual Desktop Delivery Fast, highly structured Vlavianos:2006 Zhou:2007 Mix earliest first / random first prefetch Video StreamingFast, scales well VMTorrentQuick start + P2P + profile prefetch VM DeploymentFast, scales well 10

11 11 VM VMM Hardware/OS Custom FS Custom FS 6 6 7 7 VMTorrent Architecture Swarm P2P Manager profile VMTorrent Instance Unmodified VM & VMM

12 Understanding VMTorrent: A Bottom Up Approach 12

13 Traditional VM Execution 13 VM Host VM Image FS VM runs on some host Virtual Machine: software implementation of a computer Implementation stored in an image Image stored on host’s local file system

14 Traditional VM Execution 14 VM VMM Host Hardware/OS VM Image FS Virtual Machine Monitor virtualizes hardware Conducts I/O to image through FS

15 VM Execution Over Network 15 VM VMM Hardware/OS VM Image FS Either to download image Network Backend Network Backend Network backend used Or to access via remote FS

16 VM Execution Over Network 16 VM VMM Hardware/OS VM Image FS Remote access smaller hits, but also writes and re-reads Network Backend Network Backend Download – one big up front performance hit

17 Custom FS Custom FS Quick Start with Custom FS 17 VM VMM Hardware/OS VM Image FS Network Backend Network Backend Divide image into pieces But provide appearance of complete image to VMM Introduce custom file system

18 Custom FS Custom FS Quick Start w/ Custom FS 18 VM VMM Hardware/OS Network Backend Network Backend 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 VMM attempts to read piece 1 Piece 1 is present, read completes

19 Custom FS Custom FS Quick Start w/ Custom FS 19 VM VMM Hardware/OS Network Backend Network Backend 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 VMM attempts to read piece 0 Piece 0 isn’t local, read stalls VMM waits for I/O to complete VM stalls

20 Custom FS Custom FS Quick Start w/ Custom FS 20 VM VMM Hardware/OS Network Backend Network Backend 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 FS requests piece from backend Backend requests from network

21 Quick Start w/ Custom FS 21 VM VMM Hardware/OS Network Backend Network Backend 0 0 Later, network delivers piece 0 Custom FS Custom FS 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 Read completes Custom FS receives, updates piece VMM resumes VM’s execution

22 Improved Performance w/ Custom FS 22 VM VMM Hardware/OS No waiting for image download to complete Network Backend Network Backend No more writes or re-reads over network w/ remote FS Custom FS Custom FS 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 X X

23 Custom FS + Network Backend 23 VM VMM Hardware/OS Network Backend Network Backend Custom FS Custom FS 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0

24 Alleviate bottleneck to network storage 24 VM VMM Hardware/OS Network Backend Network Backend Custom FS Custom FS 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 Scaling w/ P2P Backend Swarm P2P Manager P2P Manager 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 Exchange pieces w/ swarm P2P copy remains pristine

25 25 VM VMM Hardware/OS Custom FS Custom FS 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 Minimizing Stall Time Swarm P2P Manager P2P Manager 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 VMM accesses to non-local pieces 6 6 7 7 4? 4! Trigger high priority swarm requests

26 26 VM VMM Hardware/OS Custom FS Custom FS 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 Custom FS + P2P Manager Swarm P2P Manager P2P Manager 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 6 6 7 7

27 P2P Challenge: Request fulfillment latency Delays –Network RTT –At image source (peer or server) Impact –If even occasionally it takes 0.5s to obtain piece –Over the course of thousands of requests –10’s of seconds may be lost 27

28 P2P Challenge: Network Capacity Mem-cached: ideal access rate for given physical machine 28 (s) Cumulative Demand FS: ideal access rate w/ read-once never write Prefeching: ideal access rate w/ perfect prefetching Even assuming no latency 100Mb Network Delay lower bound

29 Solution: Prefetch Perfectly 29

30 P2P Challenge: Image Access Highly Nonlinear 30

31 Collect access patterns for VM/workload Determine expected accesses –Divide accesses into blocks –Sort by average access time –Remove blocks accessed in small fraction of runs Encode new order in profile Solution: Generate Profile Using Statistical Ordering 31

32 E.g., During boot storm ⇒ All actively fetch same small set of pieces ⇒ Low piece diversity ⇒ Little opportunity for peers to share ⇒ Low swarming efficiency P2P Challenge: In-order Profile Prefetch Inefficient 32

33 Solution: Randomization and Throttling 33 Randomize prefetch order Rate limiting (based on priority) Deadline-based throttling

34 34 VM VMM Hardware/OS Custom FS Custom FS 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 VMTorrent Architecture Swarm P2P Manager 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 6 6 7 7 profile

35 35 VM VMM Hardware/OS Custom FS Custom FS 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 VMTorrent Architecture Swarm P2P Manager 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 6 6 7 7 profile VMTorrent Instance Unmodified VM & VMM

36 Evaluation 36

37 37 VM Hardware/OS Custom FS Custom FS 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 VMTorrent Prototype BT Swarm P2P Manager 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 6 6 7 7 profile Custom C Using FUSE Custom C++ & Libtorrent

38 Emulab Testbed* Up to 101 modern hardware nodes One VMTorrent instance per node 100Mb LAN 38 *[White:2002]

39 VMs 39

40 Workloads 40 VDI-like tasks

41 We use normalized runtime (boot through shutdown) Normalized against memory-cached execution Allows easy cross-comparison for different VM/workload combinations Data Presentation 41

42 Flash Crowd Ubuntu VM Boot-Shutdown task Immediate peer departure 42

43 Flash Crowd 43

44 Hypothesis Larger swarm -> Longer time until full swarm efficiency Demand prioritization -> Relative loss of prefetch piece diversity -> Lower swarm efficiency 44

45 Swarming Efficiency (flash crowd) 45 16 100 n

46 Staggered Arrival 46


Download ppt "Joshua Reich Princeton University Department of Computer Science 1 P2P File-Systems for Scalable Content Use."

Similar presentations


Ads by Google