Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU.

Slides:



Advertisements
Similar presentations
1. XP 2 * The Web is a collection of files that reside on computers, called Web servers. * Web servers are connected to each other through the Internet.
Advertisements

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advanced Piloting Cruise Plot.
Our library has two forms of encyclopedias: Hard copy and electronic versions. The first is simply the old-fashioned "book on the shelf" type of encyclopedia.
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 2 Getting Started.
Distributed Systems Architectures
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
BASIC SKILLS AND TOOLS USING ACCESS
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
UNITED NATIONS Shipment Details Report – January 2006.
Chapter 3: Top-Down Design with Functions Problem Solving & Program Design in C Sixth Edition By Jeri R. Hanly & Elliot B. Koffman.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 10 second questions
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Layering in Provenance Systems Margo Seltzer May 13, 2009 Provenance in Secure and Advanced Computer Systems.
Communicating over the Network
Database Systems: Design, Implementation, and Management
INTERNET PROTOCOLS Class 9 CSCI 6433 David C. Roberts Entire contents copyright 2011, David C. Roberts, all rights reserved.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Configuration management
Mehdi Naghavi Spring 1386 Operating Systems Mehdi Naghavi Spring 1386.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc
Searching over Many Sites in Jeremy Stribling Joint work with: Jinyang Li, M. Frans Kaashoek, Robert Morris MIT Computer Science and Artificial Intelligence.
1 The phone in the cloud Utilizing resources hosted anywhere Claes Nilsson.
ABC Technology Project
Microsoft Access.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
The World Wide Web. 2 The Web is an infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that.
IONA Technologies Position Paper Constraints and Capabilities for Web Services
©2007 First Wave Consulting, LLC A better way to do business. Period This is definitely NOT your father’s standard operating procedure.
BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Database System Concepts and Architecture
© 2012 National Heart Foundation of Australia. Slide 2.
Lecture plan Outline of DB design process Entity-relationship model
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Executional Architecture
Global Analysis and Distributed Systems Software Architecture Lecture # 5-6.
Addition 1’s to 20.
25 seconds left…...
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Equal or Not. Equal or Not
Slippery Slope
Januar MDMDFSSMDMDFSSS
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
PSSA Preparation.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Don’t Give Up on Distributed File Systems Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU.
From Model-based to Model-driven Design of User Interfaces.
Simplifying Wide-Area Application Development with WheelFS Jeremy Stribling In collaboration with Jinyang Li, Frans Kaashoek, Robert Morris MIT CSAIL &
Flexible, Wide-Area Storage for Distributed Systems Using Semantic Cues Jeremy Stribling Thesis Defense, August 6, 2009 Including material previously published.
WheelFS Jeremy Stribling, Frans Kaashoek, Jinyang Li, Robert Morris MIT CSAIL and New York University.
WheelFS Jeremy Stribling, Frans Kaashoek, Jinyang Li, Robert Morris
Presentation transcript:

Building Distributed, Wide-Area Applications with WheelFS Jeremy Stribling, Emil Sit, Frans Kaashoek, Jinyang Li, and Robert Morris MIT CSAIL and NYU

2 Grid Computations Share Data Nodes in a distributed computation share: –Program binaries –Initial input data –Processed output from one node as intermediary input to another node

3 So Do Users and Distributed Apps Shared home directory for testbeds (e.g., PlanetLab, RON) Distributed apps reinvent the wheel: –Distributed digital research library –Wide-area measurement experiments –Cooperative web cache Can we invent a shared data layer once?

4 Our Goal Distributed file system for testbeds/Grids App can share data between nodes Users can easily access data Simple-to-build distributed apps Node File foo Testbed/Grid File foo File foo

5 Current Solutions Usual drawbacks: –All data flows through one node –File systems are too transparent Mask failures Incur long delays Node Testbed/Grid Central File Server Copy foo File foo

6 Our Proposal: WheelFS A decentralized, wide-area FS Main contributions: 1) Provide good performance according to Read Globally, Write Locally 2) Give apps control with semantic cues

7 Talk Outline 1.How to decentralize your file system 2.How to control your files

8 What Does a File System Buy You? A familiar interface Language-independent usage model Hierarchical namespace useful for apps Quick-prototyping for apps

9 File Systems 101 File system (FS) API: –Open –{ Close/Read/Write } Directories translate file names to IDs App 1 App 2 Operating System File System API call Local hard disk Node

10 Distributed File Systems App 1 App 2 Operating System File System API call Local hard disk Node Testbed/Grid File 135 Dir 500: foo 135

11 Basic Design of WheelFS Node 653 Node 076 Node 150 Node 554 Node 402 Node 257 File 135? File 135 File 135 v2 File 135 v3 135 v2 135 v2 135 v3 135 v3 Consistency Servers

12 Read Globally, Write Locally Perform writes at local disk speeds Efficient bulk data transfer Avoid overloading nodes w/ popular files

13 Write Locally Node 653 Node 076 Node 150 Node 554 Node 402 Node 257 Create foo/bar 1.Choose an ID 2.Create dir entry 3.Write local file 550 Dir 209 (foo) File 550 (bar) bar = 550 Read foo/bar

14 Read Globally Node 653 Node 076 Node 150 Node 554 Node 402 Node 257 Read file 135 File 135 Cached 135 Cached Chunk Cached Contact node 2.Receive list 3.Get chunks Chunk Read file 135 File 135

15 Example: BLAST DNA alignment tool run on Grids Copy separate DB portions and queries to many nodes Run separate computations Later fetch and combine results

16 Example: BLAST With WheelFS, however: –No explicit DB copying necessary –Efficient initial DB transfers –Automatic caching for reused DBs and queries Could be better since data is never updated

17 Example: Cooperative Web Cache Collection of nodes that: –Serve redirected web requests –Fetch web content from original web servers –Cache web content and serve it directly –Find cached content on other CWC nodes

18 Example: Cooperative Web Cache Avoid hotspots if [ -f /wfs/cwc/$URL ]; then if notexpired /wfs/cwc/$URL; then cat /wfs/cwc/$URL exit fi wget $URL –O - | tee /wfs/cwc/$URL

19 if [ -f /wfs/cwc/$URL ]; then if notexpired /wfs/cwc/$URL; then cat /wfs/cwc/$URL exit fi wget $URL –O - | tee /wfs/cwc/$URL Example: Cooperative Web Cache Node 653 Node 076 Node 150 Node 554 Node 402 Node 257 File 135 Cached 135 Client $URL $URL? ?135 = v1 402 Chunk Cached 135 No! $URL File 550 $URL == 550 Dir 070 (/wfs/cwc)

20 Talk Outline 1.How to decentralize your file system 2.How to control your files

21 Example: Cooperative Web Cache Would rather fail and refetch than wait Perfect consistency isnt crucial if [ -f /wfs/cwc/$URL ]; then if notexpired /wfs/cwc/$URL; then cat /wfs/cwc/$URL exit fi wget $URL –O - | tee /wfs/cwc/$URL

22 Explicit Semantic Cues Allow direct control over system behavior Meta-data that attach to files, dirs, or refs Apply recursively down dir tree Possible impl: intra-path component –/wfs/cwc/.cue/foo/bar

23 Semantic Cues: Writability Applies to files WriteMany (default) WriteOnce Node 653 Node 076 Node 150 Node 554 Node 402 Node 257 File 135? File 135 File 135 v2 File 135 v3 Cached 135 v3 Cached 135

24 Semantic Cues: Freshness Applies to file references LatestVersion (default) AnyVersion BestVersion Node 653 Node 076 Node 150 Node 554 Node 402 Node 257 File 135? File 135 Cached 135

25 Semantic Cues: Write Consistency Applies to files or directories Strict (default) Lax Node 653 Node 076 Node 150 Node 554 Node 402 Node 257 Write File 135 File 135 Write File 135 File 135 v2 135 v2

26 Example: BLAST WriteOnce for all: –DB files –Query files –Result files Improves cachability of these files

27 Example: Cooperative Web Cache Reading an older version is ok: –cat /wfs/cwc/.maxtime=250,bestversion/foo Writing conflicting versions is ok: –wget > /wfs/cwc/.lax,writemany/foo if [ -f /wfs/cwc/.maxtime=250,bestversion/$URL ]; then if notexpired /wfs/cwc/.maxtime=250,bestversion/$URL; then cat /wfs/cwc/.maxtime=250,bestversion/$URL exit fi wget $URL –O - | tee /wfs/cwc/.lax,writemany/$URL

28 Discussion Must break data up into files small enough to fit on one disk Stuff we swept under the rug: –Security –Atomic renames across dirs –Unreferenced files

29 Related Work Every FS paper ever written Specifically: –Cluster FS: Farsite, GFS, xFS, Ceph –Wide-area FS: JetFile, CFS, Shark –Grid: LegionFS, GridFTP, IBP –POSIX I/O High Performance Computing Extensions

30 Conclusion WheelFS: distributed storage layer for newly-written applications Performance by reading globally and writing locally Control through explicit semantic cues