Qualifying Examination A Decentralized Location and Routing Infrastructure for Fault-tolerant Wide-area Applications John Kubiatowicz (Chair)Satish Rao.

Slides:

Advertisements

Similar presentations

SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.

Advertisements

Brocade: Landmark Routing on Peer to Peer Networks Ben Y. Zhao Yitao Duan, Ling Huang, Anthony Joseph, John Kubiatowicz IPTPS, March 2002.

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Internet Indirection Infrastructure (i3 ) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana UC Berkeley SIGCOMM 2002 Presented by:

Perspective on Overlay Networks Panel: Challenges of Computing on a Massive Scale Ben Y. Zhao FuDiCo 2002.

Tapestry: Scalable and Fault-tolerant Routing and Location Stanford Networking Seminar October 2001 Ben Y. Zhao

Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Peer to Peer and Distributed Hash Tables

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.

Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK

Scalable Content-Addressable Network Lintao Liu

Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.

CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.

1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.

Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.

The Oceanstore Regenerative Wide-area Location Mechanism Ben Zhao John Kubiatowicz Anthony Joseph Endeavor Retreat, June 2000.

Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.

Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.

SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Presented by.

Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.

A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:

SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.

Distributed Lookup Systems

Distributed Object Location in a Dynamic Network Kirsten Hildrum, John D. Kubiatowicz, Satish Rao and Ben Y. Zhao.

Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.

Weaving a Tapestry Distributed Algorithms for Secure Node Integration, Routing and Fault Handling Ben Y. Zhao (John Kubiatowicz, Anthony Joseph) Fault-tolerant.

Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.

1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.

Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.

Or, Providing High Availability and Adaptability in a Decentralized System Tapestry: Fault-resilient Wide-area Location and Routing Issues Facing Wide-area.

Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.

Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.

1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.

Tapestry: Finding Nearby Objects in Peer-to-Peer Networks Joint with: Ling Huang Anthony Joseph Robert Krauthgamer John Kubiatowicz Satish Rao Sean Rhea.

Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

“Umbrella”: A novel fixed-size DHT protocol A.D. Sotiriou.

Tapestry: Decentralized Routing and Location System Seminar S ‘01 Ben Y. Zhao CS Division, U. C. Berkeley.

Locality Aware Mechanisms for Large-scale Networks Ben Y. Zhao Anthony D. Joseph John D. Kubiatowicz UC Berkeley Future Directions in Distributed Computing.

Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)

1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.

1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.

PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.

Arnold N. Pears, CoRE Group Uppsala University 3 rd Swedish Networking Workshop Marholmen, September Why Tapestry is not Pastry Presenter.

Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.

Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.

SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.

Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.

1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.

Peer to Peer Network Design Discovery and Routing algorithms

CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.

Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter : Lee Youn Do Oct 5, 2005 Ben Y.Zhao, John Kubiatowicz, and Anthony.

LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.

Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.

1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.

Accessing nearby copies of replicated objects

Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service

John D. Kubiatowicz UC Berkeley

Dynamic Replica Placement for Scalable Content Delivery

Tapestry: Scalable and Fault-tolerant Routing and Location

MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference

P2P: Distributed Hash Tables

Brocade: Landmark Routing on Peer to Peer Networks

Presentation transcript:

Qualifying Examination A Decentralized Location and Routing Infrastructure for Fault-tolerant Wide-area Applications John Kubiatowicz (Chair)Satish Rao Anthony JosephJohn Chuang

Outline Motivation Tapestry Overview Examining Alternatives Preliminary Evaluation Research Plan

New Challenges in Wide-Area Trends: –Moores Law growth in CPU, b/w, storage –Network expanding in reach and bandwidth –Applications poised to leverage network growth Scalability: # of users, requests, traffic Expect failures everywhere –10 6 s of components MTBF decreases geometrically Self-management –Intermittent resources single centralized management policy not enough Proposal: solve these issues at infrastructure level so applications can inherit properties transparently

Clustering for Scale Pros: –Easy monitor of faults –LAN communication Low latency High bandwidth –Shared state –Simple load-balancing Cons: –Centralized failure: Single outgoing link Single power source Geographic locality –Limited scalability Outgoing bandwidth Power Space / ventilation

Global Computation Model Leverage proliferation of cheap computing resources: cpus, storage, b/w Global self-adaptive system –Utilize resources wherever possible –Localize effects of single failures –No single point of vulnerability Robust, adaptive, persistent

Global Applications? Fully distributed share of resources –Storage: OceanStore, Freenet –Computation: SETI, Entropia –Network bandwidth: multicast, content distribution Deployment: application-level protocol Redundancy at every level –Storage –Network bandwidth –Computation

Key: Routing and Location Network scale stress on location / routing layer Wide-area decentralized location and routing on an overlay Properties abstracted in such a layer –Scalability: million nodes, billion objects –Availability: survive routine faults –Dynamic Operation: self-configuring, adaptive –Locality: minimize system-wide operations –Load balanced operation

Research Issues Tradeoffs in performance vs. overhead costs –Overlay routing efficiency vs. routing pointer storage –Location locality vs. location pointer storage –Fault-tolerance and availability vs. storage, bandwidth used Performance stability via redundancy Not: –Application consistency issues –Application level load partitioning

Outline Motivation Tapestry Overview Examining Alternatives Preliminary Evaluation Research Plan

What is Tapestry A prototype of a dynamic, scalable, fault-tolerant location and routing infrastructure Suffix-based hypercube routing –Core system inspired by PRR97 Publish by reference Core API: –publishObject(ObjectID, [serverID]) –msgToObject(ObjectID) –msgToNode(NodeID)

Design Goals Scalability –Millions of nodes, billions of objects Self-configuration / Dynamic operation –Dynamic insertion –Surrogate Routing Fault-resilience / Self-maintenance –Transparent operation despite failures Exploit redundancy for performance stability

Plaxton, Rajamaran, Richa 97 Overlay network with randomly distributed IDs –Server (where objects are stored) –Client (which want to search/contact objects) –Router (which forwards messages from other nodes) Combined location and routing –Servers publish / advertise objects they maintain –Messages route to nearest server given object ID Assume global network knowledge

Basic Routing Example: Octal digits, 2 18 namespace,

Publish / Location Each object has associated root node, e.g. identity f( ) Root keeps a pointer to objects location Object O stored at server S –S routes to Root(O) –Each hop keeps in index database Client routes to Root(O), route to S when found R C S

Whats New PRR97 Benefits: –Simple fault-handling –Scalable: state: bLog b (N), hops: Log b (N) b=digit base, N= |namespace| –Exploits locality –Proportional route distance Limitations –Global knowledge algorithms –Root node vulnerability –Lack of adaptability Tapestry Inherited: –Scalability –Exploits locality –Proportional route distance New: –Distributed algorithms –Redundancy for fault-tolerance –Redundancy for performance –Self-configuring / adaptive

Fault-resilience Minimized soft-state vs. explicit fault-recovery Routing –Redundant backup routing pointers –Soft-state neighbor probe packets Location –Soft-state periodic republish 50 million files/node, daily republish, b = 16, N = 2 160, 40B/msg, worst case update traffic 156 kb/s, expected traffic for network w/ 2 40 nodes 39 kb/s –Hash objectIDs for multiple roots P(findingReference w/ partition) = 1 – (1/2) n where n = # of roots

Dynamic Surrogate Routing Real networks much smaller than namespace –sparseness in the network Routing to non-existent node (or, defining f: (N) (n), where N = namespace, n = set of nodes in network ) Example: Routing to root node of object O Need mapping from N n

PRR97 Approach to f(N i ) Given desired ID N i, –Find set S of nodes in existing network nodes n matching most # of suffix digits with N i –Choose S i = node in S with highest valued ID Issues: –Mapping must be generated statically using global knowledge –Must be kept as hard state in order to operate in changing environment –Mapping is not well distributed, many nodes in n get no mappings

Tapestry Approach to f(N i ) Globally consistent distributed algorithm: –Attempt to route to desired ID N i –Whenever null entry encountered, choose next higher non-null pointer entry –If current node S is only non-null pointer in rest of route map, terminate route, f(N i ) = S Assumes: –Routing maps across network are up to date –Null/non-null properties identical at all nodes sharing same suffix

Analysis of Tapestry Algorithm Globally consistent deterministic mapping Null entry no node in network with suffix consistent map identical null entries across same route maps of nodes w/ same suffix Additional hops compared to PRR solution: Reduce to coupon collector problem Assuming random distribution With n ln(n) + cn entries, P(all coupons)= 1-e -c For n=b, c=b-ln(b), P(b 2 nodes left) = 1-b/e b = # of additional hops Log b (b 2 ) = 2 Distributed algorithm with minimal additional hops

What if Not consistent? Two cases –A. If a node disappeared, and some node did not detect it. Routing proceeds on invalid link, fails No backup router, so proceed to surrogate routing –B. If a node entered, has not been detected, then go to surrogate node instead of existing node New node checks with surrogate after all such nodes have been notified Route info at surrogate is moved to new node

Properties of Overlay Logical hops through overlay per route Routing state per overlay node Overlay routing distance vs. underlying network –Relative Delay Penalty (RDP) Messages for insertion Load balancing

Alternatives: P2P Indices Current Solutions: –DNS server redirection, DNS peering Content Addressable Networks –Ratnasamy et al., ACIRI / UCB Chord –Stoica, Morris, Karger, Kaashoek, Balakrishnan MIT Pastry –Druschel and Rowstron Microsoft Research

Comparing the Alternatives Properties –Parameter –Logical Path Length –Neighbor-state –Routing Overhead (RDP) –Messages to insert –Consistency –Load-balancing TAP. ChordCANPastry Log b (N) Log 2 (N) O(d*N 1/d ) O(d) Base b bLog b (N) O(1) O(1) ? App-dep. EventualEpoch ??? Good Log b (N) Log 2 (N) NoneDimen dBase b bLog b (N)+O(b) O(Log 2 2 (N)) O(d*N 1/d ) Good O(Log b 2 (N))O(Log b (N)) Designed for P2P Systems

Outline Motivation Tapestry Overview Examining Alternatives Preliminary Evaluation Research Plan

Evaluation Metrics Routing distance overhead (RDP) Routing redundancy fault-tolerance –Availability of objects and references –Message delivery under link/router failures –Overhead of fault-handling Locality vs. storage overhead Optimality of dynamic insertion Performance stability via redundancy

Results: Location Locality Measuring effectiveness of locality pointers (TIERS 5000)

Results: Stability via Redundancy Impact of parallel queries on multiple roots on response time and its variability. Aggregate bandwidth measures b/w used for softstate republish 1/day and b/w used by requests at rate of 1/s.

Application-level multicast –Leverages Tapestry for scale fault-tolerance –Optimizations Self-configuring into sub-trees Group ID clusters for lower b/w Example Application: Bayeux Root

Research Scope Effectiveness as application infrastructure –Build new novel apps –Port existing apps to scale to wide-area Use simulations to better understand parameters effects on overall performance Explore further stability via statistics Understand / map out research space Outside scope: –DoS resiliency –Streaming media, P2P, content-distribution apps

Timeline 0-5 months Simulation/analysis of parameters impact on performance Quantify approaches to exploit routing redundancy, analyze via simulation Finish deployment of real dynamic Tapestry Consider alternate mechanisms –Learn from consistent hashing

Timeline 5-10 months Extend deployment to wide-area networks –Nortel, EMC, academic institutions –Evaluate real world performance Design and implement network-embedded SDS (w/ T. Hodes) Optimizing routing by fault prediction –Integrate link-characterization work (Konrad) Start writing dissertation

Timeline months Finish writing dissertation Travel / Interviews Graduate

Chord Distributed hashtable State: Log 2 (N) ptrs Hops: Log 2 (N) Key: –Correlation between overlay distance and physical distance occurs over time

CAN Distributed hashtable addressed in d dimension coordinate space State: O(d) Hops: expected O(dn 1/d ) Key: –Lack of network distance correlation partially offset by heuristics & redundancy –Soft-state immutable objects, consistency by epochs?

Pastry Incremental digit routing like Plaxton Object replicated at x nodes closest to objects ID State: b(Log b N)+O(b) Hops: O(Log b N) Key: –Does not exploit locality well –Infrastructure controls replicates and their placement –Questions of security, confidentiality, consistency? (Immutable objects?)

Algorithm: Dynamic Insertion New node desires integration w/ ID xxxxx Step 1: copy route map recursively ----x ---xx -xxxx --xxx

Dynamic Insertion cont. Step 2: Get relevant data from surrogate inform edge nodes of your existence Step 3: Notify neighbors to optimize paths surrogate --xxx