A Scalable, Content-Addressable Network

Slides:

Advertisements

Similar presentations

Topology-Aware Overlay Construction and Server Selection Sylvia Ratnasamy Mark Handley Richard Karp Scott Shenker Infocom 2002.

Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.

Scalable Content-Addressable Network Lintao Liu

Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.

Topologically-Aware Overlay Construction and Server Selection Sylvia Ratnasamy, Mark Handly, Richard Karp and Scott Shenker Presented by Shreeram Sahasrabudhe.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.

A Scalable Content Addressable Network (CAN)

Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.

A Scalable Content Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker Presented by: Ilya Mirsky, Alex.

A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:

Distributed Lookup Systems

A Scalable Content- Addressable Network Sections: 3.1 and 3.2 Καραγιάννης Αναστάσιος Α.Μ. 74.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.

1 A Scalable Content- Addressable Network S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Proceedings of ACM SIGCOMM ’01 Sections: 3.5 & 3.7.

Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.

1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.

Wide-area cooperative storage with CFS

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network ACIRI U.C.Berkeley Tahoe Networks 1.

A Scalable Content-Addressable Network

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network ACIRI U.C.Berkeley Tahoe Networks 1.

1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.

Peer-to-Peer Networks and Distributed Hash Tables 2006.

Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.

CONTENT ADDRESSABLE NETWORK Sylvia Ratsanamy, Mark Handley Paul Francis, Richard Karp Scott Shenker.

Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.

Applied Research Laboratory David E. Taylor A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker.

Sylvia Ratnasamy (UC Berkley Dissertation 2002) Paul Francis Mark Handley Richard Karp Scott Shenker A Scalable, Content Addressable Network Slides by.

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.

1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)

Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.

1 Slides from Richard Yang with minor modification Peer-to-Peer Systems: DHT and Swarming.

A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.

Content Addressable Networks CAN is a distributed infrastructure, that provides hash table-like functionality on Internet-like scales. Keys hashed into.

Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.

P2P Group Meeting (ICS/FORTH) Monday, 28 March, 2005 A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp,

1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.

1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.

Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.

Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.

1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.

Peer to Peer Network Design Discovery and Routing algorithms

Topologically-Aware Overlay Construction and Sever Selection Sylvia Ratnasamy, Mark Handley, Richard Karp, Scott Shenker.

Peer-to-Peer Networks 03 CAN (Content Addressable Network) Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.

LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.

Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.

P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.

CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA

Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.

Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M

Plethora: Infrastructure and System Design

EE 122: Peer-to-Peer (P2P) Networks

CS 268: Peer-to-Peer Networks and Distributed Hash Tables

DHT Routing Geometries and Chord

A Scalable content-addressable network

CONTENT ADDRESSABLE NETWORK

A Scalable, Content-Addressable Network

Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service

Peer-to-Peer Storage Systems

Peer-to-Peer Networks and Distributed Hash Tables

Chord and CFS Philip Skov Knudsen

A Scalable Content Addressable Network

Presentation transcript:

A Scalable, Content-Addressable Network 1,2 3 1 Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker 1,2 1 1 2 3 ACIRI U.C.Berkeley Tahoe Networks

Outline Introduction Design Evaluation Strengths & Weaknesses Ongoing Work

Internet-scale hash tables essential building block in software systems Internet-scale distributed hash tables equally valuable to large-scale distributed systems?

Internet-scale hash tables essential building block in software systems Internet-scale distributed hash tables equally valuable to large-scale distributed systems? peer-to-peer systems Napster, Gnutella, Groove, FreeNet, MojoNation… large-scale storage management systems Publius, OceanStore, PAST, Farsite, CFS ... mirroring on the Web

Content-Addressable Network (CAN) CAN: Internet-scale hash table Interface insert(key,value) value = retrieve(key)

Content-Addressable Network (CAN) CAN: Internet-scale hash table Interface insert(key,value) value = retrieve(key) Properties scalable operationally simple good performance (w/ improvement)

Content-Addressable Network (CAN) CAN: Internet-scale hash table Interface insert(key,value) value = retrieve(key) Properties scalable operationally simple good performance Related systems: Chord/Pastry/Tapestry/Buzz/Plaxton ...

Problem Scope Design a system that provides the interface scalability robustness performance security Application-specific, higher level primitives keyword searching mutable content anonymity

Outline Introduction Design Evaluation Strengths & Weaknesses Ongoing Work

CAN: basic idea K V K V K V K V K V K V K V K V K V K V K V

CAN: basic idea insert (K1,V1) K V K V K V K V K V K V K V K V K V K V

CAN: basic idea insert (K1,V1) K V K V K V K V K V K V K V K V K V K V

CAN: basic idea (K1,V1) K V K V K V K V K V K V K V K V K V K V K V

CAN: basic idea retrieve (K1) K V K V K V K V K V K V K V K V K V K V

CAN: solution virtual Cartesian coordinate space entire space is partitioned amongst all the nodes every node “owns” a zone in the overall space abstraction can store data at “points” in the space can route from one “point” to another point = node that owns the enclosing zone

CAN: simple example 1

CAN: simple example 1 2

CAN: simple example 3 1 2

CAN: simple example 3 1 2 4

CAN: simple example

CAN: simple example I

CAN: simple example node I::insert(K,V) I

CAN: simple example node I::insert(K,V) (1) a = hx(K) I x = a

CAN: simple example I node I::insert(K,V) (1) a = hx(K) b = hy(K) y = b x = a

CAN: simple example I node I::insert(K,V) (1) a = hx(K) b = hy(K) (2) route(K,V) -> (a,b)

CAN: simple example I node I::insert(K,V) (1) a = hx(K) b = hy(K) (2) route(K,V) -> (a,b) (3) (a,b) stores (K,V) (K,V)

CAN: simple example J node J::retrieve(K) (1) a = hx(K) b = hy(K) (2) route “retrieve(K)” to (a,b) (K,V) J

CAN Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address)

CAN: routing table

CAN: routing (a,b) (x,y)

CAN: routing A node only maintains state for its immediate neighboring nodes

CAN: node insertion I new node 1) discover some node “I” already in CAN

CAN: node insertion (p,q) 2) pick random point in space I new node

CAN: node insertion J I new node (p,q) J I new node 3) I routes to (p,q), discovers node J

CAN: node insertion new J 4) split J’s zone in half… new owns one half

CAN: node insertion Inserting a new node affects only a single other node and its immediate neighbors

CAN: node failures Need to repair the space recover database (weak point) soft-state updates use replication, rebuild database from replicas repair routing takeover algorithm

CAN: takeover algorithm Simple failures know your neighbor’s neighbors when a node fails, one of its neighbors takes over its zone More complex failure modes simultaneous failure of multiple adjacent nodes scoped flooding to discover neighbors hopefully, a rare event

CAN: node failures Only the failed node’s immediate neighbors are required for recovery

Design recap Basic CAN Additional design features completely distributed self-organizing nodes only maintain state for their immediate neighbors Additional design features multiple, independent spaces (realities) background load balancing algorithm simple heuristics to improve performance

Outline Introduction Design Evaluation Strengths & Weaknesses Ongoing Work

Evaluation Scalability Low-latency Load balancing Robustness

CAN: scalability For a uniformly partitioned space with n nodes and d dimensions per node, number of neighbors is 2d average routing path is (dn1/d)/4 hops simulations show that the above results hold in practice Can scale the network without increasing per-node state Chord/Plaxton/Tapestry/Buzz log(n) nbrs with log(n) hops

CAN: low-latency Problem Solution latency stretch = (CAN routing delay) (IP routing delay) application-level routing may lead to high stretch Solution increase dimensions, realities (reduce the path length) Heuristics (reduce the per-CAN-hop latency) RTT-weighted routing multiple nodes per zone (peer nodes) deterministically replicate entries

CAN: low-latency #dimensions = 2 Latency stretch #nodes w/o heuristics w/ heuristics Latency stretch 16K 32K 65K 131K #nodes

CAN: low-latency #dimensions = 10 Latency stretch #nodes w/o heuristics w/ heuristics Latency stretch 16K 32K 65K 131K #nodes

CAN: load balancing Two pieces Dealing with hot-spots popular (key,value) pairs nodes cache recently requested entries overloaded node replicates popular entries at neighbors Uniform coordinate space partitioning uniformly spread (key,value) entries uniformly spread out routing load

Uniform Partitioning Added check at join time, pick a zone check neighboring zones pick the largest zone and split that one

Uniform Partitioning 65,000 nodes, 3 dimensions w/o check w/ check V = total volume n Percentage of nodes V 16 V 8 V 4 V 2 V 2V 4V 8V Volume

CAN: Robustness Completely distributed no single point of failure ( not applicable to pieces of database when node failure happens) Not exploring database recovery (in case there are multiple copies of database) Resilience of routing can route around trouble

Outline Introduction Design Evaluation Strengths & Weaknesses Ongoing Work

Strengths More resilient than flooding broadcast networks Efficient at locating information Fault tolerant routing Node & Data High Availability (w/ improvement) Manageable routing table size & network traffic

Weaknesses Impossible to perform a fuzzy search Susceptible to malicious activity Maintain coherence of all the indexed data (Network overhead, Efficient distribution) Still relatively higher routing latency Poor performance w/o improvement

Suggestions Catalog and Meta indexes to perform search function Extension to handle mutable content efficiently for web-hosting Security mechanism to defense against attacks

Outline Introduction Design Evaluation Strengths & Weaknesses Ongoing Work

Ongoing Work Topologically-sensitive CAN construction distributed binning

Distributed Binning Goal Idea CAN construction bin nodes such that co-located nodes land in same bin Idea well known set of landmark machines each CAN node, measures its RTT to each landmark orders the landmarks in order of increasing RTT CAN construction place nodes from the same bin close together on the CAN

Distributed Binning number of nodes 4 Landmarks (placed at 5 hops away from each other) naïve partitioning #dimensions=2 #dimensions=4 w/o binning w/ binning w/o binning w/ binning  20 15 latency Stretch 10 5 256 1K 4K 256 1K 4K number of nodes

Ongoing Work (cont’d) Topologically-sensitive CAN construction distributed binning CAN Security (Petros Maniatis - Stanford) spectrum of attacks appropriate counter-measures

Ongoing Work (cont’d) CAN Usage Application-level Multicast (NGC 2001) Grass-Roots Content Distribution Distributed Databases using CANs (J.Hellerstein, S.Ratnasamy, S.Shenker, I.Stoica, S.Zhuang)

Summary CAN Scalability Low-latency routing Robust an Internet-scale hash table potential building block in Internet applications Scalability O(d) per-node state Low-latency routing simple heuristics help a lot Robust decentralized, can route around trouble