Autonomous Replication for High Availability in Unstructured P2P Systems (Paper by Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen) Hristo.

Slides:

Advertisements

Similar presentations

Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.

Advertisements

Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper.

Playback delay in p2p streaming systems with random packet forwarding Viktoria Fodor and Ilias Chatzidrossos Laboratory for Communication Networks School.

On Large-Scale Peer-to-Peer Streaming Systems with Network Coding Chen Feng, Baochun Li Dept. of Electrical and Computer Engineering University of Toronto.

PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities F. M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D.

Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.

P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.

Availability in Global Peer-to-Peer Systems Qin (Chris) Xin, Ethan L. Miller Storage Systems Research Center University of California, Santa Cruz Thomas.

Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.

Autonomous Replication for High Availability in Unstructured P2P Systems Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen Department of.

1 A Framework for Lazy Replication in P2P VoD Bin Cheng 1, Lex Stein 2, Hai Jin 1, Zheng Zhang 2 1 Huazhong University of Science & Technology (HUST) 2.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

A probabilistic approach to building large scale federated systems Francisco Matias Cuenca-Acuna

Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,

Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.

Francisco Matias Cuenca-Acuna Christopher Peery Thu D. Nguyen Usando algoritmos probabilísticos para construir sistemas.

Self Healing Wide Area Network Services Bhavjit S Walha Ganesh Venkatesh.

presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.

Erasure Coding vs. Replication: A Quantiative Comparison

Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities Francisco Matias Cuenca-Acuna, Christopher Peery, Richard P.

Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.

Navigating and Sharing in a Decentralized World Francisco Matias Cuenca-Acuna

Text-Based Content Search and Retrieval in ad hoc P2P Communities Francisco Matias Cuenca-Acuna Thu D. Nguyen

Searching in Unstructured Networks Joining Theory with P-P2P.

1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.

Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.

P-Grid Presentation by Thierry Lopez P-Grid: A Self-organizing Structured P2P System Karl Aberer, Philippe Cudré-Mauroux, Anwitaman Datta, Zoran Despotovic,

Project Mimir A Distributed Filesystem Uses Rateless Erasure Codes for Reliability Uses Pastry’s Multicast System Scribe for Resource discovery and Utilization.

 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.

1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.

THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.

Multicast Communication Multicast is the delivery of a message to a group of receivers simultaneously in a single transmission from the source – The source.

1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.

1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.

Introduction Widespread unstructured P2P network

Communication (II) Chapter 4

Exploring VoD in P2P Swarming Systems By Siddhartha Annapureddy, Saikat Guha, Christos Gkantsidis, Dinan Gunawardena, Pablo Rodriguez Presented by Svetlana.

COCONET: Co-Operative Cache driven Overlay NETwork for p2p VoD streaming Abhishek Bhattacharya, Zhenyu Yang & Deng Pan.

Multi-level Hashing for Peer-to-Peer System in Wireless Ad Hoc Environment Dewan Tanvir Ahmed and Shervin Shirmohammadi Distributed & Collaborative Virtual.

Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.

UbiStore: Ubiquitous and Opportunistic Backup Architecture. Feiselia Tan, Sebastien Ardon, Max Ott Presented by: Zainab Aljazzaf.

The EigenTrust Algorithm for Reputation Management in P2P Networks

THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.

Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,

Autonomous Replication for High Availability in Unstructured P2P Systems Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen

ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.

TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.

Module 7: Resolving NetBIOS Names by Using Windows Internet Name Service (WINS)

Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.

Serverless Network File Systems Overview by Joseph Thompson.

Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.

Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.

Efficient Peer-to-Peer Keyword Searching 1 Efficient Peer-to-Peer Keyword Searching Patrick Reynolds and Amin Vahdat presented by Volker Kudelko.

1 Push-to-Peer Video-on-Demand System. 2 Abstract Content is proactively push to peers, and persistently stored before the actual peer-to-peer transfers.

Exact Regenerating Codes on Hierarchical Codes Ernst Biersack Eurecom France Joint work and Zhen Huang.

1 30 November 2006 An Efficient Nearest Neighbor (NN) Algorithm for Peer-to-Peer (P2P) Settings Ahmed Sabbir Arif Graduate Student, York University.

Peer to Peer Network Design Discovery and Routing algorithms

Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

CSCI 156: Lab 11 Paging. Our Simple Architecture Logical memory space for a process consists of 16 pages of 4k bytes each. Your program thinks it has.

CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.

School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.

Geethanjali College Of Engineering and Technology Cheeryal( V), Keesara ( M), Ranga Reddy District. I I Internal Guide Mrs.CH.V.Anupama Assistant Professor.

Pastry Scalable, decentralized object locations and routing for large p2p systems.

Providing Secure Storage on the Internet

Peer to Peer Information Retrieval

Mobile P2P Data Retrieval and Caching

by Mikael Bjerga & Arne Lange

Overview Multimedia: The Role of WINS in the Network Infrastructure

Presentation transcript:

Autonomous Replication for High Availability in Unstructured P2P Systems (Paper by Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen) Hristo Pentchev Proseminar Peer-to-Peer Information Systems

Autonomous Replication for High Availability in Unstructured P2P Systems 2 Motivation P2P Comunity I need X Has somebod y X ? Yes. Bye ! Please give me X ! We don’t have X. §$%?? ? grrrr

Autonomous Replication for High Availability in Unstructured P2P Systems 3 Overview  Introduction Introduction  PlanetP PlanetP  Autonomous Replication Autonomous Replication  Simulations Simulations

Autonomous Replication for High Availability in Unstructured P2P Systems 4 Introduction  P2P computing is getting important Availability  Data availability  Peer availability in P2P communities  A decentralized and autonomous replication algorithm  Increasing the availability of shared data using this algorithm in weakly organized systems is possible

Autonomous Replication for High Availability in Unstructured P2P Systems 5 Introduction Naive idea  Replicate a file  Send the replicas randomly The replication needs  Excess storage  Bandwidth

Autonomous Replication for High Availability in Unstructured P2P Systems 6 Introduction Example: Replicating a file 6 times in P2P community with average peer availability 20% increases the file’s availability to 79% but needs lots of bandwidth. 100GB + 600GB = 700GB The availability is equal to the probability that all peers having the file are offline. Conclusion: Naive replication is not effective.

Autonomous Replication for High Availability in Unstructured P2P Systems 7 Introduction How to improve the benefit of replication?  Erasure coding Divide a file in m fragments Code this m fragments in n (n > m), so that the file could be reproduced with any m (unique) fragments Send the fragments to random peers m n divide code

Autonomous Replication for High Availability in Unstructured P2P Systems 8 Introduction We win with erasure coding: More availability Less bandwidth because constant changes to the online membership do not require the movement of replicas any more

Autonomous Replication for High Availability in Unstructured P2P Systems 9 Overview  Introduction Introduction  PlanetP PlanetP  Autonomous Replication Autonomous Replication  Simulations Simulations

Autonomous Replication for High Availability in Unstructured P2P Systems 10 PlanetP  Gossiping-based publish/subscribe infrastructure Maintaining the loosely synchronized global data Gossiping in PlanetP - spreading new information Information needed by our algorithm:  Replica-to-peer mapping  Peer availability

Autonomous Replication for High Availability in Unstructured P2P Systems 11 PlanetP-Gossiping  PlaneP architecture  Local index = content of shared files  Global index = state of the community local index global index Node

Autonomous Replication for High Availability in Unstructured P2P Systems 12 Overview  Introduction Introduction  PlanetP PlanetP  Autonomous Replication Autonomous Replication  Simulations Simulations

Autonomous Replication for High Availability in Unstructured P2P Systems 13 Autonomous Replication - Terminology & Assumptions  Nodes have hoard set and fragment store Hoard set – used by offline operations  Hoarding entire files Fragment set - used by the algorithm  The size of the fragment sets is limited

Autonomous Replication for High Availability in Unstructured P2P Systems 14 Autonomous Replication - Terminology & Assumptions  Files have unique ID  Availability of fragments

Autonomous Replication for High Availability in Unstructured P2P Systems 15 Autonomous Replication hoard set replication store Replicator erasure code hoard set excess storage Target erasure code hoard set excess storage Target hoard set excess storage Target hoard set replication store

Autonomous Replication for High Availability in Unstructured P2P Systems 16 Autonomous Replication - The algorithm  The algorithm Each member advertises IDs in the global index (files + fragments) Each member periodically estimates the availability of its files and fragments Each member periodically generates random fragment of a random file and pushes it to a random target (with its availability) The target saves the fragment or rejects it

Autonomous Replication for High Availability in Unstructured P2P Systems 17 Autonomous Replication - Estimating the availability Estimating the availability Nodes insert/remove and to/from the global index Peers advertise their availability in the global index Peers could either hoard a file or store only one fragment of it Peers go offline but not permanently Peers are dropped from the global index after timeout

Autonomous Replication for High Availability in Unstructured P2P Systems 18 Autonomous Replication - Estimating the availability Note: Don’t work with duplicate fragments (small chance)

Autonomous Replication for High Availability in Unstructured P2P Systems 19 Autonomous Replication - Fragmentation and replication Fragmentation and replication Choose a random file with availability smaller then the wanted Produce some fragment of this file using erasure coding (n>>m) Advantages of this kind of erasure coding: 1.Small probability of duplicating fragments 2.Usefulness of fragments 3.Easy to reflect changes in the community

Autonomous Replication for High Availability in Unstructured P2P Systems 20 Autonomous Replication - Target’s activity Target’s activity Receiving a request  If the replication store is not full then save the fragment  If the replication store is full then compute the availability of the fragments in it if then reject the request else we have the case that then replace some fragment in the store with the new one But which one?

Autonomous Replication for High Availability in Unstructured P2P Systems 21 Autonomous Replication - Target’s activity Weighted random selection We make lottery with tickets divided in two bowls like 80:20. Every fragment receives the same portion from the “small” one. The tickets in the “big” bowl are shared between the fragments having availability bigger than the average depending on their availability. The “winner” must be ejected

Autonomous Replication for High Availability in Unstructured P2P Systems 22 Autonomous Replication - Target’s activity Example: We have 3 fragments a,b,c and 100 tickets: Average availability (in nines) + 10%=0,76 So fragment a has the biggest chance to be rejected. AvailabilityAvailability in nines Share big pool Share small pool Eviction probability a0,992676,60,73 b0,91136,60,19 c0,50,306,60,06

Autonomous Replication for High Availability in Unstructured P2P Systems 23 Autonomous Replication - Optimizations Optimizations on replicators Including some weighted random selection of files for replication Choosing targets with free space in the replication store

Autonomous Replication for High Availability in Unstructured P2P Systems 24 Autonomous Replication - Misbehaving peers Misbehaving peers Corrupting fragments  No consequences for the file reproduction  Wasted space Pushing high available fragments  No consequences for the usage of replication store  Wasted bandwidth

Autonomous Replication for High Availability in Unstructured P2P Systems 25 Autonomous Replication - Misbehaving peers Wrong information Worst case: a peer advertises very high availability and that it is hoarding a significant part from the shared data  The community will not replicate this files  Much free storage for greedy peers  The peer will receive lot of requests for the files it’s “hoarding”

Autonomous Replication for High Availability in Unstructured P2P Systems 26 Overview  Introduction Introduction  PlanetP PlanetP  Autonomous Replication Autonomous Replication  Simulations Simulations

Autonomous Replication for High Availability in Unstructured P2P Systems 27 Simulations  Assumptions for the simulator Synchronous replication’s attempt Simplified message transfer timing simulation Estimation of the availability every 10 min. Erasure coding with m=10  Simulated communities File SharingCorporateWorkgroup No. Members No. Files Node availability 24,9%80,7%33,3%

Autonomous Replication for High Availability in Unstructured P2P Systems 28 Simulations Results computed using equation 1 as analytical model (Figure 2)equation 1 excess storageavailability CO1,75X3 nines WG6X3 nines FS8X3 nines  High members average availability increases the availability Figure 2

Autonomous Replication for High Availability in Unstructured P2P Systems 29 Simulations Results from the simulation excess storageavailability CO1X3 nines WG9X(uniform)3 nines FS6X3 nines Figure 3  Server like peers increase the availability

Autonomous Replication for High Availability in Unstructured P2P Systems 30 Simulations Centralized knowledge Compare our algorithm (Rep) with one having centralized perfect knowledge (Omni). Less excess space used by Omni Better minimum availability by Omni Figure 3 Availability-Based Replacement Compare our algorithm (Rep) with one replacing the fragments in replication store by FIFO rule (Base). Higher availability with Rep

Autonomous Replication for High Availability in Unstructured P2P Systems 31 Simulations

Autonomous Replication for High Availability in Unstructured P2P Systems 32 Thank you for your attention !

Autonomous Replication for High Availability in Unstructured P2P Systems 33 Figure 2 Simulations

Autonomous Replication for High Availability in Unstructured P2P Systems 34 Figure 3 Simulations

Autonomous Replication for High Availability in Unstructured P2P Systems 35 Figure 3 Simulations

Autonomous Replication for High Availability in Unstructured P2P Systems 36 Autonomous Replication - Estimating the availability Back

Autonomous Replication for High Availability in Unstructured P2P Systems 37 Appendix: PlanetP – Membership Joining of new peers:  gossiping NEW Rejoining of peers  gossiping (the peer status is marked as ON-LINE) ON-LINE Leaving of present peers:  a peer discovers that another peer is OFF-LINE when an attempt to communicate with it fails:  the peer status is marked as OFF-LINE  information in the global index is not dropped  if a peer has been marked as OFF-LINE continuously for a time T Dead, it is assumed that the peer has left the community permanently:  all information about the peer is dropped OFF-LINE (before T Dead ):

Autonomous Replication for High Availability in Unstructured P2P Systems 38 Appendix: Rumoring algorithm A peer has a change A peer has a change: rumor The algorithm provides spreading of new information across a P2P community rumor if the rumor is new information for the peer P y, then it starts to push this rumor just like P x the peer P x stops pushing the rumor after it has contacted n consecutive peers that already heard the rumor PxPx rumor every T g seconds, a peer P x pushes this change (a rumor) to a peer P y chosen randomly from the global index PyPy

Autonomous Replication for High Availability in Unstructured P2P Systems 39 Appendix: Anti-entropy algorithm The algorithm allows to avoid the possibility of rumors dying out before reaching everyone All peers: global index summary the peer P y returns the summary of its version of the global index then P x can ask P y for any new information that it does not have PxPx every T r seconds, a peer P x attempts to pull information from a peer P y chosen randomly from the global index pull PyPy

Autonomous Replication for High Availability in Unstructured P2P Systems 40 Appendix: Partial anti-entropy algorithm The algorithm allows to reduce the time of new information spreading Extension of each push operation: The process requires only one extra message exchange in the case that P y knows something that P x does not identifiers of recent rumors the peer P y piggybacks the identifiers of a small number of the most recent rumors then P x can pull any recent rumor that did not reach it PxPx a peer P x pushes a rumor to a peer P y rumor PyPy