P2p, Fall 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search & Replication in Unstructured P2p.

p2p, Fall 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search & Replication in Unstructured P2p

p2p, Fall 05 2 Overview  Centralized Constantly-updated directory hosted at central locations (do not scale well, updates, single points of failure)  Decentralized but structured The overlay topology is highly controlled and files (or metadata/index) are not placed at random nodes but at specified locations  Decentralized and Unstructured Peers connect in an ad-hoc fashion The location of document/metadata is not controlled by the system No guaranteed for the success of a search No bounds on search time

p2p, Fall 05 3 Overview  Blind Search and Variations No information about the location of items  Informed Search Maintain (localized) index information Local and Routing Indexes Trade-off: cost of maintaining the indexes (when joining/leaving/updating) vs cost for search

p2p, Fall 05 4 Blind Search Flood-based each node contact its neighbors, which in turn contact their neighbors, until the item is located Exponential search time No guarantees

p2p, Fall 05 5 Blind Search: Issues  BFS vs DFS: better response time, more messages  Iterative vs Recursive (return path)  TTL (time to leave) parameter  Cycles (duplicate messages)  Connectivity  Power-Law Topologies: the ith node with most connections has ω/i a neighbors

p2p, Fall 05 6 Gnutella: Summary Completely decentralized Hit rates are high High fault tolerance Adopts well and dynamically to changing peer populations Protocol causes high network traffic (e.g., 3.5Mbps). For example: –4 connections C / peer, TTL = 7 –1 ping packet can cause packets No estimates on the duration of queries can be given No probability for successful queries can be given Topology is unknown  algorithms cannot exploit it Free riding is a problem Reputation of peers is not addressed Simple and robust

p2p, Fall 05 7 Summary and Comparison of Approaches

p2p, Fall 05 8 More on Search Search Options –Query Expressiveness (type of queries) –Comprehensiveness (all or just the first (or k) results) –Topology –Data Placement –Message Routing

p2p, Fall 05 9 Comparison     

p2p, Fall 05 10 Comparison         

p2p, Fall 05 11 Client-Server performs well –But not always feasible Ideal performance is often not the key issue! Things that flood-based systems do well –Scaling –Decentralization of visibility and liability –Finding popular stuff (e.g., caching) –Fancy local queries Things that flood-based systems do poorly –Finding unpopular stuff –Fancy distributed queries –Guarantees about anything (answer quality, privacy, etc.)

p2p, Fall 05 12 Blind Search Variations Expanding Ring or Iterative Deepening: Start BFS with a small TTL and repeat the BFS at increasing depths if the first BFS fails Works well when there is some stop condition and a “small” flood will satisfy the query Else even bigger loads than standard flooding Appropriate when hot objects are replicated more widely than cold objects Modified-BFS: Choose only a ratio of the neighbors (some random subset)

p2p, Fall 05 13 Blind Search Methods Random Walks: The node that poses the query sends out k query messages to an equal number of randomly chosen neighbors Each step follows each own path at each step randomly choosing one neighbor to forward it Each path – a walker Two methods to terminate each walker:  TTL-based or  checking method (the walkers periodically check with the query source if the stop condition has been met) It reduces the number of messages to k x TTL in the worst case Some kind of local load-balancing

p2p, Fall 05 14 Blind Search Methods Random Walks In addition, the protocol bias its walks towards high-degree nodes (choose the highest degree neighbor)

p2p, Fall 05 15 Topics in Database Systems: Data Management in Peer-to-Peer Systems Q. Lv et al, “Search and Replication in Unstructured Peer-to- Peer Networks”, ICS’02

p2p, Fall 05 16 Search and Replication in Unstructured Peer-to-Peer Networks Type of replication depends on the search strategy used (i)A number of blind-search variations of flooding (ii)A number of (metadata) replication strategies Evaluation Method: Study how they work for a number of different topologies and query distributions

p2p, Fall 05 17 Methodology Performance of search depends on  Network topology: graph formed by the p2p overlay network  Query distribution: the distribution of query frequencies for individual files  Replication: number of nodes that have a particular file Assumption: fixed network topology and fixed query distribution Results still hold, if one assumes that the time to complete a search is short compared to the time of change in network topology and in query distribution

p2p, Fall 05 18 Network Topology (1) Power-Law Random Graph A 9239-node random graph Node degrees follow a power law distribution when ranked from the most connected to the least, the i-th ranked has ω/i a, where ω is a constant Once the node degrees are chosen, the nodes are connected randomly

p2p, Fall 05 19 Network Topology (2) Normal Random Graph A 9836-node random graph

p2p, Fall 05 20 Network Topology (3) Gnutella Graph (Gnutella) A 4736-node graph obtained in Oct 2000 Node degrees roughly follow a two-segment power law distribution

p2p, Fall 05 21 Network Topology (4) Two-Dimensional Grid (Grid) A two dimensional 100x100 grid

p2p, Fall 05 22 Query Distribution Assume m objects Let q i be the relative popularity of the i-th object (in terms of queries issued for it) Values are normalized Σ i=1, m q i = 1 (1) Uniform: All objects are equally popular q i = 1/m (2) Zipf-like q i  1 / i α

p2p, Fall 05 23 Query Distribution & Replication When the replication is uniform, the query distribution is irrelevant (since all objects are replicated by the same amount, search times are equivalent for both hot and cold items) When the query distribution is uniform, all three replication distributions are equivalent (uniform!) Thus, three relevant combinations query-distribution/replication (1)Uniform/Uniform (2)Zipf-like/Proportional (3)Zipf-like/Square-root

p2p, Fall 05 24 Metrics Pr(success): probability of finding the queried object before the search terminates #hops: delay in finding an object as measured in number of hops

p2p, Fall 05 25 Metrics #msgs per node: Overhead of an algorithm as measured in average number of search messages each node in the p2p has to process #nodes visited Percentage of message duplication Peak #msgs: the number of messages that the busiest node has to process (to identify hot spots) These are per-query measures An aggregated performance measure, each query convoluted with its probability

p2p, Fall 05 26 Limitation of Flooding There are many duplicate messages (due to cycles) particularly in high connectivity graphs Multiple copies of a query are sent to a node by multiple neighbors Avoiding cycles, decreases the number of links Duplicated messages can be detected and not forwarded - BUT, the number of duplicate messages can still be excessive and worsens as TTL increases Choice of TTL  Too low, the node may not find the object, even if it exists, too high, burdens the network unnecessarily

p2p, Fall 05 27 Limitation of Flooding: Comparison of the topologies Power-law and Gnutella-style graphs particularly bad with flooding Highly connected nodes means higher duplication messages, because many nodes’ neighbors overlap Random graph best, Because in truly random graph the duplication ratio (the likelihood that the next node already received the query) is the same as the fraction of nodes visited so far, as long as that fraction is small Random graph better load distribution among nodes

p2p, Fall 05 28 Random Walks Experiments show that 16 to 64 walkers give good results checking once at every 4th step a good balance between the overhead of the checking message and the benefits of checking Keeping state (when the same query reaches a node, the node chooses randomly a different neighbor to forward it) Improves Random and Grid by reducing up to 30% the message overhead and up to 30% the number of hops Small improvements for Gnutella and PLRG

p2p, Fall 05 29 Random Walks When compared to flooding: The 32-walker random walk reduces message overhead by roughly two orders of magnitude for all queries across all network topologies at the expense of a slight increase in the number of hops (increasing from 2-6 to 4-15) When compared to expanding ring, The 32-walkers random walk outperforms expanding ring as well, particularly in PLRG and Gnutella graphs

p2p, Fall 05 30 Principles of Search  Adaptive termination is very important Expanding ring or the checking method  Message duplication should be minimized Preferably, each query should visit a node just once  Granularity of the coverage should be small Increase of each additional step should not significantly increase the number of nodes visited

p2p, Fall 05 31 Replication

p2p, Fall 05 32 Types of Replication Two types of replication  Metadata/Index: replicate index entries  Data/Document replication: replicate the actual data (e.g., music files)

p2p, Fall 05 33 Types of Replication Caching vs Replication Cache: Store data retrieved from a previous request (client- initiated) Replication: More proactive, a copy of a data item may be stored at a node even if the node has not requested it

p2p, Fall 05 34 Reasons for Replication Reasons for replication  Performance load balancing locality: place copies close to the requestor geographic locality (more choices for the next step in search) reduce number of hops  Availability In case of failures Peer departures Besides storage, cost associated with replication: Consistency Maintenance Make reads faster in the expense of slower writes

p2p, Fall 05 35 No proactive replication (Gnutella) –Hosts store and serve only what they requested –A copy can be found only by probing a host with a copy Proactive replication of “keys” (= meta data + pointer) for search efficiency (FastTrack, DHTs) Proactive replication of “copies” – for search and download efficiency, anonymity. (Freenet)

p2p, Fall 05 36 Issues Which items (data/metadata) to replicate Based on popularity In traditional distributed systems, also rate of read/write cost benefit: the ratio: read-savings/write-increase Where to replicate (allocation schema) More Later

p2p, Fall 05 37 Issues How/When to update Both data items and metadata

p2p, Fall 05 38 “Database-Flavored” Replication Control Protocols Lets assume the existence of a data item x with copies x 1, x 2, …, x n x: logical data item x i ’s: physical data items A replication control protocol is responsible for mapping each read/write on a logical data item (R(x)/W(x)) to a set of read/writes on a (possibly) proper subset of the physical data item copies of x

p2p, Fall 05 39 One Copy Serializability Correctness A DBMS for a replicated database should behave like a DBMS managing a one-copy (i.e., non-replicated) database insofar as users can tell One-copy serializable (1SR) the schedule of transactions on a replicated database be equivalent to a serial execution of those transactions on a one- copy database One-copy schedule: replace operation of data copies with operations on data items

p2p, Fall 05 40 ROWA Read One/Write All (ROWA) A replication control protocol that maps each read to only one copy of the item and each write to a set of writes on all physical data item copies. Even if one of the copies is unavailable an update transaction cannot terminate

p2p, Fall 05 41 Write-All-Available Write-all-available A replication control protocol that maps each read to only one copy of the item and each write to a set of writes on all available physical data item copies.

p2p, Fall 05 42 Quorum-Based Voting Read quorum V r and a write quorum V w to read or write a data item If a given data item has a total of V votes, the quorums have to obey the following rules: 1.V r + V w > V 2.V w > V/2 Rule 1 ensures that a data item is not read or written by two transactions concurrently (R/W) Rule 2 ensures that two write operations from two transactions cannot occur concurrently on the same data item (W/W)

p2p, Fall 05 43 Distributing Writes Immediate writes Deffered writes Access only one copy of the data item, it delays the distribution of writes to other sites until the transaction has terminated and is ready to commit. It maintains an intention list of deferred updates After the transaction terminates, it send the appropriate portion of the intention list to each site that contains replicated copies Optimizations – aborts cost less – may delay commitment – delays the detection of conflicts Primary or master copy Updates at a single copy per item

p2p, Fall 05 44 Eager vs Lazy Replication Eager replication: keeps all replicas synchronized by updating all replicas in a single transaction Lazy replication: asynchronously propagate replica updates to other nodes after the replicating transaction commits In p2p, lazy replication (or soft state)

p2p, Fall 05 45 Update Propagation Who initiates the update:  Push by the server item (copy) that changes  Pull by the client holding the copy When  Periodic  Immediate  Lazy: when an inconsistency is detected  Threshold-based: Freshness (e.g., number of updates or actual time) Value  Expiration-Time: Items expire (become invalid) after that time (most often used in p2p) Stateless or State-full (the “item-owners” know which nodes holds copies of the item)

p2p, Fall 05 46 Replication & Structured P2P

p2p, Fall 05 47 CHORD Invariant to guarantee correctness of lookups: Keep successors nodes up-to-date Method: Maintain a successor list of its “r” nearest successors on the Chord ring Why? Availability How to keep it consistent: Lazy thought a periodic stabilization Metadata replication or redundancy

p2p, Fall 05 48 CHORD Method: Replicate data associated with a key at the k nodes succeeding the key Why? Availability Data replication

p2p, Fall 05 49 CAN Multiple realities With r realities each node is assigned r coordinated zones, one on every reality and holds r independent neighbor sets Replicate the hash table at each reality Availability: Fails only if nodes at both r nodes fail Performance: Better search, choose to forward the query to the neighbor with coordinates closest to the destination Metadata replication

p2p, Fall 05 50 CAN Overloading coordinate zones Multiple nodes may share a zone The hash table may be replicated among zones Higher availability Performance: choices in the number of neighbors, can select nodes closer in latency Cost for Consistency Metadata replication

p2p, Fall 05 51 CAN Multiple Hash Functions Use k different hash functions to map a single key onto k points in the coordinate space Availability: fail only if all k replicas are unavailable Performance: choose to send it to the node closest in the coordinated space or send query to all k nodes in parallel (k parallel searches) Cost for Consistency Query traffic (if parallel searches) Metadata replication

p2p, Fall 05 52 CAN Hot-spot Replication A node that finds it is being overloaded by requests for a particular data key can replicate this key at each of its neighboring nodes Then with a certain probability can choose to either satisfy the request or forward it Performance: load balancing Metadata replication

p2p, Fall 05 53 CAN Caching Each node maintains a a cache of the data keys it recently accessed Before forwarding a request, it first checks whether the requested key is in its cache, and if so, it can satisfy the request without forwarding it any further Number of cache entries per key grows in direct proportion to its popularity Metadata replication

p2p, Fall 05 54 Replication Theory: Replica Allocation Policies

p2p, Fall 05 55 Question: how to use replication to improve search efficiency in unstructured networks with a proactive replication mechanism? Replication: Allocation Scheme How many copies of each object so that the search overhead for the object is minimized, assuming that the total amount of storage for objects in the network is fixed

p2p, Fall 05 56 Replication Theory Assume m objects and n nodes Each object i is replicated on r i distinct nodes and the total number of objects stored is R, that is Σ i=1, m r i = R Also, p i = r i /R Assume that object i is requested with relative rates q i, we normalize it by setting Σ i=1, m q i = 1 For convenience, assume 1 << r i  n and that q 1  q 2  …  q m

p2p, Fall 05 57 Replication Theory Assume that searches go on until a copy is found Searches consist of randomly probing sites until the desired object is found: search at each step draws a node uniformly at random and asks for a copy

p2p, Fall 05 58 Search Example 2 probes 4 probes

p2p, Fall 05 59 Replication Theory The probability Pr(k) that the object is found at the k’th probe is given Pr(k) = Pr(not found in the previous k-1 probes) Pr(found in one (the kth) probe) = (1 – r i /n) k-1 * r i /n k (search size: step at which the item is found) is a random variable with geometric distribution and θ = ri/n => expectation n/r i

p2p, Fall 05 60 Replication Theory A i : Expectation (average search size) for object i is the inverse of the fraction of sites that have replicas of the object A i = n/r i The average search size A of all the objects (average number of nodes probed per object query) A = Σ i q i A i = n Σ i q i /r i Minimize: A = n Σ i q i /r i

p2p, Fall 05 61 Replication Theory If we have no limit on r i, replicate everything everywhere Then, the average search size A i = n/r i = 1 Search becomes trivial How to allocate these R replicas among the m objects: how many replicas per object Assume a limit on R and that the average number of replicas per site ρ = R/n is fixed

p2p, Fall 05 62 Uniform Replication Create the same number of replicas for each object r i = R/m Average search size for uniform replication A i = n/r i = m/ρ A uniform = Σ i q i m/ρ = m/ρ (m n/R) Which is independent of the query distribution It makes sense to allocate more copies to objects that are frequently queried, this should reduce the search size for the more popular objects

p2p, Fall 05 63 Proportional Replication Create a number of replicas for each object proportional to the query rate r i = R q i

p2p, Fall 05 64 Uniform and Proportional Replication Summary: Uniform Allocation: p i = 1/m Simple, resources are divided equally Proportional Allocation: p i = q i “Fair”, resources per item proportional to demand Reflects current P2P practices Example: 3 items, q 1 =1/2, q 2 =1/3, q 3 =1/6 UniformProportional

p2p, Fall 05 65 Proportional Replication Number of replicas for each object: r i = R q i Average search size for uniform replication A i = n/r i = n/R q i A proportioanl = Σ i q i n/R q i = m/ρ = A uniform again independent of the query distribution Why? Objects whose query rate are greater than average (>1/m) do better with proportional, and the other do better with uniform The weighted average balances out to be the same So what is the optimal way to allocate replicas so that A is minimized?

p2p, Fall 05 66 Space of Possible Allocations q i+1 /q i ? p i+1 /p i As the query rate decreases, how much does the ratio of allocated replicas behave Reasonable: p i+1 /p i  1 =1 for uniform

p2p, Fall 05 67 Space of Possible Allocations Definition: Allocation p 1, p 2, p 3,…, p m is “in-between” Uniform and Proportional if for 1< i <m, q i+1 /q i < p i+1 /p i < 1 (=1 for uniform, = for proportial, we want to favor popular but not too much) Theorem1: All (strictly) in-between strategies are (strictly) better than Uniform and Proportional Theorem2: p is worse than Uniform/Proportional if for all i, p i+1 /p i > 1 (more popular gets less) OR for all i, q i+1 /q i > p i+1 /p i (less popular gets less than “fair share”) Proportional and Uniform are the worst “reasonable” strategies

p2p, Fall 05 68 q 2 /q 1 p 2 /p 1 Space of allocations on 2 items Worse than prop/uni More popular item gets less. Worse than prop/uni More popular gets more than its proportional share Better than prop/uni Uniform Proportional SR

p2p, Fall 05 69 So, what is the best strategy?

p2p, Fall 05 70 Square-Root Replication Find r i that minimizes A, A = Σ i q i A i = n Σ i q i /r i This is done for r i = λ √q i where λ = R/ Σ i √ q i Then the average search size is A optimal = 1/ρ ( Σ i √q i ) 2

p2p, Fall 05 71 How much can we gain by using SR ? Zipf-like query rates A uniform /A SR

p2p, Fall 05 72 Other Metrics: Discussion Utilization rate, the rate of requests that a replica of an object i receives U i = R q i /r i  For uniform replication, all objects have the same average search size, but replicas have utilization rates proportional to their query rates  Proportional replication achieves perfect load balancing with all replicas having the same utilization rate, but average search sizes vary with more popular objects having smaller average search sizes than less popular ones

p2p, Fall 05 73 Replication: Summary

p2p, Fall 05 74 Pareto Distribution (for the queries) Pareto principle: 80-20 rule 80% of the wealth owned by 20% of the population Zipf: what is the size of the rth ranked Pareto: how many have size > r

p2p, Fall 05 75 Replication (summary) Each object i is replicated on r i nodes and the total number of objects stored is R, that is Σ i=1, m r i = R (1) Uniform: All objects are replicated at the same number of nodes r i = R/m (2) Proportional: The replication of an object is proportional to the query probability of the object r i  q i (3) Square-root: The replication of an object i is proportional to the square root of its query probability q i r i  √q i

p2p, Fall 05 76 What is the search size of a query ?  Soluble queries: number of probes until answer is found.  Insoluble queries: maximum search size  Query is soluble if there are sufficiently many copies of the item.  Query is insoluble if item is rare or non existent. Assumption that there is at least one copy per object

p2p, Fall 05 77 SR is best for soluble queries Uniform minimizes cost of insoluble queries OPT is a hybrid of Uniform and SR Tuned to balance cost of soluble and insoluble queries. What is the optimal strategy?

p2p, Fall 05 78 UniformSR 10^4 items, Zipf-like w=1.5 All Soluble 85% Soluble All Insoluble

p2p, Fall 05 79 We now know what we need. How do we get there?

p2p, Fall 05 80 Replication Algorithms Fully distributed where peers communicate through random probes; minimal bookkeeping; and no more communication than what is needed for search. Converge to/obtain SR allocation when query rates remain steady. Uniform and Proportional are “easy” – Uniform: When item is created, replicate its key in a fixed number of hosts. – Proportional: for each query, replicate the key in a fixed number of hosts (need to know or estimate the query rate) Desired properties of algorithm:

p2p, Fall 05 81 Replication - Implementation Two strategies are popular Owner Replication When a search is successful, the object is stored at the requestor node only (used in Gnutella) Path Replication When a search succeeds, the object is stored at all nodes along the path from the requestor node to the provider node (used in Freenet) Following the reverse path back to the requestor

p2p, Fall 05 82 Achieving Square-Root Replication How can we achieve square-root replication in practice?  Assume that each query keeps track of the search size  Each time a query is finished the object is copied to a number of sites proportional to the number of probes On average object i will be replicated on c n/r i times each time a query is issued (for some constant c) It can be shown that this gives square root

p2p, Fall 05 83 Replication - Conclusion Thus, for Square-root replication an object should be replicated at a number of nodes that is proportional to the number of probes that the search required

p2p, Fall 05 84 Replication - Implementation If a p2p system uses k-walkers, the number of nodes between the requestor and the provider node is 1/k of the total nodes visited (number of probes) Then, path replication should result in square-root replication Problem: Tends to replicate nodes that are topologically along the same path

p2p, Fall 05 85 Replication - Implementation Random Replication When a search succeeds, we count the number of nodes on the path between the requestor and the provider Say p Then, randomly pick p of the nodes that the k walkers visited to replicate the object Harder to implement

p2p, Fall 05 86 Achieving Square-Root Replication What about replica deletion? Steady state: creation time equal with the deletion time The lifetime of replicas must be independent of object identity or query rate FIFO or random deletions is ok LRU or LFU no

p2p, Fall 05 87 Replication: Evaluation Study the three replication strategies in the Random graph network topology Simulation Details Place the m distinct objects randomly into the network Query generator generates queries according to a Poisson process at 5 queries/sec Zipf-distribution of queries among the m objects (with a = 1.2) For each query, the initiator is chosen randomly Then a 32-walker random walk with state keeping and checking every 4 steps Each sites stores at most objAllow (40) objects Random Deletion Warm-up period of 10,000 secs Snapshots every 2,000 query chunks

p2p, Fall 05 88 Replication: Evaluation For each replication strategy  What kind of replication ratio distribution does the strategy generate?  What is the average number of messages per node in a system using the strategy  What is the distribution of number of hops in a system using the strategy

p2p, Fall 05 89 Evaluation: Replication Ratio Both path and random replication generates replication ratios quite close to square-root of query rates

p2p, Fall 05 90 Evaluation: Messages Path replication and random replication reduces the overall message traffic by a factor of 3 to 4

p2p, Fall 05 91 Evaluation: Hops Much of the traffic reduction comes from reducing the number of hops Path and random, better than owner For example, queries that finish with 4 hops, 71% owner, 86% path, 91% random

p2p, Fall 05 92 Summary Random Search/replication Model: probes to “random” hosts Proportional allocation – current practice Uniform allocation – best for insoluble queries Soluble queries: Proportional and Uniform allocations are two extremes with same average performance Square-Root allocation minimizes Average Search Size OPT (all queries) lies between SR and Uniform SR/OPT allocation can be realized by simple algorithms.

p2p, Fall 05 93 Replication & Unstructured P2P epidemic algorithms

p2p, Fall 05 94 Replication Policy  How many copies  Where (owner, path, random path) Update Policy  Synchronous vs Asynchronous  Master Copy

p2p, Fall 05 95 Methods for spreading updates: Push: originate from the site where the update appeared To reach the sites that hold copies Pull: the sites holding copies contact the master site Expiration times Epidemics for spreading updates

p2p, Fall 05 96 Update at a single site Randomized algorithms for distributing updates and driving replicas towards consistency Ensure that the effect of every update is eventually reflected to all replicas: Sites become fully consistent only when all updating activity has stopped and the system has become quiescent Analogous to epidemics A. Demers et al, Epidemic Algorithms for Replicated Database Maintenance, SOSP 87

p2p, Fall 05 97 Methods for spreading updates: Direct mail : each new update is immediately mailed from its originating site to all other sites  Timely & reasonably efficient  Not all sites know all other sites  Mails may be lost Anti-entropy : every site regularly chooses another site at random and by exchanging content resolves any differences between them  Extremely reliable but requires exchanging content and resolving updates  Propagates updates much more slowly than direct mail

p2p, Fall 05 98 Methods for spreading updates: Rumor mongering :  Sites are initially “ignorant”; when a site receives a new update it becomes a “hot rumor”  While a site holds a hot rumor, it periodically chooses another site at random and ensures that the other site has seen the update  When a site has tried to share a hot rumor with too many sites that have already seen it, the site stops treating the rumor as hot and retains the update without propagating it further Rumor cycles can be more frequent that anti-entropy cycles, because they require fewer resources at each site, but there is a chance that an update will not reach all sites

p2p, Fall 05 99 Anti-entropy and rumor spreading are examples of epidemic algorithms Three types of sites:  Infective: A site that holds an update that is willing to share is hold  Susceptible: A site that has not yet received an update  Removed: A site that has received an update but is no longer willing to share Anti-entropy: simple epidemic where all sites are always either infective or susceptible

p2p, Fall 05 100 A set S of n sites, each storing a copy of a database The database copy at site s  S is a time varying partial function s.ValueOf: K  {u:V x t :T} set of keys set of values set of timestamps (totally ordered by < V contains the element NIL s.ValueOf[k] = {NIL, t}: item with k has been deleted from the database Assume, just one item s.ValueOf  {u:V x t:T} thus, an ordered pair consisting of a value and a timestamp The first component may be NIL indicating that the item was deleted by the time indicated by the second component

p2p, Fall 05 101 The goal of the update distribution process is to drive the system towards  s, s’  S: s.ValueOf = s’.ValueOf Operation invoked to update the database Update[u:V] s.ValueOf {r, Now{})

p2p, Fall 05 102 Direct Mail At the site s where an update occurs: For each s’  S PostMail[to:s’, msg(“Update”, s.ValueOf) Each site s’ receiving the update message: (“Update”, (u, t)) If s’.ValueOf.t < t s’.ValueOf  (u, t)  The complete set S must be known to s (stateful server)  PostMail messages are queued so that the server is not delayed (asynchronous), but may fail when queues overflow or their destination are inaccessible for a long time  n (number of sites) messages per update  traffic proportional to n and the average distance between sites s originator of the update s’ receiver of the update

p2p, Fall 05 103 Anti-Entropy At each site s periodically execute: For some s’  S ResolveDifference[s, s’] Three ways to execute ResolveDifference : Push (sender (server) - driven) If s.Valueof.t > s’.Valueof.t s’.ValueOf  s.ValueOf Pull (receiver (client) – driven) If s.Valueof.t < s’.Valueof.t s.ValueOf  s’.ValueOf Push-Pull s.Valueof.t > s’.Valueof.t  s’.ValueOf  s.ValueOf s.Valueof.t < s’.Valueof.t  s.ValueOf  s’.ValueOf s  s’ s pushes its value to s’ s pulls s’ and gets s’ value

p2p, Fall 05 104 Anti-Entropy Assume that  Site s’ is chosen uniformly at random from the set S  Each site executes the anti-entropy algorithm once per period It can be proved that  An update will eventually infect the entire population  Starting from a single affected site, this can be achieved in time proportional to the log of the population size

p2p, Fall 05 105 Anti-Entropy Let p i be the probability of a site remaining susceptible after the i cycle of anti-entropy For pull, A site remains susceptible after the i+1 cycle, if (a) it was susceptible after the i cycle and (b) it contacted a susceptible site in the i+1 cycle p i+1 = (p i ) 2 For push, A site remains susceptible after the i+1 cycle, if (a) it was susceptible after the i cycle and (b) no infectious site choose to contact in the i+1 cycle p i+1 = p i (1 – 1/n) n(1-p i ) 1 – 1/n (site is not contacted by a node) n(1-p i ) number of infectious nodes at cycle i Pull is preferable than push

p2p, Fall 05 106 Anti-Entropy compare the whole database instance sent over the network  Use checksums what about recent updates known only in a few sites  + A list of recent updates (now - timestamp < threshold τ) Compare fist recent updates, update the ckecksums and then compare the checksums, choice of τ  Maintain an inverted list of updates ordered by timestamp Perform anti-entropy by exchanging timestamps at reverse timestamp order until their checksums agree send only the updates, when to stop

p2p, Fall 05 107 Complex Epidemics: Rumor Spreading  Initial State: n individuals initially inactive (susceptible) Rumor planting&spreading:  We plant a rumor with one person who becomes active (infective), phoning other people at random and sharing the rumor  Every person bearing the rumor also becomes active and likewise shares the rumor  When an active individual makes an unnecessary phone call (the recipient already knows the rumor), then with probability 1/k the active individual loses interest in sharing the rumor (becomes removed) We would like to know:  How fast the system converges to an inactive state (no one is infective)  The percentage of people that know the rumor when the inactive state is reached

p2p, Fall 05 108 Complex Epidemics: Rumor Spreading Let s, i, r be the fraction of individuals that are susceptible, infective and removed s + i + r = 1 ds/dt = - s i di/dt = s i – 1/k(1-s) i s = e –(k+1)(1-s) An exponential decrease with s For k = 1, 20% miss the rumor For k = 2, only 6% miss it Unnecessary phone calls

p2p, Fall 05 109 Residue The value of s when i is zero: the remaining susceptible when the epidemic finishes Traffic m = Total update traffic / Number of sites Delay  Average delay (t avg ): difference between the time of the initial injection of an update and the arrival of the update at a given site averaged over all sites  The delay until (t last ) the reception by the last site that will receive the update during an epidemic Criteria to characterize epidemics

p2p, Fall 05 110 Blind vs. Feedback Feedback variation: a sender loses interest only if the recipient knows the rumor Blind variation: a sender loses interest with probability 1/k regardless of the recipient Counter vs. Coin Instead of losing interest with probability 1/k, use a counter so that we loose interest only after k unnecessary contacts s = e -m There are nm updates sent The probability that a single site misses all these updates is (1 – 1/n) nm Simple variations of rumor spreading m is the traffic Counters and feedback improve the delay, with counters playing a more significant role

p2p, Fall 05 111 Push vs. Pull Pull converges faster If there are numerous independent updates, a pull request is likely to find a source with a non-empty rumor list If the database is quiescent, the push phase ceases to introduce traffic overhead, while the pull continues to inject useless requests for updates Simple variations of rumor spreading Counter, feedback and pull work better

p2p, Fall 05 112 Minimization Use a push and pull together, if both sites know the update, only the site with the smaller counter is incremented Connection Limit A site can be the recipient of more than one push in a cycle, while for pull, a site can service an unlimited number of requests What if we set a limit:  Push gets better (reduce traffic, since the spread grows exponentially, most traffic occurs at the end  Pull gets worst

p2p, Fall 05 113 Hunting If a connection is rejected, then the choosing site can “hunt” for alternate sites Then push and pull similar

p2p, Fall 05 114 Complex Epidemic and Anti-entropy Anti-entropy can be run infrequently to back-up a complex epidemic, so that every update eventually reaches (or is suspended at) every site What happens when an update is discovered during anti- entropy: use rumor mongering (e.g., make it a hot rumor) or direct mail

p2p, Fall 05 115 Deletion and Death Certificates Replace deleted items with death certificates which carry timestamps and spread like ordinary data When old copies of deleted items meet death certificates, the old items are removed. But when to delete death certificates?

p2p, Fall 05 116 Dormant Death Certificates Define some threshold (but some items may be resurrected re- appear”) If the death certificate is older than the expected time required to propagate it to all sites, then the existence of an obsolete copy of the corresponding data item is unlikely Delete very old certificates at most sites, retaining “dormant” copies at only a few sites (like antibodies) Use two thresholds, t1 and t2 + a list of r retention sites names with each death certificate (chosen at random when the death certificate is created) Once t1 is reached, all servers but the servers in the retention list delete the death certificate Dormant death certificates are deleted when t1 + t2 is reached

p2p, Fall 05 117 Anti-Entropy with Dormant Death Certificates Whenever a dormant death certificate encounters an obsolete data item, it must be “activated”

p2p, Fall 05 118 How to choose partners Consider spatial distributions in which the choice tends to favor nearby servers Spatial Distribution

p2p, Fall 05 119 Spatial Distribution The cost of sending an update to a nearby site is much lower that the cost of sending the update to a distant site Favor nearby neighbors Trade off between: Average traffic per link and Convergence times Example: linear network, only nearest neighbor: O(1) and O(n) vs uniform random connections: O(n) and O(log n) Determine the probability of connecting to a site at distance d For spreading updates on a line, d -2 distribution: the probability of connecting to a site at distance d is proportional to d -2 In general, each site s independently choose connections according to a distribution that is a function of Q s (d), where Q s (d) is the cumulative number of sites at distance d or less from s

p2p, Fall 05 120 Spatial Distribution and Anti-Entropy Extensive simulation on the actual topology with a number of different spatial distributions A different class of distributions less sensitive to sudden increases of Q s (d) Let each site s build a list of the other sites sorted by their distances from s Select anti-entropy exchange partners from the sorted list according to a function f(i), where i is its position on the list (averaging the probabilities of selecting equidistant sites) Non-uniform distribution induce less overload on critical links

p2p, Fall 05 121 Spatial Distribution and Rumors Anti-entropy converges with probability 1 for a spatial distribution such that for every pair (s’, s) of sites there is a nonzero probability that s will choose to exchange data with s’ However, rumor mongering is less robust against changes in spatial distributions and network topology As the spatial distribution is made less uniform, we can increase the value of k to compensate

p2p, Fall 05 122 Replication II: A Push&Pull Algorithm Updates in Highly Unreliable, Replicated Peer-to-Peer Systems [Datta, Hauswirth, Aberer, ICDCS03]

p2p, Fall 05 123 Replication in P2P systems P-Grid CAN Unstructured P2P (sub-) network of replicas How to update them?

p2p, Fall 05 124 Problems in real-world P2P systems All replicas need to be informed of updates. Peers have low online probabilities and quorum can not be assumed. Eventual consistency is sufficient. Updates are relatively infrequent compared to queries. Metrics: Communication overhead, latency and percentage of replicas getting the update Updates in Highly Unreliable, Replicated Peer-to-Peer Systems [Datta, HauswirthAberer, ICDCS03]

p2p, Fall 05 125 Problems in real-world P2P systems (continued) Replication factor is substantially higher than what is assumed for distributed databases. Connectivity among replicas is high. Connectivity graph is random. Updates in Highly Unreliable, Replicated Peer-to-Peer Systems [Datta, HauswirthAberer, ICDCS03]

p2p, Fall 05 126 Updates in replicated P2P systems  P2P system’s search algorithm will find a random online replica responsible for the key being searched.  The replicas need to be consistent (ideally)  Probabilistic guarantee: Best effort! Assumption: each peer knows a subset of the all replicas for an item online offline

p2p, Fall 05 127 Updates in Highly Unreliable, Replicated Peer-to-Peer Systems [Datta, HauswirthAberer, ICDCS03] Update Propagation combines  A push phase is initiated by the originator of the update that pushes the new update to a subset of responsible peers it knows, which in turn propagate it to responsible peers they know, etc (similar to flooding with TTL)  A pull phase is initiated by a peer that needs to update its copy. For example, because (a) it was offline (disconnected) or (b) has received a pull request but is not sure that it has the most up-to-date copy Push and pull are consecutive, but may overlap in time

p2p, Fall 05 128 Algorithms Push:  If replica p gets Push(U, V, Rf, t) for a new (U, V) pair  Define Rp= random subset (of size R*fr) of replicas known to p  With probability PF(t): Push(U, V, Rf U Rp, t+1) to Rp \ Rf Rf: partial list of peers that have received the update, R number of replicas, fr: fraction of the total replicas which peers initially decide to forward the update (fan-out) Each message keeps the list of peers were the update has been sent Parameters:  TTL counter t  PF(t) probability (locally determined at each peer) to send the update  |Rp|size of the random subset - fanout Item, version, counter (similar to counters, when TTL

p2p, Fall 05 129 Selective Push 1 2 2 3 t t t+1 extra update message avoid parallel redundant update: messages are propagated only with probability PF < 1 and to a fraction of the neighbors 1 2 2 t t t+1 extra update message avoid sequential redundant update: partial lists of informed neighbors are transmitted with the message

p2p, Fall 05 130 Algorithms Strategy: Push update to online peers asap, such that later, all online peers always have update (possibly pulled) w.h.p. Pull:  If p coming online, or got no Push for time T  Contact online replicas  Pull updates based on version vectors

p2p, Fall 05 131 Scenario1: Dynamic topology 1 2 4 5 3 7 6 9 8

p2p, Fall 05 132 Scenario2: Duplicate messages 1 2 4 5 3 7 6 9 8 Necessary messages Avoidable duplicates Unavoidable (?) duplicates

p2p, Fall 05 133 Results: Impact of varying fanout How many peers learn about the update A limited fanout (fr) is sufficient to spread the update, since flooding is exponential. A large fanout will cause unnecessary duplicate messages

p2p, Fall 05 134 Results: Impact of probability of peer staying online in consecutive push rounds Sigma (σ) probability of online peers staying online in consecutive push rounds:

p2p, Fall 05 135 Results: Impact of varying probability of pushing Reduce the probability of forwarding updates with the increase in the number of push rounds

p2p, Fall 05 136 CUP: Controlled Update Propagation in Peer-to-Peer Networks [RoussopoulosBaker02] PCX: Path Caching with Expiration Cache index entries at intermediary nodes that lie on the path taken by a search query Cached entries typically have expiration times Not addressed: which items need to be updated as well as whether the interest in updating particular entries has died out CUP: Controlled Update Propagation Asynchronously builds caches of index entries while answering search queries + Propagates updates of index entries to maintain these caches (pushes updates)

p2p, Fall 05 137 CUP: Controlled Update Propagation in Peer-to-Peer Networks [RoussopoulosBaker02] Every node maintains two logical channels per neighbor:  a query channel: used to forward search queries  an update channel: used to forward query responses asynchronously to a neighbor and to update index entries that are cached at the neighbor (to proactively push updates) Queries travel to the node holding the item Updates travel along the reverse path taken by a query Query coalescing: if a node receives two or more queries for an item pushes only one instance Just one Update Channel (does not keep a separate open connection per request) All responses go through the update channel: use interest bits so it knows to which neighbors to push the response

p2p, Fall 05 138 CUP: Controlled Update Propagation in Peer-to-Peer Networks [RoussopoulosBaker02] Each node decides individually:  When to receive updates through registering its interest + an incentive-based policy to determine when to cut-off incoming updates  When to propagate updates

p2p, Fall 05 139 CUP: Controlled Update Propagation in Peer-to-Peer Networks [RoussopoulosBaker02] For each key K, node n stores a flag that indicates whether the node is waiting to receive an update for K in response to a query an interest vector: each bit corresponds to a neighbor and is set or clear depending on whether the neighbor is or is not interested in receiving updates for K a popularity measure or request frequency of each non-local key K for which it receives queries The measure is used to re-evaluate whether it is beneficial to continue caching and receiving updates for K

p2p, Fall 05 140 CUP: Controlled Update Propagation in Peer-to-Peer Networks [RoussopoulosBaker02] For each key, the authority node that owns the key is the root of the CUP tree Updates originate at the root of the tree and travel downstream to interested nodes Types of updates: deletes, refresh, append Example: A is the root for K3 Applicable to both structured and unstructured In structured, the query path is well- defined with a bounded number of hops

p2p, Fall 05 141 CUP: Controlled Update Propagation in Peer-to-Peer Networks [RoussopoulosBaker02] Handling Queries for K: 1. Fresh entries for key K are cached use it to push the response to the querying neigborhood 2. Key K is not in cache added and marked it as pending (to coalesce potential bursts) 3. All cached entries for K have expired push the query Handling Updates for K: An update of K is forwarded only to neighbors have registered interest in K Also, an adaptive control mechanism to regulate the rate of pushed updates

p2p, Fall 05 142 CUP: Controlled Update Propagation in Peer-to-Peer Networks [RoussopoulosBaker02] Adaptive control mechanism to regulate the rate of pushed updates Each node N has a capacity U for pushing updates that varies with its workload, network bandwidth and/or network connectivity N divides U among its outgoing update channels such that each channel gets a share that is proportional to the length of its queue Entries in the queue may be re-ordered

P2p, Fall 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search & Replication in Unstructured P2p.

Similar presentations

Presentation on theme: "P2p, Fall 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search & Replication in Unstructured P2p."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

P2p, Fall 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search & Replication in Unstructured P2p.

Similar presentations

Presentation on theme: "P2p, Fall 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search & Replication in Unstructured P2p."— Presentation transcript:

Similar presentations

About project

Feedback