Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Small World Phenomenon: An Algorithmic Perspective by Anton Karatoun.

Similar presentations


Presentation on theme: "The Small World Phenomenon: An Algorithmic Perspective by Anton Karatoun."— Presentation transcript:

1 The Small World Phenomenon: An Algorithmic Perspective by Anton Karatoun

2 Small World Phenomenon Any two individuals in the network are likely to be connected through a short sequence of intermediate acquaintances. A significant area of study in the social sciences. A fundamental ingredient in the structural evolution of the World Wide Web.

3 Small World Phenomenon Question: Why should there exist short chain of acquaintances linking together arbitrary pairs of strangers? Answer: Random networks have low diameter

4 Small World Phenomenon Edges of the network are divided into “local” and “long-range” contacts. Re-wired ring lattice: set V of n points spaced uniformly on a circle. each point is connected to each of its k nearest neighbors, for a small constant k. small number of edges in which the endpoints are chosen uniformly at random from V.

5 Small World Phenomenon Two fundamentally surprising discoveries: There should exist such sort chains in the network of acquaintanceships. (existential) People should be able to find these chains knowing so little about the target individual. (algorithmic)

6 Small World Phenomenon Question: Why should arbitrary pairs of strangers be able to find short chains of acquaintances that link them together? What properties should a social network must posses in order for it to exhibit the cues and enable its members to find short chains through it?

7 The Model: Networks and Decentralized Algorithms Two dimensional grid and directed edges. Nodes: Lattice distance: d((i,j),(k,l)) = |k-i| + |l-j| Local contacts: node u has a directed edge to every other node within lattice distance p. (p>=1)

8 The Model II Long-ranged contacts: For universal constants q>=0, r>=0: Directed edges from u to q other nodes. The i-th directed edge from u has endpoint v with probability proportional to:

9 The Model III

10 The Model IV The model has a simple “geographic” interpretation: individuals live on a grid and know their neighbors for some number of steps in all directions; they also have some number of acquaintances distributed more broadly across the grid. Viewing p and q as fixed constants, we obtain a one-parameter family of network models by tuning the value of the exponent r.

11 Decentralized algorithm The goal: starting with two arbitrary nodes in the network (s and t) transmit a message from s to t in as few steps as possible. The message holder u in a given step has a knowledge of: The set of local contacts among all nodes The location, on the lattice, of the target t The locations and long-range contacts of all nodes that have come in contact with this message

12 Theorem 1 There is a constant a 0, depending on p and q but independent of n, so that when r=0 (the uniform distribution), the expected delivery time of any decentralized algorithm is at least a 0 n 2/3. When long-range contacts are formed independently of the geometry of the grid, short chains exist but the nodes will not be able to find them.

13 Theorem 1 (cont.) We consider the set U of all nodes within lattice distance n 2/3 of t. With high probability, the source s will lie outside of U, and if the message is never passed from a node to a long-ranged contact in U, the number of steps needed to reach t will be at least proportional to n 2/3. But the probability that any message holder has a long-range contact in U is roughly n -2/3, so the expected number of steps before a long-range contact in U is found is at least proportional to n 2/3 as well.

14 Theorem 2 There is a decentralized algorithm  and a constant a 2,independent of n, so that when r=2 (the inverse square distribution) and p=q=1, the expected delivery time of  is at most a 2 (logn) 2.  : in each step, the current message holder u chooses a contact that is as close to the target t as possible.

15 Theorem 2 (cont.) At exponent r=2 a node’s long-range contacts are nearly uniformly distributed over all “distance scales”. Given any node u, we can partition the remaining nodes of the lattice into sets A 0, A 1, …, A log n, where A j consists of all nodes whose lattice distance to u is between 2 j and 2 j+1.

16 Theorem 2 (cont.) At exponent r=2, each long-range contact of u is nearly equally likely to belong to any of the sets A j ; when r 2, there is a bias toward sets A j at nearer distances.

17 Theorem 3 Let 0<=r<2. There is a constant a r, depending on p, q, r, but independent of n, so that the expected delivery time of any decentralized algorithm is at least a r n (2-r)/3 Let r>2. Delivery time a r n (r-2)/(r-1)

18 Small World Phenomenon

19 We can generalize our results to k-dimensional lattice networks, for constant values of k, as well as less structured graphs with analogous scaling properties. In the k-dimensional case, a decentralized algorithm can construct paths of lengths polynomial in log n if and only if r = k A network should contain latent structural cues that can be used to guide a message towards a target.

20 Conclusion While we have focused on a very clean model, we believe that a more general conclusion can be drawn for small-world networks: that the correlation between local structure and long-range connections provides fundamental cues for finding paths through the network.

21 Symphony: Distributed Hashing In A Small World

22 Symphony A protocol for maintaining distributed hash tables in a wide area network. The key idea is to arrange all participants along a ring and equip them long distance contacts drawn from a family of harmonic distributions. The construction is scalable, flexible, stable in the presence of frequent updates and offers small average latency. The cost of updates when hosts join and leave is small.

23 Symphony Scalability: The protocol should work for a range of networks of arbitrary size. Stability: Hosts with arbitrary arrival and departure times, typically with small lifetimes. Performance: Low latency for hash lookup and low maintenance cost in the presence of frequent joins and leaves.

24 Symphony Flexibility: Smooth trade-offs between performance and state management complexity. Simplicity: Easy to understand, code, debug and deploy.

25 Symphony Core idea: place all hosts along a ring and equip each node with a few long-range links. With k=O(1) links per node, it is possible to route hash lookups with an average latency of O(1/k (logn) 2 ) hops.

26 Symphony: The Protocol Let I denote the unit interval [0,1) that wraps around. Whenever a node arrives, it chooses as its id a real number from I uniformly at random. A node manages that sub-range of I which corresponds to the segment on the circle between its own id and that of its immediate clockwise predecessor. A node maintains two short links with its immediate neighbors.

27 Symphony: The Protocol If a hash function maps an object to an m- bit hash key K, then the manager for this hash entry is the node whose sub-range contains the real number K/2 m. Every node has k>=1 long-range links. For each such link, a node first picks a random number x from a probability distribution function. Then it contacts the manager of x.

28 Symphony: The Protocol Probability distribution function (pdf): p n, where n denotes the current number of nodes. The function p n (x) takes the value 1/(x*ln n) when x lies in the range [1/n, 1], and is 0 otherwise. Pdf belongs to a family of harmonic distributions. Problems: estimation of n, choice of k.

29 Unidirectional Routing Protocol When a node wishes to lookup a hash key x from I, it forwards a lookup for x along that link (short or long) that minimizes the clockwise distance to x. The expected path length with unidirectional routing in an n-node network with k=O(1) links is O(1/k*(log n) 2 ) hops.

30 Bidirectional Routing Protocol A node routes a lookup for x along that link (incoming or outgoing) that minimizes the absolute distance to x. The expected path length with bidirectional routing in an n-node network with k=O(1) links is O(1/k*(log n) 2 ) hops. The constant hidden behind the big-O notation is less than 1.

31 Estimation Protocol Let X s denote the sum of segment lengths managed by any set of s distinct nodes. Then s/X s is an unbiased estimator for n. The estimate improves as s increases. Choice of s: s=3 is good enough in practice. A node estimates n by using the length of the segment it partitions and its two neighboring segments. These three segment length are readily available at no extra cost from the two nodes between which x inserts itself in the ring.

32 Conclusion Symphony scales well, has low lookup latency and maintenance cost with only a few neighbors per node. In particular, s=3 neighbors suffice for the Estimation Protocol and k=4 long-range links with Bidirectional routing are sufficient for low latencies in networks as big as 2 15 nodes.


Download ppt "The Small World Phenomenon: An Algorithmic Perspective by Anton Karatoun."

Similar presentations


Ads by Google