Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.

Similar presentations


Presentation on theme: "Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T."— Presentation transcript:

1 Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T. Rowstron IEEE Journal on Selected Areas in Communications, Oct, 2002

2 Outline Pastry A peer-to-peer location and routing substrate Scribe Built on top of Pastry Experimental evaluation Delay penalty Node stress (routing tables) Link stress (network bandwidth)

3 Pastry (1/2) Each Pastry node has a unique, 128-b nodeId. The set of existing nodeIds is uniformly distributed. This is achieved by basing the nodeId on a secure hash of the node ’ s public key or IP address.

4 Pastry (2/2) Each node contains Routing tables (some of live nodes) Each entry maps a nodeId to the associated node ’ s IP address. IP addresses for the nodes in its “ leaf set ”. Leaf set (total l nodes) The set of nodes with l/2 numerically closest larger nodeId l/2 numerically closest smaller nodeId

5 Routing Given a message and a key, Pastry reliably routes the message to the node with the nodeId that is numerically closest to the key among all live nodes. In each routing step, the current node normally forwards the message to a node whose nodeId shares a longer prefix with the key. The key can be different from the destination nodeId.

6 Routing a message From node 65a1fc with key d46a1c

7 Locality properties Short routes property Concern the total distance that messages travel along Pastry routes. In each step, a message is routed to the nearest node with a longer prefix match. Route convergence property Concern the distance traveled by two messages sent to the same key before their routes converge. AB C E Converge D

8 Node addition The new nodeId X can initialize its state by contacting a nearby node A. A will route a special message using X as the key. This message is routed to the existing node Z with nodeId numerically closest to X. X then obtains the leaf set the routing table from Z. Z is the nearest node, so their leaf sets are almost the same. Their routing tables are very similar.

9 Failure To handle node failures, neighboring nodes in the nodeId space periodically exchange keep- alive messages. If a node is unresponsive for a period T, it is presumed failed. All members of the failed node ’ s leaf set are then notified and they update their leaf sets. Routing table entries that refer to the failed nodes are repaired lazily.

10 Scribe Scribe uses Pastry to manage group creation, group joining and to build a per-group multicast tree. Implementation CREATE JOIN MULTICAST LEAVE

11 Multicast tree creation 1100 1111 1001 0100 0111 1100 CREATE 0111 JOIN 1001 forwarder 0100 JOIN 1101 forwarder 1111 forwarder b = 1 ( match 1 bit at a time) Because b = 1, so both 1111 and 1101 can be a forwarder.

12 Membership Rendezvous point The root of the multicast tree. Can be changed. Forwarder Scribe nodes that are part of a group ’ s multicast tree. They may or may not be member of the group. Each forwarder maintains a children table.

13 Multicast message dissemination Multicast sources use Pastry to locate the rendezvous point of a group. They route to the rendezvous point and ask it to return its IP address. They cache the rendezvous point ’ s IP address and use it in subsequent multicasts to the group. Multicast messages are disseminated from the rendezvous point along the multicast tree. Why? Each multicast source can also be viewed as the root. If each multicast source transmit data by itself, the delay penalty in worst case can become twice.

14 Reliability Each nonleaf node in the tree sends a heartbeat message to its children. A child suspects that its parent is faulty when it fails to receive heartbeat messages. Upon detection of the failure of its parent, a node calls Pastry to route a JOIN message to a new parent. If the failed node is the root, a new root (the live node with the numerically closet nodeId to the groupId) will replace it.

15 Experimental evaluation Compare with IP multicast Delay penalty Node stress Link stress Experimental setup A network topology with 5,050 routers Scribe run on 100,000 end nodes. 1,500 groups

16 Delay penalty Scribe increases the delay to deliver messages relative to IP multicast. RMD The ratio between the maximum delay using Scribe and the maximum delay using IP multicast. RAD The ratio between the average delay using Scribe and the average delay using IP multicast.

17 Delay penalty Scribe / IP multicast The number of groups with a RAD or RMD lower than or equal to the relative delay.

18 Node stress (1/2)

19 Node stress (2/2) Each node averagely remembers few children. Long tail

20 Link stress IP multicast 950 Scribe 4031

21 Bottleneck remover (1/3) Reasons Some node may have less computational power or bandwidth available than others. The distribution of children table entries has a long tail. Algorithm When a node is overloaded, it selects the group that consumes the most resources. It chooses the child in this group that is farthest away.

22 Bottleneck remover (2/3) The parent drops the chosen child by sending it a message containing the children table for the group. When the child receives the message, It measures the delay between itself and other nodes in the table. It computes the total delay between itself and the parent via each node in the table. It sends a join message to the node that provides the smallest combined delay.

23 Bottleneck remover (3/3)

24 Node stress No long tail

25 Scalability Evaluating Scribe ’ s scalability with a large number of groups. Experimental setup 50,000 Scribe nodes 30,000 groups with 11 members

26 Node stress (1/2) Collapse will be introduced later.

27 Node stress (2/2) Scribe is inappropriate to small groups! Long tail

28 Scribe collapse (1/2) If a multicast group has few members, the group may require many other nodes to become forwarders. (The tree is inefficient.) The new algorithm collapses long paths in the tree. Removing nodes that are not members of a group and have only one entry on the group ’ s children table.

29 Scribe collapse (2/2)

30 Link stress Na ï ve unicast Scribe IP multicast Scribe collapse


Download ppt "Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T."

Similar presentations


Ads by Google