Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.

Searching In Peer-To-Peer Networks Chunlin Yang

What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions as a client as well as a server with no administrator User on each computer decides what data on their computer will be shared on the network

What’s P2P – Continue To share huge volumes of data among peers in the network No dedicated servers or hierarchy among the computers in the network Examples: Gnutella, Freenet, and Napster

Why P2P Three internet fundamental assets: information, bandwidth, storage space Increasing amount of information, find useful information in real time is increasingly difficult Bandwidth: more have been done, however hot sites like Yahoo, eBay get more and more traffic bottleneck

Why P2P - Continue Computing resource: processors speed increase and storage device capacity get bigger, but data center accumulate more and more computation tasks P2P networking can greatly improve the utilization of the internet resources

Why P2P - Continue Load balance traffic to reduce the peak load on network Increase reliability and fault tolerance of the global system Fault tolerance for server down time, such as email delivery or slice big email package to small packets and transfer through multi- path.

Basic Searching Algorithms Gnutella: BFS Freenet: DFS Napster: Index Server

Basic Search Algorithm Gnutella Each node of the network simultaneously acts as a client as well as a server Conducts searching while listening for incoming queries Completely decentralized, every node is equal

Basic Searching Algorithm Gnutella - Continue A node send query to all its neighbors and each neighbor searches in its own resource and forward the message to all it’s own neighbors If a query is satisfied, a response will be sent back to the original requester using the reverse path

Basic Searching Algorithm Gnutella - Continue Queries are assigned GUIDs to avoid repetition Use a TTL of 7 (about 10000 nodes) to not congest the network Problem: can be cyclical, and cause excessive traffic

Basic Searching Algorithm Freenet Cooperative file distribution to improve documentation distribution efficiency by sharing bandwidth and disk Each file has a unique id and its locations Network of equal nodes, each acting as client and server

Basic Searching Algorithm Freenet - Continue Information stored on hosts under searchable keys Uses a depth-first search with depth limit D. Each node forwards the query to a single neighbor, and waits for a definite response from the neighbor If the query was not satisfied, the neighbor forwards the query to another neighbor

Basic Searching Algorithm Freenet - Continue If the query was satisfied, the response will be sent back to the query source using the reverse path Each node along the path copies data to its own database as well More popular information becomes easier to access

Basic Searching Algorithm Napster Centralized server has information of online users and songs location in database for quick search Client use peer-to-peer file transfer when a location of a song found from server Legal problem: ignores copyright Problem: same issue for client-server bottleneck and if the index server down

Improving Search Algorithms In Peer-to-Peer Network Iterative Deepening Directed BFS Local Indices Routing Indices NEVRLATE

Iterative Deepening Multiple breadth-first searches initiated with successively larger depth limits, until the query is satisfied or the Maximum depth has been reached. Example: policy P(a,b,c) first depth a, second depth b, and third depth c.

Iterative Deepening - Continue A Source mode S first initiates a BFS of depth a, When a node at depth a receives and process the query, it will store the query temporarily All messages frozen at nodes of a hops from the source S receives response messages from nodes that have processes the query

Iterative Deepening - Continue After a time period of predefined W, if the query has been satisfied, S does nothing Otherwise S starts another round of iteration by initiating a BFS of depth b S send a resend message of TTL of a, all node will only forward the resend message until to nodes at a hops

Iterative Deepening - Continue A node at hop a will drop the resend message and unfreeze the corresponding query by forwarding the query to all its neighbor with a TTL of b-a When message reach to node of hop b, the process continues in a similar fashion When process to level c, query will not be frozen, S will not initiate another iteration even the query is not satisfied. Problem ?

Directed BFS A node sends query to a subset of its neighbors that could return many results for minimum response time A node maintains simple statistics on its neighbors for past queries or the latency of the connection with that neighbor From these statistics, some rules can be used to pick up a node to send a query:

Directed BFS - Continue Neighbors that has returned highest number of results for previous queries Neighbors that returns response message having the lowest average number of hops Neighbors that has forward the largest number of message Neighbors that has the shortest message queue

Local Indices Each node n maintains an index over the data of all nodes within r hops of itself r is a system-wide variable known as the radius of the index When receive a query, a node can process it on behalf of every node with in r hops, data can be searched on fewer nodes to reduce the cost while keep the satisfaction

Local Indices - Continue A system-wide policy specifies the depths at which the query should be processed All nodes at the depths not listed in the policy simply forward the query Example P(1,5), Only nodes with a depths of 1 and 5 process the query while nodes at other depth just forward the query, Reason: Each node has information of its neighbors within 4 hops.

Routing Indices To allow a node to select the “best” neighbors to send a query to, Routing Indices is a data structure and associated algorithms that, given a query, returns a list of neighbors, ranked according to their goodness for the query, The goodness should in general reflect the number of documents in nearby nodes.

Routing Indices - Continue Each node has a local index for quickly finding local documents when a query is received. Nodes also have a Compound Routing Indices containing: The number of documents along each path, The number of documents on each topic of interest,

Routing Indices Example

Documents with topics -------------------------------------------------- Path #docs DB N T L A 150 30 20 0 100 B 100 20 0 10 30 C 1000 0 300 0 50 D 200 100 0 100 150

Routing Indices - Maintain When a connection is established between two nodes, they exchange their routing indices, and update its own indices and send message to its neighbors, When a node I disconnected from the network, node D detected, it will remove the row for I, and send a new routing indices of its own to all its neighbors to update.

NEVRLATE Network-Efficient Vast Resource Lookup At The Edge Directory servers to be organized into a logical 2-dimensional grid, or a set of sets of servers Enabling registration in one “horizontal” dimension and Lookup in the other “vertical” dimension.

NEVRLATE - Continue Each node is a directory server Each set of servers, the vertical cloud, can reach each other member of the set The set of sets of servers is the entire NEVRLATE network.

NEVRLATE - Continue

Each host register its resource and location to one node of each set When a query comes, only one set need to be searched to get all location containing the satisfied information Can also register to two nodes in each sets for fault tolerance

Extension Total rank of neighbor’s : weighed sum of all key ranks Assumption: high rank nodes should always be better to access or close to resource Dominating-set mark process: rule1/rule2, when remove a node from the DS, choose the one with less rank instead of uid

Extension - Continue Based on Mark Process (Wu & Li), the connected dominating set nodes will have relatively higher connectivity than non-DS nodes. The dominating set nodes need to have resource information and location of resource for their neighbor nodes. When search, request will be sent only to DS nodes to reduce cost and traffic while keep satisfactions.

Extension - Continue Clustering: when construct a cluster, choose the one with highest rank instead of lowest uid, choose the node with lowest rank as the gateway – low traffic Consider not only its own rank but also total ranks of its neighbors Max-min ranking: when searching, choose max as well as min for the key index rank

Extension - Continue Reason: max could be high traffic, min, low traffic Networks are dynamic, resources are dynamic, help to re-rank the networks Example: Glades Rd/Palmetto Park Rd SW  NE

Summary

Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.

Similar presentations

Presentation on theme: "Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.

Similar presentations

Presentation on theme: "Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions."— Presentation transcript:

Similar presentations

About project

Feedback