Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Small World (SSW) An Overlay Network and Index Structure for Semantic based P2P Search Presented by : Raj Kumar Rasam.

Similar presentations


Presentation on theme: "Semantic Small World (SSW) An Overlay Network and Index Structure for Semantic based P2P Search Presented by : Raj Kumar Rasam."— Presentation transcript:

1 Semantic Small World (SSW) An Overlay Network and Index Structure for Semantic based P2P Search Presented by : Raj Kumar Rasam

2 Presents the design of an overlay network called Semantic Small world (SSW) that facilitates efficient semantic based search in P2P systems. That is, given a query, which can be a point query or range query, we need to return a set of contents that are most relevant to the search criteria according to some semantic distance function. The proposed overlay network supports the following searches Semantic Search. Supports both Point and Range Queries. Support search for Top K Queries. Source –Mei Li, Wang-Chien Lee, Anand Sivasubramaniam (PSU). Semantic Small World: An Overlay Network for Peer-to-Peer Search, 12 th IEEE International Conference on Network Protocols 2004 ( ICNP 2004) Berlin, Germany. Semantic Small World

3 Based on three innovative Ideas: Small World Network: overlay network scalable to large networks. Semantic Clustering: Clustering strategy that places peer nodes based on the semantics of their data objects. Dimension Reduction: Adaptive Space Linearization (ASL) – technique for construction of a one-dimensional SSW to address the challenges raised by high dimensionality of semantic space. Dynamically cluster peers with semantically similar data closer to each other in the semantic space. Organize these clusters into an overlay network and build a distributed index structure when a peer joins or leaves the network. Semantic Small World

4 Semantic Space and Vector: Each object is represented as a k-element vector, namely Semantic Vector (SV) that can be mapped to a point in a k- Dimensional Semantic Space. Euclidean distance is used to represent the semantic closeness between two SVs. Semantic Small World Semantic Vector: or Given N k-dimensional data points the Centroid is Given X o and Y o as two points then Euclidean Distance D o is :

5 There are two types of queries. They are Point Query Range Query A point query is defined by the vector Semantic Small World A simple range query is described by an hyper rectangular region We expect to return those objects x such that the Euclidean distance between x and Q is minimum. We expect to return those objects that belong to the region Q

6 Characteristics of Small World Network: Average Path Length between two nodes. Cluster Coefficient defined as the probability that two neighbors of a node are neighbors themselves. Small average path length and large cluster coefficient. Searches can be efficiently conducted when network exhibits following properties: Each node knows its local neighbors called short range contacts. Each node knows a small number of randomly chosen distant nodes called long range contacts., with a probability proportional to C/d where d is the distance and C is the normalization constant that brings the total probability to 1. A search can be performed in Semantic Small World

7 Construction of SSW network Each cluster is owned by a maximum of M number of peers. Peer nodes within each cluster know each other. Have short range contacts and long range contacts 3 steps in constructing network 1. Obtain semantic label to position a peer. 2. Form peer clusters in semantic space. 3. Construct overlay network across these clusters Semantic Small World

8 1 (0,0)0.20.40.8 (1,0) (1,1) 0.8 0.6 0.2 0.40.25 0.8 0.4

9 Semantic Labeling: Executed before or when a peer node joins the network. A peer clusters its local data objects into data clusters consisting of data objects with similar semantics and chooses the centroid of its largest data cluster as its semantic label or join point. Cluster Formation SVs of all data objects form a virtual search space Semantic space is made up of clusters with a maximum size of M peers in each cluster. If the size of cluster exceeds M, it is split into two clusters. Semantic Small World

10 Overlay Network / Index construction. Each peer maintains a set of short range contacts and a certain number of long range contacts. These long range contacts reduce network diameter and transform the network into a small world with poly- logarithmic search cost. Dimension Reduction EX: SV for a document is made of 50-300 elements Adopt a technique called Adaptive Space Linearization (ASL), for linearizing the clusters in high dimensional space into a one-dimensional SSW (SSW-1D) through cluster split process. Semantic Small World

11 Cluster Splitting process If the cluster size exceeds pre-defined maximum size, M, it is partitioned. Choose two peers such that they are semantically farthermost from each other. Alternatively assign peers to the two sub-clusters based on the shortest distance to the seeds. Finally the cluster space is partitioned at the middle point of the dimension that has the largest span between the centroids of the semantic labels of the two sub-clusters. We need to describe a naming schema to maintain 1-1 mapping between the naming of clusters in SSW-1D and their semantic subspaces. Use 128-bit binary number (called cluster ID) to name the cluster. Semantic Small World

12 Naming Scheme for clusters Each peer maintains a variable, Par_Bit, which initially points to MSB of the cluster ID. Par_Bit indicates the bit to be set (to 0/1) in the next cluster split. After each split the two sub-clusters will rename their ID by resetting the bit pointed by Par_Bit separately, retaining all other bits the same as the ID of the original cluster and decrease their Par_Bit by one. The cluster that has smaller centroid along the partition dimension obtains an ID with the bit pointed by Par_Bit set to 0 and the other one obtains an ID with the bit pointed by Par_Bit set to 1. The process is repeated as more peers join the system to invoke more splits. Semantic Small World

13 00001000 1100 1010 P=1 P=2 P=3 P=2 P=3 0100 0010 P=3 1110 P=4 1111 P=3 0110 P=4 0101 08210 14 5 11 15 4 126 P=4 1011 (1,0) (1,1) (0,1) (0,0)0.2 0.25 0.40.6 0.2 0.6 0.8 0.4 0.8 Semantic Small World

14 c15 c6 c8 c12 c14 c11 c5 c4 c10 c0 c2

15 Semantic Small World 1 (0,0)0.20.40.8 (1,0) (1,1) 0.8 0.6 0.2 0.40.25 0.8 0.4

16 Data Structure At Each Peer ClusterState: { ClusterRange, ClusterSize, Par_His, Par_Bit } NeighborList: { NodeId } ShortContact: { NodeId, ClusterRange } LongContact: { NodeId, ClusterRange } ForeignIndex: { Semantic Vector, NodeId } Semantic Small World

17 Example: DS of Peer 1 that belongs to Cluster 4 ClusterState: ClusterRange = { (0, 0.25), (0.4, 0.8) } ClusterSize = 4 Par_His = { 1: (v, 0.4), 2: (h, 0.4), 3: (v, 0.25), 4: (h,0.8) } Par_Bit = 4 NeighborList: { 3, 4, 5 } ShortContact: { ( 7, {(0.00, 0.25), (0.80, 1.00)} ), (12, {(0.25, 0.40), (0.40, 1.00)} ) } LongContact: { (25, {(0.60, 1.00), (0.00, 0.20)} ) } ForeignIndex: { Semantic Vector, NodeId } Semantic Small World

18 Search: Search operation has two modes : 1. search-within-cluster 2. search-across-cluster To initiate search, requester has to first generate a search semantic vector (SV ) for the query Q. Search Within Cluster: Peer checks whether the SV for query Q falls within its cluster range. If this is the case, it floods the request to peers in its NeighborList except for the one from whom the message was received. Object with highest similarity is returned as result. Semantic Small World

19 Search-Across-Cluster If Q does not belong to ClusterRange of the peer, search across the cluster mode is invoked. In this case, a pseudo-cluster-name (PCN), the estimated ID for the cluster covering SV is calculated for the query based on the partition history (Par_His) stored at that particular peer. The search is continued by forwarding the message to the contact that has the closest naming distance to the PCN. The above process is repeated until the cluster whose semantic subspace covering SV is reached. Semantic Small World

20 Estimation of PCN Set all the bits of PCN to 0 Iterate through Par_His of current Peer(i), the bits of PCN are set as the same value of corresponding bits of Peer is Cluster ID, C i, as long as query confirms to the same Par_His entry. Otherwise, corresponding bit is set to a different value and PCN estimation process at peer i stops, since this peer does not have further details about the PCN. Semantic Small World

21 Algorithm For PCN Estimation For x = B to Par_Bit +1 Do Obtain Partition Dimension d and Partition Point p from Peer is Par_His x. If i.ClusterRange d p and Q d p or i.ClusterRange d > p and Q d > p Then PCN x = C i x. Else PCN x = 1 - C i x. Break. End If End For Semantic Small World

22 00001000 1100 1010 P=1 P=2 P=3 P=2 P=3 0100 0010 P=3 1110 P=4 1111 P=3 0110 P=4 0101 08210 14 5 11 15 4 126 P=4 1011 (1,0) (1,1) (0,1) (0,0)0.2 0.25 0.40.6 0.2 0.6 0.8 0.4 0.8 Semantic Small World

23 1. General Process Peer Join2. Cluster Splitting 3. Foreign Index Publishing General Process: 1. Generate x.label for the join point chosen by Peer x. 2. Sends a join message to existing Peer i. 3. Peer i performs search and directs the message to Peer j (contact peer) that covers the point. 4. If the cluster size of Peer j is below the maximum size M then Peer x simply joins the cluster. 5. If the size exceeds M, Cluster splitting is invoked. Semantic Small World

24 Cluster Splitting 1. If the cluster size > M contact peer j first obtains a complete list of the semantic labels of all the peers present in the cluster by polling the peers in the cluster through flooding. 2. Then it splits the cluster into two using the Cluster Splitting strategy as discussed earlier. 3. Peer j finishes the cluster splitting by informing all the other peers to update their ClusterState, ShortConatct, LongContact and ForeignIndex. 4. Cluster Splitting operation is invoked infrequently by choosing large M but large M will increase the search cost due to flooding within the cluster. Semantic Small World

25 Foreign Index Publishing: 1. After Joining the cluster, a peer may find that some of its local data objects do not belong to this cluster. 2. The newly joined peer publishes the locations of these data objects to their corresponding peer clusters (as foreign indexes). 3. The first node ( in a corresponding peer cluster) reached during the publishing process adds a tuple consisting of SV and the NodeId of source node for a data object into its foreign index store. Semantic Small World

26 Peer Removal 1. Peer Leave.2. Peer Failure. Peer Leave: 1. Checks to see if it is the last peer in the cluster. 2. If its not the last peer then it simply informs its leaving by transferring its foreign index to a randomly selected peer in its cluster. 3. If it is the last peer in the cluster then the semantic subspace of this cluster needs to be merged with one of its neighboring clusters. 4. The leaving peer then transfers the foreign index. 5. The receiving peer updates the cluster range as well as short range contacts. Similarly all the other peers in the cluster update their range as well as their affected short range contacts. Semantic Small World

27 Peer Failure: 1. A failed peer is detected during routine operations such as search. 2. If there is a failure in long range contact then it simply re-establishes another long range contact. 3. If the peer detecting the peer failure is located in one of the neighboring clusters E of the failed peer, originally located in cluster F, it is likely that other peers in cluster E maintain short range contacts with other live peers in cluster F. At an expense of two messages, the short range contact of the detecting peer can be recovered. 4. If a short range contact of a peer in cluster E cannot be recovered by contacting other peers in the network, it implies that no live peer exits in cluster F and at this point cluster merging is invoked. Semantic Small World

28 Object Publication: let SV x be the semantic vector of an object x, that a peer wants to publish, then 1. If SV x belongs to its cluster range then it includes the point corresponding to the semantic vector in its semantic subspace. If SV x does not belong to its cluster range then it searches for the cluster which covers this point and publishes it as a foreign index. Object Removal: If the SV x belongs to its cluster range then it just removes the point from its cluster. If it is stored as a foreign index, it sends a message to delete the foreign index stored at the node where the object is being published. Semantic Small World

29


Download ppt "Semantic Small World (SSW) An Overlay Network and Index Structure for Semantic based P2P Search Presented by : Raj Kumar Rasam."

Similar presentations


Ads by Google