Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhenjiang Lin, Michael R. Lyu and Irwin King

Similar presentations


Presentation on theme: "Zhenjiang Lin, Michael R. Lyu and Irwin King"— Presentation transcript:

1 MatchSim: A Novel Neighbor-based Similarity Measure with Maximum Neighborhood Matching
Zhenjiang Lin, Michael R. Lyu and Irwin King Department of Computer Sciences and Engineering The Chinese University of Hong Kong, Shatin, Hong Kong Overview Motivation We propose a neighbor-based similarity measure, called MatchSim, to solve the problem of computing similarity between objects in a graph. The method recursively refining the similarities between objects by finding the maximum matching of similarity between their neighbors. Experimental results demonstrate the effectiveness of the proposed method. Effectively and efficiently exploring similarity between objects by exploiting the relationships ( the links) among them only. Main Contributions A neighbor-based similarity measure, MatchSim, which recursively extract similarity between two pages by finding the maximum matching between their similar neighbors. Experiments on two real-world datasets. Basic Idea Example: MatchSim vs SimRank SimRank: sim(a, b) = Σsim(ai , bj )/4 = 0.4. By dropping the most similar page-pair (a2 , b2), sim(a, b) increases to sim(a1 , b1 )/1 = 0.6, which is obviously counterintuitive! Figure: Measuring similarity between a and b based on their neighbors. (sim(a1, b1) = 0.6, sim(a1, b2) = sim(a2, b1) = 0.1, sim(a2, b2) = 0.8.) MatchSim: finds the maximum matching between their neighbors, and takes the average similarity of the matched pairs as sim(a, b). Here, sim(a, b) = (sim(a1 , b1) + sim(a2 , b2))/2 = 0.7. SimRank (an existing neighbor-based method): sim(a, b) is the average similarity between their neighbors. MatchSim (proposed method): sim(a, b) is the average similarity of the maximum matching between their neighbors. MatchSim Definition MatchSim Iteration sim(a, b) is the fix point of the iteration. W (a, b) is the sum of similarity of the maximum matching between neighbors of a and b, i.e., I(a) and I(b). Wk (a, b) is computed based on the scores simk (*, *). Iteration starts with sim(a, b) = 1 for a = b and 0 otherwise. Properties Convergency Symmetric: sim(a, b) = sim(b, a); Bounded: 0 ≦ sim(a, b) ≦1; Reaches 1 if and only if a and b are identical. The convergence has been proved theoretically. Experimentally converges within 15 iterations. Experimental Evaluation The Google Scholar (GS) Dataset: A citation graph crawled from containing 20,000 papers and 87,717 citations. Ground truth: “Related Articles” returned by Google Scholar. The CSE Website (CW) Dataset: A web graph crawled from containing 22,615 web pages and 120,947 hyperlinks. Ground truth: cosine TFIDF similarity scores.


Download ppt "Zhenjiang Lin, Michael R. Lyu and Irwin King"

Similar presentations


Ads by Google