Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mining the Web to Determine Similarity Between Words, Objects, and Communities Author : Mehran Sahami Reporter : Tse Ho Lin 2007/9/10 FLAIRS, 2006.

Similar presentations


Presentation on theme: "1 Mining the Web to Determine Similarity Between Words, Objects, and Communities Author : Mehran Sahami Reporter : Tse Ho Lin 2007/9/10 FLAIRS, 2006."— Presentation transcript:

1 1 Mining the Web to Determine Similarity Between Words, Objects, and Communities Author : Mehran Sahami Reporter : Tse Ho Lin 2007/9/10 FLAIRS, 2006

2 2 Outline Motivation Objectives Methodology  Words  Objects  Communities Experiments Conclusion Personal Comments

3 3 Motivation Words  Many similarity measure are term-wise similarity. Objects  Users may be looking to find the same item sold at different vendors on the web. Communities  Users are seeking to find others with similar interests. Cos(“space exploration”, “NASA”)

4 4 Objectives We begin by describing a robust method for measuring the semantic similarity between short texts. We then examine the use of machine learning to produce similarity functions between semi- structured data elements. We measure the similarity between on –line communities of users as part of a recommendation system.

5 5 Methodology – Words Retrieved documents Compute the TFIDF term vector Query x Truncate top m weighted terms Query y vivi

6 6 Methodology – Objects Product NameISBNCategorization Me Talk Pretty One Day Paperback Edition0316776963Books Product NameISBNCategorization The Tiny Book of Boss Jokes0007152604Books Compute similarity between fields Compute similarity between fields Training the parameters Training the parameters Clustering

7 7 Methodology – Communities Joachims’ Combine Ranking B, R: Community

8 8 Experiments Words Objects

9 9 Experiments Communities m: The user is already a member of the recommended community n: The user visits but does not join the recommended community j: The user joins the recommended community L 2, MI1, MI2, IDF, L 1, LogOdds.

10 10 Conclusion In this paper we have presented several web- based applications where measuring the similarity between different entities is an important element for success.

11 11 Personal Comments Application  Similarity Measure, Record linkage. Advantage  The proposed approaches use large quantity of available on-line information. Drawback  The author doesn’t compare with other related methods in the experiment.

12 Parameters Training 12


Download ppt "1 Mining the Web to Determine Similarity Between Words, Objects, and Communities Author : Mehran Sahami Reporter : Tse Ho Lin 2007/9/10 FLAIRS, 2006."

Similar presentations


Ads by Google