Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

Similar presentations


Presentation on theme: "Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed."— Presentation transcript:

1 Topic-Sensitive PageRank Taher H. Haveliwala

2 PageRank Importance is propagated A global ranking vector is pre-computed

3 PageRank

4 Topic-Sensitive PageRank Basic idea  For each topic, the importance scores for each page are computed  Composite score of a page are calculated by combining the scores of the page based on the topics of the query

5 Topic-Sensitive PageRank ODP-Biasing The top level categories of the Open Directory (16 topics) is used Let T j be the set of URLs in the ODP categories c j In computing the PageRank vector for topic c j, we replace the uniform damping vector by the non- uniform vector where It will be referred as

6 Topic-Sensitive PageRank We chose to make P(c j ) uniform

7 Topic-Sensitive PageRank

8 Experiment

9 Experimental Results Similarity Measure for Induced Rankings  overlap of two sets A and B =. k = 20  Kendall’s distance measure

10 Experimental Results

11

12

13

14

15 Query-Sensitive Scoring  User Study 10 queries (randomly selected from our test set) 5 volunteers For each query, the volunteer was shown 2 result rankings: 1. top 10 results ranked with the unbiased PageRank vector 2. top 10 results ranked with the topic-sensitive PageRank vector

16 Experimental Results  User Study( con’t) The volunteer was asked to 1. select all URLs which were “relevant” to the query 2. select the ranking list which is better  (They were not told anything about how either of the rankings was generated.)

17 Experimental Results

18

19 Context-Sensitive Scoring

20 Experimental Results

21 Other issues Search Context  hierarchical directory  users’ browsing patterns  Bookmarks  email archives

22 Other issues  Flexibility Apply to any kinds of context  Transparency tune the classifier used on the search context, or adjust topic weights  Privacy a client-side program could use the user context to generate the user profile locally  Efficiency query-time cost and the offline preprocessing cost is low

23 Automatic Identification of User Interest For Personalized Search Feng QiuJunghoo Cho

24 User Preference Representation Topic Preference Vector  T = [T(1),…,T(m)]  T(i) represents the user’s degree of interest in the i th topic 

25 User Preference Representation

26 User Model Topic-Driven Random Surfer Model The user browses the web in a two-step process. First, the user chooses a topic of interest t for the ensuing sequence of random walks with probability T(t) Then with equal probability, she jumps to one of the pages on topic t Starting from this page, the user then performs a random walk, such that at each step, with probability d, she randomly follows an out-link on the current page; with the remaining probability 1-d she gets bored and picks a new topic of interest for the next sequence of random walks based on T and jumps to a page on the chosen topic. This process is repeated forever.

27 User Model Topic-Driven Searcher Model The user always visits web pages through a search engine in a two-step process. First, the user chooses a topic of interest t with probability T(t). Then the user goes to the search engine and issues a query on the chosen topic t. The search engine then returns pages ranked by TSPR t (p), on which the user clicks.

28 User Model Relationship between V and T  Under Topic-Driven Random Surfer Model  Under Topic-Driven Searcher Model

29 Learning Topic Preference Vector Problem  Given V and TSPR i, find T satisfies

30 Learning Topic Preference Vector Linear regression  Minimize the square-root error Maximum likelihood estimator **  = the probability that the user visits the page p

31 Ranking Search Results Using Topic Preference Vectors Ranking of page p = because

32 Evaluation Metrics Accuracy of topic preference vector  T e is our estimation based on the user’s click history  T is the user’s actual topic preference vector

33 Evaluation Metrics Accuracy of personalized ranking  Kendall distance between and  is the sorted list of top-k pages based on the estimated personalized ranking scores  is the sorted list of top-k pages computed the user ‘s true preference vector

34 Evaluation Metrics Improvement in search quality  Average rank of relevant pages in the search result  S denotes the set of the pages the user u selected  R(p) is the ranking of the page p

35 Experiments User Study  10 subjects in the UCLA Computer Science Department  04/2004 – 10/2004 (6 months)  Queries to Google, results and clicked URLs average number of queries per subject = 255.6 average number of clicks per query = 0.91

36 Experiments Accuracy of Learning Method  synthetic dataset generated by simulation based on our topic-driven searcher model Generation of topic preference vector Randomly choose K topics and assign random weight for them. The weight of others are set to zero. The vector is then normalized Generation of click history Use the generated topic preference vector to generate the clicks by the visit probability distribution dictated by the topic-driven searcher model

37 Experiments  Accuracy of estimated topic preference vector

38 Experiments  Accuracy of estimated topic preference vector

39 Experiments Accuracy of Personalized PageRank

40 Experiments Accuracy of Personalized PageRank

41 Experiments Quality of Personalized Search

42 Experiments Quality of Personalized Search

43 Conclusion Proposed a framework to investigate the problem of personalizing web searching by the user search history and TSPR Conducted both theoretical and real life experiments to evaluate the approach

44 Thank you


Download ppt "Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed."

Similar presentations


Ads by Google