Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.

Similar presentations


Presentation on theme: "Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of."— Presentation transcript:

1 Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of Science and Technology Trondheim, Norway mnorozi@idi.ntnu.no FIT 2010, 21~23 Dec 2010

2 FIT 2010, Dec 21 ~ 23 2010 Challenges of IR Huge and dynamic documents corpus Users with dynamic needs Efficiency in terms of computations Usage of limited resources Storage Issues “Personalization” “Relevancy” Stability & scalability And so on…

3 Contributions Wide range exploration of Ideas Evaluation of convergence behaviors and come up with acceleration in query-dependent LAR – “Extrapolation”. “Personalization”. Experimentation FIT 2010, Dec 21 ~ 23 2010

4 Theoretical Backgrounds FIT 2010, Dec 21 ~ 23 2010

5 Link Analysis Ranking A link from page p to page q denotes ‘endorsement’, or ‘vote’ page p considers page q an authority on a subject mine or classify the webgraph of recommendations assign an authority value to every page FIT 2010, Dec 21 ~ 23 2010

6 Webgraph FIT 2010, Dec 21 ~ 23 2010

7 Family of LAR Query-independent: rank the whole Web PageRank (Brin and Page 98) Query-dependent: rank a small subset of pages related to a specific query HITS (Kleinberg 97 - 98) SALSA (Lempel and Moran 2000) FIT 2010, Dec 21 ~ 23 2010

8 PageRank Good page should be pointed by good pages Random walk on the web graph pick a page at random with probability α follow a random outgoing link with probability 1- α jump to a random page – ‘Teleportation’ Ranking is done according to the stationary distribution of Random walk 1.Red Page 2.Purple Page 3.Yellow Page 4.Blue Page 5.Green Page 1.Red Page 2.Purple Page 3.Yellow Page 4.Blue Page 5.Green Page FIT 2010, Dec 21 ~ 23 2010

9 Random Walks Random walks on graphs correspond to Markov Chains The set of states S is the set of nodes of the graph G The transition probability matrix is the probability that we follow an edge from one node to another FIT 2010, Dec 21 ~ 23 2010

10 Example The above system is a familiar in linear algebra: the problem of finding the ’eigenvector’ of matrix A FIT 2010, Dec 21 ~ 23 2010

11 Adjustments Reducibility Adjustment Uniqueness (Using ’Power Method’) FIT 2010, Dec 21 ~ 23 2010

12 HITS (Hypertext Induced Topic Search) Authority is not necessarily transferred directly between authorities Pages have double identity hub identity authority identity Good hubs point to good authorities Good authorities are pointed by good hubs ‘Mutual Reinforcement relationship’ FIT 2010, Dec 21 ~ 23 2010

13 Focused Subgraph FIT 2010, Dec 21 ~ 23 2010

14 Mutual Reinforcement Initialize all weights to 1. Repeat until convergence O operation : hubs collect the weight of the authorities I operation: authorities collect the weight of the hubs Normalize weights under some norm FIT 2010, Dec 21 ~ 23 2010

15 HITS & SVD (Singular Value Decomposition) The iterative hits equations SVD FIT 2010, Dec 21 ~ 23 2010

16 SALSA (Stochastic Approach for Link- Structure Analysis) Blend the ideas in HITS and PageRank. The graph is smaller like HITS, and there are random walks like PageRank. Two Random Walks alternating between hubs and authorities FIT 2010, Dec 21 ~ 23 2010

17 Hub & Authority Graphs FIT 2010, Dec 21 ~ 23 2010

18 Evaluations and Analyses FIT 2010, Dec 21 ~ 23 2010

19 Extrapolation Constructing new data points outside a discrete set of known data points FAST convergence and quick response time is crucial in query- dependent algos Using the properties of Markov chain we can formulate Extrapolation And largely using the idea the dominant eigenvalue of Markov matrix is λ 1 = 1 FIT 2010, Dec 21 ~ 23 2010

20 Extrapolation on PR by Kamvar et.al. FIT 2010, Dec 21 ~ 23 2010

21 Techniques Aitken Δ 2 Fixed point itr Assumption Quadratic Assumption FIT 2010, Dec 21 ~ 23 2010

22 An Example Extrapolation in experiement For query ”computational complexity” Here is the example FIT 2010, Dec 21 ~ 23 2010

23 Insight into Extrapolation A new premise Automated manipulation of Extrapolation parameters Hybrid Extrapolation FIT 2010, Dec 21 ~ 23 2010

24 Implications & Conclusions FIT 2010, Dec 21 ~ 23 2010

25 Implications & Future Work Study of Graph structure of web Bow-tie structure Power law distribution Clustering or classification A lot more can be done in Extrapolation Convergence – it could be any other measure instead of just L1 norm Hybrid Extrapolation Extrapolation for Personalization Personalization is the contemporary and future active topic in Information Retrieval Active utilization of usage data from webserver logs Spread of activation: propagate and generalize user’s preferences Structural retrieval model: with nested structures The task of IR is difficult task but believe me its very interesting and rewarding FIT 2010, Dec 21 ~ 23 2010

26 Recommendations Prior knowledge of subjects like: Linear Algebra Combinatorial Optimizations Compiler Constructions Finite Automaton AI A “good” dataset is quite crucial An Evaluation framework such as the work by Tsaparas etc. The task is difficult but not unfulfilled. And Rewarding! FIT 2010, Dec 21 ~ 23 2010

27 Interesting Resources The Structure of Information Networks – Jon Kleinberg http://www.cs.cornell.edu/Courses/cs685/2002fa/ Centre for Complex Network Research (CCNR) http://www.nd.edu/~networks/index.htm http://www.barabasilab.com/ Personalization http://www.kamvar.org/personalization/ Learning to Rank for Information Retrieval http://research.microsoft.com/users/LR4IR-2008/ Statistical Cybermetrics http://cybermetrics.wlv.ac.uk/database/ Web Research Collections (TREC Web, Terabyte & Blogs Tracks) http://ir.dcs.gla.ac.uk/test_collections/ Amy N. Langville http://math.cofc.edu/langvillea/ Sep Kamvar http://www.kamvar.org/ Panayiotis Tsaparas http://research.microsoft.com/users/panats Tie-Yan Liu http://research.microsoft.com/users/tyliu/index.html David Gleich http://www.stanford.edu/~dgleich/ Searching Stanford http://www.stanford.edu/search/ And References from the Thesis FIT 2010, Dec 21 ~ 23 2010

28 Thank You! Q & A You can send feedbacks, questions on: mnorozi@idi.ntnu.no FIT 2010, Dec 21 ~ 23 2010


Download ppt "Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of."

Similar presentations


Ads by Google