INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Lecture # 37 Markov chains

ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the following sources “Introduction to information retrieval” by Prabhakar Raghavan, Christopher D. Manning, and Hinrich Schütze “Managing gigabytes” by Ian H. Witten, ‎Alistair Moffat, ‎Timothy C. Bell “Modern information retrieval” by Baeza-Yates Ricardo, ‎ “Web Information Retrieval” by Stefano Ceri, ‎Alessandro Bozzon, ‎Marco Brambilla

Outline Markov chains Ergodic Markov chains
Markov Chain with Teleporting Query Processing Personalized PageRank

Markov chains A Markov chain consists of n states, plus an nn transition probability matrix P. At each step, we are in one of the states. For 1  i,j  n, the matrix entry Pij tells us the probability of j being the next state, given we are currently in state i. 00:03:30  00:04:00 00:05:00  00:05:45 00:06:00  00:06:20 00:07:10  00:07:20 Pii>0 is OK. i j Pij

Markov chains Clearly, for all i,
Markov chains are abstractions of random walks. A simple markov chain with three nodes is shown below. The numbers on links show the transition probabilities. In markov chain, the prob. Distribution of the next step only depends on the current state, and not on how markov chain arrived at current state. 00:07:31  00:07:50 (Vi & clearly, for) 00:09:05  00:10:10 (a matrix with) 00:10:35  00:11:00 (a matrix with) 00:11:05  00:12:10 ((a) (B) & markov chains) 00:14:20  00:14:50 ( ) 00:15:20  00:15:30 (in markov chain)

Ergodic Markov chains For any ergodic Markov chain, there is a unique long-term visit rate for each state. Steady-state probability distribution. Over a long time-period, we visit each state in proportion to this rate. It does not matter where we start. 00:17:25  00:17:31 (title) 00:19:00  00:19:15 ( for any) 00:20:15  00:20:25 (steady) 00:20:40  00:21:00 (over a long & it does not)

00:21:45  00:22:40 00:22:55  00:23:45

Markov Chain with Teleporting: Random Surfer Model
00:27:10  00:27:25 (if a row) 00:28:33  00:28:45 (divide) 00:29:20  00:29:31 (multiply) 00:32:25  00:32:46 (add)

1 2 3 00:33:45  00:35:45 00:35:55  00:36:40 00:38:12  00:38:50 00:40:00  00:40:53 00:41:00  00:41:45 00:42:15  00:42:35 00:43:58  00:44:52

Query Processing Compute Page Rank for all pages Run the query
Sort the obtained pages based on the page rank Integrate page rank and relevance to come up with the final result 00:45:50  00:46:30

Specialized page ranks

Personalized PageRank
PageRank can be biased (personalized) by changing E to a non- uniform distribution. Restrict “random jumps” to a set of specified relevant pages. For example, let E(p) = 0 except for one’s own home page, for which E(p) =  This results in a bias towards pages that are closer in the web graph to your own homepage. 0.6*sports + 0.4*news 00:50:00  00:51:00

Google PageRank-Biased Spidering
Use PageRank to direct (focus) a spider on “important” pages. Compute page-rank using the current set of crawled pages. Order the spider’s search queue based on current estimated PageRank. Topic Directed Page Rank 00:51:35  00:52:00

Resources IIR Chap 21 http://www2004.org/proceedings/docs/1p309.pdf
xhtml/index.html mccurley.html The WebGraph framework I: Compression techniques (Boldi et al )

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Similar presentations

Presentation on theme: "INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Similar presentations

Presentation on theme: "INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID"— Presentation transcript:

Similar presentations

About project

Feedback