INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Slides:



Advertisements
Similar presentations
Lecture 18: Link analysis
Advertisements

Markov Models.
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Link Analysis: PageRank
Link Analysis David Kauchak cs160 Fall 2009 adapted from:
How PageRank Works Ketan Mayer-Patel University of North Carolina January 31, 2011.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Page Rank.  Intuition: solve the recursive equation: “a page is important if important pages link to it.”  Maximailly: importance = the principal eigenvector.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Link Analysis, PageRank and Search Engines on the Web
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
CS347 Lecture 6 April 25, 2001 ©Prabhakar Raghavan.
Link Analysis HITS Algorithm PageRank Algorithm.
The effect of New Links on Google Pagerank By Hui Xie Apr, 07.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
ITCS 6265 Lecture 17 Link Analysis This lecture Anchor text Link analysis for ranking Pagerank and variants HITS.
Graph-based Algorithms in Large Scale Information Retrieval Fatemeh Kaveh-Yazdy Computer Engineering Department School of Electrical and Computer Engineering.
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.
CS349 – Link Analysis 1. Anchor text 2. Link analysis for ranking 2.1 Pagerank 2.2 Pagerank variants 2.3 HITS.
Ranking Link-based Ranking (2° generation) Reading 21.
COMP4210 Information Retrieval and Search Engines Lecture 9: Link Analysis.
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
CS 440 Database Management Systems Web Data Management 1.
Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.
Jeffrey D. Ullman Stanford University.  Web pages are important if people visit them a lot.  But we can’t watch everybody using the Web.  A good surrogate.
Modified by Dongwon Lee from slides by
Search Engines and Link Analysis on the Web
PageRank Random Surfers on the Web Transition Matrix of the Web Dead Ends and Spider Traps Topic-Specific PageRank Jeffrey D. Ullman Stanford University.
Information Retrieval Christopher Manning and Prabhakar Raghavan
Link-Based Ranking Seminar Social Media Mining University UC3M
PageRank and Markov Chains
DTMC Applications Ranking Web Pages & Slotted ALOHA
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
CS 440 Database Management Systems
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID Lecture # 37 Markov chains

ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the following sources “Introduction to information retrieval” by Prabhakar Raghavan, Christopher D. Manning, and Hinrich Schütze “Managing gigabytes” by Ian H. Witten, ‎Alistair Moffat, ‎Timothy C. Bell “Modern information retrieval” by Baeza-Yates Ricardo, ‎  “Web Information Retrieval” by Stefano Ceri, ‎Alessandro Bozzon, ‎Marco Brambilla

Outline Markov chains Ergodic Markov chains Markov Chain with Teleporting Query Processing Personalized PageRank

Markov chains A Markov chain consists of n states, plus an nn transition probability matrix P. At each step, we are in one of the states. For 1  i,j  n, the matrix entry Pij tells us the probability of j being the next state, given we are currently in state i. 00:03:30  00:04:00 00:05:00  00:05:45 00:06:00  00:06:20 00:07:10  00:07:20 Pii>0 is OK. i j Pij

Markov chains Clearly, for all i, Markov chains are abstractions of random walks. A simple markov chain with three nodes is shown below. The numbers on links show the transition probabilities. In markov chain, the prob. Distribution of the next step only depends on the current state, and not on how markov chain arrived at current state. 00:07:31  00:07:50 (Vi & clearly, for) 00:09:05  00:10:10 (a matrix with) 00:10:35  00:11:00 (a matrix with) 00:11:05  00:12:10 ((a) (B) & markov chains) 00:14:20  00:14:50 (0 0.5 0.5) 00:15:20  00:15:30 (in markov chain)

Ergodic Markov chains For any ergodic Markov chain, there is a unique long-term visit rate for each state. Steady-state probability distribution. Over a long time-period, we visit each state in proportion to this rate. It does not matter where we start. 00:17:25  00:17:31 (title) 00:19:00  00:19:15 ( for any) 00:20:15  00:20:25 (steady) 00:20:40  00:21:00 (over a long & it does not)

00:21:45  00:22:40 00:22:55  00:23:45

Markov Chain with Teleporting: Random Surfer Model 00:27:10  00:27:25 (if a row) 00:28:33  00:28:45 (divide) 00:29:20  00:29:31 (multiply) 00:32:25  00:32:46 (add)

1 2 3 00:33:45  00:35:45 00:35:55  00:36:40 00:38:12  00:38:50 00:40:00  00:40:53 00:41:00  00:41:45 00:42:15  00:42:35 00:43:58  00:44:52

Query Processing Compute Page Rank for all pages Run the query Sort the obtained pages based on the page rank Integrate page rank and relevance to come up with the final result 00:45:50  00:46:30

Specialized page ranks

Personalized PageRank PageRank can be biased (personalized) by changing E to a non- uniform distribution. Restrict “random jumps” to a set of specified relevant pages. For example, let E(p) = 0 except for one’s own home page, for which E(p) =  This results in a bias towards pages that are closer in the web graph to your own homepage. 0.6*sports + 0.4*news 00:50:00  00:51:00

Google PageRank-Biased Spidering Use PageRank to direct (focus) a spider on “important” pages. Compute page-rank using the current set of crawled pages. Order the spider’s search queue based on current estimated PageRank. Topic Directed Page Rank 00:51:35  00:52:00

Resources IIR Chap 21 http://www2004.org/proceedings/docs/1p309.pdf http://www2004.org/proceedings/docs/1p595.pdf http://www2003.org/cdrom/papers/refereed/p270/kamvar-270- xhtml/index.html http://www2003.org/cdrom/papers/refereed/p641/xhtml/p641- mccurley.html The WebGraph framework I: Compression techniques (Boldi et al. 2004)