Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexicon/dictionary DIC Inverted Index Allows quick lookup of document ids with a particular word 3810131620 Stanford UCLA MIT … 12391618 PL(Stanford) PL(UCLA)

Similar presentations


Presentation on theme: "Lexicon/dictionary DIC Inverted Index Allows quick lookup of document ids with a particular word 3810131620 Stanford UCLA MIT … 12391618 PL(Stanford) PL(UCLA)"— Presentation transcript:

1 lexicon/dictionary DIC Inverted Index Allows quick lookup of document ids with a particular word 3810131620 Stanford UCLA MIT … 12391618 PL(Stanford) PL(UCLA) Posting list 4581013192022 PL(MIT)

2 Junghoo "John" Cho (UCLA Computer Science)2 PageRank A page is important if it is pointed by many important pages PR( p ) = PR( p 1 )/ c 1 + … + PR( p k )/ c k p i : page pointing to p, c i : number of links in p i PageRank of p is the sum of PageRanks of its parents One equation for every page – N equations, N unknown variables

3 Junghoo "John" Cho (UCLA Computer Science)3 Example: Web of 1842 Ne Am MS PR(n) = PR(n)/2 + PR(a)/2 PR(m) = PR(a)/2 PR(a) = PR(n)/2+PR(m) Netscape, Microsoft and Amazon

4 Junghoo "John" Cho (UCLA Computer Science)4 PageRank: Matrix Notation Web graph matrix M = { m ij } – Each page i corresponds to row i and column i of the matrix M – m ij = 1/ c if page i is one of the c children of page j m ij = 0 otherwise PageRank vector PageRank equation

5 Junghoo "John" Cho (UCLA Computer Science)5 PageRank: Iterative Computation Initially every page has a unit of importance At each round, each page shares its importance among its children and receives new importance from its parents Eventually the importance of each page reaches a limit – Stochastic matrix

6 Junghoo "John" Cho (UCLA Computer Science)6 Example: Web of 1842 Ne Am MS

7 Junghoo "John" Cho (UCLA Computer Science)7 PageRank: Random Surfer Model The probability of a Web surfer to reach a page after many clicks, following random links Random Click

8 Junghoo "John" Cho (UCLA Computer Science)8 Problems on the Real Web Dead end – A page with no links to send importance – All importance “leak out of” the Web Crawler trap – A group of one or more pages that have no links out of the group – Accumulate all the importance of the Web

9 Junghoo "John" Cho (UCLA Computer Science)9 Example: Dead End No link from Microsoft Ne Am MS Dead end

10 Junghoo "John" Cho (UCLA Computer Science)10 Example: Dead End Ne Am MS

11 Junghoo "John" Cho (UCLA Computer Science)11 Solution to Dead End Assume a surfer to jumps to a random page at a dead end Ne Am MS

12 Junghoo "John" Cho (UCLA Computer Science)12 Example: Crawler Trap Only self-link at Microsoft Ne Am MS Crawler trap

13 Junghoo "John" Cho (UCLA Computer Science)13 Example: Crawler Trap Ne Am MS

14 Junghoo "John" Cho (UCLA Computer Science)14 Crawler Trap: Damping Factor “Tax” each page some fraction of its importance and distribute it equally – Probability to jump to a random page Assuming 20% tax

15 Algorithm KMP while (m + i) < |D| do: if W[i] = D[m + i], let i = i + 1 if i = |W|, return m otherwise, let m = m + i - T[i], if i > 0, let i = T[i] return no-match


Download ppt "Lexicon/dictionary DIC Inverted Index Allows quick lookup of document ids with a particular word 3810131620 Stanford UCLA MIT … 12391618 PL(Stanford) PL(UCLA)"

Similar presentations


Ads by Google