Presentation is loading. Please wait.

Presentation is loading. Please wait.

The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan 1000669605 Instructor: Dr. Gautam Das.

Similar presentations


Presentation on theme: "The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan 1000669605 Instructor: Dr. Gautam Das."— Presentation transcript:

1 The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan 1000669605 Instructor: Dr. Gautam Das

2 Technology Overview

3 Motivation WWW is huge and heterogeneous WebPages proliferate free of quality control Commercial interest to manipulate ranking The ‘quality’ of a webpage is subjective to the users. Problem: Necessity to approximate the overall relative ‘importance’ of web pages. Solution: Take advantage of the Link Structure of the web

4 Link structure of the Web Forward Links(Outedges): The outgoing links from a webpage. C is A & B’s forward link. Back Links(Inedges): Incoming links to a webpage. A & B are back links for C.

5 Related Work Academic paper citations Link based analysis Clustering methods that take link structure into account Modeling web as Hubs and Authorities

6 Ranking Intuition The quantity of the backlinks to a webpage makes it important. The quality of the back linked pages increases the ranking. “A page has high rank if the sum of the ranks of it’s backlinks is high.” How about having a backlink from www.yahoo.com?

7 Naïve PageRank Calculation u & v --> Webpages B u --> backlinks of u N v --> Forward Links from v to u. R --> Ranks of the webpages c Used for normalization

8 Matrix Representation ‘A’ is a square adjacency Matrix with Rows and columns corresponding to web pages (u & v) A u,v = 1/N u if there is an edge from u to v A u,v = 0 if there is no edge.

9 Matrices Revisited Eigen Values and Eigen Vectors: Matrix A (nXn) is an Eigen value of A if there exists a non-zero vector v such that Av= v vector v is called an Eigen vector of A corresponding to. We can rewrite Av= v as (A− I)v=0, where I is identity matrix (nXn).

10 Matrices Revisited(Contd…) How to solve for Eigen value and Eigen Vector?

11

12 Sample Calculation 1 3 24

13 Matrix Representation (contd…) A --> square matrix of web pages R --> vector over webpages To find: Eigen Vector corresponding to dominant (maximum) Eigen value. – Could be computed by repeatedly iterating till it converges to the dominant Eigen value-Eigen Vector Matrix Notation gives R = c A R c : eigenvalue R : eigenvector of A R = Normalized R =

14 Problem with Naïve PageRank Rank Sink: Two web pages that point to each other but to no other page. Third page which points to one of them. loop will accumulate rank but never distribute it (since there are no out edges).

15 Solution – Extended version of PageRank Introducing Rank Source: E(u): a vector over the web pages that corresponds to a source of rank.

16 Random Surfer Model Random Surfer – Clicks on successive links at random. The factor ‘E’ can be viewed as modeling this behavior. “Surfer” periodically gets bored, jumped to a random page based on E.

17 PageRank Computation - initialize vector over web pages Loop: - new ranks sum of normalized backlink ranks - compute normalizing factor - add escape term - control parameter While - stop when converged

18 Another Problem? Dangling links: – Links to a page with no link to any other pages – Not clear where their weights should be distributed Solution : Remove them from the system until after calculating all other PageRanks!

19 Implementation Web crawler keeps a database of URLs so that it can discover all URLs on the web To implement PageRank, the web crawler builds an index of the URLs as it crawls Problems??? Infinitely large sites Incorrect/Broken HTML Sites are down Web is always changing

20 PageRank Implementation Convert each URL into unique integer ID Link structure sorted by the IDs Remove dangling links Make a initial assignment of ranks and iterate until convergence Add the dangling links back Iterate the process again to assign weights to all dangling links Link database A, is normally kept in RAM

21 Convergence Properties Interpret web as a expander like graph. – if every subsets of nodes S has a neighborhood that is larger than some factor α times |S| Verification - if the largest eigenvalue is sufficiently larger than the second-largest eigenvalue

22 Applications of Page Rank Search, Browsing and Traffic estimation. Help user decide if a site is trustworthy. Estimate web traffic. Spam detection and prevention. Predict citation counts

23 http://www.techpavan.com/2008/11/20/back end-google-search/ http://www.techpavan.com/2008/11/20/back end-google-search/ http://www.math.hmc.edu/calculus/tutorial s/eigenstuff/ http://williamcotton.com/pagerank- explained-with-javascript http://williamcotton.com/pagerank- explained-with-javascript


Download ppt "The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan 1000669605 Instructor: Dr. Gautam Das."

Similar presentations


Ads by Google