Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.

Slides:



Advertisements
Similar presentations
Markov Models.
Advertisements

Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Graphs, Node importance, Link Analysis Ranking, Random walks
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Link Analysis: PageRank
How PageRank Works Ketan Mayer-Patel University of North Carolina January 31, 2011.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
Experiments with MATLAB Experiments with MATLAB Google PageRank Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University, Taiwan
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Estimating the Global PageRank of Web Communities Paper by Jason V. Davis & Inderjit S. Dhillon Dept. of Computer Sciences University of Texas at Austin.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Page Rank.  Intuition: solve the recursive equation: “a page is important if important pages link to it.”  Maximailly: importance = the principal eigenvector.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
15-853Page :Algorithms in the Real World Indexing and Searching III (well actually II) – Link Analysis – Near duplicate removal.
Lexicon/dictionary DIC Inverted Index Allows quick lookup of document ids with a particular word Stanford UCLA MIT … PL(Stanford) PL(UCLA)
Link Analysis, PageRank and Search Engines on the Web
1 Evaluating the Web PageRank Hubs and Authorities.
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
PageRank Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata October 27, 2014.
CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Presented By: - Chandrika B N
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Author(s): Rahul Sami and Paul Resnick, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Overview of Web Ranking Algorithms: HITS and PageRank
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.
How works M. Ram Murty, FRSC Queen’s Research Chair Queen’s University or How linear algebra powers the search engine.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Ranking Link-based Ranking (2° generation) Reading 21.
Understanding Google’s PageRank™ 1. Review: The Search Engine 2.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture IX: 2014/05/05.
CompSci 100E 4.1 Google’s PageRank web site xxx web site yyyy web site a b c d e f g web site pdq pdq.. web site yyyy web site a b c d e f g web site xxx.
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
Google's Page Rank. Google Page Ranking “The Anatomy of a Large-Scale Hypertextual Web Search Engine” by Sergey Brin and Lawrence Page
大规模数据处理 / 云计算 05 – Graph Algorithm 闫宏飞 北京大学信息科学技术学院 7/22/2014 Jimmy Lin University of Maryland SEWMGroup This work.
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
PageRank Google : its search listings always seemed deliver the “good stuff” up front. 1 2 Part of the magic behind it is its PageRank Algorithm PageRank™
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.
Jeffrey D. Ullman Stanford University.  Web pages are important if people visit them a lot.  But we can’t watch everybody using the Web.  A good surrogate.
Motivation Modern search engines for the World Wide Web use methods that require solving huge problems. Our aim: to develop multiscale techniques that.
PageRank Random Surfers on the Web Transition Matrix of the Web Dead Ends and Spider Traps Topic-Specific PageRank Jeffrey D. Ullman Stanford University.
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2017 Lecture 7: Information Retrieval II Aidan Hogan
DTMC Applications Ranking Web Pages & Slotted ALOHA
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
Centrality in Social Networks
Lecture 22 SVD, Eigenvector, and Web Search
Sarthak Ahuja ( ) Saumya jain ( )
CS 440 Database Management Systems
Junghoo “John” Cho UCLA
Junghoo “John” Cho UCLA
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
COMP5331 Web databases Prepared by Raymond Wong
Link Analysis Many slides are borrowed from Stanford Data Mining Class taught by Drs Anand Rajaraman, Jeffrey D. Ullman, and Jure Leskovec.
Presentation transcript:

Ljiljana Rajačić

Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić

Page Rank Two challenges of web search 1.Web contains many sources of information Who to trust? 2.What is the “best” answer to a query? No single right answer Not all web pages are equally “important” Ljiljana Rajačić 3 / 25

Page Rank Link analysis approaches  Rank pages (nodes) by analyzing topology of the web graph  Idea: Links as votes -Page is more important if it has more links adjacent to it  Incoming links? Outgoing links?  Links from important pages have higher weight => recursive problem! Ljiljana Rajačić 4 / 25

Page Rank Ljiljana Rajačić 5 / 25

Page Rank Link weight proportional to the importance of its source page If page j with importance r j has n out-links, each link gets r j / n votes Page j ‘s own importance is the sum of the votes on its in-links Ljiljana Rajačić 6 / 25

Page Rank A page is important if it is pointed to by other important pages Rank r j of page j : d i out-degree of node i Ljiljana Rajačić 7 / 25

Page Rank Ljiljana Rajačić 8 / 25

Page Rank Ljiljana Rajačić 9 / 25

Page Rank Since Flow equasion in the matrix form: Ljiljana Rajačić 10 / 25 M ∙ r = r Page i links to 3 pages, including j

Page Rank x is an eigenvector with the corresponding eigenvalue λ if Since  Rank vector r is an eigenvector of web matrix M, with corresponding eigenvalue 1 We can now efficiently find r ! Power iteration method Ljiljana Rajačić 11 / 25 Mx = λ x M ∙ r = r

Page Rank Ljiljana Rajačić 12 / 25 d i – out-degree of node i

Page Rank Page rank simulates a random web surfer:  At any time t, surfer is on some page i  At t + 1, he follows an out-link from i uniformly at random  Ends up on some page j linked from i Rank vector r is a stationary distribution of probabilities that a random walker is on page i at arbitrary time t Ljiljana Rajačić 13 / 25

Page Rank Ljiljana Rajačić 14 / 25 Does this converge? Does it converge to what we want? Are the results reasonable?

Page Rank Ljiljana Rajačić 15 / 25 All out-links are within an isolated group Spider traps absorbe all rank eventually

Page Rank At each step, random surfer has 2 options:  Follow a random link with probability β  Jump to random page with probability 1 – β  β is usually in range 0.8 – 0.9 Ljiljana Rajačić 16 / 25

Page Rank Ljiljana Rajačić 17 / 25 A dead end is a page with no out-links They cause rank “leaking out” All 0 in b’s column

Page Rank Always jump to random page from a dead end Ljiljana Rajačić 18 / 25

Page Rank PageRank equation [Brin – Page, 1998]: Google matrix A: Ljiljana Rajačić 19 / 25 e – vector of all 1s

Page Rank Key step is matrix – vector multiplication A is dense – no 0 elements M was sparse  only ~ 10 – 100 non-zero elements per column We want to work with M It’s possible! Ljiljana Rajačić 20 / 25

Page Rank Ljiljana Rajačić 21 / 25

Page Rank Ljiljana Rajačić 22 / 25

Page Rank CPU  Graph representation: Adjecency list  O(m) per iteration, where m is the number of edges  m = O(n) => O(n) per iteration CUDA  Graph representation: Adjecency matrix  O(n 2 ) per iteration Ljiljana Rajačić 23 / 25

Page Rank Ljiljana Rajačić 24 / 25 Number of pagesCPUCUDA ms340 ms ms380 ms ms550 ms >850000~6.5 sMemory overflow

Page Rank Thanks for the attention! Ljiljana Rajačić 25 / 25