Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.

Slides:



Advertisements
Similar presentations
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Advertisements

1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Information Networks Link Analysis Ranking Lecture 8.
Graphs, Node importance, Link Analysis Ranking, Random walks
Link Analysis: PageRank
CS345 Data Mining Page Rank Variants. Review Page Rank  Web graph encoded by matrix M N £ N matrix (N = number of web pages) M ij = 1/|O(j)| iff there.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
15-853Page :Algorithms in the Real World Indexing and Searching III (well actually II) – Link Analysis – Near duplicate removal.
Link Analysis, PageRank and Search Engines on the Web
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
Link Structure and Web Mining Shuying Wang
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Link Analysis HITS Algorithm PageRank Algorithm.
Overview of Web Data Mining and Applications Part I
Information Retrieval Link-based Ranking. Ranking is crucial… “.. From our experimental data, we could observe that the top 20% of the pages with the.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Presented By: - Chandrika B N
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint Work With Sep Kamvar, Taher Haveliwala.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
Using Hyperlink structure information for web search.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Overview of Web Ranking Algorithms: HITS and PageRank
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Ranking Link-based Ranking (2° generation) Reading 21.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
PageRank Algorithm -- Bringing Order to the Web (Hu Bin)
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Dec.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
The PageRank Citation Ranking: Bringing Order to the Web
Quality of a search engine
15-499:Algorithms and Applications
HITS Hypertext-Induced Topic Selection
Methods and Apparatus for Ranking Web Page Search Results
Search Engines and Link Analysis on the Web
Link Analysis 2 Page Rank Variants
Link-Based Ranking Seminar Social Media Mining University UC3M
Text & Web Mining 9/22/2018.
Lecture 22 SVD, Eigenvector, and Web Search
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Graph and Link Mining.
Junghoo “John” Cho UCLA
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
Presentation transcript:

Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of Science and Technology Trondheim, Norway FIT 2010, 21~23 Dec 2010

FIT 2010, Dec 21 ~ Challenges of IR Huge and dynamic documents corpus Users with dynamic needs Efficiency in terms of computations Usage of limited resources Storage Issues “Personalization” “Relevancy” Stability & scalability And so on…

Contributions Wide range exploration of Ideas Evaluation of convergence behaviors and come up with acceleration in query-dependent LAR – “Extrapolation”. “Personalization”. Experimentation FIT 2010, Dec 21 ~

Theoretical Backgrounds FIT 2010, Dec 21 ~

Link Analysis Ranking A link from page p to page q denotes ‘endorsement’, or ‘vote’ page p considers page q an authority on a subject mine or classify the webgraph of recommendations assign an authority value to every page FIT 2010, Dec 21 ~

Webgraph FIT 2010, Dec 21 ~

Family of LAR Query-independent: rank the whole Web PageRank (Brin and Page 98) Query-dependent: rank a small subset of pages related to a specific query HITS (Kleinberg ) SALSA (Lempel and Moran 2000) FIT 2010, Dec 21 ~

PageRank Good page should be pointed by good pages Random walk on the web graph pick a page at random with probability α follow a random outgoing link with probability 1- α jump to a random page – ‘Teleportation’ Ranking is done according to the stationary distribution of Random walk 1.Red Page 2.Purple Page 3.Yellow Page 4.Blue Page 5.Green Page 1.Red Page 2.Purple Page 3.Yellow Page 4.Blue Page 5.Green Page FIT 2010, Dec 21 ~

Random Walks Random walks on graphs correspond to Markov Chains The set of states S is the set of nodes of the graph G The transition probability matrix is the probability that we follow an edge from one node to another FIT 2010, Dec 21 ~

Example The above system is a familiar in linear algebra: the problem of finding the ’eigenvector’ of matrix A FIT 2010, Dec 21 ~

Adjustments Reducibility Adjustment Uniqueness (Using ’Power Method’) FIT 2010, Dec 21 ~

HITS (Hypertext Induced Topic Search) Authority is not necessarily transferred directly between authorities Pages have double identity hub identity authority identity Good hubs point to good authorities Good authorities are pointed by good hubs ‘Mutual Reinforcement relationship’ FIT 2010, Dec 21 ~

Focused Subgraph FIT 2010, Dec 21 ~

Mutual Reinforcement Initialize all weights to 1. Repeat until convergence O operation : hubs collect the weight of the authorities I operation: authorities collect the weight of the hubs Normalize weights under some norm FIT 2010, Dec 21 ~

HITS & SVD (Singular Value Decomposition) The iterative hits equations SVD FIT 2010, Dec 21 ~

SALSA (Stochastic Approach for Link- Structure Analysis) Blend the ideas in HITS and PageRank. The graph is smaller like HITS, and there are random walks like PageRank. Two Random Walks alternating between hubs and authorities FIT 2010, Dec 21 ~

Hub & Authority Graphs FIT 2010, Dec 21 ~

Evaluations and Analyses FIT 2010, Dec 21 ~

Extrapolation Constructing new data points outside a discrete set of known data points FAST convergence and quick response time is crucial in query- dependent algos Using the properties of Markov chain we can formulate Extrapolation And largely using the idea the dominant eigenvalue of Markov matrix is λ 1 = 1 FIT 2010, Dec 21 ~

Extrapolation on PR by Kamvar et.al. FIT 2010, Dec 21 ~

Techniques Aitken Δ 2 Fixed point itr Assumption Quadratic Assumption FIT 2010, Dec 21 ~

An Example Extrapolation in experiement For query ”computational complexity” Here is the example FIT 2010, Dec 21 ~

Insight into Extrapolation A new premise Automated manipulation of Extrapolation parameters Hybrid Extrapolation FIT 2010, Dec 21 ~

Implications & Conclusions FIT 2010, Dec 21 ~

Implications & Future Work Study of Graph structure of web Bow-tie structure Power law distribution Clustering or classification A lot more can be done in Extrapolation Convergence – it could be any other measure instead of just L1 norm Hybrid Extrapolation Extrapolation for Personalization Personalization is the contemporary and future active topic in Information Retrieval Active utilization of usage data from webserver logs Spread of activation: propagate and generalize user’s preferences Structural retrieval model: with nested structures The task of IR is difficult task but believe me its very interesting and rewarding FIT 2010, Dec 21 ~

Recommendations Prior knowledge of subjects like: Linear Algebra Combinatorial Optimizations Compiler Constructions Finite Automaton AI A “good” dataset is quite crucial An Evaluation framework such as the work by Tsaparas etc. The task is difficult but not unfulfilled. And Rewarding! FIT 2010, Dec 21 ~

Interesting Resources The Structure of Information Networks – Jon Kleinberg Centre for Complex Network Research (CCNR) Personalization Learning to Rank for Information Retrieval Statistical Cybermetrics Web Research Collections (TREC Web, Terabyte & Blogs Tracks) Amy N. Langville Sep Kamvar Panayiotis Tsaparas Tie-Yan Liu David Gleich Searching Stanford And References from the Thesis FIT 2010, Dec 21 ~

Thank You! Q & A You can send feedbacks, questions on: FIT 2010, Dec 21 ~