Link Building and Communities in Large Networks Link Building Link Building is W[1]-Hard and allows no FPTAS The dashed links show the set of two new links.

Slides:



Advertisements
Similar presentations
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Advertisements

Markov Models.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Five Problems CSE 421 Richard Anderson Winter 2009, Lecture 3.
Generative Models for the Web Graph José Rolim. Aim Reproduce emergent properties: –Distribution site size –Connectivity of the Web –Power law distriubutions.
Game-Theoretic Approaches to Multi-Agent Systems Bernhard Nebel.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
Overlapping Coalition Formation: Charting the Tractability Frontier Y. Zick, G. Chalkiadakis and E. Elkind (submitted to AAMAS 2012)
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
Assignment: Improving search rank – search engine optimization Read the following post carefully.
Estimating the Global PageRank of Web Communities Paper by Jason V. Davis & Inderjit S. Dhillon Dept. of Computer Sciences University of Texas at Austin.
CS246: Page Selection. Junghoo "John" Cho (UCLA Computer Science) 2 Page Selection Infinite # of pages on the Web – E.g., infinite pages from a calendar.
Link Building Martin Olsen Department of Computer Science Aarhus University 1.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Link Analysis, PageRank and Search Engines on the Web
CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
Network Science and the Web: A Case Study Networked Life CIS 112 Spring 2009 Prof. Michael Kearns.
Clustering and greedy algorithms Prof. Noah Snavely CS1114
The Shortest Path Problem
Chapter 8 Web Structure Mining Part-1 1. Web Structure Mining Deals mainly with discovering the model underlying the link structure of the web Deals with.
CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
Google’s PageRank: The Math Behind the Search Engine Author:Rebecca S. Wills, 2006 Instructor: Dr. Yuan Presenter: Wayne.
On the approximability of the link building problem Author - MartinOlsena,AnastasiosViglasb, ∗ Speaker - Wayne Yang.
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Graph-based Algorithms in Large Scale Information Retrieval Fatemeh Kaveh-Yazdy Computer Engineering Department School of Electrical and Computer Engineering.
CS440 Computer Science Seminar Introduction to Evolutionary Computing.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Mathematics at Google. Brief history Started in 1996 as the research project ‘Backrub’ by the then PhD student Larry Page Sergey Brin joined in Became.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
Image segmentation Prof. Noah Snavely CS1114
MotivationFundamental ProblemsProblems on Graphs Parallel processors are becoming common place. Each core of a multi-core processor consists of a CPU and.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Topics Paths and Circuits (11.2) A B C D E F G.
9 Algorithms: PageRank. Ranking After matching, have to rank:
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Vasilis Syrgkanis Cornell University
Link Building and Communities in Large Networks Martin Olsen University of Aarhus Link Building Link Building is NP-Hard The dashed links show the set.
MotivationLocating the k largest subsequences: Main ideasResults Problem definitions Problem instance ( k=5 ) Bibliography
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
“Important” Vertices and the PageRank Algorithm Networked Life NETS 112 Fall 2014 Prof. Michael Kearns.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Simplex Method Review. Canonical Form A is m x n Theorem 7.5: If an LP has an optimal solution, then at least one such solution exists at a basic feasible.
Google’s means to provide better search results Qi-Yuan Gou.
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
Mathematics of the Web Prof. Sara Billey University of Washington.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
What types of problems we study, Part 1: Statistical problemsHighlights of the theoretical results What types of problems we study, Part 2: ClusteringFuture.
Maximum Expected Utility
Search Engines and Link Analysis on the Web
PageRank and Markov Chains
DTMC Applications Ranking Web Pages & Slotted ALOHA
Piyush Kumar (Lecture 2: PageRank)
Why Social Graphs Are Different Communities Finding Triangles
9 Algorithms: PageRank.
CS 440 Database Management Systems
9 Algorithms: PageRank.
Description of PageRank
CSE 421, University of Washington, Autumn 2006
CSE 421 Richard Anderson Autumn 2019, Lecture 3
Presentation transcript:

Link Building and Communities in Large Networks Link Building Link Building is W[1]-Hard and allows no FPTAS The dashed links show the set of two new links maximizing the PageRank value of page 1. Result: Computing a set of k new links pointing to a given page maximizing the PageRank value for the page is W[1]-hard and no FPTAS exists for this job if NP≠P, [1]. Identification of Community Members In [4] a community is defined as a set C for which every member of C has relatively more links to nodes in C compared to any non-member: Results: Such communities are also NP-hard to compute [4]. A simple and efficient greedy approach was used to find a community of 556 Danish computer science sites. The sites were ranked with a local version of the PageRank algorithm: 1) (CS U Aarhus) 2) (CS U Copenhagen) 3) (ITU Copenhagen) 4) (CS U Aalborg) 5) (CS PhD School) 6) (Informatics/Mathematical modeling DTU) 17) (CS/Mathematics U Southern Denmark) The sites marked with bold font were given to the greedy approach as ”known” members of the community. In the real world people try to identify optimal places for signs. The corresponding problem in cyberspace is to identify optimal places (web pages) for links to web pages. The PageRank Algorithm The PageRank algorithm used by Google assumes that a web surfer visiting a page p will choose the next page to visit following these rules: With probability 0.85 the surfer follows a link from p chosen uniformly at random. With probability 0.15 the surfer chooses to visit another page chosen uniformly at random. The PageRank value of a page is the probability that a web surfer will visit the page after s steps for large s. A Simple Link Building Problem Exercise: Find the page u in {3, 4, 5, 6, 7} for which we would achieve the maximum increase in the PageRank value of page 1 by adding the link ( u, 1). The percentages are the current PageRank values of the nodes. The solution appears somewhere on the poster. Result: This problem can be solved in time corresponding to a constant and small number of PageRank computations [2]. Communities You might want to restrict your search for optimal new links to the community of a page (or a set of pages). But how can you define a community of nodes in a graph? 62 dolphins in Doubtful Sound (New Zealand) were observed. The graph shows a real community structure of the dolphins. A dolphin has at least as many friends in its own community compared to any other community. Result: Such structures are NP-hard to compute [3]. References [1]Maximizing PageRank with new Backlinks, Olsen, submitted. [2]The Computational Complexity of Link Building, Olsen, COCOON [3]Nash Stability in Additively Separable Hedonic Games and Community Structures, Olsen, Theory of Computing Systems, [4] Communities in Large Networks: Identification and Ranking, Olsen, Fourth Workshop on Algorithms and Models for the Web-Graph, Solution to the problem : u = % 6.93% 7.77% 5.95% 9.08% MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation Martin Olsen Aarhus University