The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks Power law graphs Small world graphs.
Advertisements

CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Traffic-driven model of the World-Wide-Web Graph A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France.
Analysis and Modeling of Social Networks Foudalis Ilias.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
School of Information University of Michigan SI 614 Random graphs & power law networks preferential attachment Lecture 7 Instructor: Lada Adamic.
Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
On the Spread of Viruses on the Internet Noam Berger Joint work with C. Borgs, J.T. Chayes and A. Saberi.
CS728 Lecture 5 Generative Graph Models and the Web.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Power law random graphs. Loose definition: distribution is power-law if Over some range of values for some exponent Examples  Degree distributions of.
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
The Barabási-Albert [BA] model (1999) ER Model Look at the distribution of degrees ER ModelWS Model actorspower grid www The probability of finding a highly.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
CS Lecture 6 Generative Graph Models Part II.
Advanced Topics in Data Mining Special focus: Social Networks.
Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Interconnect Implications of Growth-Based Structural Models for VLSI Circuits* Chung-Kuan Cheng, Andrew B. Kahng and Bao Liu UC San Diego CSE Dept.
Lecture 3 Power Law Structure Ding-Zhu Du Univ of Texas at Dallas.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
1 A Random-Surfer Web-Graph Model Avrim Blum, Hubert Chan, Mugizi Rwebangira Carnegie Mellon University.
Computer Science 1 Web as a graph Anna Karpovsky.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Lecture 6 - Models of Complex Networks II Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Primal-Dual Meets Local Search: Approximating MST’s with Non-uniform Degree Bounds Author: Jochen Könemann R. Ravi From CMU CS 3150 Presentation by Dan.
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
“Adversarial Deletion in Scale Free Random Graph Process” by A.D. Flaxman et al. Hammad Iqbal CS April 2006.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Slides are modified from Lada Adamic
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Models of Web-Like Graphs: Integrated Approach
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
The simultaneous evolution of author and paper networks
Random Walk for Similarity Testing in Complex Networks
Topics In Social Computing (67810)
Complex Networks: Connectivity and Functionality
Models of networks (synthetic networks or generative models): Random, Small-world, Scale-free, Configuration model and Random geometric model By: Ralucca.
Lecture 13 Network evolution
PageRank algorithm based on Eigenvectors
Peer-to-Peer and Social Networks
Section 8.3: Degree Distribution
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 6 – PA models
Discrete Mathematics and its Applications Lecture 6 – PA models
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006

The paper The influence of search engines on preferential attachment Soumen Chakrabarti, Alan Frieze and Juan Vera

Background The evolution of social networks through time Web graph Models Preferential Attachment Copying Model

Background Evolution of the Web Power-law Preferential attachment( Barabasi and Albert) Copying Model  The author of a newborn page u picks a random reference page v from the web, and with some probability, copies out- links from v to u.  Power-law: power ~ 2 Organic Evolution NO POWERFUL CENTRAL ENTIRY!

The New Problem How the page authors find existing pages and create links to them? Highly popular search engines limit the attention of the page authors to a small set of celebrity pages. Page authors frequently use search engines to locate pages, and include the HOT pages they visit (with probability p)

The New Problem The evolution of the Web graph has been influenced permanently and pervasively by the existence of search engines. A search engine ranks a page highly, Authors find the page more often, some of them link to it, raising its in-degree and Pagerank, which leads to a further improvement or entrenchment of its rank.

The Results in This Paper The celebrity nodes eventually accumulate a constant fraction of all links created with high probability The degree of the other nodes still follow a power- law distribution with a steeper power:

The New Model Modeling how the web graphs evolves if the author use search engine to decide on links that they insert into new pages. How the degree distribution deviates from the traditional model

The New Model Undirected Web Graph Query to the Search Engine is fixed The search Engine returns a fix number of URLs ordered by their degree at the previous time-step Limit the analysis to one topic at a time with out loss of generality?? Comments: A new page may involve multiple topics at the same time and include different number of links for each topic.

The New Model Growth process: Generates a sequence of graphs G t, t =1,2,3,… At time t, the Graph G t = (V t, E t ) has t vertices and mt edges. Parameters: p: a probability N: maximum number of celebrity nodes listed by the search engine

The New Model – Comments Comments: The number of links each new page creates is fixed? Is this real? How does this affect the results? Intuitively, the page author may not have a number in mind of how many links he wants to include, he will only determine whether a link will be included based on the content of that link

Some Notations in the new model

Formal Definition of Process P

The New Model In both cases y i is selected by preferential attachment within the target subset of old nodes, i.e. for x in U

The New Model - Comments The m random edges may have duplicate vertices. For different i, the same vertex may be selected! When t is smaller than m, we have a lot of loops. Should we not start from one vertex? Instead, we can start from m vertices or N vertices and the initial web graph is created at random. With high probability, the oldest links become celebrity page. What happens in the real world? A page becomes hot not only by random, but also due to its contents, can we model this??

The simulation results Very different from the standard preferential attachment! The celebrities is far from the Power-Law straight line in log-log plot. As p increases, the power increases as well! PSimulated powerComputed power P = P = P = The celebrities command a constant fraction of the total degree over all nodes, this fraction grows with p.

The simulation results

Results

Theorem 1

Interpretations Celebrities capture a large? (depends on the constant) fraction of links. Non-celebrities follow a power-law degree distribution with a power steeper than in preferential attachment.

The Proof The celebrity list becomes fixed whp after some time t f Once the celebrity list is fixed, process P looks very similar to an analogous process P * : In each step, P * takes the N oldest vertices as S t, instead of the N largest-degree vertices. This is quite reasonable, basically, the oldest vertices have higher degree, since they have longer time to be included

Coupling G t and G t *

Analysis of the degree distribution of G t *

Basic Proof to Lemma 2 Finding recurrence of Finding a similar recurrence:

Lemma 3

Basic Proof to Lemma 3

The celebrity list get fixed WHP, adding m edges to a single non-celebrity will not make it a celebrity. The total degree of celebrities is concentrated to a constant fraction of all edges ever added to the graph

List-fixing Lemma

Proof to Lemma 4

Lemma 5

Lemma 6 With low degree, the celebrity has low degree

Lemma 7 With low probability, the non-celebrity has high degree

Lemma 8 With low probability, the gap will keep small

Proof of Theorem 1 Lst t f to be the last time that S t changes in the process P

Proof of Theorem 1 cont.

Conclusions Modeling the influence of a search engine within the preferential attachment framework leads to a qualitative change in the familiar power-law degree distribution. Each of a clot of celebrities captures a constant fraction of the total degree of the graph, and the degree of the remaining nodes follow a steeper power law.

Is this Model real? The model differs from the reality. Edges are undirected? Outlinks are not modified after creation Pages do not die No topic-based clustering

Comments This model is used on to one topic There may be interactions between topics The author may include links for different topics into the same page The number of links on a page is fixed, which is not the real case

Thank you!