1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks Power law graphs Small world graphs.
Advertisements

Complex Networks Advanced Computer Networks: Part1.
Small-world networks.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Analysis and Modeling of Social Networks Foudalis Ilias.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
Information Networks Small World Networks Lecture 5.
Advanced Topics in Data Mining Special focus: Social Networks.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
CS728 Lecture 5 Generative Graph Models and the Web.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Network Models Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Models Why should I use network models? In may 2011, Facebook.
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
CS Lecture 6 Generative Graph Models Part II.
Advanced Topics in Data Mining Special focus: Social Networks.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Summary from Previous Lecture Real networks: –AS-level N= 12709, M=27384 (Jan 02 data) route-views.oregon-ix.net, hhtp://abroude.ripe.net/ris/rawdata –
Computer Science 1 Web as a graph Anna Karpovsky.
Information Networks Power Laws and Network Models Lecture 3.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
Section 8 – Ec1818 Jeremy Barofsky March 31 st and April 1 st, 2010.
Small-world networks. What is it? Everyone talks about the small world phenomenon, but truly what is it? There are three landmark papers: Stanley Milgram.
“Adversarial Deletion in Scale Free Random Graph Process” by A.D. Flaxman et al. Hammad Iqbal CS April 2006.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
3. SMALL WORLDS The Watts-Strogatz model. Watts-Strogatz, Nature 1998 Small world: the average shortest path length in a real network is small Six degrees.
Class 9: Barabasi-Albert Model-Part I
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
How Do “Real” Networks Look?
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
Class 4: It’s a Small World After All Network Science: Small World February 2012 Dr. Baruch Barzel.
Netlogo demo. Complexity and Networks Melanie Mitchell Portland State University and Santa Fe Institute.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Dec.
Network (graph) Models
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Lecture 1: Complex Networks
Topics In Social Computing (67810)
How Do “Real” Networks Look?
Generative Model To Construct Blog and Post Networks In Blogosphere
The Watts-Strogatz model
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Social Network Analysis
Models of Network Formation
Models of Network Formation
Models of Network Formation
How Do “Real” Networks Look?
Models of Network Formation
Peer-to-Peer and Social Networks
Lecture 9: Network models CS 765: Complex Networks
Lecture 21 Network evolution
Network Science: A Short Introduction i3 Workshop
Network Models Michael Goodrich Some slides adapted from:
Advanced Topics in Data Mining Special focus: Social Networks
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14,

2 Web Structure I : Power Laws and Small World Phenomenon

3 Outline Power laws The preferential attachment model Small-world networks The Watts-Strogatz model

4 Observed Phenomena Few multi-billionaires, but many with modest income [Pareto, 1896] Few frequent words, but many infrequent words [Zipf, 1932] Few “mega-cities” but many small towns [Zipf, 1949] Few web pages with high degree, but many with low degree [Kumar et al, 99] [Barabási & Albert, 99] All the above obey power laws.

5 Power Law (Pareto) Distribution  > 0: shape parameter (“slope”) k > 0: location parameter Ex: (k = $1000,  = 2)  1/100 earn ≥ $10,000  1/10,000 earn ≥ $100,000  1/1,000,000 earn ≥ $1,000,000

6 Power Law Properties PDF: Infinite mean for  ≤ 1 Infinite variance for  ≤ 2 When X is discrete,

7 Power Law Graphs Linear Scale PlotLog-Log Plot Slope = - 

8 Scale-Free Distributions Power laws are invariant to scale  Ex: (k = arbitrary,  = 2) 1/100 earn ≥ 10k 1/10,000 earn ≥ 100k 1/1,000,000 earn ≥ 1000k

9 Heavy Tailed Distributions In many “classical” distributions  Ex: normal, exponential In power law distributions “heavy tail” “light tail”

10 Zipf’s Law Size of r-th largest city is Equivalent to a power law:  X = size of a city   Change variables: 

11 Power Laws and the Internet Web Graph  In- and out-degrees (in slope: ~2.1, out slope: ~2.7) [Kumar et al. 99, Barabási & Albert 99, Broder et al 00]  Sizes of connected components [Broder et al 00]  Website sizes [Huberman & Adamic 99] Internet graph  Degrees [Faloutsos 3 99]  Eigenvalues [Mihail & Papadimitriou 02] Traffic  Number of visits to websites

12 Power Laws and Graphs If X is a random web page, then What random graph model explains this phenomenon?

13 Erdős-Rényi Random Graphs G n,p  n: size of the graph (fixed)  p: edge existence probability (fixed): Every pair u,v is connected by an edge with probability p. Theorem [Erdős & Rényi, 60] For any node x in G n,p,

14 Preferential Attachment [ Barabási & Albert 99] A novel random graph model  Initialization: graph starts with a single node with two self loops.  Growth: At every step a new node v is added to the graph. v has a self loop and connects to one neighbor.  Preferential attachment: v connects to u with probability The rich get richer / The winner takes it all

15 : # of nodes whose indegree = k after t steps k > 1:    Expected growth: Why Does it Work? k = 1:

16 Why Does it Work? (2) Fact: After sufficiently many steps, reaches a “steady state”. c k = value of at the steady state. Since at steady state, Hence, Therefore:

17 Why Does it Work? (3) Then: And: Therefore:

18 Six Degrees of Separation [Stanley Milgram, 67] “Random starters” at Nebraska, Kansas, etc. Destinations: in Boston Intermediaries send postcards to Milgram Findings: average of 6 postcards “Conclusion”: every two people in the US are connected by a path of length ~ 6

19 Small-World Networks Average diameter: length of shortest path from u to v, averaged over all pairs u,v Clustering coefficient: fraction of neighbors of v that are neighbors of each other, averaged over all v Small-world network: a sparse graph with average diameter O(log n) and a constant clustering coefficient

20 The Web as a Small World Network Low diameter  Study of a synthetic web graph model [Albert, Jeong, Barabási 99] Average diameter of the Web is ~19 Grows logarithmically with size of the Web.  Study of a large crawl [Broder et al 00] Average diameter of the SCC is ~ 16 Maximum diameter of the SCC is ≥ 28  Diameter of host graph [Adamic 99] Average diameter of SCC: ~4 High clustering coefficient  Clustering coefficient of host graph [Adamic 99] Clustering coefficient: ~0.08 (compared to in a comparable random graph)

21 Model for Small-World Networks [Watts & Strogatz 98] One extreme: random networks  Low diameter  Low clustering coefficient Other extreme: “regular” networks (e.g., a lattice)  High clustering coefficient  High diameter Small-world: interpolation between the two  Low diameter  High clustering coefficient  Regularity: social networking  Randomness: individual interests

22 Random Network The model: n vertices Every pair u,v is connected by an edge with probability p = d/n Properties: Expected number of edges: ~dn Graph is connected w.h.p Diameter: O(log n) w.h.p. Clustering coefficient: ~ p = d/n = o(1)

23 Ring Lattice The model: n vertices on a circle Every vertex has d neighbors: the d/2 vertices to its right and the d/2 vertices to its left Properties: Number of edges: dn/2 Graph is connected Diameter: O(n/d) Clustering coefficient:

24 Random Rewiring Start from a ring lattice for i = 1 to d/2 do  for v = 1 to n do Pick i-th clockwise nearest neighbor of v With probability p, replace this neighbor by a random vertex

25 Analysis If p = 0, ring lattice  High clustering coefficient  High diameter If p = 1, random network  Logarithmic diameter  Low clustering coefficient However,  Diameter goes down rapidly as p grows  Clustering coefficient goes down slowly as p grows Therefore, for small p, we get a small-world network.  Logarithmic diameter  High clustering coefficient

26 End of Lecture 7