Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.

Similar presentations


Presentation on theme: "CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian."— Presentation transcript:

1 CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian

2 Topics covered in the course Structure and modeling of social networks Power law graphs; Small world phenomenon; High clustering coefficient; Probabilistic and game theoretic models Algorithms for link analysis Crawling the web; HITS; Page Rank; Webspam; Rank aggregation; Spectral clustering Economic aspects of the Internet Peering relations; Alternative mechanisms for routing; P2P networks Topics motivated by e-commerce Reputation mechanisms; Recommendation systems; Ad auctions

3 Logistics Course web page: http://www.cs.washington.edu/education/courses/cse522/05au/ Course work:  reading papers (1/week on avg)  possibly a few problem sets How to contact us: {nickle,mahdian}@microsoft.com

4 Social Networks A social network is a graph that represents relationships between independent entities.  Graph of friendships (or in the virtual world, networks like orkut)  Web of sexual contact  Graph of scientific collaborations  Cross-posts in newsgroups  Web graph (links between webpages)  Internet: Inter/Intra-domain graph

5 Scientific Collaboration Network 400,000 nodes, authors in Mathematical Reviews database An edge between two authors if they have a joint paper Just 676,000 edges Picture from orgnet.com

6 Scientific Collaboration Network Average degree 3.36 A few high-degrees:  Paul Erdös, 509  Frank Harary, 268  Yuri Alekseevich Mitropolskii, 244 Many low-degrees: (100,000 of degree 1) Picture from orgnet.com

7 Scientific Collaboration Network Short paths  Max Erdös # is 13  Any two authors connected by path of length at most 23  Average distance between two authors is 7.64  e.g.: John Nash → Shapley → Fulkerson → Hoffman → Paul Erdös Many triangles … Picture from orgnet.com

8 9/11 Terrorist Network Picture from orgnet.com

9 Newsgroup Cross-Post Graph Nodes are newsgroups, essentially archived email lists Edges are cross-posts, i.e. there is an edge between two newsgroups to which an identical email is posted alt.microsoft.sucksalt.linux.sucks

10 Internet Graphs Inter-domain graphs  Nodes are autonomous systems or domains  Edges are inter-domain connections SPRINTAOL

11 Inter-domain graph Picture from caida.org

12 Internet Graphs Intra-domain graphs  Nodes are routers  Edges are links between routers 199.45.130.13199.45.143.14

13 Intra-domain graph

14 Colored by AS number Picture from lumeta.com

15 World Wide Web Nodes are webpages Arcs (i.e., directed edges) are hyperlinks http://research.microsoft.com/~mahdianhttp://theory.csail.mit.edu

16 Web graph, Chicago Tribune Page Picture generated by Nicheworks

17 Social Networks

18 Why Study These Networks Understand the creation of these networks Understand viral epidemics Help design crawling strategies for the web Analyze behavior of algorithms (web/internet) Predict evolution of the network and emergence of new phenomena

19 In this lecture Common properties of social networks  Power law degree distribution  Small world phenomenon  High clustering coefficient Structure of the web graph

20 Power Laws Two quantities x and y are related by a power law if y is proportional to x (-c) for a constant c y = .x (-c) If x and y are related by a power law, then the graph of log(y) versus log(x) is a straight line log(y) = -c.log(x) + log(  ) The slope of the log-log plot is the power exponent c

21 Power Law Distributions A random variable X has a power law distribution if Pr[X=k] is proportional to k (-c) for a constant c The cumulative distribution, Pr[X>k], of a power law distribution is proportional to k (-c+1), and is called the Pareto law Similar to a power law, the Zipf law relates the rank r of X to its size: the r’th largest instance of X is proportional to r (-c’)

22 Example: City Populations 1. New York7,322,564 2. Los Angeles 3,485,398 3. Chicago2,783,726 4. Houston 1,630,553 5. Philadelphia 1,585,577 6. San Diego 1,110,549 7. Detroit 1,027,974 8. Dallas 1,006,877 9. Phoenix 983,403 10. San Antonio 935,933

23 Example: City Populations 1. New York7,322,564 2. Los Angeles 3,485,398 3. Chicago2,783,726 21. Seattle 516,259 94. Spokane, WA 177,196 95. Tacoma, WA 176,664 96. Little Rock, AR 175,795 97. Bakersfield, CA 174,820 98. Fremont, CA 173,339 99. Fort Wayne, IN 173,072 100. Arlington, VA 170,936

24 Example: City Populations Power law exponent: c = 0.74

25 Power Laws in Networks Degree distribution often satisfies a power law: fraction of nodes f d of degree d is proportional to d -c Degree dFraction f d = 1/(2d) 11/2 21/4 31/6 4~1/8

26 Example: Collaboration Graph Power law exp: c = 2.97 With exponential decay factor, c = 2.46

27 Example: Cross-Post Graph Power law exponent: c = 1.3

28 Example: Inter-Domain Internet Power law exponent: 2.15 < c < 2.2

29 Example: Intra-Domain Internet Power law exponent: c = 2.48

30 Example: Web Graph In-Degree Power law exponent: c = 2.09

31 Example: Web Graph Out-Degree Power law exponent: c = 2.72

32 Small World Phenomenon Six degrees of separation: “Everybody on this planet is separated by only six other people. Six degrees of separation between us and everyone else on this planet. The President of the United States, a gondolier in Venice, just fill in the names.”

33 Small World Phenomenon Milgram’s famous experiment (1960s):  Choose a random person in Nebraska, Bob  Ask Bob to deliver a letter to a random person in Massachusetts, Lashawn  Tell Bob target’s name, address, and occupation  Instruct Bob to only send letter to people he knows on a first-name basis

34 Small World Phenomenon Bob, a farmer in Nebraska David, mayor of Bob’s town Bernard, David’s cousin who went to college with Maya, who grew up in Boston With Lashawn

35 Small World Phenomenon in Graphs The diameter of a graph is the maximum distance (number of edges) between any pair of nodes The average distance of a graph is the average distance between any pair of nodes The average connected distance of a graph is the average distance between any pair of connected nodes

36 Small World Phenomenon in Graphs A graph exhibits a small world phenomenon if it has low diameter or average (connected) distance Typically, the average distance of a small world graph is on the order of log n (where n is the number of nodes)

37 Examples Collaboration graph  401,000 nodes, 676,000 edges (average degree 3.37)  Diameter: 23, Average distance: 7.64 Cross-post graph, giant component  30,000 nodes, 800,000 edges (average degree 53.3)  Diameter: 13, Average distance: 3.8 Web graph  200 million nodes, 1.5 billion edges (average degree 15)  Average connected distance: 16 Inter-domain Internet  3500 nodes, 6500 edges (average degree 3.71)  95% of pairs of nodes within distance 5

38 High Clustering Coefficient The clustering coefficient of a graph is the fraction of triangles among connected triples of nodes Intuitively, the clustering coefficient reflects the probability that your friends are themselves friends We expect social networks to have a high clustering coefficient

39 Examples Collaboration graph  Clustering coefficient is 0.14  Density of edges is 0.000008 Cross-post graph  Clustering coefficient is 0.4492  Density of edges is 0.0016

40 Assignment READ: A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, Graph structure in the web, WWW, 2000.

41 Graph Structure of the Web Breadth-first search from randomly chosen start nodes  Follow both forward and backward links  Reveal directed and undirected graph structure Over 90% of nodes reachable if links are treated as undirected Directed graph reveals complex bow-tie structure

42 Bow-Tie Structure of Web Graph Picture from the Nature journal

43 Next Time Probabilistic models for social networks


Download ppt "CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian."

Similar presentations


Ads by Google