Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Retrieval Search Engine Technology (10) Prof. Dragomir R. Radev.

Similar presentations


Presentation on theme: "Information Retrieval Search Engine Technology (10) Prof. Dragomir R. Radev."— Presentation transcript:

1 Information Retrieval Search Engine Technology (10) http://tangra.si.umich.edu/clair/ir09 http://tangra.si.umich.edu/clair/ir09 Prof. Dragomir R. Radev radev@umich.edu

2 SET/IR – W/S 2009 … 16. (Social) networks Random graph models Properties of random graphs. …

3 SET/IR – W/S 2009 … 17. Small worlds Scale-free networks Power law distributions Centrality …

4

5

6 Krebs 2004

7

8 Peri et al., Nucleic Acids Res. 2004 January 1; 32(Database issue): D497–D501. doi: 10.1093/nar/gkh070. Interleukin-2 receptor pathway protein interaction network (from HPRD).

9 American Journal of Sociology, Vol. 100, No. 1. "Chains of affection: The structure of adolescent romantic and sexual networks,“ Bearman PS, Moody J, Stovel K.

10 The New York Times May 21, 2005

11 Email network

12 Networks The Web Citation networks Social networks Protein interaction networks Technological networks Other networks –Movie actor networks –Cooccurrence of characters in Les Miserables –Board membership

13 Types of networks Directed/undirected Can have weights Single-mode vs. bipartite (e.g., movie- actor graphs)

14 Semantic network

15 Meredithyesterdayapples bought green Dependency network

16

17 Random network

18 Lexical networks A special case of networks where nodes are words or documents and edges link semantically related nodes Other examples: –Words used in dictionary definitions –Names of people mentioned in the same story –Words that translate to the same word

19 Analyzing networks Clustering coefficient –Watts/Strogatz cc = #triangles/#triples –Example: Diameter (longest shortest path) Average shortest path (asp) Strongly connected component (SCC) Weakly connected component (WCC)

20 Degree distribution Uniform Poisson Power-law (with coefficient α).

21 Types of networks Regular networks –Uniform degree distribution Random networks –Memoryless –Poisson degree distribution –Characteristic value –Low clustering coefficient –Large asp Small world networks –High transitivity –Presence of hubs (memory) –High clustering coefficient (e.g., 1000 times higher than random) –Small asp –Some are scale free –Immune to random attacks –(Very) vulnerable to targeted attacks –Power law degree distribution (typical value of  between 2 and 3)

22 From: Mark Newman 2003. The structure and function of complex networks

23 Comparing the dependency graph to a random (Poisson) graph RandomActual n55635584 M1444014472 Diameter2113 Asp8.7884.01 W/S cc0.000620.092  n/a2.2

24 Properties of lexical networks Entries in a thesaurus [Motter et al. 2002] c/c 0 = 260 (n=30,000) Co-occurrence networks [Dorogovtsev and Mendes 2001, Sole and Ferrer i Cancho 2001] c/c 0 = 1,000 (n=400,000) Mental lexicon [Vitevitch 2005] c/c 0 = 278 (n=19,340) letter actor characternature universe world

25

26 Graph-based representations 1 2 3 4 5 7 68 12345678 111 21 311 41 51111 611 7 8 Square connectivity (incidence) matrix Graph G (V,E)

27 Bipartite graphs and one-mode projections ABCDE 1234 ABCDE 111 211 311 4111

28 Power laws Web site size (Huberman and Adamic 1999) Power-law connectivity (Barabasi and Albert 1999): exponents 2.45 for out-degree and 2.1 for the in-degree Others: call graphs among telephone carriers, citation networks (Redner 1998), e.g., Erdos, collaboration graph of actors, metabolic pathways (Jeong et al. 2000), protein networks (Maslov and Sneppen 2002). All values of gamma are around 2-3.

29 Small-world networks Diameter = average length of the shortest path between all pairs of nodes. Example… Milgram experiment (1967) –Kansas/Omaha --> Boston (42/160 letters) –diameter = 6 Albert et al. 1999 – average distance between two verstices is d = 0.35 + 2.06 log 10 n. For n = 10 9, d=18.89. Six degrees of separation

30 Clustering coefficient Cliquishness (c): between the k v (k v – 1)/2 pairs of neighbors. Examples: nkdd rand Cc rand Actors225226613.652.990.790.00027 Power grid49412.6718.712.40.080.005 C. Elegans282142.652.250.280.05


Download ppt "Information Retrieval Search Engine Technology (10) Prof. Dragomir R. Radev."

Similar presentations


Ads by Google