Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS Multimedia and Graph mining Christos Faloutsos CMU.

Similar presentations


Presentation on theme: "CMU SCS Multimedia and Graph mining Christos Faloutsos CMU."— Presentation transcript:

1 CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

2 CMU SCS IC '06C. Faloutsos2 CONGRATULATIONS!

3 CMU SCS IC '06C. Faloutsos3 Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions

4 CMU SCS IC '06C. Faloutsos4 Motivation Data mining: ~ find patterns (rules, outliers) How do detached cat retinas evolve? How do real graphs look like? How do (numerical) streams look like?

5 CMU SCS IC '06C. Faloutsos5 ViVo: cat retina mining with Ambuj Singh, Mark Verardo, Vebjorn Ljosa, Arnab Bhattacharya (UCSB) Jia-Yu Tim Pan, HJ Yang (CMU)

6 CMU SCS IC '06C. Faloutsos6 Detachment Development Normal 1 day after detachment 3 days after detachment 7 days after detachment 28 days after detachment 3 months after detachment

7 CMU SCS IC '06C. Faloutsos7 Data and Problem (Problem) What happens in retina after detachment? –What tissues (regions) are involved? –How do they change over time? How will a program convey this info? More than classification “ we want to learn what classifier learned”

8 CMU SCS IC '06C. Faloutsos8 Main idea extract characteristic visual ‘words’ Equivalent to characteristic keywords, in a collection of text documents

9 CMU SCS IC '06C. Faloutsos9 Visual vocabulary?

10 CMU SCS IC '06C. Faloutsos10 Visual vocabulary? news: president, minister, economic sports: baseball, score, penalty

11 CMU SCS IC '06C. Faloutsos11 Visual Vocabulary (ViVo) generation Step 1: Tile image Step 2: Extract tile features Step 3: ViVo generation Visual vocabulary V1 V2 Feature 1 Feature 2 8x12 tiles

12 CMU SCS IC '06C. Faloutsos12 Biological interpretation IDViVoDescriptionCondition V1 GFAP in inner retina (Müller cells)Healthy V10 Healthy outer segments of rod photoreceptors Healthy V8 Redistribution of rod opsin into cell bodies of rod photoreceptors Detached V11 Co-occurring processes: Müller cell hypertrophy and rod opsin redistribution Detached

13 CMU SCS IC '06C. Faloutsos13 Which tissue is significant on 7- day?

14 CMU SCS IC '06C. Faloutsos14 FEMine: Mining Fly Embryos

15 CMU SCS IC '06C. Faloutsos15 With Eric Xing (CMU CS) Bob Murphy (CMU – Bio) Tim Pan (CMU -> Google) Andre Balan (U. Sao Paulo)

16 CMU SCS IC '06C. Faloutsos16 Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions

17 CMU SCS IC '06C. Faloutsos17 Graphs - why should we care?

18 CMU SCS IC '06C. Faloutsos18 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

19 CMU SCS IC '06C. Faloutsos19 Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

20 CMU SCS IC '06C. Faloutsos20 Problem: network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?

21 CMU SCS IC '06C. Faloutsos21 Graph mining Are real graphs random?

22 CMU SCS IC '06C. Faloutsos22 Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patterns

23 CMU SCS IC '06C. Faloutsos23 Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3

24 CMU SCS IC '06C. Faloutsos24 Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3 count 3

25 CMU SCS IC '06C. Faloutsos25 Solution: The plot is linear in log-log scale [FFF’99] freq = degree (-2.15) O = -2.15 Exponent = slope Outdegree Frequency Nov’97 -2.15

26 CMU SCS IC '06C. Faloutsos26 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

27 CMU SCS IC '06C. Faloutsos27 The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]

28 CMU SCS IC '06C. Faloutsos28 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

29 CMU SCS IC '06C. Faloutsos29 Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi http://www.nd.edu/~networks/ Publication%20Categories/ 04%20Talks/2005-norway- 3hours.ppt

30 CMU SCS IC '06C. Faloutsos30 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf users sites ``ebay’’

31 CMU SCS IC '06C. Faloutsos31 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user

32 CMU SCS IC '06C. Faloutsos32

33 CMU SCS IC '06C. Faloutsos33 A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)

34 CMU SCS IC '06C. Faloutsos34 Olympic medals (Sidney’00, Athens’04): log( rank) log(#medals)

35 CMU SCS IC '06C. Faloutsos35 More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

36 CMU SCS IC '06C. Faloutsos36

37 CMU SCS IC '06C. Faloutsos37 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

38 CMU SCS IC '06C. Faloutsos38 Time evolution with Jure Leskovec (CMU) and Jon Kleinberg (Cornell) (‘best paper’ KDD05)

39 CMU SCS IC '06C. Faloutsos39 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data?

40 CMU SCS IC '06C. Faloutsos40 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time –As the network grows the distances between nodes slowly decrease

41 CMU SCS IC '06C. Faloutsos41 Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter

42 CMU SCS IC '06C. Faloutsos42 Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter

43 CMU SCS IC '06C. Faloutsos43 Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter

44 CMU SCS IC '06C. Faloutsos44 Diameter – “Patents” Patent citation network 25 years of data time [years] diameter

45 CMU SCS IC '06C. Faloutsos45 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)

46 CMU SCS IC '06C. Faloutsos46 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! –But obeying the ``Densification Power Law’’

47 CMU SCS IC '06C. Faloutsos47 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69

48 CMU SCS IC '06C. Faloutsos48 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 1: tree

49 CMU SCS IC '06C. Faloutsos49 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2

50 CMU SCS IC '06C. Faloutsos50 Densification – Patent Citations Citations among patents granted 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66

51 CMU SCS IC '06C. Faloutsos51 Densification – Autonomous Systems Graph of Internet 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18

52 CMU SCS IC '06C. Faloutsos52 Densification – Affiliation Network Authors linked to their publications 2002 –60,000 nodes 20,000 authors 38,000 papers –133,000 edges N(t) E(t) 1.15

53 CMU SCS IC '06C. Faloutsos53 Graphs - Conclusions Real graphs obey some surprising patterns –which can help us spot anomalies / outliers A lot of interest from web searching companies –recommendation systems –link spamming –trust propagation HUGE graphs (Millions and Billions of nodes)

54 CMU SCS IC '06C. Faloutsos54 Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions

55 CMU SCS IC '06C. Faloutsos55 Why care about streams?

56 CMU SCS IC '06C. Faloutsos56 Why care about streams? Sensor devices –Temperature, weather measurements –Road traffic data –Geological observations –Patient physiological data –sensor-Andrew project Embedded devices –Network routers

57 CMU SCS IC '06C. Faloutsos57 Co-evolving time sequences Joint work with Jimeng Sun (CMU) Dr. Spiros Papadimitriou (CMU/IBM) Dr. Yasushi Sakurai (NTT) Prof. Jeanne VanBriesen (CMU/CEE)

58 CMU SCS IC '06C. Faloutsos58 Motivation water distribution network normal operation Phase 1Phase 2Phase 3 : : : chlorine concentrations sensors near leak sensors away from leak

59 CMU SCS IC '06C. Faloutsos59 Phase 1Phase 2Phase 3 : : : Motivation water distribution network normal operationmajor leak chlorine concentrations sensors near leak sensors away from leak

60 CMU SCS IC '06C. Faloutsos60 Motivation actual measurements (n streams) k hidden variable(s) spot: “hidden (latent) variables” Phase 1 : : : chlorine concentrations Phase 1 k = 1

61 CMU SCS IC '06C. Faloutsos61 Motivation chlorine concentrations Phase 1 Phase 2 actual measurements (n streams) k hidden variable(s) k = 2 : : : spot: “hidden (latent) variables”

62 CMU SCS IC '06C. Faloutsos62 Motivation chlorine concentrations Phase 1 Phase 2 Phase 3 actual measurements (n streams) k hidden variable(s) k = 1 : : : spot: “hidden (latent) variables”

63 CMU SCS IC '06C. Faloutsos63 SPIRIT / InteMon http://warsteiner.db.cs.cmu.edu/demo/intemon.jsp http://localhost:8080/demo/graphs.jsp self-* storage system (PDL/CMU) –1 PetaByte storage –self-monitoring, self-healing: self-* with Jimeng Sun (CMU/CS) Evan Hoke (CMU/CS-ug) Prof. Greg Ganger (CMU/CS/ECE) John Strunk (CMU/ECE)

64 CMU SCS IC '06C. Faloutsos64 Related project Anomaly detection in network traffic (Zhang, Xie)

65 CMU SCS IC '06C. Faloutsos65 Conclusions Biological images, graphs & streams pose fascinating problems self-similarity, fractals and power laws work, when other methods fail!

66 CMU SCS IC '06C. Faloutsos66 Books Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

67 CMU SCS IC '06C. Faloutsos67 Contact info christos@cs.cmu.edu www.cs.cmu.edu/~christos Wean Hall 7107 Ph#: x8.1457 and, again WELCOME!


Download ppt "CMU SCS Multimedia and Graph mining Christos Faloutsos CMU."

Similar presentations


Ads by Google