Presentation is loading. Please wait.

Presentation is loading. Please wait.

© P. Smyth: UC Irvine: Graphical Models: 2004: 1 Learning from data with graphical models Padhraic Smyth © Information and Computer Science University.

Similar presentations


Presentation on theme: "© P. Smyth: UC Irvine: Graphical Models: 2004: 1 Learning from data with graphical models Padhraic Smyth © Information and Computer Science University."— Presentation transcript:

1 © P. Smyth: UC Irvine: Graphical Models: 2004: 1 Learning from data with graphical models Padhraic Smyth © Information and Computer Science University of California, Irvine www.datalab.uci.edu

2 © P. Smyth: UC Irvine: Graphical Models: 2004: 2 Outline Graphical models –A general language for specification and computation with probabilistic models Learning graphical models from data –EM algorithm, Bayesian learning, etc Applications –Learning topics from documents –Discovering clusters of curves –Other work…

3 © P. Smyth: UC Irvine: Graphical Models: 2004: 3 The Data Revolution Technological Advances in past 20 years –Sensors –Storage –Computational power –Databases and indexing –Machine learning Science and engineering are increasingly data driven

4 © P. Smyth: UC Irvine: Graphical Models: 2004: 4 Examples of Digital Data Sets The Web: > 4.3 billion pages –Link graph –Text on Web pages Genomic Data –Sequence = 3 billion base pairs (human) –25k genes: expression, location, networks…. Sloan Digital Sky Survey –15 terabytes –~500 million sky objects Earth Sciences –NASA MODIS satellite –Entire Earth at 15m to 250m resolution every day –37 spectral bands

5 © P. Smyth: UC Irvine: Graphical Models: 2004: 5

6 © P. Smyth: UC Irvine: Graphical Models: 2004: 6

7 © P. Smyth: UC Irvine: Graphical Models: 2004: 7

8 © P. Smyth: UC Irvine: Graphical Models: 2004: 8

9 © P. Smyth: UC Irvine: Graphical Models: 2004: 9

10 © P. Smyth: UC Irvine: Graphical Models: 2004: 10

11 © P. Smyth: UC Irvine: Graphical Models: 2004: 11

12 © P. Smyth: UC Irvine: Graphical Models: 2004: 12

13 © P. Smyth: UC Irvine: Graphical Models: 2004: 13 Questions of Interest can we detect significant change? identify, classify, and catalog certain phenomena? segment/cluster pixels into land cover types? measure and predict seasonal variation? And so on…. Previously scientists did much of this work by hand But now…. automated algorithms are essential

14 © P. Smyth: UC Irvine: Graphical Models: 2004: 14 Learning from Data Uncertainty abounds…. –Measurement error –Unobserved phenomena –Model and parameter uncertainty –Forecasting and prediction Probability is the “ language of uncertainty” –Significant shift in computer science towards probabilistic models in recent years

15 © P. Smyth: UC Irvine: Graphical Models: 2004: 15 Preliminaries Probability X = x : statement about the world (true or false) P(X=x) : degree of belief that the world is in state X=x Set of variables U = {X 1, X 2,...... X N } Joint Probability Distribution: P(U) = p (X 1, X 2,...... X N ) P(U) is a table of K x K x K …. = K N numbers (say all variables take K values)

16 © P. Smyth: UC Irvine: Graphical Models: 2004: 16 Conditional Probabilities Many problems of interest involve computing conditional probabilities –Prediction P(X N, | x N,...... x 1 ) –Classification/Diagnosis/Detection arg max { P(Y | x 1,...... x N ) } –Learning P(  | x 1,...... x N ) Note: –Computing P (X N+1 | X 1 = x) has time complexity O(K N )

17 © P. Smyth: UC Irvine: Graphical Models: 2004: 17 Two Problems Problem 1: Computational Complexity –Inference computations scale as O(K N ) Problem 2: Model Specification –To specify p(U) we need a table of K N numbers –Where do these numbers come from?

18 © P. Smyth: UC Irvine: Graphical Models: 2004: 18 A very brief history…. Problems recognized by 1970’s, 80’s in artificial intelligence (AI) –E.g., see early AI work in constructing diagnostic reasoning systems: e.g., medical diagnosis, N=100 different symptoms –1970’s/80’s Solutions? invent new formalisms –Certainty factors –Fuzzy logic –Non-monotonic logic –1985: “In defense of probability”, P. Cheeseman, IJCAI 85 –1990’s: marriage of statistics and computer science

19 © P. Smyth: UC Irvine: Graphical Models: 2004: 19 Two Key Ideas Problem 1: Computational Complexity –Idea: Represent dependency structure as a graph and exploit sparseness in computation Problem 2: Model Specification –Idea: learn models from data using statistical learning principles

20 © P. Smyth: UC Irvine: Graphical Models: 2004: 20 “…probability theory is more fundamentally concerned with the structure of reasoning and causation than with numbers.” Glenn Shafer and Judea Pearl Introduction to Readings in Uncertain Reasoning, Morgan Kaufmann, 1990

21 © P. Smyth: UC Irvine: Graphical Models: 2004: 21 Graphical Models Dependency structure encoded by an acyclic directed graph –Node random variable –Edges encode dependencies Absence of edge -> conditional independence –Directed and undirected versions Why is this useful? –A language for communication –A language for computation Origins: –Wright 1920’s –1988 Spiegelhalter and Lauritzen in statistics Pearl in computer science –Aka Bayesian networks, belief networks, causal networks, etc

22 © P. Smyth: UC Irvine: Graphical Models: 2004: 22 Examples of 3-way Graphical Models A CB Conditionally independent effects: p(A,B,C) = p(B|A)p(C|A)p(A) B and C are conditionally independent given A

23 © P. Smyth: UC Irvine: Graphical Models: 2004: 23 Examples of 3-way Graphical Models A B C Independent Causes: p(A,B,C) = p(C|A,B)p(A)p(B)

24 © P. Smyth: UC Irvine: Graphical Models: 2004: 24 Examples of 3-way Graphical Models ACB Markov dependence: p(A,B,C) = p(C|B) p(B|A)p(A)

25 © P. Smyth: UC Irvine: Graphical Models: 2004: 25 Directed Graphical Models A B C p(A,B,C) = p(C|A,B)p(A)p(B)

26 © P. Smyth: UC Irvine: Graphical Models: 2004: 26 Directed Graphical Models A B C In general, p(X 1, X 2,....X N ) =  p(X i | parents(X i ) ) p(A,B,C) = p(C|A,B)p(A)p(B)

27 © P. Smyth: UC Irvine: Graphical Models: 2004: 27 Example D A B C F E G

28 © P. Smyth: UC Irvine: Graphical Models: 2004: 28 Example D A B c F E g Say we want to compute p(a | c, g)

29 © P. Smyth: UC Irvine: Graphical Models: 2004: 29 Example D A B c F E g Direct calculation: p(a|c,g) =  bdef p(a,b,d,e,f | c,g) Complexity of the sum is O(K 4 )

30 © P. Smyth: UC Irvine: Graphical Models: 2004: 30 Example D A B c F E g Reordering:  d p(a|b)  d p(b|d,c)  e p(d|e)  f p(e,f |g)

31 © P. Smyth: UC Irvine: Graphical Models: 2004: 31 Example D A B c F E g Reordering:  b  p(a|b)  d p(b|d,c)  e p(d|e)  f p(e,f |g) p(e|g)

32 © P. Smyth: UC Irvine: Graphical Models: 2004: 32 Example D A B c F E g Reordering:  b  p(a|b)  d p(b|d,c)  e p(d|e) p(e|g) p(d|g)

33 © P. Smyth: UC Irvine: Graphical Models: 2004: 33 Example D A B c F E g Reordering:  b  p(a|b)  d p(b|d,c) p(d|g) p(b|c,g)

34 © P. Smyth: UC Irvine: Graphical Models: 2004: 34 Example D A B c F E g Reordering:  b  p(a|b) p(b|c,g) p(a|c,g) Complexity is O(K), compared to O(K 4 )

35 © P. Smyth: UC Irvine: Graphical Models: 2004: 35 A More General Algorithm Message Passing (MP) Algorithm –Pearl, 1988; Lauritzen and Spiegelhalter, 1988 –Declare 1 node (any node) to be a root –Schedule two phases of message-passing nodes pass messages up to the root messages are distributed back to the leaves –In time O(N), we can compute any probability of interest

36 © P. Smyth: UC Irvine: Graphical Models: 2004: 36 Sketch of the MP algorithm in action

37 © P. Smyth: UC Irvine: Graphical Models: 2004: 37 Sketch of the MP algorithm in action 1

38 © P. Smyth: UC Irvine: Graphical Models: 2004: 38 Sketch of the MP algorithm in action 1 2

39 © P. Smyth: UC Irvine: Graphical Models: 2004: 39 Sketch of the MP algorithm in action 1 2 3

40 © P. Smyth: UC Irvine: Graphical Models: 2004: 40 Sketch of the MP algorithm in action 1 2 3 4

41 © P. Smyth: UC Irvine: Graphical Models: 2004: 41 Complexity of the MP Algorithm Efficient –Complexity scales as O(N K m ) N = number of variables K = arity of variables m = maximum number of parents for any node –Compare to O(K N ) for brute-force method

42 © P. Smyth: UC Irvine: Graphical Models: 2004: 42 Graphs with “loops” D A B C F E G Message passing algorithm does not work when there are multiple paths between 2 nodes

43 © P. Smyth: UC Irvine: Graphical Models: 2004: 43 Graphs with “loops” D A B C F E G General approach: “cluster” variables together to convert graph to a tree

44 © P. Smyth: UC Irvine: Graphical Models: 2004: 44 Junction Tree D A B, E C F G

45 © P. Smyth: UC Irvine: Graphical Models: 2004: 45 Junction Tree D A B, E C F G Good news: can perform MP algorithm on this tree Bad news: complexity is now O(K 2 )

46 © P. Smyth: UC Irvine: Graphical Models: 2004: 46 Probability Calculations on Graphs Structure of the graph reveals –Computational strategy Sparser graphs -> faster computation –Dependency relations Automated computation of conditional probabilities –i.e., a fully general algorithm for arbitrary graphs –exact vs. approximate answers Extensions –For continuous variables: replace sum with integral –For identification of most likely values Replace sum with max operator –Conditional probability tables can be approximated

47 © P. Smyth: UC Irvine: Graphical Models: 2004: 47 Hidden or Latent Variables In many applications there are 2 sets of variables: –Variables whose values we can directly measure –Variables that are “hidden”, cannot be measured Examples: –Speech recognition: Observed: acoustic voice signal Hidden: label of the word spoken –Face tracking in images Observed: pixel intensities Hidden: position of the face in the image –Text modeling Observed: counts of words in a document Hidden: topics that the document is about

48 © P. Smyth: UC Irvine: Graphical Models: 2004: 48 Mixture Models S Y Hidden discrete variable Observed variable(s)

49 © P. Smyth: UC Irvine: Graphical Models: 2004: 49 A Graphical Model for Clustering S YjYj Hidden discrete (cluster) variable Observed variable(s) (assumed conditionally independent given S) YdYd Y1Y1

50 © P. Smyth: UC Irvine: Graphical Models: 2004: 50 A Graphical Model for Clustering S YjYj Hidden discrete (cluster) variable Observed variable(s) (assumed conditionally independent given S) YdYd Y1Y1 Clusters = p(Y 1,…Y d | S = s) Clustering = learning these probability distributions from data

51 © P. Smyth: UC Irvine: Graphical Models: 2004: 51 Mixtures of Markov Chains S Y2Y2 YNYN Y1Y1 Can learn model sets of sequences as coming from K different Markov chains Y3Y3

52 © P. Smyth: UC Irvine: Graphical Models: 2004: 52 Mixtures of Markov Chains S Y2Y2 YNYN Y1Y1 Can learn model sets of sequences as coming from K different Markov chains Provides a useful method for clustering sequences Cadez, Heckerman, Meek, Smyth, 2003 used to cluster 1 million Web user navigation patterns algorithm is part of SQL Server 2005 Y3Y3

53 © P. Smyth: UC Irvine: Graphical Models: 2004: 53 Hidden Markov Model (HMM) Y1Y1 S1S1 Y2Y2 S2S2 Y3Y3 S3S3 YNYN SNSN - - - - - - - - - - - - - - - - - - - - - - - - - - Observed Hidden Two key conditional independence assumptions Widely used in: ….speech recognition, protein models, error-correcting codes Comments: - inference about S given Y is O(N) - if S is continuous, we have a Kalman filter

54 © P. Smyth: UC Irvine: Graphical Models: 2004: 54 Generalized HMMs Y1Y1 S1S1 Y2Y2 S2S2 Y3Y3 S3S3 YnYn SnSn I1I1 I2I2 I3I3 InIn

55 © P. Smyth: UC Irvine: Graphical Models: 2004: 55 Learning from Data

56 © P. Smyth: UC Irvine: Graphical Models: 2004: 56 Probabilistic Model

57 © P. Smyth: UC Irvine: Graphical Models: 2004: 57 Probabilistic Model Real World Data

58 © P. Smyth: UC Irvine: Graphical Models: 2004: 58 Probabilistic Model Real World Data P(Data | Parameters)

59 © P. Smyth: UC Irvine: Graphical Models: 2004: 59 Probabilistic Model Real World Data P(Data | Parameters) P(Parameters | Data)

60 © P. Smyth: UC Irvine: Graphical Models: 2004: 60 Parameters and Data y3y3  Data = {y 1,…y 4 } Model parameters Data observations are assumed IID here y4y4 y2y2 y1y1

61 © P. Smyth: UC Irvine: Graphical Models: 2004: 61 Plate Notation y3y3  ynyn y2y2 y1y1 yiyi i=1:n 

62 © P. Smyth: UC Irvine: Graphical Models: 2004: 62 Maximum Likelihood yiyi i=1:n  Likelihood() = p(Data |  ) =  p(y i |  ) Maximum Likelihood:  ML = arg max{ Likelihood() } Data = {y 1,…y n } Model parameters

63 © P. Smyth: UC Irvine: Graphical Models: 2004: 63 Being Bayesian yiyi i=1:n  Bayesian Learning: p(  | evidence ) = p(  | data, prior)  Prior() = p(  )

64 © P. Smyth: UC Irvine: Graphical Models: 2004: 64 Learning = Inference in a Graph yiyi i=1:n   is unknown Learning = process of computing p(  | y’s, prior ) Information “flows” from green nodes to  node 

65 © P. Smyth: UC Irvine: Graphical Models: 2004: 65 Example: Gaussian Model yiyi i=1:n  Note: priors and parameters are assumed independent here   

66 © P. Smyth: UC Irvine: Graphical Models: 2004: 66 Example: Bayesian Regression yiyi i=1:n  Model: y i = f [x i ;] + e, e ~ N(0,   ) p(y i | x i ) ~ N ( f[x i ;],   )    xixi

67 © P. Smyth: UC Irvine: Graphical Models: 2004: 67 Learning with Hidden Variables SiSi yiyi i=1:n  Finding  that maximizes L() is difficult -> use iterative optimization algorithms

68 © P. Smyth: UC Irvine: Graphical Models: 2004: 68 Learning with Hidden Variables Guess at some initial parameters   E-step (inference in a graph) –For each case, and each unknown variable compute p(S | known data,   ) M-step: (optimization) –Maximize L() using p(S | …..) –This yields new parameter estimates   This is the EM algorithm: –Guaranteed to converge to a (local) maximum of L()

69 © P. Smyth: UC Irvine: Graphical Models: 2004: 69 Mixture Model SiSi yiyi i=1:n 

70 © P. Smyth: UC Irvine: Graphical Models: 2004: 70 E-Step SiSi yiyi i=1:n 

71 © P. Smyth: UC Irvine: Graphical Models: 2004: 71 M-Step SiSi yiyi i=1:n 

72 © P. Smyth: UC Irvine: Graphical Models: 2004: 72 E-Step SiSi yiyi i=1:n 

73 © P. Smyth: UC Irvine: Graphical Models: 2004: 73 Data from Prof. Christine McLaren, Dept of Epidemiology, UC Irvine

74 © P. Smyth: UC Irvine: Graphical Models: 2004: 74 Mixture Model CiCi yiyi i=1:n  Two hidden clusters C=1 and C= 2 P(y | C = k) is Gaussian Model for y = mixture of two 2d-Gaussians

75 © P. Smyth: UC Irvine: Graphical Models: 2004: 75

76 © P. Smyth: UC Irvine: Graphical Models: 2004: 76

77 © P. Smyth: UC Irvine: Graphical Models: 2004: 77

78 © P. Smyth: UC Irvine: Graphical Models: 2004: 78

79 © P. Smyth: UC Irvine: Graphical Models: 2004: 79

80 © P. Smyth: UC Irvine: Graphical Models: 2004: 80

81 © P. Smyth: UC Irvine: Graphical Models: 2004: 81 Anemia Group Control Group

82 © P. Smyth: UC Irvine: Graphical Models: 2004: 82 Application 1: Probabilistic Topic Modeling in Text Documents Collaborators: Mark Steyvers, UC Irvine Michal Rosen-Zvi, UC Irvine Tom Griffiths, Stanford/MIT

83 © P. Smyth: UC Irvine: Graphical Models: 2004: 83 The slides on author-topic modeling are in a separate Powerpoint file (with author-topic in the title)

84 © P. Smyth: UC Irvine: Graphical Models: 2004: 84 Application 2: Modeling Sets of Curves Collaborators: Scott Gaffney, Dasha Chudova, UC Irvine, Andy Robertson, Suzana Camargo, Columbia University

85 © P. Smyth: UC Irvine: Graphical Models: 2004: 85 Graphical Models for Curves y n  t y = f(t ;  ) e.g., y = at 2 + bt + c,  = {a, b, c} Data = { (y 1,t 1 ),……. y T, t T ) }

86 © P. Smyth: UC Irvine: Graphical Models: 2004: 86 Graphical Models for Curves y T points  t y ~ Gaussian density with mean = f(t ;  ), variance =  2 

87 © P. Smyth: UC Irvine: Graphical Models: 2004: 87 Example t y

88 © P. Smyth: UC Irvine: Graphical Models: 2004: 88 Example f(t ;  ) <- this is hidden t y

89 © P. Smyth: UC Irvine: Graphical Models: 2004: 89 Sets of Curves

90 © P. Smyth: UC Irvine: Graphical Models: 2004: 90 Sets of Curves

91 © P. Smyth: UC Irvine: Graphical Models: 2004: 91 Clustering “non-vector” data Challenges with the data…. –May be of different “lengths”, “sizes”, etc –Not easily representable in vector spaces –Distance is not naturally defined a priori Possible approaches –“convert” into a fixed-dimensional vector space Apply standard vector clustering – but loses information –use hierarchical clustering But O(N 2 ) and requires a distance measure –probabilistic clustering with mixtures Define a generative mixture model for the data Learn distance and clustering simultaneously

92 © P. Smyth: UC Irvine: Graphical Models: 2004: 92 Graphical Models for Sets of Curves y T  t Each curve: P(y i | t i,  ) = product of Gaussians  N curves

93 © P. Smyth: UC Irvine: Graphical Models: 2004: 93 Curve-Specific Transformations y T  t  N curves  e.g., E[y i ] = at 2 + bt + c +  i,  = {a, b, c,  1,….  N } Note: we can learn function parameters and shifts simultaneously with EM

94 © P. Smyth: UC Irvine: Graphical Models: 2004: 94 Learning Shapes and Shifts Original data Data after Learning Data = smoothed growth acceleration data from teenagers EM used to learn a spline model + time-shift for each curve

95 © P. Smyth: UC Irvine: Graphical Models: 2004: 95 Clustering: Mixtures of Curves y T  t  N curves  c Each set of trajectory points comes from 1 of K models Model for group k is a Gaussian curve model Marginal probability for a trajectory = mixture model

96 © P. Smyth: UC Irvine: Graphical Models: 2004: 96 Clustering Methodology Mixtures of polynomials and splines –model data as mixtures of noisy regression models –2d (x,y) position as a function of time –use the model as a first-order approximation for clustering Compare to vector-based clustering... –provides a quantitative (e.g., predictive) model –can handle variable-length trajectories missing measurements background/outlier process coupling of other “features” (e.g., intensity)

97 © P. Smyth: UC Irvine: Graphical Models: 2004: 97 Winter Storm Tracks Highly damaging weather Important water- source Climate change implications

98 © P. Smyth: UC Irvine: Graphical Models: 2004: 98 Data –Sea-level pressure on a global grid –Four times a day, every 6 hours, over 30 years Blue indicates low pressure

99 © P. Smyth: UC Irvine: Graphical Models: 2004: 99 Clusters of Trajectories

100 © P. Smyth: UC Irvine: Graphical Models: 2004: 100 Cluster Shapes for Pacific Cyclones

101 © P. Smyth: UC Irvine: Graphical Models: 2004: 101 TROPICAL CYCLONES Western North Pacific 1983-2002

102 © P. Smyth: UC Irvine: Graphical Models: 2004: 102

103 © P. Smyth: UC Irvine: Graphical Models: 2004: 103 Hierarchical Bayesian Models y T ii t  N curves  c Each curve is allowed to have its own parameters -> “clustering in parameter space” 

104 © P. Smyth: UC Irvine: Graphical Models: 2004: 104 Summary Graphical Models –Representation language for complex probabilistic models –Provide a systematic framework for Model description Probabilistic computation –General framework for learning from data Applications –author-topic models for text data –mixture of regressions for sets of curves –…. many more Extensions –Hierarchical Bayesian models –Spatio-temporal models –……

105 Further Information Papers online at www.datalab.uci.edu –Graphical models Smyth, Heckerman, Jordan, 1997, Neural Computation –Topic models Steyvers et al, ACM SIGKDD 2004 Rosen-Zvi et al, UAI 2004 –Curve clustering Gaffney and Smyth, NIPS 2004 Scott Gaffney, Phd thesis, UCI 2004 Author-Topic Browser: www.datalab.uci.edu/author-topic –JAVA demo of online browser –additional tables and results


Download ppt "© P. Smyth: UC Irvine: Graphical Models: 2004: 1 Learning from data with graphical models Padhraic Smyth © Information and Computer Science University."

Similar presentations


Ads by Google