Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 2: Statistical learning primer for biologists

Similar presentations


Presentation on theme: "Lecture 2: Statistical learning primer for biologists"— Presentation transcript:

1 Lecture 2: Statistical learning primer for biologists
Alan Qi Purdue Statistics and CS Jan. 15, 2009

2 Outline Basics for probability Regression
Graphical models: Bayesian networks and Markov random fields Unsupervised learning: K-means and Expectation maximization

3 Probability Theory Sum Rule Product Rule

4 The Rules of Probability
Sum Rule Product Rule

5 Bayes’ Theorem posterior  likelihood × prior

6 Probability Density & Cumulative Distribution Functions

7 Expectations Conditional Expectation (discrete)
Approximate Expectation (discrete and continuous)

8 Variances and Covariances

9 The Gaussian Distribution

10 Gaussian Mean and Variance

11 The Multivariate Gaussian

12 Gaussian Parameter Estimation
Likelihood function

13 Maximum (Log) Likelihood

14 Properties of and Unbiased Biased

15 Curve Fitting Re-visited

16 Maximum Likelihood Determine by minimizing sum-of-squares error,

17 Predictive Distribution

18 MAP: A Step towards Bayes
Determine by minimizing regularized sum-of-squares error,

19 Bayesian Curve Fitting

20 Bayesian Networks Directed Acyclic Graph (DAG)

21 Bayesian Networks General Factorization

22 Generative Models Causal process for generating images

23 Discrete Variables (1) General joint distribution: K 2 -1 parameters
Independent joint distribution: 2(K-1) parameters

24 Discrete Variables (2) General joint distribution over M variables: KM -1 parameters M node Markov chain: K-1+(M-1)K(K-1) parameters

25 Discrete Variables: Bayesian Parameters (1)

26 Discrete Variables: Bayesian Parameters (2)
Shared prior

27 Parameterized Conditional Distributions
If are discrete, K-state variables, in general has O(K M) parameters. The parameterized form requires only M + 1 parameters

28 Conditional Independence
a is independent of b given c Equivalently Notation

29 Conditional Independence: Example 1

30 Conditional Independence: Example 1

31 Conditional Independence: Example 2

32 Conditional Independence: Example 2

33 Conditional Independence: Example 3
Note: this is the opposite of Example 1, with c unobserved.

34 Conditional Independence: Example 3
Note: this is the opposite of Example 1, with c observed.

35 “Am I out of fuel?” B = Battery (0=flat, 1=fully charged)
And hence B = Battery (0=flat, 1=fully charged) F = Fuel Tank (0=empty, 1=full) G = Fuel Gauge Reading (0=empty, 1=full)

36 “Am I out of fuel?” Probability of an empty tank increased by observing G = 0.

37 “Am I out of fuel?” Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.

38 The Markov Blanket Factors independent of xi cancel between numerator and denominator.

39 Markov Random Fields Markov Blanket

40 Cliques and Maximal Cliques

41 Joint Distribution where is the potential over clique C and
is the normalization coefficient; note: M K-state variables  KM terms in Z. Energies and the Boltzmann distribution

42 Illustration: Image De-Noising (1)
Original Image Noisy Image

43 Illustration: Image De-Noising (2)

44 Illustration: Image De-Noising (3)
Noisy Image Restored Image (ICM)

45 Converting Directed to Undirected Graphs (1)

46 Converting Directed to Undirected Graphs (2)
Additional links: “marrying parents”, i.e., moralization

47 Directed vs. Undirected Graphs (2)

48 Inference on a Chain Computational time increases exponentially with N.

49 Inference on a Chain

50 Supervised Learning Supervised learning: learning with examples or labels, e.g., classification and regression Linear regression (the example we just given), Generalized linear models (e.g, probit classification), Support vector machines, Gaussian processes classifications, etc. Take CS590M-Machine Learning in 2009 fall.

51 Unsupervised Learning
Supervised learning: learning with examples or labels, e.g., classification and regression Unsupervised learning: learning without examples or labels, e.g., clustering, mixture models, PCA, non-negative matrix factorization

52 K-means Clustering: Goal

53 Cost Function

54 Two Stage Updates

55 Optimizing Cluster Assignment

56 Optimizing Cluster Centers

57 Convergence of Iterative Updates

58 Example of K-Means Clustering

59 Mixture of Gaussians Mixture of Gaussians: Introduce latent variables:
Marginal distribution:

60 Conditional Probability
Responsibility that component k takes for explaining the observation.

61 Maximum Likelihood Maximize the log likelihood function

62 Maximum Likelihood Conditions (1)
Setting the derivatives of to zero:

63 Maximum Likelihood Conditions (2)
Setting the derivative of to zero:

64 Maximum Likelihood Conditions (3)
Lagrange function: Setting its derivative to zero and use the normalization constraint, we obtain:

65 Expectation Maximization for Mixture Gaussians
Although the previous conditions do not provide closed-form conditions, we can use them to construct iterative updates: E step: Compute responsibilities M step: Compute new mean , variance , and mixing coefficients . Loop over E and M steps until the log likelihood stops to increase.

66 Example EM on the Old Faithful data set.

67 General EM Algorithm

68 EM as Lower Bounding Methods
Goal: maximize Define: We have

69 Lower Bound is a functional of the distribution . Since and ,
is a lower bound of the log likelihood function

70 Illustration of Lower Bound

71 Lower Bound Perspective of EM
Expectation Step: Maximizing the functional lower bound over the distribution Maximization Step: Maximizing the lower bound over the parameters .

72 Illustration of EM Updates


Download ppt "Lecture 2: Statistical learning primer for biologists"

Similar presentations


Ads by Google