Presentation is loading. Please wait.

Presentation is loading. Please wait.

EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.

Similar presentations


Presentation on theme: "EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University."— Presentation transcript:

1 EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University of California, Berkeley1

2 Last time PCA reduces dimensionality of a data set while retaining as much as possible the data variation.  Statistical view: The leading PCs are given by the leading eigenvectors of the covariance.  Geometric view: Fitting a d-dim subspace model via SVD Extensions of PCA  Probabilistic PCA via MLE  Kernel PCA via kernel functions and kernel matrices Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 2

3 This lecture Review basic iterative algorithms Formulation of the subspace segmentation problem Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 3

4 Example 4.1 Euclidean distance-based clustering is not invariant to linear transformation Distance metric needs to be adjusted after linear transformation Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 4

5 Assume data sampled from a mixture of Gaussian Classical distance metric between a sample and the mean of the jth cluster is the Mahanalobis distance Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 5

6 K-Means Assume a map function provide each ith sample a label An optimal clustering minimizes the within-cluster scatter: i.e., the average distance of all samples to their respective cluster means Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 6

7 However, as K is user defined, when each point becomes a cluster itself: K=n. In this chapter, would assume true K is known. Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 7

8 Algorithm A chicken-and-egg view Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 8

9 Two-Step Iteration Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 9

10 Example http://www.paused21.net/off/kmeans/bin/ Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 10

11 Characteristics of K-Means It is a greedy algorithm, does not guarantee to converge to the global optimum. Given fixed initial clusters/ Gaussian models, the iterative process is deterministic. Result may be improved by running k-means multiple times with different starting conditions. The segmentation-estimation process can be treated as a generalized expectation-maximization algorithm Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 11

12 EM Algorithm [Dempster-Laird-Rubin 1977] EM estimates the model parameters and the segmentation in a ML sense. Assume samples are independently drawn from a mixed probabilistic distribution, indicated by a hidden discrete variable z Cond. dist. can be Gaussian Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 12

13 The Maximum-Likelihood Estimation The unknown parameters are The likelihood function: The optimal solution maximizes the log-likelihood Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 13

14 E Step: Compute the Expectation Directly maximize the log-likelihood function is a high-dimensional nonlinear optimization problem Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 14

15 Define a new function: The first term is called expected complete log- likelihood function; The second term is the conditional entropy. Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 15

16 Observation: Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 16

17 M-Step: Maximization Regard the (incomplete) log-likelihood as a function of two variables: Maximize g iteratively Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 17

18 Iteration converges to a stationary point Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 18

19 Prop 4.2: Update Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 19

20 Update Recall Assume is fixed, then maximize the expected complete log-likelihood Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 20

21 To maximize the expected log-likelihood, as an example, assume each cluster is isotropic normal distribution: Eliminate the constant term in the objective Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 21

22 Exer 4.2 Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 22 Compared to k-means, EM assigns the samples “softly” to each cluster according to a set of probabilities.

23 EM Algorithm Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 23

24 Exam 4.3: Global max may not exist Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 24


Download ppt "EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University."

Similar presentations


Ads by Google