Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 59000 Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct. 7 2008.

Similar presentations


Presentation on theme: "CS 59000 Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct. 7 2008."— Presentation transcript:

1 CS 59000 Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct. 7 2008

2 Outline Review of Probit regression, Laplace approximation, BIC, Bayesian logistic regression Kernel methods Kernel ridge regression Kernel Principle Component Analysis

3 Probit Regression Probit function:

4 Labeling Noise Model Robust to outliers and labeling errors

5 Generalized Linear Models Generalized linear model: Activation function: Link function:

6 Canonical Link Function If we choose the canonical link function: Gradient of the error function:

7 Examples

8 Laplace Approximation for Posterior Gaussian approximation around mode:

9 Evidence Approximation

10 Bayesian Information Criterion Approximation of Laplace approximation: More accurate evidence approximation needed

11 Bayesian Logistic Regression

12 Kernel Methods Predictions are linear combinations of a kernel function evaluated at training data points. Kernel function feature space mapping Linear kernel: Stationary kernels:

13 Fast Evaluation of Inner Product of Feature Mappings by Kernel Functions Inner product needs computing six feature values and 3 x 3 = 9 multiplications Kernel function has 2 multiplications and a squaring

14 Kernel Trick 1. Reformulate an algorithm such that input vector enters only in the form of inner product. 2. Replace input x by its feature mapping: 3. Replace the inner product by a Kernel function: Examples: Kernel PCA, Kernel Fisher discriminant, Support Vector Machines

15 Dual variables: Dual Representation for Ridge Regression

16 Kernel Ridge Regression Using kernel trick: Minimize over dual variables:

17 Generate Kernel Matrix Positive semidefinite Consider Gaussian kernel:

18 Combining Generative & Discriminative Models by Kernels Since each modeling approach has distinct advantages, how to combine them? Use generative models to construct kernels, Use these kernels in discriminative approaches

19 Measure Probability Similarity by Kernels Simple inner product: For mixture distribution: For infinite mixture models: For models with latent variables (e.g,. Hidden Markov Models:)

20 Fisher Kernels Fisher Score: Fisher Information Matrix: Fisher Kernel: Sample Average :

21 Principle Component Analysis (PCA) Assume We have is a normalized eigenvector:

22 Feature Mapping Eigen-problem in feature space

23 Dual Variables Suppose, we have

24 Eigen-problem in Feature Space (1)

25 Eigen-problem in Feature Space (2) Normalization condition: Projection coefficient:

26 General Case for Non-zero Mean Case Kernel Matrix:

27 Kernel PCA on Synthetic Data Contour plots of projection coefficients in feature space

28 Limitations of Kernel PCA Discussion…


Download ppt "CS 59000 Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct. 7 2008."

Similar presentations


Ads by Google