CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Outline Review of Probit regression, Laplace approximation, BIC, Bayesian logistic regression Kernel methods Kernel ridge regression Kernel Principle Component Analysis
Probit Regression Probit function:
Labeling Noise Model Robust to outliers and labeling errors
Generalized Linear Models Generalized linear model: Activation function: Link function:
Canonical Link Function If we choose the canonical link function: Gradient of the error function:
Examples
Laplace Approximation for Posterior Gaussian approximation around mode:
Evidence Approximation
Bayesian Information Criterion Approximation of Laplace approximation: More accurate evidence approximation needed
Bayesian Logistic Regression
Kernel Methods Predictions are linear combinations of a kernel function evaluated at training data points. Kernel function feature space mapping Linear kernel: Stationary kernels:
Fast Evaluation of Inner Product of Feature Mappings by Kernel Functions Inner product needs computing six feature values and 3 x 3 = 9 multiplications Kernel function has 2 multiplications and a squaring
Kernel Trick 1. Reformulate an algorithm such that input vector enters only in the form of inner product. 2. Replace input x by its feature mapping: 3. Replace the inner product by a Kernel function: Examples: Kernel PCA, Kernel Fisher discriminant, Support Vector Machines
Dual variables: Dual Representation for Ridge Regression
Kernel Ridge Regression Using kernel trick: Minimize over dual variables:
Generate Kernel Matrix Positive semidefinite Consider Gaussian kernel:
Combining Generative & Discriminative Models by Kernels Since each modeling approach has distinct advantages, how to combine them? Use generative models to construct kernels, Use these kernels in discriminative approaches
Measure Probability Similarity by Kernels Simple inner product: For mixture distribution: For infinite mixture models: For models with latent variables (e.g,. Hidden Markov Models:)
Fisher Kernels Fisher Score: Fisher Information Matrix: Fisher Kernel: Sample Average :
Principle Component Analysis (PCA) Assume We have is a normalized eigenvector:
Feature Mapping Eigen-problem in feature space
Dual Variables Suppose, we have
Eigen-problem in Feature Space (1)
Eigen-problem in Feature Space (2) Normalization condition: Projection coefficient:
General Case for Non-zero Mean Case Kernel Matrix:
Kernel PCA on Synthetic Data Contour plots of projection coefficients in feature space
Limitations of Kernel PCA Discussion…