CS 59000 Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct. 7 2008.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

ECG Signal processing (2)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Biointelligence Laboratory, Seoul National University

1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.

Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.

An Introduction of Support Vector Machine

Pattern Recognition and Machine Learning: Kernel Methods.

Support vector machine

Computer vision: models, learning and inference Chapter 8 Regression.

CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct

Supervised Learning Recap

Linear Models for Classification: Probabilistic Methods

Chapter 4: Linear Models for Classification

Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.

Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Visual Recognition Tutorial

A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Pattern Recognition and Machine Learning

1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)

Principle of Locality for Statistical Shape Analysis Paul Yushkevich.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.

Summarized by Soo-Jin Kim

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Ch 6. Kernel Methods Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J. S. Kim Biointelligence Laboratory, Seoul National University.

EM and expected complete log-likelihood Mixture of Experts

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct

CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.

CS Statistical Machine learning Lecture 24

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

Linear Models for Classification

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Lecture 2: Statistical learning primer for biologists

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.

1 Kernel-class Jan Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.

A Brief Introduction to Bayesian networks

Ch 12. Continuous Latent Variables ~ 12

Computer vision: models, learning and inference

Lecture 09: Gaussian Processes

Machine Learning Basics

Probabilistic Models for Linear Regression

Classification Discriminant Analysis

Welcome to the Kernel-Club

Generally Discriminant Analysis

Lecture 10: Gaussian Processes

Parametric Methods Berlin Chen, 2005 References:

Linear Discrimination

Presentation transcript:

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

Outline Review of Probit regression, Laplace approximation, BIC, Bayesian logistic regression Kernel methods Kernel ridge regression Kernel Principle Component Analysis

Probit Regression Probit function:

Labeling Noise Model Robust to outliers and labeling errors

Generalized Linear Models Generalized linear model: Activation function: Link function:

Canonical Link Function If we choose the canonical link function: Gradient of the error function:

Examples

Laplace Approximation for Posterior Gaussian approximation around mode:

Evidence Approximation

Bayesian Information Criterion Approximation of Laplace approximation: More accurate evidence approximation needed

Bayesian Logistic Regression

Kernel Methods Predictions are linear combinations of a kernel function evaluated at training data points. Kernel function feature space mapping Linear kernel: Stationary kernels:

Fast Evaluation of Inner Product of Feature Mappings by Kernel Functions Inner product needs computing six feature values and 3 x 3 = 9 multiplications Kernel function has 2 multiplications and a squaring

Kernel Trick 1. Reformulate an algorithm such that input vector enters only in the form of inner product. 2. Replace input x by its feature mapping: 3. Replace the inner product by a Kernel function: Examples: Kernel PCA, Kernel Fisher discriminant, Support Vector Machines

Dual variables: Dual Representation for Ridge Regression

Kernel Ridge Regression Using kernel trick: Minimize over dual variables:

Generate Kernel Matrix Positive semidefinite Consider Gaussian kernel:

Combining Generative & Discriminative Models by Kernels Since each modeling approach has distinct advantages, how to combine them? Use generative models to construct kernels, Use these kernels in discriminative approaches

Measure Probability Similarity by Kernels Simple inner product: For mixture distribution: For infinite mixture models: For models with latent variables (e.g,. Hidden Markov Models:)

Fisher Kernels Fisher Score: Fisher Information Matrix: Fisher Kernel: Sample Average :

Principle Component Analysis (PCA) Assume We have is a normalized eigenvector:

Feature Mapping Eigen-problem in feature space

Dual Variables Suppose, we have

Eigen-problem in Feature Space (1)

Eigen-problem in Feature Space (2) Normalization condition: Projection coefficient:

General Case for Non-zero Mean Case Kernel Matrix:

Kernel PCA on Synthetic Data Contour plots of projection coefficients in feature space

Limitations of Kernel PCA Discussion…