ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for.

Slides:



Advertisements
Similar presentations
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning 3rd Edition
INTRODUCTION TO Machine Learning 3rd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Classification and risk prediction
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
INTRODUCTION TO Machine Learning 3rd Edition
Linear Models for Classification
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin
INTRODUCTION TO Machine Learning 3rd Edition
Ch8: Nonparametric Methods
CH 5: Multivariate Methods
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
INTRODUCTION TO Machine Learning
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Test #1 Thursday September 20th
Presentation transcript:

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

Multivariate Data Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples Data matrix: Typically these variables are correlated. 3 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multivariate Parameters 4 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Parameter Estimation Given a multivariate sample, estimates for these parameters can be calculated: 5 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those instances: not a good idea if the sample is small Use ‘missing’ as an attribute: may give information Imputation: Fill in the missing value Mean imputation: Use the most likely value (e.g., mean) Imputation by regression: Predict based on other attributes 6 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multivariate Normal Distribution 7 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Covariance Matrix

Multivariate Normal Distribution Mahalanobis distance: (x – μ) T ∑ –1 (x – μ) measures the distance from x to μ in terms of ∑ (x – μ) T ∑ –1 (x – μ)=C 2 the d-dimensional hyperellipsoid ( 超橢圓體 ) centered at μ, and its shape and orientation are defined by ∑. It is a useful way of determining similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant. 8 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multivariate Normal Distribution Mahalanobis distance: (x – μ) T ∑ –1 (x – μ) Example: Bivariate case (d = 2) 9 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Bivariate Normal 10 If X and Y are independent then Cov(X, Y)=0. However, if Cov(X, Y)=0 then X and Y may not be independent. Read more: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

11 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Independent Inputs: Naive Bayes If x i are independent, off-diagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/σ i ) Euclidean distance: If variances are also equal, reduces to Euclidean distance. 12 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Parametric Classification When, if the class-conditional densities are taken as normal density, p (x | C i ) ~ N ( μ i, ∑ i ), we have Discriminant functions are 13 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Estimation of Parameters Given a training sample for K ≥ 2 classes, X = { x t, r t }, where r t i = 1 if x t ∈ C i, and 0 otherwise. 14 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Compare with Eq (p. 71)

Different S i Quadratic discriminant (Fig. 5.3) This quadratic discriminant can also be written as where 15 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

likelihoods posterior for C 1 quadratic discriminant: P (C 1 |x ) = 0.5 C1C1 C2C2 16 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Common Covariance Matrix S Shared common sample covariance S Discriminant which is a linear discriminant Note: the term cancels since it is common in all discriminants. 17 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Common Covariance Matrix S C1C1 C2C2 18 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) linear discriminant: P (C 1 |x ) = 0.5

Diagonal S When x j, j = 1,…, d, are independent (Naive Bayes’ assumption), ∑ is diagonal all off-diagonals of the covariance matrix to be zero p (x j |C i ) are univariate Gaussians Classify based on weighted Euclidean distance (in s j units) to the nearest mean 19 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Diagonal S variances may be different C1C1 C2C2 When x j, j = 1,..d, are independent, ∑ is diagonal 20 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Diagonal S, equal variances Nearest mean classifier: Classify based on Euclidean distance to the nearest mean Each mean can be considered a prototype or template and this is template matching 21 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Diagonal S, equal variances C1C1 C2C2 22 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nearest mean classifier

Model Selection As we increase complexity (less restricted S), bias decreases and variance increases Assume simple models (allow some bias) to control variance (regularization) AssumptionCovariance matrixNo of parameters Shared, HypersphericSi=S=s2ISi=S=s2I1 Shared, Axis-alignedS i =S, with s ij =0, i != jd Shared, HyperellipsoidalSi=SSi=Sd (d+1)/2 Different, Hyperellipsoidal SiSi K (d (d+1)/2) 23 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

24 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) AssumptionCovariance matrix No of parameter s Shared, HypersphericSi=S=s2ISi=S=s2I1 Shared, Axis-alignedS i =S, with s ij =0, i != jd Shared, Hyperellipsoidal Si=SSi=Sd (d+1)/2 Different, Hyperellipsoidal SiSi K (d (d+1)/2)

Discrete Features Let x j (discrete attributes) be binary (Bernoulli): If x j are independent binary variables, the discriminant is linear Estimated parameters 25 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Discrete Features Multinomial (1-of-n j ) features: x j  {v 1, v 2,..., v n j } The probability that x j belonging to class C i takes value v k. If x j are independent The maximum likelihood estimator for p ijk is 26 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multivariate Regression Multivariate linear model Taking the derivative with respect to the parameters, w j, j = 0,…,d, we get the normal equations. (see eq. 5.37, P104) Solution: 27 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multivariate Regression Multivariate polynomial model: Define new higher-order variables z 1 =x 1, z 2 =x 2, z 3 =x 1 2, z 4 =x 2 2, z 5 =x 1 x 2 and use the linear model in this new z space (basis functions, kernel trick, SVM: Chapter 13) 28 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Exercise Let us say in two dimensions, we have two classes with exactly the same mean. What type of boundaries can be defined? 29 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)