INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 Lecture.

Slides:



Advertisements
Similar presentations
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Advertisements

INTRODUCTION TO Machine Learning 2nd Edition
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CHAPTER 10: Linear Discrimination
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning 3rd Edition
INTRODUCTION TO Machine Learning 3rd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Classification and risk prediction
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
INTRODUCTION TO Machine Learning 3rd Edition
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Probability Theory and Parameter Estimation I
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin
INTRODUCTION TO Machine Learning 3rd Edition
Ch8: Nonparametric Methods
CH 5: Multivariate Methods
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
INTRODUCTION TO Machine Learning
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Test #1 Thursday September 20th
Presentation transcript:

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

CHAPTER 5: MULTIVARIATE METHODS

Multivariate Data 3  Multiple measurements (sensors)  d inputs/features/attributes: d-variate  N instances/observations/examples

Multivariate Parameters 4

Parameter Estimation 5

Estimation of Missing Values 6  What to do if certain instances have missing attributes?  Ignore those instances: not a good idea if the sample is small  Use ‘missing’ as an attribute: may give information  Imputation: Fill in the missing value  Mean imputation: Use the most likely value (e.g., mean)  Imputation by regression: Predict based on other attributes

Multivariate Normal Distribution 7

8  Mahalanobis distance: ( x – μ ) T ∑ –1 ( x – μ ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)  Bivariate: d = 2

Bivariate Normal 9

10

Independent Inputs: Naive Bayes 11  If x i are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/ σ i ) Euclidean distance:  If variances are also equal, reduces to Euclidean distance

Parametric Classification  If p (x | C i ) ~ N ( μ i, ∑ i )  Discriminant functions 12

Estimation of Parameters 13

Different S i  Quadratic discriminant 14

15 likelihoods posterior for C 1 discriminant: P (C 1 |x ) = 0.5

Common Covariance Matrix S 16  Shared common sample covariance S  Discriminant reduces to which is a linear discriminant

Common Covariance Matrix S 17

Diagonal S 18  When x j j = 1,..d, are independent, ∑ is diagonal p (x|C i ) = ∏ j p (x j |C i )(Naive Bayes’ assumption) Classify based on weighted Euclidean distance (in s j units) to the nearest mean

Diagonal S 19 variances may be different

Diagonal S, equal variances 20  Nearest mean classifier: Classify based on Euclidean distance to the nearest mean  Each mean can be considered a prototype or template and this is template matching

Diagonal S, equal variances 21 * ?

Model Selection 22 AssumptionCovariance matrixNo of parameters Shared, HypersphericSi=S=s2ISi=S=s2I1 Shared, Axis-alignedS i =S, with s ij =0d Shared, HyperellipsoidalSi=SSi=Sd(d+1)/2 Different, HyperellipsoidalSiSi K d(d+1)/2  As we increase complexity (less restricted S), bias decreases and variance increases  Assume simple models (allow some bias) to control variance (regularization)

23

Discrete Features 24  Binary features: if x j are independent (Naive Bayes’) the discriminant is linear Estimated parameters

Discrete Features 25  Multinomial (1-of-n j ) features: x j Î {v 1, v 2,..., v n j } if x j are independent

Multivariate Regression 26 Multivariate linear model Multivariate polynomial model: Define new higher-order variables z 1 =x 1, z 2 =x 2, z 3 =x 1 2, z 4 =x 2 2, z 5 =x 1 x 2 and use the linear model in this new z space (basis functions, kernel trick: Chapter 13)