INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI

Slides:



Advertisements
Similar presentations
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
CHAPTER 10: Linear Discrimination
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning 3rd Edition
INTRODUCTION TO Machine Learning 3rd Edition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Classification and risk prediction
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Principles of Pattern Recognition
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
INTRODUCTION TO Machine Learning 3rd Edition
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
Background for Machine Learning (I) Usman Roshan.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Probability Theory and Parameter Estimation I
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin
INTRODUCTION TO Machine Learning 3rd Edition
Ch8: Nonparametric Methods
CH 5: Multivariate Methods
INTRODUCTION TO Machine Learning 3rd Edition
INTRODUCTION TO Machine Learning
Generally Discriminant Analysis
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Radial Basis Functions: Alternative to Back Propagation
Linear Discrimination
Test #1 Thursday September 20th
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI Lecture Slides for

CHAPTER 5: MULTIVARIATE METHODS

Multivariate Data 3  Multiple measurements (sensors)  d inputs/features/attributes: d-variate  N instances/observations/examples

Multivariate Parameters 4 Covariance Matrix:

Parameter Estimation from data sample X 5

Estimation of Missing Values 6  What to do if certain instances have missing attribute values?  Ignore those instances. This is not a good idea if the sample is small  Use ‘missing’ as an attribute: may give information  Imputation: Fill in the missing value  Mean imputation: Use the most likely value (e.g., mean)  Imputation by regression: Predict based on other attributes

Multivariate Normal Distribution 7

8  Mahalanobis distance: ( x – μ ) T ∑ –1 ( x – μ ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)  Bivariate: d = 2 z-normalization

Bivariate Normal Isoprobability [i.e., (x – μ ) T ∑ –1 (x – μ ) = c 2 ] contour plots 9 when covariance is 0, ellipsoid axes are parallel to coordinate axes

10

 If x i are independent, offdiagonal values of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/ σ i ) Euclidean distance:  If variances are also equal, reduces to Euclidean distance Independent Inputs: Naive Bayes 11 The use of the term “Naïve Bayes” in this chapter is somewhat wrong Naïve Bayes assumes independence in the probability sense, not in the linear algebra sense

Parametric Classification  If p (x | C i ) ~ N ( μ i, ∑ i )  Discriminant functions 12

Estimation of Parameters from data sample X 13

Assuming a different S i for each C i  Quadratic discriminant. Expanding the formula on previous slide: has the form of a quadratic formula See figure on next slide 14

15 likelihoods posterior for C 1 discriminant: P (C 1 |x ) = 0.5

Assuming Common Covariance Matrix S 16  Shared common sample covariance S  Discriminant reduces to which is a linear discriminant

Common Covariance Matrix S 17 Arbitrary covariances but shared by classes

Assuming Common Covariance Matrix S is Diagonal 18  When x j j = 1,..d, are independent, ∑ is diagonal p (x|C i ) = ∏ j p (x j |C i )(Naive Bayes’ assumption) Classify based on weighted Euclidean distance (in s j units) to the nearest mean

Assuming Common Covariance Matrix S is Diagonal 19 variances may be different Covariances are 0, so ellipsoid axes are parallel to coordinate axes

Assuming Common Covariance Matrix S is Diagonal and variances are equal 20  Nearest mean classifier: Classify based on Euclidean distance to the nearest mean  Each mean can be considered a prototype or template and this is template matching

Assuming Common Covariance Matrix S is Diagonal and variances are equal 21 * ? Covariances are 0, so ellipsoid axes are parallel to coordinate axes. Variances are the same, so ellipsoids become circles. Classifier looks for nearest mean

Model Selection 22 AssumptionCovariance matrixNo of parameters Shared, HypersphericSi=S=s2ISi=S=s2I1 Shared, Axis-alignedS i =S, with s ij =0d Shared, HyperellipsoidalSi=SSi=Sd(d+1)/2 Different, HyperellipsoidalSiSi K d(d+1)/2  As we increase complexity (less restricted S), bias decreases and variance increases  Assume simple models (allow some bias) to control variance (regularization)

23 Different cases of covariance matrices fitted to the same data lead to different decision boundaries

Discrete Features 24  Binary features: if x j are independent (Naive Bayes’) the discriminant is linear Estimated parameters

Discrete Features 25  Multinomial (1-of-n j ) features: x j in {v 1, v 2,..., v n j } where z jk = 1 if x j = v k ; or 0 otherwise if x j are independent

Multivariate Regression 26 Multivariate linear model Multivariate polynomial model: Define new higher-order variables z 1 =x 1, z 2 =x 2, z 3 =x 1 2, z 4 =x 2 2, z 5 =x 1 x 2 and use the linear model in this new z space (basis functions, kernel trick: Chapter 13)