MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Slides:



Advertisements
Similar presentations
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
INTRODUCTION TO Machine Learning 3rd Edition
Indian Statistical Institute Kolkata
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Classification and risk prediction
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
by B. Zadrozny and C. Elkan
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
1 E. Fatemizadeh Statistical Pattern Recognition.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Machine Learning 5. Parametric Methods.
Review of statistical modeling and probability theory Alan Moses ML4bio.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
Applied statistics Usman Roshan.
Ch8: Nonparametric Methods
CH 5: Multivariate Methods
Dimensionality Reduction
INTRODUCTION TO Machine Learning
Generally Discriminant Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Radial Basis Functions: Alternative to Back Propagation
Linear Discrimination
Test #1 Thursday September 20th
Presentation transcript:

MACHINE LEARNING 6. Multivariate Methods 1

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan Application  Observation Vector: Information About Customer  Age  Marital Status  Yearly Income  Savings  Inputs/Attribute/Features associated with a customer  The variables are correlated (savings vs. age)

Correlation Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3  Suppose we have two random variables X and Y.  We want to estimate the degree of “correlation” among them  Positive Correlation: If one happens to be large so the probability that another one will be large is significant  Negative Correlation: If one happens to be large so the probability that another one will be small is significant  Zero correlation: Value of one tells nothing about the value of other

Correlation Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4  Some reasonable assumptions  The “correlation” between X and Y is the same as between X+a and X+b where a,b constant  The “correlation” between X and Y is the same as between aX and bY  a,b are constant  Example  If there is a connection between temperature inside the building and outside the building, it’s does not mater what scale is used

Correlation Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5  Let’s do a “normalization”  Both these variables have zero mean and unit variance  Filtered out the individual differences  Let’s check mean (expected) square differences between them

Correlation Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6  The result should be  Small when positively “correlated”  Large when negatively correlated  Medium when “uncorrelated”

Correlation Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7  Larger covariance means larger correlation coefficient means smaller average square differences

Correlation vs. Dependance Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8  Not the same thing  Independent=>Have zero correlation  Have zero correlation=> May not be independent  We look at square differences between two variables  Two variables might have “unpredictable” square differences but still be dependant

Correlation vs. Independence Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9  Random variable X from {-1,0,1} with p=1/3  Random variable Y=X^2  Clearly dependant but  COV(X,Y)=E((X-0)(Y-EY))=EXY-EY*EX=EXY=EX^3=0  Correlation only measures “linear” independence

Multivariate Distribution  Assume all members of class came from join distribution  Can learn distributions from data P(x|C)  Assign new instance for most probable class P(C|x) using Bayes rule  An instance described by a vector of correlated parameters  Realm of multivariate distributions  Multivariate normal Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10

Multivariate Parameters 11 Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Multivariate Data Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12  Multiple measurements (sensors)  d inputs/features/attributes: d-variate  N instances/observations/examples

Parameter Estimation Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13

Estimation of Missing Values Based on for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14  What to do if certain instances have missing attributes?  Ignore those instances: not a good idea if the sample is small  Use ‘missing’ as an attribute: may give information  Imputation: Fill in the missing value  Mean imputation: Use the most likely value (e.g., mean)  Imputation by regression: Predict based on other attributes

Multivariate Normal Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15  Have d-attributes  Often can assume each one distributed normally  Attributes might be dependant/correlated  Joint distribution of correlated several variables  P(X 1 =x 1, X 2 =x 2, … X d =x d )=?  X 1 is normally distributed with mean µ i and variance i

Multivariate Normal Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16  Mahalanobis distance: ( x – μ ) T ∑ –1 ( x – μ )  2 variables are correlated  Divided by inverse of covariance (large)  Contribute less to Mahalanobis distance  Contribute more to the probability

Bivariate Normal Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17

Multivariate Normal Distribution Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18  Mahalanobis distance: ( x – μ ) T ∑ –1 ( x – μ ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)  Bivariate: d = 2

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 Bivariate Normal

Independent Inputs: Naive Bayes Based on Introduction to Machine Learning © The MIT Press (V1.1) 20  If x i are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/ σ i ) Euclidean distance:  If variances are also equal, reduces to Euclidean distance

Projection Distribution  Example: vector of 3 features  Multivariate normal distribution  Projection to 2 dimensional space (e.g. XY plane) Vectors of 2 features  Projection are also multivariate normal distribution  Projection of d-dimensional normal to k-dimensional space is k-dimensional normal Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21

1D projection Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22

Multivariate Classification  Assume members of class from a single multivariate distribution  Multivariate normal is a good choice  Easy to analyze  Model many natural phenomena  Model a class as having single prototype source (mean) slightly randomly changed Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23

Example  Matching cars to customers  Each cat defines a class of matching customers  Customers described by (age, income)  There is a correlation between age and income  Assume each class is multivariate normal  Need to learn P(x|C) from data  Use Bayes to compute P(C|x) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24

Parametric Classification  If p (x | C i ) ~ N ( μ i, ∑ i )  Discriminant functions are  Need to know Covariance Matrix and mean to compute discriminant functions.  Can ignore P(x) as the same for all classes Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25

Estimation of Parameters 26 Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Covariance Matrix per Class  Quadratic discriminant  Requires estimation of K*d*(d+1)/2 parameters for covariance matrix Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 28 likelihoods posterior for C 1 discriminant: P (C 1 |x ) = 0.5

Common Covariance Matrix S Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 29  If not enough data can assume all classes have same common sample covariance matrix S  Discriminant reduces to a linear discriminant (x T S -1 x is common to all discriminant and can be removed)

Common Covariance Matrix S Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 30

Diagonal S Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 31  When x j j = 1,..d, are independent, ∑ is diagonal p (x|C i ) = ∏ j p (x j |C i )(Naive Bayes’ assumption) Classify based on weighted Euclidean distance (in s j units) to the nearest mean

Diagonal S Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 32 variances may be different

Diagonal S, equal variances Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 33  Nearest mean classifier: Classify based on Euclidean distance to the nearest mean  Each mean can be considered a prototype or template and this is template matching

Diagonal S, equal variances Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 34 * ?

Model Selection  Different covariance matrix for each class  Have to estimate many parameters  Small bias, large variance  Common covariance matrices, diagonal covariance etc. reduce number of parameters  Increase bias but control variance  In-between states? Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 35

Regularized Discriminant Analysis(RDA)  a=b=0: Quadratic classifier  a=0, b=1:Shared Covariance, linear classifier  a=1,b=0: Diagonal Covariance  Choose best a,b by cross validation Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 36

Model Selection: Example Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 37

Model Selection Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 38