Presentation is loading. Please wait.

Presentation is loading. Please wait.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 Lecture.

Similar presentations


Presentation on theme: "INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 Lecture."— Presentation transcript:

1 INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml3e Lecture Slides for

2 CHAPTER 5: MULTIVARIATE METHODS

3 Multivariate Data 3  Multiple measurements (sensors)  d inputs/features/attributes: d-variate  N instances/observations/examples

4 Multivariate Parameters 4

5 Parameter Estimation 5

6 Estimation of Missing Values 6  What to do if certain instances have missing attributes?  Ignore those instances: not a good idea if the sample is small  Use ‘missing’ as an attribute: may give information  Imputation: Fill in the missing value  Mean imputation: Use the most likely value (e.g., mean)  Imputation by regression: Predict based on other attributes

7 Multivariate Normal Distribution 7

8 8  Mahalanobis distance: ( x – μ ) T ∑ –1 ( x – μ ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)  Bivariate: d = 2

9 Bivariate Normal 9

10 10

11 Independent Inputs: Naive Bayes 11  If x i are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/ σ i ) Euclidean distance:  If variances are also equal, reduces to Euclidean distance

12 Parametric Classification  If p (x | C i ) ~ N ( μ i, ∑ i )  Discriminant functions 12

13 Estimation of Parameters 13

14 Different S i  Quadratic discriminant 14

15 15 likelihoods posterior for C 1 discriminant: P (C 1 |x ) = 0.5

16 Common Covariance Matrix S 16  Shared common sample covariance S  Discriminant reduces to which is a linear discriminant

17 Common Covariance Matrix S 17

18 Diagonal S 18  When x j j = 1,..d, are independent, ∑ is diagonal p (x|C i ) = ∏ j p (x j |C i )(Naive Bayes’ assumption) Classify based on weighted Euclidean distance (in s j units) to the nearest mean

19 Diagonal S 19 variances may be different

20 Diagonal S, equal variances 20  Nearest mean classifier: Classify based on Euclidean distance to the nearest mean  Each mean can be considered a prototype or template and this is template matching

21 Diagonal S, equal variances 21 * ?

22 Model Selection 22 AssumptionCovariance matrixNo of parameters Shared, HypersphericSi=S=s2ISi=S=s2I1 Shared, Axis-alignedS i =S, with s ij =0d Shared, HyperellipsoidalSi=SSi=Sd(d+1)/2 Different, HyperellipsoidalSiSi K d(d+1)/2  As we increase complexity (less restricted S), bias decreases and variance increases  Assume simple models (allow some bias) to control variance (regularization)

23 23

24 Discrete Features 24  Binary features: if x j are independent (Naive Bayes’) the discriminant is linear Estimated parameters

25 Discrete Features 25  Multinomial (1-of-n j ) features: x j Î {v 1, v 2,..., v n j } if x j are independent

26 Multivariate Regression 26 Multivariate linear model Multivariate polynomial model: Define new higher-order variables z 1 =x 1, z 2 =x 2, z 3 =x 1 2, z 4 =x 2 2, z 5 =x 1 x 2 and use the linear model in this new z space (basis functions, kernel trick: Chapter 13)


Download ppt "INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 Lecture."

Similar presentations


Ads by Google