Download presentation

Presentation is loading. Please wait.

Published byPerry Bosworth Modified about 1 year ago

1
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

2

3
Likelihood- vs. Discriminant-based Classification Likelihood-based: Assume a model for p(x|C i ), use Bayes’ rule to calculate P(C i |x) g i (x) = log P(C i |x) Discriminant-based: Assume a model for g i (x|Φ i ); no density estimation Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries 3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

4
Linear discriminant: Advantages: Simple: O(d) space/computation Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring) Optimal when p(x|C i ) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separable Linear Discriminant 4Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

5
Quadratic discriminant: Higher-order (product) terms: Map from x to z using nonlinear basis functions and use a linear discriminant in z-space Generalized Linear Model 5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

6
Two Classes 6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

7
Geometry 7Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

8
Multiple Classes 8 Classes are linearly separable Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

9
Pairwise Separation 9 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

10
When p (x | C i ) ~ N ( μ i, ∑ ) From Discriminants to Posteriors 10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

11
11 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

12
Sigmoid (Logistic) Function 12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

13
E(w|X) is error with parameters w on sample X w*=arg min w E(w | X) Gradient Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradient Gradient-Descent 13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

14
Gradient-Descent 14 wtwt w t+1 η E (w t ) E (w t+1 ) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

15
Logistic Discrimination 15 Two classes: Assume log likelihood ratio is linear Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

16
Training: Two Classes 16Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

17
Training: Gradient-Descent 17Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

18
18Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

19
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

20
K>2 Classes 20 softmax Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

21
21Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

22
Example 22Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

23
Generalizing the Linear Model 23 Quadratic: Sum of basis functions: where φ(x) are basis functions Hidden units in neural networks (Chapters 11 and 12) Kernels in SVM (Chapter 13) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

24
Discrimination by Regression 24 Classes are NOT mutually exclusive and exhaustive Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google