Download presentation

Presentation is loading. Please wait.

Published byTomas Rawlings Modified over 2 years ago

1
: INTRODUCTION TO Machine Learning Parametric Methods

2
Parametric Estimation X = { x t } t where x t ~ p (x) Parametric estimation: Assume a form for p (x |q ) and estimate q, its sufficient statistics, using X N ( μ, σ 2 ) where q = { μ, σ 2 }

3
Maximum Likelihood Estimation Likelihood of q given the sample X l ( θ |X) = p (X | θ ) = t p (x t | θ ) Log likelihood L( θ |X) = log l ( θ |X) = t log p (x t | θ ) Maximum likelihood estimator θ * = argmax θ L( θ |X)

4
Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1} P (x) = p o x (1 – p o ) (1 – x) L (p o |X) = log t p o x t (1 – p o ) (1 – x t ) MLE: p o = t x t / N Multinomial: K>2 states, x i in {0,1} P (x 1,x 2,...,x K ) = i p i x i L(p 1,p 2,...,p K |X) = log t i p i x i t MLE: p i = t x i t / N

5
Gaussian (Normal) Distribution p(x) = N ( μ, σ 2 ) MLE for μ and σ 2 :

6
Bias and Variance Unknown parameter q Estimator d i = d (X i ) on sample X i Bias: b q (d) = E [d] – q Variance: E [(d–E [d]) 2 ] Mean square error: r (d,q) = E [(d–q) 2 ] = (E [d] – q) 2 + E [(d–E [d]) 2 ] = Bias 2 + Variance

7
Bayes Estimator Treat θ as a random var with prior p ( θ ) Bayes rule: p ( θ |X) = p(X| θ ) p( θ ) / p(X) Full: p(x|X) = p(x| θ ) p( θ |X) d θ Maximum a Posteriori (MAP): θ MAP = argmax θ p( θ |X) Maximum Likelihood (ML): θ ML = argmax θ p(X| θ ) Bayes: θ Bayes = E[ θ |X] = θ p( θ |X) d θ

8
Parametric Classification

9
Given the sample ML estimates are Discriminant becomes Parametric Classification

10
(a)and(b) for two classes when the input is one-dimensional. Variances are equal and the posteriors intersect at one point, which is the threshold if decision.

11
Parametric Classification (a)and(b) for two classes when the input is one-dimensional. Variances are unequal and the posteriors intersect at two points. In (c), the expected risks are shown for the two classes and for reject with

12
Regression

13
Regression: From LogL to Error

14
Linear Regression

15
Polynomial Regression

16
Square Error: Relative Square Error: Absolute Error: E ( θ |X) = t |r t – g(x t | θ )| ε -sensitive Error: E ( θ |X) = t 1(|r t – g(x t | θ )|>ε) (|r t – g(x t |θ)| – ε) Other Error Measures

17
Bias and Variance biasvariance noisesquared error

18
Estimating Bias and Variance M samples X i ={x t i, r t i }, i=1,...,M are used to fit g i (x), i =1,...,M

19
Bias/Variance Dilemma Example: g i (x)=2 has no variance and high bias g i (x)= t r t i /N has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data)

20
Bias/Variance Dilemma (a) Function, f(x) = 2sin(1.5x), and one noisy (N(0,1)) dataset sampled from the function. Five samples are taken, each containing twenty in-stances. (b), (c), (d) are five polynomial fits, namely, gi(.), of order 1, 3 and 5. for each case, dotted line is the average of the five fits namely,.

21
Polynomial Regression Best fit min error In the same setting as that of previous, using one hundred models instead of five, bias, variance, and error for polynomials of order 1 to 5.

22
Model Selection Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex models E=error on data + λ model complexity Akaikes information criterion (AIC), Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM)

23
Best fit, elbow Model Selection

24
Bayesian Model Selection Prior on models, p(model) Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior

25
Regression example Coefficients increase in magnitude as order increases: 1: [ , ] 2: [0.1682, , ] 3: [0.4238, , , : [ , , , , ]

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google