Presentation is loading. Please wait.

Presentation is loading. Please wait.

: INTRODUCTION TO Machine Learning Parametric Methods.

Similar presentations


Presentation on theme: ": INTRODUCTION TO Machine Learning Parametric Methods."— Presentation transcript:

1 : INTRODUCTION TO Machine Learning Parametric Methods

2 Parametric Estimation X = { x t } t where x t ~ p (x) Parametric estimation: Assume a form for p (x |q ) and estimate q, its sufficient statistics, using X N ( μ, σ 2 ) where q = { μ, σ 2 }

3 Maximum Likelihood Estimation Likelihood of q given the sample X l ( θ |X) = p (X | θ ) = t p (x t | θ ) Log likelihood L( θ |X) = log l ( θ |X) = t log p (x t | θ ) Maximum likelihood estimator θ * = argmax θ L( θ |X)

4 Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1} P (x) = p o x (1 – p o ) (1 – x) L (p o |X) = log t p o x t (1 – p o ) (1 – x t ) MLE: p o = t x t / N Multinomial: K>2 states, x i in {0,1} P (x 1,x 2,...,x K ) = i p i x i L(p 1,p 2,...,p K |X) = log t i p i x i t MLE: p i = t x i t / N

5 Gaussian (Normal) Distribution p(x) = N ( μ, σ 2 ) MLE for μ and σ 2 :

6 Bias and Variance Unknown parameter q Estimator d i = d (X i ) on sample X i Bias: b q (d) = E [d] – q Variance: E [(d–E [d]) 2 ] Mean square error: r (d,q) = E [(d–q) 2 ] = (E [d] – q) 2 + E [(d–E [d]) 2 ] = Bias 2 + Variance

7 Bayes Estimator Treat θ as a random var with prior p ( θ ) Bayes rule: p ( θ |X) = p(X| θ ) p( θ ) / p(X) Full: p(x|X) = p(x| θ ) p( θ |X) d θ Maximum a Posteriori (MAP): θ MAP = argmax θ p( θ |X) Maximum Likelihood (ML): θ ML = argmax θ p(X| θ ) Bayes: θ Bayes = E[ θ |X] = θ p( θ |X) d θ

8 Parametric Classification

9 Given the sample ML estimates are Discriminant becomes Parametric Classification

10 (a)and(b) for two classes when the input is one-dimensional. Variances are equal and the posteriors intersect at one point, which is the threshold if decision.

11 Parametric Classification (a)and(b) for two classes when the input is one-dimensional. Variances are unequal and the posteriors intersect at two points. In (c), the expected risks are shown for the two classes and for reject with

12 Regression

13 Regression: From LogL to Error

14 Linear Regression

15 Polynomial Regression

16 Square Error: Relative Square Error: Absolute Error: E ( θ |X) = t |r t – g(x t | θ )| ε -sensitive Error: E ( θ |X) = t 1(|r t – g(x t | θ )|>ε) (|r t – g(x t |θ)| – ε) Other Error Measures

17 Bias and Variance biasvariance noisesquared error

18 Estimating Bias and Variance M samples X i ={x t i, r t i }, i=1,...,M are used to fit g i (x), i =1,...,M

19 Bias/Variance Dilemma Example: g i (x)=2 has no variance and high bias g i (x)= t r t i /N has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data)

20 Bias/Variance Dilemma (a) Function, f(x) = 2sin(1.5x), and one noisy (N(0,1)) dataset sampled from the function. Five samples are taken, each containing twenty in-stances. (b), (c), (d) are five polynomial fits, namely, gi(.), of order 1, 3 and 5. for each case, dotted line is the average of the five fits namely,.

21 Polynomial Regression Best fit min error In the same setting as that of previous, using one hundred models instead of five, bias, variance, and error for polynomials of order 1 to 5.

22 Model Selection Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex models E=error on data + λ model complexity Akaikes information criterion (AIC), Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM)

23 Best fit, elbow Model Selection

24 Bayesian Model Selection Prior on models, p(model) Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior

25 Regression example Coefficients increase in magnitude as order increases: 1: [-0.0769, 0.0016] 2: [0.1682, -0.6657, 0.0080] 3: [0.4238, -2.5778, 3.4675, -0.0002 4: [-0.1093, 1.4356, -5.5007, 6.0454, -0.0019]


Download ppt ": INTRODUCTION TO Machine Learning Parametric Methods."

Similar presentations


Ads by Google