Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning 5. Parametric Methods.

Similar presentations


Presentation on theme: "Machine Learning 5. Parametric Methods."— Presentation transcript:

1 Machine Learning 5. Parametric Methods

2 Parametric Methods Need a probabilities to make decisions (prior, evidence, likelihood) Probability is a function of input (observables) Represent function by Selecting its general form (model) with several unknown parameters Find(estimate) parameters from data that optimize certain criteria (e.g. minimize generalization error) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

3 Parametric Estimation
Assume sample comes from a distribution known up to its parameters Sufficient statistics : parameters that completely define distribution (e.g. mean and variance) X = { xt }t where xt ~ p (x) Parametric estimation: Assume a form for p (x | θ) and estimate θ, its sufficient statistics, using X e.g., N ( μ, σ2) where θ = { μ, σ2} Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

4 Estimation Assume form of distribution
Estimate its sufficient parameters from a sample Use distribution in classification or regressions Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

5 Maximum Likelihood Estimation
X consists of independent and identically distributed (iid) samples Likelihood of θ given the sample X l (θ|X) = p (X |θ) = ∏t p (xt|θ) Log likelihood L(θ|X) = log l (θ|X) = ∑t log p (xt|θ) Maximum likelihood estimator (MLE) θ* = argmaxθ L(θ|X) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

6 Why to use log likelihood
Log is increasing function Increased input->increased output Maximizing log of a function is equivalent to maximizing function itself Log convert products to sum log (abc)=log(a)+log(b)+logc Makes analysis/computation simpler Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

7 Examples: Bernoulli/Multinomial
Bernoulli: Two states, failure/success, x in {0,1} P (x) = pox (1 – po ) (1 – x) Solving L(p|X)/dp=0 => Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

8 Examples: Multinomial
Generalization of Bernoulli : multinomial Outcome is mutually exclusive K states Probability of occurring pi P (x1,x2,...,xK) = ∏i pixi L(p1,p2,...,pK|X) = log ∏t ∏i pixit MLE: pi = ∑t xit / N Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

9 Gaussian (Normal) Distribution
p(x) = N ( μ, σ2) Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

10 Bias and Variance of an estimator
Actual value of en estimator depends on data Get different N samples from a true distribution , results in different value of an estimator Value of an estimator is a random variables Can ask about its mean and variance Difference between the true value of a parameter and mean of estimator is bias of the estimator Usually looking for formula resulting in unbiased estimator with small variance Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

11 Bias and Variance Unknown parameter θ
Estimator di = d (Xi) on sample Xi Bias: bθ(d) = E [d] – θ Variance: E [(d–E [d])2] Mean square error: r (d,θ) = E [(d–θ)2] = (E [d] – θ)2 + E [(d–E [d])2] = Bias2 + Variance Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

12 Mean Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

13 Variance Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

14 Optimal Estimators Estimators are random variables as depend on random sample from the distribution We look for estimators (functions of samples) that have minimum expected square error Expected square error can be decomposed into bias (drift) and variance Sometimes accept larger errors but require unbiased estimators. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

15 Bayes’ Estimator Treat θ as a random var with prior p (θ)
Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X) Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X) Maximum Likelihood (ML): θML = argmaxθ p(X|θ) Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

16 Bayes estimator Sometimes have some prior information on the possible value range that parameter may take Can’t have exact value but can provide a probability for each value (density function) Probability of a parameter before looking at data Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

17 Bayes Estimator Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

18 Bayes Estimator Assume this function have a peak around true value of the parameter Maximum a posteriori estimate will minimize error Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

19 Bayes’ Estimator: Example
xt ~ N (θ, σo2) and θ ~ N ( μ, σ2) θML = m θMAP = Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

20 Parametric Classification
Bayes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

21 Assumptions Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

22 Example Input: customer income
Output: which car out of K models he will prefer Assume that for each model there is a group of customers with certain income Income in a single class distributed normal Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

23 Example: True distributions
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

24 Example: Estimate parameters
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

25 Example: Discriminant
Evaluate on a new input Select class with largest discriminant Assume: priors and variances are equal: Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

26 Boundary between classes
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

27 Equal variances Single boundary at halfway between means
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

28 Variances are different
Two boundaries Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

29 Two approaches Likelihood-based approach (till now)
Calculate probabilities using Bayes rule Compute discriminant function Discriminant function approach(later) Directly estimate discriminant Bypass probabilities estimation Boundary between classes? Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

30 Regression x is independent variable, r is dependant variable
Unknown f, want to approximate to predict future values Parametric approach: assume model with small number of parameters y= Find best parameters from data Also have to make assumption on noise Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

31 Regressions Have a training data (x,r)
Find parameters to maximize likelihood In other words, what parameters makes data most probable Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

32 Regressions Ignore the last term,(does not depend on parameters
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

33 Regression Minimize last term
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

34 Least Square Estimate Minimize this
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

35 Linear Regression Assume linear model Need to minimize
Set derivatives to zero 2 linear equations in 2 unknowns Can solve easily Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

36 Polynomial Regression
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

37 Bias and Variance noise squared error bias variance
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

38 Bias and Variance Given single sample (x,r), what is the expected error Variations are due to noise and training sample First term is due to noise Does not depend on the estimate Can’t be removed Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

39 Variance Second term Deviation of estimator from regression function
Depends on estimator and training set Average over all possible training samples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

40 Example: polynomial regression
As we increase degree of the polynomial Bias decreases as allow better fit to points Variance increases as small deviation in training sample might result in large deviation in model parameters Bias/variance dilemma true for any machine learning systems Need a way to find optimal model complexity to balance between bias and variance Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

41 Bias/Variance Dilemma
Example: gi(x)=2 has no variance and high bias gi(x)= ∑t rti/N has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data) Bias/Variance dilemma: (Geman et al., 1992) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

42 Bias/Variance Dilemma
gi Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

43 Polynomial Regression
Best fit “min error” Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

44 Model Selection How to select right model complexity?
Different from estimating model parameters There are several procedures Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

45 Cross-Validation Can’t calculate bias and variance as don’t know true model But can estimate total generalization error Set aside portion of data (validation set) Increase model complexity, find parameters Calculate error on validation set Stop when error cease to decrease or even start increasing Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

46 Cross-Validation Best fit, “elbow”
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

47 Regularization Introduce penalty for model complexity into an error function Find optimal model complexity (e.g. degree of polynomial) and optimal parameters (coefficients) which minimize this function Lambda is penalty for model complexity If lambda is too large only very simple models will be admitted Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)


Download ppt "Machine Learning 5. Parametric Methods."

Similar presentations


Ads by Google