The Method of Likelihood Hal Whitehead BIOL4062/5062.

Slides:



Advertisements
Similar presentations
Hypothesis testing Another judgment method of sampling data.
Advertisements

Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Likelihood ratio tests
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Probability & Statistical Inference Lecture 7 MSc in Computing (Data Analytics)
458 Model Uncertainty and Model Selection Fish 458, Lecture 13.
Topic 2: Statistical Concepts and Market Returns
Statistical inference form observational data Parameter estimation: Method of moments Use the data you have to calculate first and second moment To fit.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Inference about a Mean Part II
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Simple Linear Regression and Correlation
Chapter 12 Section 1 Inference for Linear Regression.
AS 737 Categorical Data Analysis For Multivariate
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Statistical Inference: Estimation and Hypothesis Testing chapter.
Inference for regression - Simple linear regression
II.Simple Regression B. Hypothesis Testing Calculate t-ratios and confidence intervals for b 1 and b 2. Test the significance of b 1 and b 2 with: T-ratios.
Statistical inference: confidence intervals and hypothesis testing.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Education 793 Class Notes T-tests 29 October 2003.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
MC Chapter 27 – Inference on slope. Growth hormones are often used to increase the weight gain of chickens. In an experiment using 15 chickens, five different.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Maximum Likelihood Estimation Psych DeShon.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
© Copyright McGraw-Hill 2004
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
PEP-PMMA Training Session Statistical inference Lima, Peru Abdelkrim Araar / Jean-Yves Duclos 9-10 June 2007.
Lesson Testing the Significance of the Least Squares Regression Model.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Quiz 3. Model selection Overview Objectives determine the “choice” of model Modeling for forecasting Likelihood ratio test Akaike Information Criterion.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Linear model. a type of regression analyses statistical method – both the response variable (Y) and the explanatory variable (X) are continuous variables.
More on Inference.
Chapter 4. Inference about Process Quality
Confidence Intervals and Hypothesis Tests for Variances for One Sample
When we free ourselves of desire,
More on Inference.
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Model Comparison.
Review of Statistical Inference
CHAPTER 12 More About Regression
Statistical Inference for the Mean: t-test
Presentation transcript:

The Method of Likelihood Hal Whitehead BIOL4062/5062

What is likelihood Maximum likelihood Maximum likelihood estimation Likelihood ratio tests Likelihood profile confidence intervals Model selection: –Likelihood ratio tests –Akaike Information Criterion (AIC) Likelihood and least-squares Calculating likelihood

The Method of Likelihood Observations: Y = {y 1,y 2,y 3,...} e.g. Weights of 30 crabs of known age and sex Model specified by: μ 1, μ 2, μ 3,… e.g. y = μ 1 + μ 2 ·√Age + μ 3 ·Sex(0:1) + μ 4 ·e where e ~ N(0, 1) The LIKELIHOOD of Y is: L = Probability ( Y | Model & μ 1, μ 2, μ 3,... )

Likelihood The LIKELIHOOD of Y is: L = Probability ( Y | Model & μ 1, μ 2, μ 3,... ) The LIKELIHOOD that Z became a criminal: Probability Z became a criminal given what we what we know of Z’s characteristics and how those characteristics translate into the probability of being a criminal

The LIKELIHOOD of Y is: L = Probability ( Y | Model & μ 1, μ 2, μ 3,…) We can work this out if we know μ 1, μ 2, μ 3,… Weights of 30 crabs of known age and sex y = μ 1 + μ 2 ·√Age + μ 3 ·Sex(0:1) + μ 4 ·e e.g Prob. of these 30 weights is 0.04 if: female wt at age 0, μ 1 = 30.0 growth parameter, μ 2 = 0.7 excess male weight, μ 3 = 5.0 residual s.d., μ 4 = 6.3 L(μ 1 =30,μ 2 =0.7,μ 3 =5.0, μ 4 =6.3)=0.04

If we do not know μ 1, μ 2, μ 3,... MAXIMUM LIKELIHOOD of Y is: L(μ 1,μ 2,μ 3,...) = Max{Prob.( Y | μ 1, μ 2, μ 3,... )} μ 1,μ 2,… e.g Max prob. of 30 weights is 0.12 when: female wt at age 0, μ 1 = 28.4 growth parameter, μ 2 = 0.31 excess male weight, μ 3 = 1.7 residual s.d., μ 4 = 3.9 Maximum Likelihood Estimators

Maximum Likelihood μ1μ1 Likelihood Maximum likelihood Maximum likelihood estimator of μ 1

Maximum Likelihood μ1μ1 Likelihood Precise estimate Imprecise estimate

Likelihood Ratio Tests If: μ 1,μ 2,μ 3,…,μ t is true model μ 1,μ 2,μ 3,…,μ t,...,μ g is more general model then: G = 2∙Log[L(μ 1,μ 2,μ 3,…,μ g )/L(μ 1,μ 2,μ 3,…,μ t )] (twice the log of the ratios of the maximum likelihoods) is distributed as χ² with g-t degrees of freedom for large sample sizes (asymptotically) If G is unexpectedly large then data are unlikely to be from model μ 1,μ 2,μ 3,…,μ t

Likelihood Ratio Tests G = 2·Log[L(μ 1,μ 2,μ 3,…,μ g )/L(μ 1,μ 2,μ 3,…,μ t )] This is the "G-test for goodness-of-fit": null hypothesis: μ 1,μ 2,μ 3,…,μ t alternative hypothesis: μ 1,μ 2,μ 3,…,μ t,...,μ g

Likelihood: an example ExpectFind Wild Type 75% 80 Mutants 25% 10 Total 100% 90

Null hypothesis: Binomial Distribution with q = 0.75 ExpectFind Wild Type 75% 80 Mutants 25% 10 Total 100% 90 Likelihood(q=0.75) = 90 C 10 · · =

Alternative hypothesis: Binomial Distribution with q = ? ExpectFind Wild Type 75% 80 Mutants 25% 10 Total 100% 90 Likelihood(q) = 90 C 10 · q 80 ·(1-q) 10 This has a maximum value when q = 80/90 = 0.89 Max Likelihood(q) = 90 C 10 ·(0.89) 80 ·(1-0.89) 10 = Maximum Likelihood Estimator

Likelihood Ratio Test ExpectFind Wild Type 75% 80 Mutants 25% 10 Total 100% 90 G = 2 · Log { Max Likelihood (q) } Likelihood (q = 0.75) = 2 · Log(0.1236/ ) = is distributed as χ² with 1 d.f. if q=0.75 significantly large (P<0.01) in χ²(1) so: reject null hypothesis.

Profile Likelihood Confidence Intervals μ1μ1 Likelihood

Profile Likelihood Confidence Intervals μ1μ1 Log- Likelihood 2 Maximum likelihood Maximum likelihood estimator of μ 1 95% c.i.

Profile Likelihood Confidence Intervals Log-Likelihood Contours (relative to maximum likelihood) μ1μ1 μ2μ2 MLE(0) -2 95% Confidence region

Model Selection Using Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ 1 + μ 4 · e M(1): y = μ 1 + μ 2 · √ Age + μ 4 · e M(2): y = μ 1 + μ 2 · √ Age + μ 3 · Sex(0:1) + μ 4 · e

Model Selection Using Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ 1 + μ 4 · e Log(L)= M(1): y = μ 1 + μ 2 · √Age + μ 4 · eLog(L)= M(2): y = μ 1 + μ 2 · √Age + μ 3 · Sex(0:1) + μ 4 · e Log(L)=

Model Selection Using Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ 1 + μ 4 · e Log(L)= M(1): y = μ 1 + μ 2 · √Age + μ 4 · eLog(L)= M(2): y = μ 1 + μ 2 · √Age + μ 3 ·Sex(0:1) + μ 4 · e Log(L)= G(M(0)vs.M(1)) = 2x( (-23.04)) = 5.40 G(M(1)vs.M(2)) = 2x( (-20.34)) = 1.00 G(M(0)vs.M(2)) = 2x( (-23.04)) = 6.40

Model Selection Using Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ 1 + μ 4 · e Log(L)= M(1): y = μ 1 + μ 2 · √ Age + μ 4 · e Log(L)= M(2): y = μ 1 + μ 2 · √Age + μ 3 ·Sex(0:1) + μ 4 · e Log(L)= G(M(0)vs.M(1)) = 2x( (-23.04)) = 5.40P(χ²(1))<0.05 G(M(1)vs.M(2)) = 2x( (-20.34)) = 1.00P(χ²(1))>0.10 G(M(0)vs.M(2)) = 2x( (-23.04)) = 6.40 P(χ²(2))<0.05

Model Selection Using Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ 1 + μ 4 · e Log(L)= M(1): y = μ 1 + μ 2 ·√Age + μ 4 · e Log(L)= M(2): y = μ 1 + μ 2 · √Age + μ 3 · Sex(0:1) + μ 4 · e Log(L)= G(M(0)vs.M(1)) = 2x( (-23.04)) = 5.40P(χ²(1))<0.05 G(M(1)vs.M(2)) = 2x( (-20.34)) = 1.00P(χ²(1))>0.10 G(M(0)vs.M(2)) = 2x( (-23.04)) = 6.40 P(χ²(2))<0.05

Model Selection Using Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ 1 + μ 4 · e Log(L)= M(1): y = μ 1 + μ 2 · √Age + μ 4 · e Log(L)= M(2): y = μ 1 + μ 2 · √Age + μ 3 · Sex(0:1) + μ 4 · e Log(L)= G(M(0)vs.M(1)) = 2x( (-23.04)) = 5.40P(χ²(1))<0.05 G(M(1)vs.M(2)) = 2x( (-20.34)) = 1.00P(χ²(1))>0.10 G(M(0)vs.M(2)) = 2x( (-23.04)) = 6.40 P(χ²(2))<0.05 But: What is critical p-value?

Model Selection Using Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(1): y = μ 1 + μ 2 ·√Age + μ 4 ·e M(3): y = μ 1 + μ 3 ·Sex(0:1) + μ 4 ·e But: Cannot compare M(1) and M(3) using likelihood-ratio tests

Model Selection Using Likelihood-Ratio Tests What is critical p-value? Cannot compare models which are not subsets of one another using likelihood-ratio tests So: Akaike Information Criteria (AIC)

Akaike Information Criteria (AIC) Kullback-Leibler Information (KLI): –“information lost when model M(0) is used to approximate model M(1)” –“distance from M(0) to M(1)” AIC(M) = - 2xLog(Likelihood(M)) + 2xK(M) –K(M) is number of estimable parameters of model M AIC is an estimate of the expected relative distance (KLI) between a fitted model, M, and the unknown true mechanism that generated the data

Akaike Information Criteria (AIC) AIC(M) = - 2xLog(Likelihood(M)) + 2xK(M) –K(M) is number of estimable parameters In model selection: choose model with smallest AIC –least expected relative distance between M, and the unknown true mechanism that generated the data

Model Selection Using AIC Weights of 30 crabs of known age and sex: M(0): y = μ 1 + μ 4 · e M(1): y = μ 1 + μ 2 · √Age + μ 4 · e M(2): y = μ 1 + μ 2 · √Age + μ 3 · Sex(0:1) + μ 4 · e M(3): y = μ 1 + μ 3 · Sex(0:1) + μ 4 · e

Model Selection Using AIC Weights of 30 crabs of known age and sex: M(0): y = μ 1 + μ 4 · e AIC=50.08 M(1): y = μ 1 + μ 2 · √Age + μ 4 · eAIC=46.68 M(2): y = μ 1 + μ 2 · √Age + μ 3 · Sex(0:1) + μ 4 · e AIC=47.68 M(3): y = μ 1 + μ 2 · Sex(0:1) + μ 4 · eAIC=49.95

Model Selection Using AIC Weights of 30 crabs of known age and sex: M(0): y = μ 1 + μ 4 · e AIC=50.08 M(1): y = μ 1 + μ 2 · √Age + μ 4 · eAIC=46.68 M(2): y = μ 1 + μ 2 · √Age + μ 3 · Sex(0:1) + μ 4 · e AIC=47.68 M(3): y = μ 1 + μ 3 · Sex(0:1) + μ 4 · eAIC=49.95

Model Selection Using AIC Differences in AIC between models: ΔAIC Support for less favoured model –ΔAIC: 0-2Substantial –ΔAIC: 4-7Considerably less –ΔAIC: >10Essentially none

Model Selection Using AIC Weights of 30 crabs of known age and sex: M(0): y = μ 1 + μ 4 · e AIC=50.08 Unlikely M(1): y = μ 1 + μ2 · √Age + μ 4 · e AIC=46.68BEST M(2): y = μ 1 + μ 2 ·√Age + μ 3 ·Sex(0:1) + μ 4 ·e AIC=47.68 Good M(3): y = μ 1 + μ 3 · Sex(0:1) + μ 4 · e AIC=49.95 Unlikely

Modifications to AIC AIC for small sample sizes: AIC C = - 2x(Log-Likelihood) + 2xKxn/(n-K-1) n is sample size AIC for overdispersed count data: QAIC = - 2xLog-Likelihood/c + 2xK c is “variance inflation factor” (c=χ²/df)

Burnham, K. P., and D. R. Anderson 2002 Model selection and multimodel inference: a practical information-theoretic approach, 2nd ed. New York: Springer-Verlag

Likelihood and Least-Squares If errors are normally distributed –least squares and maximum-likelihood estimates of parameters are the same –but not σ 2 estimators Likelihood is a more powerful and theoretically-based technique

AIC and Least-Squares If all models assume normal errors with constant variance: AIC = n.Log(σ 2 ) + 2.K –σ 2 = Σe i 2 /n (the MLE of σ 2 ) –K is total no of estimated regression parameters, including the intercept and σ 2

Calculating Likelihoods Analytical formulae Compute by multiplying probabilities Estimate by simulation –number of times data are obtained in 1,000 simulations given model and parameters

The Method of Likelihood Probability of data given model Estimate parameters using maximum likelihood Estimate confidence intervals using likelihood profiles Compare models using –likelihood ratio tests –Akaike Information Criterion (AIC)