01/20151 EPI 5344: Survival Analysis in Epidemiology Maximum Likelihood Estimation: An Introduction March 10, 2015 Dr. N. Birkett, School of Epidemiology,

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
The Maximum Likelihood Method
Brief introduction on Logistic Regression
Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Sampling: Final and Initial Sample Size Determination
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Maximum likelihood (ML) and likelihood ratio (LR) test
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
Maximum likelihood (ML) and likelihood ratio (LR) test
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
GG313 Lecture 8 9/15/05 Parametric Tests. Cruise Meeting 1:30 PM tomorrow, POST 703 Surf’s Up “Peak Oil and the Future of Civilization” 12:30 PM tomorrow.
Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.
Statistics 350 Lecture 10. Today Last Day: Start Chapter 3 Today: Section 3.8 Homework #3: Chapter 2 Problems (page 89-99): 13, 16,55, 56 Due: February.
Maximum likelihood (ML)
Chapter 12 Section 1 Inference for Linear Regression.
Logistic regression for binary response variables.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Random Sampling, Point Estimation and Maximum Likelihood.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Introduction to Behavioral Statistics Probability, The Binomial Distribution and the Normal Curve.
Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT!
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche,
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
STT : BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 7: Parametric Survival Models under Censoring STT
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
01/20151 EPI 5344: Survival Analysis in Epidemiology Confounding and Effect Modification March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Machine Learning 5. Parametric Methods.
ES 07 These slides can be found at optimized for Windows)
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
Review of statistical modeling and probability theory Alan Moses ML4bio.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Conditional Expectation
Review. Common probability distributions Discrete: binomial, Poisson, negative binomial, multinomial Continuous: normal, lognormal, beta, gamma, (negative.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
The simple linear regression model and parameter estimation
Statistics 350 Lecture 4.
The Maximum Likelihood Method
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

01/20151 EPI 5344: Survival Analysis in Epidemiology Maximum Likelihood Estimation: An Introduction March 10, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa

01/20152 Objectives MLE was introduced by me in EPI5340 Likely covered in other courses too. Won’t cover much on the basics. Parameter estimation using maximum likelihood Using MLE to estimate variance and do statistical testing.

01/20153 Intro (1) Conduct an experiment –Toss a coin 10 times and observe 6 heads –What is the probability of getting a head when tossing this coin? –NOTE: we do not know that the coin is fair! Let p = prob(head). Assume binomial dist’n:

01/20154 Intro (3) We can give a formula for how likely the data is, given a specific value of ‘p’:

01/20155

6 Intro (4) For mathematical ease, one usually works with the logarithm of the likelihood –Has the same general shape –Has the same maximum point

01/20157

8 Intro (5) What value of ‘p’ makes the log(L) as large as possible? Log(L) curves have the same general shape –An inverted ‘U’ Have one point which is the maximum. Use calculus to find it To find maximum, find ‘p’ which makes this equal to ‘0’

01/20159 Intro (6) To find maximum, find ‘p’ which makes this equal to ‘0’

01/ Intro (7) Suppose we re-do experiment and get 600 heads in 1,000 tosses. What is p MLE ? –600/1000 = 0.6 (the same) Do we gain anything by doing 100 times for tosses? –Plot the log(L) curve

01/ Much narrower

01/ MLE (1) Likelihood –how likely is the observed data given that the parameter(s) assume a fixed value(s) It is not the probability of the observed data Assumes –We have a parametric model for the data –Usually assumes independent observations Coin tosses are independent, each with a Bernoulli Dist'n When plotted, scale on y-axis is arbitrary Usually work with ln(L): the natural logarithm of L

01/ MLE (2) Ln(L) curve is nearly always an inverted ‘U’ (inverted parabola) The value of the parameter which makes the curve as high as possible makes the observed data the most likely. –Maximum Likelihood Estimator (MLE)

01/ MLE (3) The width of the ln(L) curve relates to the variance of the parameter estimate –More precisely, the variance is related to: slope of the slope of the ln(L) curve at the MLE Referred to as: Fisher’s Information

01/201515

01/201516

Another example: incidence rate 01/ # of observed events (D) follows a Poisson Distribution:

01/ To find the MLE, set this slope to ‘0’ The formula for the incidence rate from epidemiology

Normal(Gaussian) 1 observation only 01/ We will assume that σ is known To find MLE, set = 0

Normal(Gaussian) ‘N’ observations 01/ Previous may not seem useful – who does a study with one data point? So, let’s suppose we have ‘N’ observations: x 1 …x N All normally distributed with common mean and variance Assume that σ is known

Normal(Gaussian) ‘N’ observations 01/

Normal(Gaussian) ‘N’ observations 01/ To find MLE, set

01/ Approximations (1) All likelihoods have a similar shape –Inverted ‘U’, with one peak Over some range of parameter values (near the MLE), all likelihood curves look like a parabola –Larger sample size  larger range of fit We can approximate any likelihood curve with a parabola  Normal approximation. This is useful since it provides statistical tests.

01/ Approximations (2) General Idea –Assume that true likelihood is based on one parameter θ –θ MLE is most likely value of θ –We want to find a normal likelihood with a peak at the same point and which ‘looks similar’ around the MLE point: True ln(L) Normal approx

01/ Approximations (3) For a Gaussian curve, we have (ignoring the constant: We have seen that, for this situation, Our ‘true’ curve has an MLE of To have the same peak, we need to set:

01/ Approximations (4) What do we mean by ‘similar shape’? –Can’t use ‘slope’ since it is always ‘0’ at MLE Many criteria could be used. We will use ‘curvature’

01/ Approximations (5) Curvature = - second derivative of log(L) = - Information Curvature –The slope of the slope of the likelihood curve at the MLE Rate at which the slope is changing at the MLE Peeked curves have higher values It is always < 0

01/ Approximations (6) What is the curvature at the peak (MLE) for a Gaussian? Which is a constant! Set to the curvature of ‘real’ curve to get approximate curve

Approximations (7) To get a ‘good’ normal approximation in the region of the MLE, here’s what we need to do Set the ‘mean’ of the normal curve to Set the variance of the normal curve to the negative of the reciprocal of the curvature of the target: 01/ How to do this depends on the ‘target’

01/ Approximations (8) Approximation to binomial dist’n ‘N’ events ‘D’ are positive Want to find a normal approximation to use around the MLE

01/ Approximations (9) We need the curvature at the MLE. So, make these 2 substitutions: This gives: So, the normal approximation uses:

01/201532

01/ Hypothesis tests (1) Simple hypothesis test: –H 0 : mean = μ 0 We’ll do this using a Likelihood approach Based off the real curve, not an approximation (for now) Determine the likelihood at: –Null hypothesis –MLE (the observed data) –Subtract likelihoods (‘MLE’ from ‘null’)

01/ p MLE Null

01/ Difference in log-likelihood = -18 p MLE Null

01/ p MLE Null Difference in log-likelihood = -0.1

01/ Hypothesis tests (2) We want to test Sample: x 1, x 2,…,x n iid~N(μ, σ 2 ), σ 2 is assumed ‘known’. We know that: Likelihood ratio test of null hypothesis NOTE: for convenience, I have scaled the ln(L) axes so the the value at the MLE is ‘0’. In reality, the ln(L) value at the MLE is not ‘0’.

01/ Hypothesis tests (3) Likelihood Curve

01/ Hypothesis tests (4) But, it again is easier to work with logs. So, the test is based on:

01/ Hypothesis tests (5) First, remember that for a normal distribution, we have: So, at the null hypothesis, we have: And at the MLE point, we have:

01/ Hypothesis tests (6) Distributed as Should recognize this test from Biostats 1 After a bit of algebra

Likelihood ratio test = -2ΔLR ~ –If x’s are normal, test is exact –If x’s are not normal, test is not exact but isn’t bad. Assumes that we know the true shape of the likelihood curve. What if we don’t? Use an approximation Two main methods –Wald –Score 01/ Hypothesis tests (7)

01/ Hypothesis tests (8) Wald test –Assumes that the true and normal curves have: the same peak value (the MLE) Same curvature at the peak value –Is an approximate test which is best around the MLE Good for 95% confidence intervals. –Tends to under-estimate the LR test value.

01/ Wald approximation Wald True

01/ True LR test Wald LR test Wald True

01/ Hypothesis tests (9) Score test –Assumes that the true and normal curves have: Same slope and curvature at the null value –Implies that the peaks are not the same the MLEs are also not the same –Is an approximate test which is best around the Null hypothesis

01/201547

01/ Hypothesis tests (10) Regression models –can be fit using MLE methods –most common approach used for logistic regression Cox regression Poisson regression Data will be iid and normally distributed with:

01/ Hypothesis tests (11) Can use MLE to estimate the Betas Fitted model will have a ln(L) value. Now, fit two models: –one with x –one without x. Each model will have a ln(L) –ln(L with x ) –ln(L without x )

01/ Hypothesis tests (12) Likelihood ratio test of is given by: Complicated way to test one Beta Easily extended to more complex models Very similar to using Partial F-tests which you covered when learning linear regression

01/201551