Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Tests of Static Asset Pricing Models
Generalized Method of Moments: Introduction
Econometrics I Professor William Greene Stern School of Business
Brief introduction on Logistic Regression
3. Binary Choice – Inference. Hypothesis Testing in Binary Choice Models.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
1 Hypothesis Testing. 2  Greene: App. C:  Statistical Test: Divide parameter space (Ω) into two disjoint sets: Ω 0, Ω 1  Ω 0 ∩ Ω 1 =  and Ω.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Nguyen Ngoc Anh Nguyen Ha Trang
Visual Recognition Tutorial
1Prof. Dr. Rainer Stachuletz Limited Dependent Variables P(y = 1|x) = G(  0 + x  ) y* =  0 + x  + u, y = max(0,y*)
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Today Today: Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Maximum likelihood (ML) and likelihood ratio (LR) test
Evaluating Hypotheses
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Topic 3: Regression.
Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
Linear and generalised linear models
Maximum likelihood (ML)
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Lecture 14-1 (Wooldridge Ch 17) Linear probability, Probit, and
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Instrumental Variables: Problems Methods of Economic Investigation Lecture 16.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Modern Navigation Thomas Herring
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Application 3: Estimating the Effect of Education on Earnings Methods of Economic Investigation Lecture 9 1.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Machine Learning 5. Parametric Methods.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Maximum Likelihood. Much estimation theory is presented in a rather ad hoc fashion. Minimising squared errors seems a good idea but why not minimise the.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.
Estimating standard error using bootstrap
Data Modeling Patrice Koehl Department of Biological Sciences
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Probability Theory and Parameter Estimation I
Methods of Economic Investigation Lecture 12
EC 331 The Theory of and applications of Maximum Likelihood Method
Review of Statistical Inference
Instrumental Variables
Parametric Methods Berlin Chen, 2005 References:
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17

Last Time  IV estimation Issues Heterogeneous Treatment Effects  The assumptions  LATE interpretation Weak Instruments  Bias in Finite Samples  F-statistics test

Today’s Class  Maximum Likelihood Estimators You’ve seen this in the context of OLS Can make other assumptions on the form of likelihood function This is how we estimate discrete choice models like probit and logit  This is a very useful form of estimation Has nice properties Can be very robust to mis-specification

Our Standard OLS  Standard OLS Y i = X i ’β + ε i  Focus on minimizing mean squared error with an assumption that ε i |X i ~ N(0, σ 2 )

Another way to motivate linear models  “Extremum Estimators”: maximize/minimize some function OLS Minimize Mean-Squared Error Could also imagine minimizing some other types of functions  We often use a “likelihood function” This approach is more general, allowing us to deal with more complex nonlinear models Useful properties in terms of consistency and asymptotic convergence

What is a likelihood function  Suppose we have independent and identically distributed random variables {Z i,...,Z N } drawn from a density function f(z; θ). Then the likelihood function given a sample  Because it is sometimes convenient, we often use this in logarithmic form

Consistency - 1  Consider the population likelihood function with the “true” parameter θ 0  Think of L 0 as the population average and log L as the sample estimate, so that in the usual way

Consistency - 2  The population likelihood function is maximized L 0 (θ) at the true value, θ 0. Why? think of the sample likelihood function as telling us how likely it is one would observe the sample if the parameter value θ is really the true parameter value. Similarly, the population likelihood function L 0 (θ) will be the largest at the value of θ that makes it most likely to “observe the population” That value is true parameter value. ie θ 0 = argmaxL 0 (θ).

Consistency - 3  We now know that the population likelihood L 0 (θ) is maximized at θ 0 Can use Jensen’s inequality to apply this to the log function  the sample likelihood function log L(θ; z) gets closer to L 0 (θ) as N increases i.e. log(L) will start having the same shape as L 0 For large N, the sample likelihood will be maximized at θ 0

Information Matrix Equality  An additional useful property from the MLE comes from: Define the score function as the vector of derivatives of the log likelihood function Define the Hessian as the matrix of second derivatives of the log likelihood function

Asymptotic Distribution  Define the following:  Then the MLE estimate will converge in distribution to:  Where the information matrix I(θ) has the property thati.e. there does not exist a consistent estimate of θ with a smaller variance

Computation  Can be quite complex because need to numerically maximize  General procedure Re-scale variables so they have roughly similar variances Choose some starting value and estimated maximum in that areas do this over and over across different grids Get an approximation of the underlying objective function If this converges to a single maximum—you’re done

Test Statistics  Define our likelihood function L(z;θ 0,θ 1 )  Suppose we want to test H 0 : θ 0 = 0 against the alternative H A : θ 0 ≠ 0  We could estimate a restricted and an unrestricted likelihood function

Test Statistics - 1  We can test how “close” our restricted and unrestricted models might be  We could test if the restricted log likelihood function is maximized at θ 0 = 0, the derivative of the log likelihood function with respect to 0 at that point should be close to zero.

Test Statistics - 2  The restricted and unrestricted estimates of θ should be close together if the null hypothesis is correct  Partition the information matrix as follows  Define the Wald Test as:

Comparing test statistics  In large samples, these test statistics should converge in probability In finite samples, the three will tend to generate somewhat different test statistics, Will generally come to the same conclusion  The difference between the tests is how they go about answering that question. The LR test requires estimates of both of the models The W and LM tests approximate the LR test but require that only one model be estimated. When model is linear the three test statistics have the following relationship W ≥ LR ≥ LM

OLS in the MLE context  Linear Model log Likelihood Function  Choose parameter values which maximize this:

Example 1: Discrete choice  Latent Variable Model: True variable of interest is: Y * = X’β + ε We don’t observe Y * but we can observe Y = 1[Y * >0] Pr[Y=1] = Pr[Y * >0] = Pr[ε<X’β]  What to assume about ε? Linear Probability Model: Pr[Y=1] = X’β Probit Model: Pr[Y=1] = Ф(X’β) Logit Model: Pr[Y=1] = exp(X’β)/ [1 + exp(X’β)]

Likelihood Functions  Probit  Logit

Marginal Effects  In the linear function we can interpret our coefficients as the change in the likelihood function with respect to the relevant variable, i.e.  In non-linear functions, things are a bit trickier. We get We get the parameter estimate of β But we want: These are the “marginal effects” and are typically evaluated at the mean values of X

Next Time  Time Series Processes AR MA ARMA  Model Selection Return to MLE Various Criterion for Model Choice