Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)

Similar presentations


Presentation on theme: "Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)"— Presentation transcript:

1 Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)
Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)

2 Basic idea Maximum likelihood estimation (MLE) is a method to find the most likely density function that would have generated the data. Thus, MLE requires you to make a distributional assumption first. This handout provides you with an intuition behind the MLE using examples.

3 Example 1 Let me explain the basic idea of MLE using this data.
Let us make an assumption that the variable X follows normal distribution. Remember that the density function of normal distribution with mean μ and variance σ2 is given by: Id X 1 2 4 3 5 6 9

4 The data is plotted on the horizontal line.
Now, ask yourself the following question. “Which distribution, A or B, is more likely to have generated the data?” Id X 1 2 4 3 5 6 9 A B 1 4 5 6 9

5 Answer to the question is A, because the data are clustered around the center of the distribution A, but not around the center of the distribution B. This example illustrates that, by looking at the data, it is possible to find the distribution that is most likely to have generated the data. Now, I will explain exactly how to find the distribution in practice.

6 The illustration of the estimation procedure.
MLE starts with computing the likelihood contribution of each observation. The likelihood contribution is the height of the density function. We use Li to denote the likelihood contribution of ith observation.

7 Graphical illustration of the likelihood contribution
Data value The likelihood contribution of the first observation = Id X 1 2 4 3 5 6 9 A 1 4 5 6 9

8 Then, you multiply the likelihood contributions of all the observations. This is called the likelihood function. We use the notation L. In our example, n=5. This notation means you multiply from i=1 through n.

9 In our example, the likelihood function looks like:
I wrote L(μ,σ) to emphasize that the likelihood function depends on these parameters. Id Y 1 2 4 3 5 6 9

10 Then you find the values of μ and σ that maximize the likelihood function.
The values of μ and σ which are obtained this way are called the Maximum Likelihood Estimators of μ and σ. Most of the MLE cannot be solved ‘by hand’. Thus, you need to write an iterative procedure to solve it on computer.

11 Fortunately, there are many optimization computer programs that can do this.
Most common programs among Economists are GQOPT. This program runs on FORTRAN. Thus, you need to write a FORTRAN program. Even more fortunately, many of the models that requires MLE (like Probit or Logit models) can be estimated automatically on STATA. However, it is necessary for you to understand the basic idea of MLE in order to understand what STATA does.

12 Example 2 Example 1 was the simplest case.
We are usually interested in estimating a model like y=β0+β1x+u. Estimating such a model can be done using MLE.

13 Suppose that you have this data, and you are interested in estimating the model: y=β0+β1x+u
Let us make an assumption that u follows the normal distribution with mean 0 and variance σ2. Id Y X 1 2 6 4 3 7 5 9 15

14 You can write the model as:
u=y-(β0+β1x) This means that y-(β0+β1x) follows the normal distribution with with mean 0 and variance σ2. The likelihood contribution of each person is the height of the density function at the data point (y-β0+β1x).

15 For example, the likelihood contribution of the 2nd observation is given by
Data point The likelihood contribution of the 2nd observation = Id Y X 1 2 6 4 3 7 5 9 15 2-β0-β1 15-β0-9β1 6-β0-4β1 7-β0-5β1 9-β0-6β1

16 Then the likelihood function is given by
Id Y X 1 2 6 4 3 7 5 9 15 The likelihood function is a function of β0,β1, and σ.

17 You choose the values of β0,β1, and σ that maximizes the likelihood function. These are the maximum likelihood estimators of of β0,β1, and σ . Again, maximization can be easily done using GQOPT or any other programs that have the optimization programs (like Matlab).

18 Example 3 Consider the following model. y*=β0+β1x+u
Sometimes, we only know whether y*≥0 or not.

19 The data contain a variable Y which is either 0 or 1.
If Y=1, it means that y*≥0 If Y=0, it means that y*<0 Id Y X 1 2 4 3 5 6 9

20 Then, what is the likelihood contribution of each observation
Then, what is the likelihood contribution of each observation? In this case, we only know if y* ≥0 or y*<0. We do not know the exact value of y* . In such case, we use the probability that y* ≥0 or y*<0 as the likelihood contribution. Now, let’s make an assumption that u follows the standard normal distribution (normal distribution with mean 0 and variance 1.)

21 Thus, the likelihood contribution is
Take 2nd observation as an example. Since Y=0 for this observation, we know y*<0 Thus, the likelihood contribution is Id Y X 1 2 4 3 5 6 9 L2 -β0-β1 -β0-9β1 -β0-4β1 -β0-5β1 -β0-6β1

22 Thus, the likelihood contribution is
Now, take 3nd observation as an example. Since Y=1 for this observation, we know y*≥0 Thus, the likelihood contribution is Id Y X 1 2 4 3 5 6 9 L3 -β0-β1 -β0-9β1 -β0-4β1 -β0-5β1 -β0-6β1

23 Thus, the likelihood function has the following complicated form.
Id Y X 1 2 4 3 5 6 9

24 You choose the values of β0 and β1 that maximizes the likelihood function. These are the maximum likelihood estimators of of β0 and β1 .

25 Procedure of the MLE Compute the likelihood contribution of each observation: Li for i=1…n Multiply all the likelihood contribution to form the likelihood function L. Maximize L by choosing the values of the parameters. The values of parameters that maximizes L is the maximum likelihood estimators of the parameters.

26 The log likelihood function
It is usually easier to maximize the natural log of the likelihood function than the likelihood function itself.  

27 The standard errors in MLE
This is usually an advanced topic. However, it is useful to know how the standard errors are computed in MLE, since we use it for t-tests.

28 The score vector is the first derivative of the log likelihood function with respect to the parameters Let θ be a column vector of the parameters. In Example 2, θ=(β0,β1,σ)’. Then the score vector q is given by

29 Then, the standard errors of the parameters are given by the square root of the diagonal elements of the following matrix.


Download ppt "Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)"

Similar presentations


Ads by Google