1 THE CENTRAL LIMIT THEOREM If a random variable X has a normal distribution, its sample mean X will also have a normal distribution. This fact is useful.

Slides:



Advertisements
Similar presentations
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: the central limit theorem Original citation: Dougherty, C. (2012)
Advertisements

Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: asymptotic properties of estimators: the use of simulation Original.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: a Monte Carlo experiment Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: introduction to maximum likelihood estimation Original citation: Dougherty,
1 THE DISTURBANCE TERM IN LOGARITHMIC MODELS Thus far, nothing has been said about the disturbance term in nonlinear regression models.
EC220 - Introduction to econometrics (chapter 7)
1 XX X1X1 XX X Random variable X with unknown population mean  X function of X probability density Sample of n observations X 1, X 2,..., X n : potential.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: asymptotic properties of estimators: plims and consistency Original.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 13) Slideshow: stationary processes Original citation: Dougherty, C. (2012) EC220 -
1 THE NORMAL DISTRIBUTION In the analysis so far, we have discussed the mean and the variance of a distribution of a random variable, but we have not said.
1 PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE red This sequence provides an example of a discrete random variable. Suppose that you.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
MEASUREMENT ERROR 1 In this sequence we will investigate the consequences of measurement errors in the variables in a regression model. To keep the analysis.
ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY
1 ASSUMPTIONS FOR MODEL C: REGRESSIONS WITH TIME SERIES DATA Assumptions C.1, C.3, C.4, C.5, and C.8, and the consequences of their violations are the.
EC220 - Introduction to econometrics (chapter 9)
00  sd  0 –sd  0 –1.96sd  0 +sd 2.5% CONFIDENCE INTERVALS probability density function of X null hypothesis H 0 :  =  0 In the sequence.
EXPECTED VALUE OF A RANDOM VARIABLE 1 The expected value of a random variable, also known as its population mean, is the weighted average of its possible.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: expected value of a function of a random variable Original citation:
1 We will now consider the distributional properties of OLS estimators in models with a lagged dependent variable. We will do so for the simplest such.
TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN 1 This sequence describes the testing of a hypothesis at the 5% and 1% significance levels. It also.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: confidence intervals Original citation: Dougherty, C. (2012) EC220.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: maximum likelihood estimation of regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: the normal distribution Original citation: Dougherty, C. (2012)
1 In a second variation, we shall consider the model shown above. x is the rate of growth of productivity, assumed to be exogenous. w is now hypothesized.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
EC220 - Introduction to econometrics (review chapter)
1 UNBIASEDNESS AND EFFICIENCY Much of the analysis in this course will be concerned with three properties of estimators: unbiasedness, efficiency, and.
FIXED EFFECTS REGRESSIONS: WITHIN-GROUPS METHOD The two main approaches to the fitting of models using panel data are known, for reasons that will be explained.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: sampling and estimators Original citation: Dougherty, C. (2012)
1 CONTINUOUS RANDOM VARIABLES A discrete random variable is one that can take only a finite set of values. The sum of the numbers when two dice are thrown.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 8) Slideshow: measurement error Original citation: Dougherty, C. (2012) EC220 - Introduction.
THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE 1 In this short sequence we shall decompose a random variable X into its fixed and random components.
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE 1 This sequence derives an alternative expression for the population variance of a random variable. It provides.
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE
1 t TEST OF A HYPOTHESIS RELATING TO A POPULATION MEAN The diagram summarizes the procedure for performing a 5% significance test on the slope coefficient.
ASYMPTOTIC AND FINITE-SAMPLE DISTRIBUTIONS OF THE IV ESTIMATOR
MULTIPLE RESTRICTIONS AND ZERO RESTRICTIONS
TYPE II ERROR AND THE POWER OF A TEST A Type I error occurs when the null hypothesis is rejected when it is in fact true. A Type II error occurs when the.
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
A.1The model is linear in parameters and correctly specified. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS 1 Moving from the simple to the multiple.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 9) Slideshow: instrumental variable estimation: variation Original citation: Dougherty,
1 We will now look at the properties of the OLS regression estimators with the assumptions of Model B. We will do this within the context of the simple.
1 We will continue with a variation on the basic model. We will now hypothesize that p is a function of m, the rate of growth of the money supply, as well.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: alternative expression for population variance Original citation:
1 ASYMPTOTIC PROPERTIES OF ESTIMATORS: THE USE OF SIMULATION In practice we deal with finite samples, not infinite ones. So why should we be interested.
Definition of, the expected value of a function of X : 1 EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE To find the expected value of a function of.
4 In our case, the starting point should be the model with all the lagged variables. DYNAMIC MODEL SPECIFICATION General model with lagged variables Static.
HETEROSCEDASTICITY 1 This sequence relates to Assumption A.4 of the regression model assumptions and introduces the topic of heteroscedasticity. This relates.
INSTRUMENTAL VARIABLES 1 Suppose that you have a model in which Y is determined by X but you have reason to believe that Assumption B.7 is invalid and.
1 INSTRUMENTAL VARIABLE ESTIMATION OF SIMULTANEOUS EQUATIONS In the previous sequence it was asserted that the reduced form equations have two important.
1 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS This sequence presents two methods for dealing with the problem of heteroscedasticity. We will.
1 ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION We have seen that the variance of a random variable X is given by the expression above. Variance.
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: confidence intervals Original citation: Dougherty, C. (2012) EC220 -
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
1 We will illustrate the heteroscedasticity theory with a Monte Carlo simulation. HETEROSCEDASTICITY: MONTE CARLO ILLUSTRATION 1 standard deviation of.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: independence of two random variables Original citation: Dougherty,
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

1 THE CENTRAL LIMIT THEOREM If a random variable X has a normal distribution, its sample mean X will also have a normal distribution. This fact is useful for the construction of t statistics and confidence intervals if we are employing X as an estimator of the population mean. n = 1 10 million samples

2 However, what happens if we are not able to assume that X has a normal distribution? THE CENTRAL LIMIT THEOREM n = 1 10 million samples

3 The standard response is to make use of a central limit theorem. Loosely speaking, a central limit theorem states that the distribution of X will approximate a normal distribution as the sample size becomes large, even when the distribution of X itself is not normal. THE CENTRAL LIMIT THEOREM n = 1 10 million samples

4 There are a number of central limit theorems, differing only in the assumptions that they make in order to obtain this result. Here we shall be content with using the simplest one, the Lindeberg–Levy central limit theorem. THE CENTRAL LIMIT THEOREM n = 1 10 million samples

5 It states that, provided that the X i in the sample are all drawn independently from the same distribution (the distribution of X), and provided that this distribution has finite population mean and variance, the distribution of X will converge on a normal distribution as n increases. THE CENTRAL LIMIT THEOREM n = 1 10 million samples

6 This means that our t statistics and confidence intervals will be approximately valid after all, provided that the sample size is large enough. THE CENTRAL LIMIT THEOREM n = 1 10 million samples

7 The figure shows the distribution of X for the case where the X has a uniform distribution with range 0 to 1, for 10 million samples. A uniform distribution is one in which all values over a finite range are equally likely. THE CENTRAL LIMIT THEOREM n = 1 10 million samples

8 For a sample of 1, the distribution of X is the uniform distribution itself, and so it is a horizontal line. THE CENTRAL LIMIT THEOREM n = 1 10 million samples

9 We now show the distribution of X for a sample of size 10, for 10 million samples. It can be seen that X has a distribution very close to a normal distribution even though the sample size is quite small. THE CENTRAL LIMIT THEOREM n = 10 n = 1 10 million samples

10 Here is the distribution of X for samples of size 25. It is even closer to normal. THE CENTRAL LIMIT THEOREM n = 25 n = 10 n = 1 10 million samples

11 Here is the distribution for sample size 100. It is indistinguishable from normal. 10 million samples THE CENTRAL LIMIT THEOREM n = 100 n = 25 n = 10 n = 1

12 If X had a different distribution, the sample size required for a good approximation would be different. The figure shows the case where X has a lognormal distribution. As you can see, it is heavily skewed. THE CENTRAL LIMIT THEOREM n = 1

13 Here is the distribution of X for sample size 10, for 10 million samples. It is still heavily skewed. THE CENTRAL LIMIT THEOREM n = 10 n = 1 10 million samples

14 With sample size 25, the distribution is becoming less skewed. THE CENTRAL LIMIT THEOREM n = 25 n = 10 n = 1 10 million samples

15 However, even with sample size 100, the distribution is only an approximation to a normal distribution. Notice the difference in the shapes of the tails. We need a larger value of n before we can say that the distribution is approximately normal. THE CENTRAL LIMIT THEOREM 10 million samples n = 100 n = 25 n = 10 n = 1

16 In asserting that the distribution of X tends to become normal as the sample size increases, we have glossed over an important technical point that needs to be addressed. The central limit theorem applies only in the limit, as the sample size tends to infinity. THE CENTRAL LIMIT THEOREM 10 million samples n = 100 n = 25 n = 10 n = 1

17 However, as the sample size tends to infinity, the distribution of X degenerates to a spike located at the population mean. So how can we talk about the limiting distribution being normal? THE CENTRAL LIMIT THEOREM 10 million samples n = 100 n = 25 n = 10 n = 1

18 The answer is to transform the estimator in an appropriate way so that the transformation does have a limiting distribution. Having established the limiting distribution of the transformation, we may be able to work backwards to the properties of the estimator. THE CENTRAL LIMIT THEOREM 10 million samples n = 100 n = 25 n = 10 n = 1

19 If X has mean  and variance  2, X has mean  and variance  2 /n. The mean is independent of n, but the variance tends to zero as n tends to infinity. THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . Search for a transformation of X that has a limiting distribution mean stable, but variance → 0 X  mean variance properties as n increases

20 We can deal with the vanishing variance problem by scaling the estimator by. This multiplies its variance by n, and so the variance becomes  2, which is independent of n. We are making progress in finding the appropriate transformation. THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . Search for a transformation of X that has a limiting distribution mean stable, but variance → 0 variance stable, but mean increases X  mean variance properties as n increases

21 However, we now have a problem with the mean. This is now. It increases with n, so the statistic cannot have a limiting distribution. THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . Search for a transformation of X that has a limiting distribution mean stable, but variance → 0 variance stable, but mean increases X  mean variance properties as n increases

22 To deal with this, we consider instead the statistic. This is what we need. Its mean is zero and its variance is unaffected. The mean and variance are both independent of n, and so this statistic can have a limiting distribution. THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . Search for a transformation of X that has a limiting distribution mean stable, but variance → 0 variance stable, but mean increases X  0 mean and variance both stable mean variance properties as n increases

23 The Lindeberg–Levy central limit theorem states that, as n tends to infinity, this statistic has a normal distribution with mean zero and variance  2. THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . Application of central limit theorem Search for a transformation of X that has a limiting distribution mean stable, but variance → 0 variance stable, but mean increases X  0 mean and variance both stable mean variance properties as n increases

24 THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . Search for a transformation of X that has a limiting distribution mean stable, but variance → 0 variance stable, but mean increases X  0 mean and variance both stable Application of central limit theorem mean variance properties as n increases The arrow with a d over it is mathematical shorthand that means ‘has limiting distribution as n tends to infinity’.

25 THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . This relationship is true only as n goes to infinity. However, from the limiting distribution, we can start working back tentatively to finite samples. We can say, that for large n, the relationship may hold approximately. (The symbol ~ means ‘is distributed as’.) Approximation for finite samples Application of central limit theorem

26 THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . Then, dividing the statistic by, we can say that, for sufficiently large n, the second equation is approximately true. Approximation for finite samples Application of central limit theorem

27 THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . This implies the last equation. We knew, from the beginning, that the sample mean was distributed with mean  and variance  2 /n. Approximation for finite samples Application of central limit theorem

28 THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . What we have shown is that, irrespective of the distribution of X, the distribution of the sample mean is approximately normal in sufficiently large samples. This enables us to perform the usual tests. Approximation for finite samples Application of central limit theorem

29 THE CENTRAL LIMIT THEOREM X has mean  and variance  2. X is an estimator of . Approximation for finite samples Of course, this begs the question of what might be considered to be ‘sufficiently large n’. To answer this question, the analysis must be supplemented by simulation. Application of central limit theorem

30 The figure shows the distribution of for the uniform distribution when n = 1. It is, of course, just the uniform distribution itself, with the mean of 0.5 subtracted. THE CENTRAL LIMIT THEOREM 10 million samples n = 1

31 Here is the distribution of when n = 10. It looks very like a normal distribution. THE CENTRAL LIMIT THEOREM 10 million samples n = 10

32 Here is the same figure with the theoretical limiting normal distribution, in red. It confirms that the distribution for the sample mean has virtually converged to normality with a sample size of only 10. THE CENTRAL LIMIT THEOREM 10 million samples n = 10

33 The curve for n = 25 has been added. There is hardly any change because convergence has already been achieved. THE CENTRAL LIMIT THEOREM 10 million samples n = 25

34 Of course, the curve for n = 100 also coincides. In this case, n = 25 was ‘sufficiently large’. Perhaps even n = million samples THE CENTRAL LIMIT THEOREM n = 100

35 Now consider the example of the lognormal distribution. Here is the distribution of for n = 1. It is just the lognormal distribution itself with the mean subtracted. THE CENTRAL LIMIT THEOREM 10 million samples

36 Here is the distribution of for n = 10. The theoretical limiting distribution is also shown. Clearly, n = 10 is far from being ‘sufficiently large’. THE CENTRAL LIMIT THEOREM 10 million samples limiting normal distribution n = 10

37 Here is the distribution of for n = 25. It is closer to the limiting distribution but there is still a long way to go. THE CENTRAL LIMIT THEOREM 10 million samples limiting normal distribution n = 25

38 Here is the distribution of for n = 100. It is closer still to the limiting distribution but convergence has not been achieved. In the case of the lognormal distribution, even a sample size of 100 is clearly not ‘sufficiently large’. We should try 200, perhaps 500. THE CENTRAL LIMIT THEOREM limiting normal distribution n = million samples

Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section R.15 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course EC2020 Elements of Econometrics