Charles University Charles University STAKAN III

Slides:



Advertisements
Similar presentations
Biomedical Statistics Testing for Normality and Symmetry Teacher:Jang-Zern Tsai ( 蔡章仁 ) Student: 邱瑋國.
Advertisements

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: a Monte Carlo experiment Original citation: Dougherty, C. (2012) EC220.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
The General Linear Model. The Simple Linear Model Linear Regression.
ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
The Simple Linear Regression Model: Specification and Estimation
The Simple Regression Model
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Relationships Among Variables
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Founded 1348Charles University. Johann Kepler University of Linz FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Charles University.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Founded 1348Charles University
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
ANOVA, Regression and Multiple Regression March
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Joint Moments and Joint Characteristic Functions.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
The Simple Linear Regression Model: Specification and Estimation  Theory suggests many relationships between variables  These relationships suggest that.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Virtual University of Pakistan
Estimating standard error using bootstrap
Charles University Charles University STAKAN III
Step 1: Specify a null hypothesis
Linear Algebra Review.
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Charles University Charles University STAKAN III
Statistical Data Analysis - Lecture /04/03
Visual Recognition Tutorial
12. Principles of Parameter Estimation
3. The X and Y samples are independent of one another.
Assumption of normality
Charles University Charles University STAKAN III
Correlation and Regression
Introduction to Econometrics, 5th edition
Summarizing Data by Statistics
Charles University Charles University STAKAN III
Serial Correlation and Heteroscedasticity in
Chapter 4, Regression Diagnostics Detection of Model Violation
Charles University Charles University STAKAN III
Virtual University of Pakistan
Chapter 7: The Normality Assumption and Inference with OLS
Charles University Charles University STAKAN III
The Examination of Residuals
The Multiple Regression Model
Making Inferences about Slopes
12. Principles of Parameter Estimation
Serial Correlation and Heteroscedasticity in
Presentation transcript:

Charles University Charles University STAKAN III Tuesday, 14.00 – 15.20 Charles University Charles University Econometrics Econometrics Jan Ámos Víšek Jan Ámos Víšek FSV UK Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences STAKAN III Seventh Lecture

Schedule of today talk Verification of further assumptions for BLUE and for consistency, and their discussion. Verification of normality of disturbances - by tests and graphically. Modification of our results for the framework with random explanatory variables.

 Let us recall once again - Theorem Let be a sequence of r.v’s, . Assumptions  Let be a sequence of r.v’s, . Assertions Then is the best linear unbiased estimator . Assumptions If moreover , and ‘s are independent, Assertions for is consistent. Assumptions If further , regular matrix, Assertions then where . How to verify that for ? Of course, It is to be deduced !!! for cross-sectional data !

Remember in the First Lecture we introduced “Types of data” The order of rows is not relevant. “Cross-sectional data” On every row there is one patient, one industry, etc. Usually we say that on any row there is one “case.” “Panel data” The order of rows is relevant. On the rows the values of given patient (industry, etc.) at time are given. The order of rows in blocks is relevant. Combinations of both types are also assumed and usually called also “Panel data”. (Let us continue on the next slide where is a discussion why verification of for can’t be (principally) “statistical” but heuristic.)

Why verification of for can’t be (principally) “statistical” but heuristic?  We use residuals ‘s as “substitutes” for disturbances ‘s, hence for every case i we have one “realization” of r.v. . So we have no chance to check whether different r.v. are not correlated.  What we can check, is e.g. that for all i, ( such test will be offered by Durbin-Watson statistics for panel data ) but it is senseless for cross-sectional data - remember that the order of data is irrelevant.

Let us recall once again - Theorem Assumptions Let be a sequence of r.v’s, . Assertions Then is the best linear unbiased estimator . Assumptions If moreover , and ‘s are independent, Assertions for is consistent. Assumptions If further , regular matrix, Assertions then where . How to verify that , and ?

The answer follows from the fact that all three conditions , and are in the form of limits  they can’t be verified ! Of course, they give a hint which data are suitable to be “explained” by regression model, or in other words, they indicate when we can expect that the estimated regression model is reliable “explanation” of data. Notice that in both cases the words “explained” and “explanation” are in converted commas !!! It has a philosophical motivation. ( We’ll discuss it later. )

Let us consider the condition . If the “red” distance “increases” over any limits, the condition is not fulfilled. From the computational point of view it means that OLS will consider left-lower cloud as one point and two other points (in right upper corner) as another one. So the information is reduced to two points. The condition however can be distorted by many others type of data, e.g. 1,2,3, ...., gives which is not .

Let us consider now the condition which means that is bounded. If the cloud at the center will contain more and more points which will be nearer and nearer to a “gravity center”, the condition will be broken and OLS will consider all data as one point. The condition however can be distorted by many others type of data, e.g. 1,0,1,0,0,1,0,0,0,1, .... . k-th “1” will appear on the po- sition n=k(k+1)/ 2, i.e. and hence will not be bounded..

Finally, let us recall also - Theorem Let be iid. r.v’s, . Assumptions Let be iid. r.v’s, . Then and attains Rao-Cramer Assertions lower bound, i.e. is the best unbiased estimator. BLUE Assumptions If is the best unbiased estimator attaining Rao-Cramer lower bound of variance, then and . Assertions Moreover, we have showed (Third Lecture) that restricting ourselves on linear estimators is drastic, i.e. we should guarantee that the condition under which OLS is best among all estimators, holds. It means that the normality of disturbances is to be checked !!

Testing whether ...... ? We have numbers and we assume that they represent a realization of sequence of i.i.d. random variables governed by d.f. . How to test this assumption ? There are basically two types of ( statistical ) tests: The tests based on a mutual fit of empirical and theoretical d.f. or comparison of frequencies and theoretical density - tests of good fit. The tests based on some specific feature of given d.f. - e.g. test for normality based on skewness and kurtosis.

Kolmogorov-Smirnov distance D Empirical (observed) d.f. Theoretical (assumed) d.f. We look for the maximal distance D between the red and blue curve. Available e.g. in STATISTICA (equal zero elsewhere)

-test of good fit Assume that probability of this area is . Then from n realizat- ions approximately should fall in it. If denotes real number of observations falling in it, we should compare and . This is the most frequently used test of good fit. Assume k areas ( exhausting support of d.f.) and evaluate Available also in STATISTICA

Can we apply -test of good fit on residuals ? Let us recall that and hence . Conclusion: residuals are not i.i.d. !!!

Residuals should be recalculated - Theil’s residuals 1965 Let us put is of type , regular (assumption) and eigenvalues and eigenvectors of then the coordinates of are i.i.d. and they are normally distributed iff ‘s are normally distributed. Conclusion: We can apply on .

Should be the residuals really recalculated ? Let us recall that the matrix is idempotent and hence . It means that the diagonal elements are of order p/n. But then the Cauchy-Schwarz inequality implies the same for nondiagonal elements. Finally, the formulae indicates that the correlation is weak - asymptotically zero - and that the heteroscedasticity is low - asymptotically disappears.

Residuals need not be recalculated The approach presented in the previous text was advised in the econometric monographs up to the mid of seventies. Nevertheless, there were papers showing that asymptotically the results coincide for recalculated residuals with results for the “original” residuals ( the idea of proof was given on the previous slide ). In 1974 a large Monte Carlo study by Bolch and Huang manifested that the result may be even better for the “original” residuals than for recalculated. Conclusion: Nowadays we usually apply on .

Test for normality based on skewness and kurtosis Put and . Then and are sample skewness and sample kurtosis, respectively, and are asymptotically normally distributed with . Both tests reject the hypothesis about normality on the level of significance . Similar to them is Jarque-Bera test, available e.g. in TSP.

A graphical test for normality - normal plot We have numbers and we assume that they represent a realization of sequence of i.i.d. random variables governed by d.f. . How to estimate, say 30% lower quantile ? Even a common sense would advise to order the sample, i.e. and select such that . Theory says that this estimation is consistent. So the ordered re- siduals should correspond to the quantiles of underlying ( we of course assume that of normal ) distribution. In other words, .

A graphical test for normality - normal plot If the points of graph lie ( approximately) on a line, the hypothesis of normality can’t be rejected.

 Let us conclude testing the normality by .... We have recalled the Theorem Assumptions  Let be a sequence of r.v’s, . Assertions Then is the best linear unbiased estimator . Assumptions If moreover , and ‘s are independent, Assertions for is consistent. Assumptions If further , regular matrix, Assertions then . and we have added It means that the normality of disturbances is to be checked !! It would be however better to add It means that the normality of disturbances is to be reached !!

How can we say: It means that the normality of disturbances is to be reached !! Of course, this is closely related to the philosophy of modeling, to the notion of causality and natural laws, underlying “true” model, etc. . We shall discuss it on a special lecture but later. For a moment, let us assume that we look for a model which works, simply. Then of course, we can transform data -the most frequently used transformation is the logarithmic one. Box and Cox, in sixties, studied transformations of type . Notice:

At the start of the Fourth Lecture we recalled what we already know about the linear regression model.  is BLUE is consistent   is asymptotically normal  is the best among all unbiased estimators But we did not recalled that already in the First Lecture we said that the explanatory variables would be assumed to be deterministic.

But sometimes it may be more appropriate to assume that the explanatory variables are also random ( we speak usually about “random-carrier framework” ). Consider e.g. that we look for oil in an Arabian desert !! How the assumptions are to be reformulated ? And how the theory should be modified ? Assumptions on disturbances Orthogonality condition  Sphericality condition  is BLUE

Assumptions on explanatory variables  i.i.d r.v’s regular is consistent and asymptotically normal Assumptions on explanatory variables  is the best among all unbiased estimators

What is to be learnt from this lecture for exam ? How to test the assumption - that disturbances are not correlated, - assumptions for the consistency ? Verification of normality of disturbances - by statistical tests and graphically. Modifications of all, what was assumed for model with deterministic explanatory variables, for the model with the random ones. All what you need is on http://samba.fsv.cuni.cz/~visek/