Econometric Modeling.

Econometric Modeling

Econometric Models They are statistical models used in Econometrics.
.[1] Econometric Models They are statistical models used in Econometrics. They specifies the statistical relationship that is believed to hold between the various economic quantities pertaining to a particular economic phenomenon under study. An econometric model can be derived from a deterministic economic model by allowing for uncertainty, or from an economic model which itself is stochastic. However, it is also possible to use econometric models that are not tied to any specific economic theory

Example Monthly spending by consumers is linearly dependent on consumers' income in the previous month. Then the model will consist of the equation. Ct =α+βYt-1+μt where Ct is consumer spending in month t, Yt-1 is income during the previous month, and μt is an error term measuring the extent to which the model cannot fully explain consumption. Then one objective of the econometrician is to obtain estimates of the parameters α andβ; these estimated parameter values, when used in the model's equation, enable predictions for future values of consumption to be made contingent on the prior month's income.

Model selection criteria
In its most basic forms, model selection is one of the fundamental tasks of scientific inquiry. Be data admissible: that is predictions made from the model must be logical possible. Be consistence with theory: That is, it must make good economic sense. For example, if Milton Friedman’s permanent income hypothesis holds, the intercept value in the regression of permanent consumption on permanent income is expected to be zero.

Model selection techniques can be considered as estimators of some physical quantity, such as the probability of the model producing the given data. The bias and variance are both important measures of the quality of this estimator; efficiency is also often considered. A standard example of model selection is that of curve fitting, where, given a set of points and other background knowledge (e.g. points are a result of i.i.d. samples), we must select a curve that describes the function that generated the points.

Be encompassing: that is, the model should encompass or include all the rival models in the sense that it is capable of explain their results. In short, other models can not bean improvement over the chosen model. The value of the parameters should be stable. Otherwise, forecasting would be difficult.

exploratory data analysis (EDA)
It is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA),[1] which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

Specifying an Econometric Equation and Specification Error
Before any equation can be estimated, it must be completely specified Specifying an econometric equation consists of three parts, namely choosing the correct: independent variables functional form form of the stochastic error term Again, this is part of the first classical assumption from Chapter 4 A specification error results when one of these choices is made incorrectly This chapter will deal with the first of these choices (the two other choices will be discussed in subsequent chapters) © 2011 Pearson Addison-Wesley. All rights reserved. 8

Type of Specification Problem

© 2011 Pearson Addison-Wesley. All rights reserved.
Omitted Variables Two reasons why an important explanatory variable might have been left out: we forgot… it is not available in the dataset, we are examining Either way, this may lead to omitted variable bias (or, more generally, specification bias) The reason for this is that when a variable is not included, it cannot be held constant Omitting a relevant variable usually is evidence that the entire equation is a suspect, because of the likely bias of the coefficients. © 2011 Pearson Addison-Wesley. All rights reserved. 10

The Consequences of an Omitted Variable
Suppose the true regression model is: (6.1) Where is a classical error term If X2 is omitted, the equation becomes instead: (6.2) Where: (6.3) Hence, the explanatory variables in the estimated regression (6.2) are not independent of the error term (unless the omitted variable is uncorrelated with all the included variables—something which is very unlikely) But this violates Classical Assumption III! © 2011 Pearson Addison-Wesley. All rights reserved. 11

The Consequences of an Omitted Variable (cont.)
What happens if we estimate Equation 6.2 when Equation 6.1 is the truth? We get bias! What this means is that: (6.4) The amount of bias is a function of the impact of the omitted variable on the dependent variable times a function of the correlation between the included and the omitted variable Or, more formally: (6.7) So, the bias exists unless: the true coefficient equals zero, or the included and omitted variables are uncorrelated © 2011 Pearson Addison-Wesley. All rights reserved. 12

Correcting for an Omitted Variable
In theory, the solution to a problem of specification bias seems easy: add the omitted variable to the equation! Unfortunately, that’s easier said than done, for a couple of reasons Omitted variable bias is hard to detect: the amount of bias introduced can be small and not immediately detectable Even if it has been decided that a given equation is suffering from omitted variable bias, how to decide exactly which variable to include? Note here that dropping a variable is not a viable strategy to help cure omitted variable bias: If anything you’ll just generate even more omitted variable bias on the remaining coefficients! © 2011 Pearson Addison-Wesley. All rights reserved. 13

Correcting for an Omitted Variable (cont.)
What if: – You have an unexpected result, which leads you to believe that you have an omitted variable – You have two or more theoretically sound explanatory variables as potential “candidates” for inclusion as the omitted variable to the equation is to use How do you choose between these variables? One possibility is expected bias analysis – Expected bias: the likely bias that omitting a particular variable would have caused in the estimated coefficient of one of the included variables © 2011 Pearson Addison-Wesley. All rights reserved. 14

Correcting for an Omitted Variable (cont.)
Expected bias can be estimated with Equation 6.7: (6.7) When do we have a viable candidate? When the sign of the expected bias is the same as the sign of the unexpected result Similarly, when these signs differ, the variable is extremely unlikely to have caused the unexpected result © 2011 Pearson Addison-Wesley. All rights reserved. 15

Irrelevant Variables This refers to the case of including a variable in an equation when it does not belong there This is the opposite of the omitted variables case—and so the impact can be illustrated using the same model Assume that the true regression specification is: (6.10) But the researcher for some reason includes an extra variable: (6.11) The misspecified equation’s error term then becomes: (6.12) © 2011 Pearson Addison-Wesley. All rights reserved. 16

Irrelevant Variables (cont.)
So, the inclusion of an irrelevant variable will not cause bias (since the true coefficient of the irrelevant variable is zero, and so the second term will drop out of Equation 6.12) However, the inclusion of an irrelevant variable will: Increase the variance of the estimated coefficients, and this increased variance will tend to decrease the absolute magnitude of their t-scores Decrease the R2 (but not the R2) Table 6.1 summarizes the consequences of the omitted variable and the included irrelevant variable cases (unless r12 = 0) © 2011 Pearson Addison-Wesley. All rights reserved. 17

Four Important Specification Criteria
We can summarize the previous discussion into four criteria to help decide whether a given variable belongs in the equation: 1. Theory: Is the variable’s place in the equation unambiguous and theoretically sound? 2. t-Test: Is the variable’s estimated coefficient significant in the expected direction? 3. R2: Does the overall fit of the equation (adjusted for degrees of freedom) improve when the variable is added to the equation? 4. Bias: Do other variables’ coefficients change significantly when the variable is added to the equation? If all these conditions hold, the variable belongs in the equation If none of them hold, it does not belong The tricky part is the intermediate cases: use sound judgment! © 2011 Pearson Addison-Wesley. All rights reserved. 19

Specification Searches
Almost any result can be obtained from a given dataset, by simply specifying different regressions until estimates with the desired properties are obtained Hence, the integrity of all empirical work is open to question To counter this, the following three points of Best Practices in Specification Searches are suggested: Rely on theory rather than statistical fit as much as possible when choosing variables, functional forms, and the like 2. Minimize the number of equations estimated (except for sensitivity analysis, to be discussed later in this section) 3. Reveal, in a footnote or appendix, all alternative specifications estimated © 2011 Pearson Addison-Wesley. All rights reserved. 20

Sequential Specification Searches
The sequential specification search technique allows a researcher to: Estimate an undisclosed number of regressions Subsequently present a final choice (which is based upon an unspecified set of expectations about the signs and significance of the coefficients) as if it were only a specification Such a method misstates the statistical validity of the regression results for two reasons: 1. The statistical significance of the results is overestimated because the estimations of the previous regressions are ignored 2. The expectations used by the researcher to choose between various regression results rarely, if ever, are disclosed © 2011 Pearson Addison-Wesley. All rights reserved. 21

Bias Caused by Relying on the t-Test to Choose Variables
Dropping variables solely based on low t-statistics may lead to two different types of errors: 1. An irrelevant explanatory variable may sometimes be included in the equation (i.e., when it does not belong there) 2. A relevant explanatory variables may sometimes be dropped from the equation (i.e., when it does belong) In the first case, there is no bias but in the second case there is bias Hence, the estimated coefficients will be biased every time an excluded variable belongs in the equation, and that excluded variable will be left out every time its estimated coefficient is not statistically significantly different from zero So, we will have systematic bias in our equation! © 2011 Pearson Addison-Wesley. All rights reserved. 22

Sensitivity Analysis Contrary to the advice of estimating as few equations as possible (and based on theory, rather than fit!), sometimes we see journal article authors listing results from five or more specifications What’s going on here: In almost every case, these authors have employed a technique called sensitivity analysis This essentially consists of purposely running a number of alternative specifications to determine whether particular results are robust (not statistical flukes) to a change in specification Why is this useful? Because true specification isn’t known! © 2011 Pearson Addison-Wesley. All rights reserved. 23

Data Mining Data mining involves exploring a data set to try to uncover empirical regularities that can inform economic theory That is, the role of data mining is opposite that of traditional econometrics, which instead tests the economic theory on a data set Be careful, however! a hypothesis developed using data mining techniques must be tested on a different data set (or in a different context) than the one used to develop the hypothesis Not doing so would be highly unethical: After all, the researcher already knows ahead of time what the results will be! © 2011 Pearson Addison-Wesley. All rights reserved. 24

Test of specification error

6.2 Durbin-Watson Test

6.2 Durbin-Watson Test The sampling distribution of d depends on values of the explanatory variables and hence Durbin and Watson derived upper limits and lower limits for the significance level for d. There are tables to test the hypothesis of zero autocorrelation against the hypothesis of first-order positive autocorrelation. ( For negative autocorrelation we interchange )

6.2 Durbin-Watson Test If , we reject the null hypothesis of no autocorrelation. If , we do not reject the null hypothesis. If , the test is inconclusive.

6.2 Durbin-Watson Test Illustrative Example
Consider the data in Table The estimated production function is Referring to the DW table with k’=2 and n=39 for 5% significance level, we see that Since the observed , we reject the hypothesis at the 5% level.

6.2 Limitations of D-W Test
It test for only first-order serial correlation. The test is inconclusive if the computed value lies between The test cannot be applied in models with lagged dependent variables.

6.3 Estimation in Levels Versus First Differences
Simple solutions to the serial correlation problem: First Difference If the DW test rejects the hypothesis of zero serial correlation, what is the next step? In such cases one estimates a regression by transforming all the variables by ρ-differencing (quasi-first difference) First-difference

When comparing equations in levels and first differences, one cannot compare the R2 because the explained variables are different. One can compare the residual sum of squares but only after making a rough adjustment. (Please refer to P.231)

Let from the first difference equation residual sum of squares from the levels equation residual sum of squares from the first difference equation comparable from the levels equation Then

Illustrative Examples Consider the simple Keynesian model discussed by Friedman and Meiselman. The equation estimated in levels is where Ct= personal consumption expenditure (current dollars) At= autonomous expenditure

The model fitted for the period gave (figures in parentheses are standard)

This is to be compared with from the equation in first differences.

For the production function data in Table 3.11 the first difference equation is The comparable figures the levels equation reported earlier in Chapter 4, equation (4.24) are

This is to be compared with from the equation in first differences.

errors of measurement It is assumed that the data of consumption or income is accurate in Keynesian model. Unfortunately, this idea is not met in practice for a variety of reasons such as nonresponse error, reporting errors and computing error. Although the errors of measurement in the dependent variable still give unbiased estimate of the parameters and their variances.

errors of measurement The estimated variances are now larger than in the where there are no such errors of measurement. In case of measurement in the explanatory variable X, there is no satisfactory solution. That is why it is so crucial to measure the data as accurately as possible.

Econometric Modeling.

Similar presentations

Presentation on theme: "Econometric Modeling."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Econometric Modeling.

Similar presentations

Presentation on theme: "Econometric Modeling."— Presentation transcript:

Similar presentations

About project

Feedback