Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discussion of time series and panel models

Similar presentations


Presentation on theme: "Discussion of time series and panel models"— Presentation transcript:

1 Discussion of time series and panel models
Class 10 (And some of 9) Line fitting OLS regression Discussion of time series and panel models Project check in

2 Statistical Significance
Research and null hypotheses Hypothesis states the relationship between two variables. The null hypothesis state that there is NO (or a random) relationship between two variables. H: Democracies trade more with each other than with non-democracies. H0: Status as a democracy is not related to trade volume You are testing to reject H0 not accept H.

3 Types of Error Decision based on Sample State of Nature H0 true
H0 Untrue Reject H0 Type 1 error (false alarm) Correct Do not Reject H0 Type 2 error

4 Alpha level =.05, 5% chance of committing Type 1 error, or 95% chance of the decision to reject the null hypothesis being correct.

5 Causality In establishing causality there is a dependent variable, which you are trying to explain, and one or more independent variables that are assumed to be factors in the variation of the dependent variable. You need a logical model to “explain” this relationship or causality

6 Thinking in Models (again)
What is a model? Explains which elements relate to each other and how. Describing Relationships in a model Covariation – move in the same direction Direct or Positive Inverse or Negative Nonlinear False of spurious Control (confounding) variables Are you looking for the best model or testing someone else’s?

7 Developing models Where does a model come from?
From your own assessment and observation of the problem, or from talking to others. From the literature. Elements others include or consider important Definitions of these elements Descriptions of the “expected” relationships among variables Results and explanations Sources and strategies for data Suggestions of models or variations to be tested in the future

8 Types of Models Schematic Symbolic
Economic growth is a function of changes to the amount of capital (K) and changes to the amount of Labor (L). G=f(K,L) Capital Econ Growth Labor

9 The basic linear model (equation)
You can express many relationships as the linear equation: y = a + bx, where y is the dependent variable x is the independent variable a is a constant b is the slope of the line For every increase of 1 in x, y changes by an amount equal to b A perfectly linear relationship is where each change results in exactly the same change. i.e. a strict ad valorem tariff.

10 Line Fitting Other relationships may not be so exact.
Weight, is only to some degree a function of height. If you take a sample of actual heights and weights, you might see something like the graph to the right. Source:

11 Line Fitting (cont.) y = a + bx+e
The line is the “average” relationship described by the equation: y = a + bx+e The difference between the line and any individual observation is the error (e). The observations that contributed to this analysis were all for heights between 5’ and 6’4”. You cannot, extrapolate the results to heights outside of those observed. The regression results are only valid for the range of actual observations.

12 Regression Regression is the method by which we find the line that best fits the observations, i.e. has the lowest error. Since the line describes the mean of the effects of the independent variables, by definition, the sum of the actual errors will be zero. If you add up all of the values of the dependent variable and you add up all the values predicted by the model, the sum is the same and the sum of the negative errors (for points below the line) will exactly offset the sum of the positive errors (for points above the line). Therefore Summing the errors would always equal zero. So, instead, regression must find another way to measure the scale of the error. An Ordinary Least Squares (OLS) regression finds the line that results in the lowest sum of squared errors.

13 Multiple Regression What if we have multiple factors contributing to a result or a prediction? For example basic economic theory suggests that capital and labor contribute to economic growth. Hard to “see” how these two factors contribute to growth.

14 The multiple regression equation
Each of these factors has a separate relationship with the price of a home. The equation that describes a multiple regression relationship is: y = a + b1L + b2K + e This equation separates each individual independent variable from the rest, allowing each to have its own coefficient describing its relationship to the dependent variable. If Labor and Capital have the same coefficient than both contribute equally to economic growth. In a statistics software program you will enter your dependent variable first and then your independent variables. You will need to make sure the data and the variables conform to the assumptions of the model

15 Regression Statistics
SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 121 ANOVA df SS MS F Significance F Regression 2 2.76E-27 Residual 118 Total 120 Coefficients t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 4.12E-11 ln pop2001 9.31E-09 ln GDP per capita 1.53E-27

16 How good is the model The R2 value
tells you what proportion of differences is explained by the model. An R2 of .68, for example, means that 68% of the variance in the observed values of the dependent variable is explained by the model, and 32% of those differences remains unexplained in the error term. Returning to the model of economic growth… Is explaining 50% of the causes good enough?

17 How much should you explain?
Random error need not be a problem. There is always error, a larger R-square is not a goal in and of itself. Some error is due to latent variables that can not be observed. There may be additional variables that can be logically assumed to measure these causes of variation indirectly in some way. But even if they empirically appear to “explain” the variation within the regression model, variables should not necessarily be added unless there appears to be a logical way in which they might explain variation in the independent variable.

18 Statistical Significance
Each independent variable has a “p-value” or significance level in the results. Sometimes it is explicitly given, sometimes just the test statistic with which significance can be derived. The p-value is a percentage. It tells you how likely it is that the coefficient for that independent variable emerged by chance and does not describe a real relationship (type I error). A p-value of .05 means that there is a 5% chance that the relationship emerged randomly and a 95% chance that the relationship is real. It is generally accepted practice to consider variables with a p-value of less than .1 as significant, though the only basis for this cutoff is convention.

19 Direction and Size Look at the signs of the B coefficients.
Do they have the expected signs? Your model and hypothesis should give you an expectation of the direction of each independent variable’s influence. Is the effect large or small? Even if it is significant and in the right direction, does a change in the independent variable yield a large or small change in the independent variable or vice versa?

20 F-Test There is also a significance level for the model as a whole.
The F-test or “Significance F” value in Excel measures the likelihood that the model as a whole describes a relationship that emerged at random, rather than a real relationship. As with the p-value, the lower the significance F value, the greater the chance that the relationships in the model are real.

21 Other Errors or Problems
Multicollinearity Omitted Variables Endogeneity Other

22 Presenting Regression Results


Download ppt "Discussion of time series and panel models"

Similar presentations


Ads by Google