Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fundamentals of regression analysis

Similar presentations


Presentation on theme: "Fundamentals of regression analysis"— Presentation transcript:

1 Fundamentals of regression analysis
Obid A.Khakimov

2 The essence of OLS The main logic of OLS is to find such kind of parameters of the regression which yield the minimum sum of squared errors. OLS is based on number of assumptions about e, error term and X, the primarily reason for these assumptions is that we do not know how the data is generated or created.

3 Assumptions. Linearity. The relationship between independent and dependent variables is linear. Full Rank. There is no exact relationship among any independent variables. Exogeneity of independent variables. The error term of the regression is not a function of independent variables. Homoscedastisity and no Autocorrelation. Error term of the regression is independently and normally distributed with zero means and constant variance. Normality of Error term

4 Presentation of the regression analysis results
Are the signs of the estimated coefficients in accordance with the theory ? Are the estimated coefficients statistically significant ? How well the regression model explain the variation in dependent variable ? Does the model satisfy the assumptions of CLNRM ?

5 Classical theory of statistical inference
Statistical inference is concerned with how we draw conclusions about large population from which sample is selected. Estimation and hypothesis testing constitute two branches of inference.

6 Hypothesis test How reliable are the estimates ?
Hypothesis testing can take two forms, namely confidence interval estimation or test of significance Pr[b2est – tα/2se (b2es) <= b <= Pr[b2est + tα/2se (b2es) The reliability of point estimation measured by its standard error. Instead of relaying on point estimate we construct an interval around point estimate, say within two or three standard errors which will include with say 95% confidence the true parameter value.

7 Confidence interval estimation
Pr[b2est – tα/2se (b2es) <= b <= Pr[b2est + tα/2se (b2es) The reliability of point estimation measured by its standard error. Instead of relaying on point estimate we construct an interval around point estimate, say within two or three standard errors which will include with say 95% confidence the true parameter value.

8 Linear models: Typical linear models in matrix form

9 Linear models:

10 Logic of OLS To obtain estimator where
(Best linear unbiased and efficient)

11 Multicollinearity: reasons
Data collection process Constraints on model or in the population being sampled. Model specification An over-determined models

12 Perfect v.s less than perfect
Perfect multicollinearity is the case when two ore more independent variables Can create perfect linear relationship. Perfect multicollinearity is the case when two ore more independent variables Can create less than perfect linear relationship.

13 Practical consequences
The OLS is BLUE but large variances and covariances making process estimation difficult. Large variances cause large confidence intervals and accepting or rejecting hypothesis are biased. T statistics are biased Although t-stats are low, R-square might be very high. The sensitivity of estimators and variances are very high to small changes in dataset.

14 Ha: Not all slope coefficients are simultaneously zero
Due to low t-stats we can not reject our Null Hypothesis Ha: Not all slope coefficients are simultaneously zero Due to high R square the F-value will be very high and rejection of Ho will be easy

15 Detection How to detect : Multicollinearity is a question of degree.
It is a feature of sample but not population. How to detect : High R square but low t-stats. High correlation coefficients among the independent variables. Auxiliary regression High VIF Eigenvalue and condition index.***

16 Auxiliary regression Ho: The Xi variable is not collinear
Run regression where one X is dependent and other X’s are independent and Obtain R square Df num = k-2 Df denom = n-k+1 k- is the number of explanatory variables including intercept. n- is sample size. If F stat is higher than F critical then Xi variable is collinear Rule of thumb: if R square of auxiliary regression is higher than over R square then it might be troublesome.

17 What to do ? Do nothing. Combining cross section and time series
Transformation of variables (differencing, ratio transformation) Additional data observations.

18 Assumption: Homoscedasticity or equal variance of ui X Y f(u)

19 Reasons: Error learning models;
Higher variability in independent variable might increase higher variability in dependent variable. Spatial Correlation. Data collecting biases. Existence of extreme observations (outliers) Incorrect specification of Model Skewness in the distribution


Download ppt "Fundamentals of regression analysis"

Similar presentations


Ads by Google