Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regression Analysis AGEC 784.

Similar presentations


Presentation on theme: "Regression Analysis AGEC 784."— Presentation transcript:

1 Regression Analysis AGEC 784

2 Modeling Relationships
In some circumstances, data can be valuable in helping to determine the parameters in a relationship or its structural form. The process of using data to formulate relationships is known as regression analysis. In this approach, we identify one variable as the response variable, which means that it can be predicted from the values of other variables. Those other variables are called explanatory variables.

3 Types of Regression Models
Regression models that involve one explanatory variable are called simple regressions When two or more explanatory variables are involved, the relationships are called multiple regressions. Regression models are also divided into linear and nonlinear models, depending on whether the relationship between the response and explanatory variables is linear or nonlinear.

4 Estimating Relationships
Scatter plot – visualize association Correlation: n – number of pairs of observations for x, y sx, sy – standard deviations of x, y r – measures strength of linear relationship between x and y

5 r-statistic Independent of units of measurement Lies in range [-1, 1]
r > 0 – positive association r < 0 – negative association r close to 1 (or –1) implies a strong association r close to 0 implies a weak association Excel function: CORREL(xrange,yrange)

6 Simple Linear Regression
y = a + bx + e y - dependent variable x - independent variable e - an “error” term. Constants a and b represent the intercept and slope, respectively, of the regression line.

7 Error Term in Regression
Unexplained “noise” in the relationship May represent limitations of knowledge Or may represent random deviations of the dependent variable from its mean, y

8 Regression Goal Want to find line to most closely match the observed relationship between x and y Define “most closely” as minimizing sum of squared differences between observed and model values Minimizing sum of differences would set y equal to its mean Penalizes large differences more than small differences

9 Performing Regression
Residuals: ei = yi – y = yi – (a + bxi) Sum of squared differences between observations and model : SS = The regression problem: choose a and b to minimize SS

10 Regression Analysis Assumes residuals are normally distributed with mean 0 Regression parameters can be calculated directly from the data Simpler to use Excel’s regression tool (Under Data Analysis menu)

11 Goodness of Fit Coefficient of determination: R2 Lies in range [0, 1]
Closer to one – better fit Measures how much of the variation in y-values is explained by model 1 – perfect match to model 0 – equation explains none of observed variation

12 Regression Window

13 Regression Output R Squared Degree of significance
(under 0.1 is significant) P values of under 0.1 are statistically significant Estimate for a Estimate for b

14 Regression Statistics
Four measures are used to judge the statistical qualities of a regression: R2: Measures the percent of variation in the explanatory variable accounted for by the regression model. F-statistic (Significance F): Measures the probability of observing the given R2 (or higher) when all the true regression coefficients are zero. p-value: Measures the probability of observing the given estimate of the regression coefficient (or a larger value, positive or negative) when the true coefficient is zero. Confidence interval: Gives a range within which the true regression coefficient lies with given probability.

15 Simple Nonlinear Regression
A straight line may not be the most plausible description of dependency, e.g., y = axb . Can follow previous ideas to minimize sum of squared differences No Excel functions or simple formulas Or can transform non-linear relationship into linear one, e.g., log y = log a + b log x Give up some intuition for convenience

16 Multiple Linear Regression
Multiple independent variables y = a0 + a1x1 + a2x2 + … + amxm + e Work with n observations – each has: One observation of dependent variable One observation each of the m independent variables Seek to minimize the sum of squared differences Put all independent variables into x-range in Excel’s regression tool

17 Regression Output Square root of R square
Coefficient of multiple determination Accounts for presence of multiple variables P values of under 0.1 are statistically significant Coefficients of regression equation

18 Values to Include in Regression
Ideally pick values that can be justified based on practical or theoretical grounds Could choose set that generates largest value of adjusted R2 Also could choose based on those with significant p-values for coefficients Remember that good models require good forecasts for the independent variables.

19 Regression Assumptions
Errors in the regression model: Follow a Normal distribution Are mutually independent Have the same variance Linearity is assumed to hold


Download ppt "Regression Analysis AGEC 784."

Similar presentations


Ads by Google