Download presentation

Presentation is loading. Please wait.

Published byReanna Tippens Modified over 2 years ago

1
Statistical Techniques I EXST7005 Simple Linear Regression

2
n Measuring & describing a relationship between two variables è Simple Linear Regression allows a measure of the rate of change of one variable relative to another variable. è Variables will always be paired, one termed an independent variable (often referred to as the X variable) and a dependent variable (termed a Y variable). è There is a change in the value of variable Y as the value of variable X changes.

3
Simple Linear Regression (continued) n For each value of X there is a population of values for the variable Y (normally distributed). Y X

4
Simple Linear Regression (continued) n The linear model which discribes this relationship is given as è Yi = b0 + b1Xi è this is the equation for a straight line è where; b0 is the value of the intercept (the value of Y when X = 0) è b1 is the amount of change in Y for each unit change in X. (i.e. if X changes by 1 unit, Y changes by b1 units). b1 is also called the slope or REGRESSION COEFFICIENT

5
Simple Linear Regression (continued) n Population Parameters y.x = the true population mean of Y at each value of X 0 = the true value of the Y intercept 1 = the true value of the slope, the change in Y per unit of X y.x = 0 + 1Xi è this is the population equation for a straight line

6
Simple Linear Regression (continued) n The sample equation for the line describes a perfect line with no variation. In practice there is always variation about the line. We include an additional term to represent this variation. y.x = 0 + 1Xi + i for a population n Yi = b0 + b1Xi + ei for a sample è when we put this term in the model, we are describing individual points as their position on the line, plus or minus some deviation

7
Simple Linear Regression (continued) Y X n

8
n the SS of deviations from the line will form the basis of a variance for the regression line n when we leave the ei off the sample model, we are describing a point on the regression line predicted from the sample. To indicate this we put a HAT on the Yi value Simple Linear Regression (continued)

9
Characteristics of a Regression Line The line will pass through the point X, Y (also the point 0, b0) n The sum of squared deviations (measured vertically) of the points from the regression line will be a minimum. n Values on the line can be described by the equation Y = b0 + b1Xi

10
Fitting the line n Fitting the line starts with a corrected SSDeviation, this is the SSDeviation of the observations from a horizontal line through the mean. Y X

11
Fitting the line (continued) n The fitted line is pivoted on the point until it has a minimum SSDeviations. Y X

12
Fitting the line (continued) How do we know the SSDeviations are a minimum? Actually, we solve the equation for ei, and use calculus to determine the solution that has a minimum of ei2.

13
Fitting the line (continued) n The line has some desirable properties E(b0) = 0 E(b1) = 1 E( YX) = X.Y n Therefore, the parameter estimates and predicted values are unbiased estimates.

14
The regression of Y on X n Y = the "dependent" variable, the variable to be predicted n X = the "independent" variable, also called the regressor or predictor variable. n Assumptions - general assumptions è Y variable is normally distributed at each value of X è The variance is homogeneous (across X). è Observations are independent of each other and ei independent of the rest of the model.

15
The regression of Y on X (continued) n Special assumption for regression. n Assume that all of the variation is attributable to the dependent variable (Y), and that the variable X is measured WITHOUT ERROR. n Note that the deviations are measured vertically, not horizontally or perpendicular to the line.

16
Derivation of the formulas n Any observation can be written as è Yi = b0 + b1Xi + ei for a sample è where; ei = a deviation fo the observed point from the regression line n note, the idea of regression is to minimize the deviation of the observations from the regression line, this is called a Least Squares Fit

17
Derivation of the formulas (continued) ei = 0 n the sum of the squared deviations ei2 = (Yi - Yhat)2 ei2 = (Yi - b0 + b1Xi )2 The objective is to select b0 and b1 such that ei2 is a minimum, this is done with calculus n You do not need to know this derivation!

18
A note on calculations n We have previously defined the uncorrected sum of squares and corrected sum of squares of a variable Yi The uncorrected SS is Yi2 The correction factor is ( Yi)2/n The corrected SS is Yi2 - ( Yi)2/n n Your book calls this SYY, the correction factor is CYY n We could define the exact same series of calculations for Xi, and call it SXX

19
A note on calculations (continued) n We will also need a crossproduct for regression, and a corrected crossproduct n The crossproduct is XiYi The Sum of crossproducts is XiYi, which is uncorrected The correction factor is ( Xi)( Yi) / n = CXY The corrected crossproduct is XiYi-( Xi)( Yi)/n n Which you book calls SXY

20
n the partial derivative is taken with respect to each of the parameters for b0 Derivation of the formulas (continued)

21
n set the partial derivative to 0 and solve for b0 2 (Yi-b0-b1Xi)(-1) = 0 - Yi + nb0 + b1 Xi = 0 nb0 = Yi - b1 Xi b0 = Y - b1 X n So b0 is estimated using b1 and the means of X and Y

22
Derivation of the formulas (continued) n Likewise for b1 we obtain the partial derivative

23
Derivation of the formulas (continued) n set the partial derivative to 0 and solve for b1 2 (Yi-b0-b1Xi)(-Xi) = 0 - (YiXi + b0Xi + b1 Xi2) = 0 - YiXi + b0 Xi + b1 Xi2) = 0 and since b0 = Y - b1 X ), then YiXi = ( Yi/n - b1 Xi/n ) Xi + b1 Xi2 YiXi = Xi Yi/n - b1 ( Xi)2/n + b1 Xi2 YiXi - Xi Yi/n = b1 [ Xi2 - ( Xi)2/n] b1 = [ YiXi - Xi Yi/n] / [ Xi2 - ( Xi)2/n]

24
Derivation of the formulas (continued) b1 = [ YiXi - Xi Yi/n] / [ Xi2 - ( Xi)2/n] n b1 = SXY / SXX n so b1 is the corrected crossproducts over the corrected SS of X The intermediate statistics needed to solve all elements of a SLR are Xi, Yi, n, Xi2, YiXi and Yi2 (this last term we haven't seen in the calculations above, but we will need later)

25
Derivation of the formulas (continued) n Review n We want to fit the best possible line, we define this as the line that minimizes the vertically measured distances from the observed values to the fitted line. n The line that achieves this is defined by the equations b0 = Y - b1 X b1 = [ YiXi - Xi Yi/n] / [ Xi2 - ( Xi)2/n]

26
Derivation of the formulas (continued) n These calculations provide us with two parameter estimates that we can then use to get the equation for the fitted line.

27
Numerical example n See Regression handout

28
About Crossproducts n Crossproducts are used in a number of related calculations. n a crossproduct = YiXi Sum of crossproducts = YiXi = SXY Covariance = YiXi / (n-1) n Slope = SXY / SXX n SSRegression = S2XY / SXX Correlation = SXY / SXXSYY n R2 = r2 = S2XY / SXXSYY = SSRegression/SSTotal

Similar presentations

OK

Kin 304 Regression Linear Regression Least Sum of Squares

Kin 304 Regression Linear Regression Least Sum of Squares

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

5 components of reading ppt on ipad Open ppt on mac Ppt on landing gear system diagram Free ppt on motivational stories Ppt on sea level rise in india Ppt on index of industrial production us Ppt on applied operational research definition Ppt on liquid crystal display Ppt on regional trade agreements bolivia Ppt on different types of soil found in india