Presentation on theme: "Statistical Techniques I EXST7005 Simple Linear Regression."— Presentation transcript:
Statistical Techniques I EXST7005 Simple Linear Regression
n Measuring & describing a relationship between two variables è Simple Linear Regression allows a measure of the rate of change of one variable relative to another variable. è Variables will always be paired, one termed an independent variable (often referred to as the X variable) and a dependent variable (termed a Y variable). è There is a change in the value of variable Y as the value of variable X changes.
Simple Linear Regression (continued) n For each value of X there is a population of values for the variable Y (normally distributed). Y X
Simple Linear Regression (continued) n The linear model which discribes this relationship is given as è Yi = b0 + b1Xi è this is the equation for a straight line è where; b0 is the value of the intercept (the value of Y when X = 0) è b1 is the amount of change in Y for each unit change in X. (i.e. if X changes by 1 unit, Y changes by b1 units). b1 is also called the slope or REGRESSION COEFFICIENT
Simple Linear Regression (continued) n Population Parameters y.x = the true population mean of Y at each value of X 0 = the true value of the Y intercept 1 = the true value of the slope, the change in Y per unit of X y.x = 0 + 1Xi è this is the population equation for a straight line
Simple Linear Regression (continued) n The sample equation for the line describes a perfect line with no variation. In practice there is always variation about the line. We include an additional term to represent this variation. y.x = 0 + 1Xi + i for a population n Yi = b0 + b1Xi + ei for a sample è when we put this term in the model, we are describing individual points as their position on the line, plus or minus some deviation
n the SS of deviations from the line will form the basis of a variance for the regression line n when we leave the ei off the sample model, we are describing a point on the regression line predicted from the sample. To indicate this we put a HAT on the Yi value Simple Linear Regression (continued)
Characteristics of a Regression Line The line will pass through the point X, Y (also the point 0, b0) n The sum of squared deviations (measured vertically) of the points from the regression line will be a minimum. n Values on the line can be described by the equation Y = b0 + b1Xi
Fitting the line n Fitting the line starts with a corrected SSDeviation, this is the SSDeviation of the observations from a horizontal line through the mean. Y X
Fitting the line (continued) n The fitted line is pivoted on the point until it has a minimum SSDeviations. Y X
Fitting the line (continued) How do we know the SSDeviations are a minimum? Actually, we solve the equation for ei, and use calculus to determine the solution that has a minimum of ei2.
Fitting the line (continued) n The line has some desirable properties E(b0) = 0 E(b1) = 1 E( YX) = X.Y n Therefore, the parameter estimates and predicted values are unbiased estimates.
The regression of Y on X n Y = the "dependent" variable, the variable to be predicted n X = the "independent" variable, also called the regressor or predictor variable. n Assumptions - general assumptions è Y variable is normally distributed at each value of X è The variance is homogeneous (across X). è Observations are independent of each other and ei independent of the rest of the model.
The regression of Y on X (continued) n Special assumption for regression. n Assume that all of the variation is attributable to the dependent variable (Y), and that the variable X is measured WITHOUT ERROR. n Note that the deviations are measured vertically, not horizontally or perpendicular to the line.
Derivation of the formulas n Any observation can be written as è Yi = b0 + b1Xi + ei for a sample è where; ei = a deviation fo the observed point from the regression line n note, the idea of regression is to minimize the deviation of the observations from the regression line, this is called a Least Squares Fit
Derivation of the formulas (continued) ei = 0 n the sum of the squared deviations ei2 = (Yi - Yhat)2 ei2 = (Yi - b0 + b1Xi )2 The objective is to select b0 and b1 such that ei2 is a minimum, this is done with calculus n You do not need to know this derivation!
A note on calculations n We have previously defined the uncorrected sum of squares and corrected sum of squares of a variable Yi The uncorrected SS is Yi2 The correction factor is ( Yi)2/n The corrected SS is Yi2 - ( Yi)2/n n Your book calls this SYY, the correction factor is CYY n We could define the exact same series of calculations for Xi, and call it SXX
A note on calculations (continued) n We will also need a crossproduct for regression, and a corrected crossproduct n The crossproduct is XiYi The Sum of crossproducts is XiYi, which is uncorrected The correction factor is ( Xi)( Yi) / n = CXY The corrected crossproduct is XiYi-( Xi)( Yi)/n n Which you book calls SXY
n the partial derivative is taken with respect to each of the parameters for b0 Derivation of the formulas (continued)
n set the partial derivative to 0 and solve for b0 2 (Yi-b0-b1Xi)(-1) = 0 - Yi + nb0 + b1 Xi = 0 nb0 = Yi - b1 Xi b0 = Y - b1 X n So b0 is estimated using b1 and the means of X and Y
Derivation of the formulas (continued) n Likewise for b1 we obtain the partial derivative
Derivation of the formulas (continued) n set the partial derivative to 0 and solve for b1 2 (Yi-b0-b1Xi)(-Xi) = 0 - (YiXi + b0Xi + b1 Xi2) = 0 - YiXi + b0 Xi + b1 Xi2) = 0 and since b0 = Y - b1 X ), then YiXi = ( Yi/n - b1 Xi/n ) Xi + b1 Xi2 YiXi = Xi Yi/n - b1 ( Xi)2/n + b1 Xi2 YiXi - Xi Yi/n = b1 [ Xi2 - ( Xi)2/n] b1 = [ YiXi - Xi Yi/n] / [ Xi2 - ( Xi)2/n]
Derivation of the formulas (continued) b1 = [ YiXi - Xi Yi/n] / [ Xi2 - ( Xi)2/n] n b1 = SXY / SXX n so b1 is the corrected crossproducts over the corrected SS of X The intermediate statistics needed to solve all elements of a SLR are Xi, Yi, n, Xi2, YiXi and Yi2 (this last term we haven't seen in the calculations above, but we will need later)
Derivation of the formulas (continued) n Review n We want to fit the best possible line, we define this as the line that minimizes the vertically measured distances from the observed values to the fitted line. n The line that achieves this is defined by the equations b0 = Y - b1 X b1 = [ YiXi - Xi Yi/n] / [ Xi2 - ( Xi)2/n]
Derivation of the formulas (continued) n These calculations provide us with two parameter estimates that we can then use to get the equation for the fitted line.
About Crossproducts n Crossproducts are used in a number of related calculations. n a crossproduct = YiXi Sum of crossproducts = YiXi = SXY Covariance = YiXi / (n-1) n Slope = SXY / SXX n SSRegression = S2XY / SXX Correlation = SXY / SXXSYY n R2 = r2 = S2XY / SXXSYY = SSRegression/SSTotal