Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Chapter 12 Simple Linear Regression
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Experimental design and analyses of experimental data Lesson 2 Fitting a model to data and estimating its parameters.
Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
EPI 809/Spring Probability Distribution of Random Error.
Simple Linear Regression and Correlation
Chapter 12 Simple Linear Regression
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Chapter 11: Inferential methods in Regression and Correlation
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Chapter 12 Simple Regression
The Simple Regression Model
Gordon Stringer, UCCS1 Regression Analysis Gordon Stringer.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Chapter Topics Types of Regression Models
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Simple Linear Regression Analysis
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Correlation and Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
8.1 Ch. 8 Multiple Regression (con’t) Topics: F-tests : allow us to test joint hypotheses tests (tests involving one or more  coefficients). Model Specification:
Lecture 15 Basics of Regression Analysis
Linear Regression.
Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression Models
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
Xuhua Xia Correlation and Regression Introduction to linear correlation and regression Numerical illustrations SAS and linear correlation/regression –CORR.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Experimental Statistics - week 3
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Experimental Statistics - week 9
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Principles of Biostatistics Chapter 17 Correlation 宇传华 网上免费统计资源(八)
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
1 Linear Regression Model. 2 Types of Regression Models.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Presentation transcript:

Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted values Overall Mean Actual values

When analyzing a regression-type data set, the first step is to plot the data: XY The next step is to determine the line that ‘best fits’ these points. It appears this line would be sloped upward and linear (straight).

1) The regression line passes through the point (X avg, Y avg ). 2) Its slope is at the rate of “m” units of Y per unit of X, where m = regression coefficient (slope; y=mx+b) The line of best fit is the sample regression of Y on X, and its position is fixed by two results: (55, 138) Y = 1.24(X) slopeY-intercept Rise/Run

Testing the Regression Line for Significance An F-test is used based on Model, Error, and Total SOS. –Very similar to ANOVA Basically, we are testing if the regression line has a significantly different slope than a line formed by using just Y_avg. –If there is no difference, then that means that Y does not change as X changes (stays around the average value) To begin, we must first find the regression line that has the smallest Error SOS.

Independent Value Dependent Value Error SOS The regression line should pass through the overall average with a slope that has the smallest Error SOS (Error SOS = the distance between each point and predicted line: gives an index of the variability of the data points around the predicted line). overall average is the pivot point

For each X, we can predict Y:Y = 1.24(X) XY_ActualY_PredSOS Error Error SOS is calculated as the sum of (Y Actual – Y Predicted ) 2 This gives us an index of how scattered the actual observations are around the predicted line. The more scattered the points, the larger the Error SOS will be. This is like analysis of variance, except we are using the predicted line instead of the mean value.

Total SOS Calculated as the sum of (Y – Y avg ) 2 Gives us an index of how scattered our data set is around the overall Y average. Overall Y average Regression line not shown

XY_ActualY AverageSOS Total Total SOS gives us an index of how scattered the data points are around the overall average. This is calculated the same way for a single treatment in ANOVA. What happens to Total SOS when all of the points are close to the overall average? What happens when the points form a non-horizontal linear trend?

Model SOS Calculated as the Sum of (Y Predicted – Y avg ) 2 Gives us an index of how far all of the predicted values are from the overall average. Distance between predicted Y and overall mean

Model SOS Gives us an index of how far away the predicted values are from the overall average value What happens to Model SOS when all of the predicted values are close to the average value? XY_PredY AverageSOS Model

All Together Now!! XY_ActualY_PredSOS Error Y_AvgSOS Total SOS Model SOS Error =  (Y_Actual – Y_Pred) 2 SOS Total =  (Y_Actual –Y_ Avg) 2 SOS Model =  (Y_Pred – Y_Avg) 2

Using SOS to Assess Regression Line Model SOS gives us an index on how ‘different’ the predicted values are from the average values. – Bigger Model SOS = more different –Tells us how different a sloped line is from a line made up only of Y_avg. –Remember, the regression line will pass through the overall average point. Error SOS gives us an index of how different the predicted values are from the actual values –More variability = larger Error SOS = large distance between predicted and actual values

Magic of the F-test The ratio of Model SOS to Error SOS (Model SOS divided by Error SOS) gives us an overall index (the F statistic) used to indicate the relative ‘difference’ between the regression line and a line with slope of zero (all values = Y_avg. –A large Model SOS and small Error SOS = a large F statistic. Why does this indicate a significant difference? –A small Model SOS and a large Error SOS = a small F statistic. Why does this indicate no significant difference?? Based on sample size and alpha level (P-value), each F statistic has an associated P-value. –P < 0.05 (Large F statistic) there is a significant difference between the regression line a the Y_avg line. –P ≥ 0.05 (Small F statistic) there is NO significant difference between the regression line a the Y_avg line.

Mean Model SOS Mean Error SOS Independent Value Dependent Value Basically, this is an index that tells us how different the regression line is from Y_avg, and the scatter of the data around the predicted values. = F

Data production; input X Y; cards; ; proc print; run; proc reg; {Tells SAS to do the regression procedure} model Y=X; {Tells SAS that Y is the dependent value and X is the independent value} run; SAS Code for Regression

Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept X Y = [mX + b] = [(1.24)(X) ]

Correlation (r): A nother measure of the mutual linear relationship between two variables. ‘r’ is a pure number without units or dimensions ‘r’ is always between –1 and 1 Positive values indicate that y increases when x does and negative values indicate that y decreases when x increases. –What does r = 0 mean? ‘r’ is a measure of intensity of association observed between x and y. –‘r’ does not predict – only describes associations between variables

r > 0 r < 0 r = 0 r is also called Pearson’s correlation coefficient.

SAS Code for Correlation Proc corr; {Tells SAS to do the correlation procedure} var y x; (Tells SAS to determine the correlation between these variables) run; The CORR Procedure 2 Variables: Y X Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Y X Pearson Correlation Coefficients, N = 5 Prob > |r| under H0: Rho=0 Y X Y X Significant correlation high correlation

R-square If we square r, we get rid of the negative value if it is negative) and we get an index of how close the data points are to the regression line. Allows us to decide how much confidence we have in making a prediction based on our model. Is calculated as Model SOS / Total SOS

r 2 = Model SOS / Total SOS = Model SOS = Total SOS

= Model SOS = Total SOS r2 = Model SOS / Total SOS  numerator/denominator Small numerator Big denominator R 2 =

R-square and Prediction Confidence

Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept X Y = [mX + b] = [(1.24)(X) ]

Finally…….. If we have a significant relationship (based on the p-value), we can use the r-square value to judge how sure we are in making a prediction.