Plots, Correlations, and Regression Getting a feel for the data using plots, then analyzing the data with correlations and linear regression.

Slides:



Advertisements
Similar presentations
One-sample T-Test Matched Pairs T-Test Two-sample T-Test
Advertisements

Regression and correlation methods
Chapter 12 Simple Linear Regression
Simple Logistic Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Regression Analysis Notes. What is a simple linear relation? When one variable is associated with another variable in such a way that two numbers completely.
Multiple Linear Regression
1 Multiple Regression Interpretation. 2 Correlation, Causation Think about a light switch and the light that is on the electrical circuit. If you and.
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Multiple Regression models Estimation Goodness of fit tests
Descriptive Statistics In SAS Exploring Your Data.
Regression Diagnostics Using Residual Plots in SAS to Determine the Appropriateness of the Model.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Mean Comparison With More Than Two Groups
ASSESSING THE STRENGTH OF THE REGRESSION MODEL. Assessing the Model’s Strength Although the best straight line through a set of points may have been found.
Two-Way ANOVA in SAS Multiple regression with two or
More Linear Regression Outliers, Influential Points, and Confidence Interval Construction.
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Correlation and Regression Analysis
Least Squares Regression Line (LSRL)
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 3-2 Summarizing Relationships among variables ©
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Section 12.1 Scatter Plots and Correlation HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems,
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
Correlation & Regression
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Introduction to science
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Scatter Diagrams Objective: Draw and interpret scatter diagrams. Distinguish between linear and nonlinear relations. Use a graphing utility to find the.
Simple linear regression Tron Anders Moger
Correlation & Regression Analysis
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Scatter Plots and Correlation
Lecture 11: Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Regression
CHAPTER 12 More About Regression
CHAPTER 10 Correlation and Regression (Objectives)
CHAPTER 26: Inference for Regression
6-1 Introduction To Empirical Models
STA 282 – Regression Analysis
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Presentation transcript:

Plots, Correlations, and Regression Getting a feel for the data using plots, then analyzing the data with correlations and linear regression.

Introduction to Plots Before you decide to conduct simple linear regression on a data set, it is important to determine whether a linear relationship between the two variables appears justified. If it appears a linear relationship exists, you can proceed with regression analysis. If there is clearly no linear relationship between the two variables of interest, a different type of analysis may be preferred. An easy way to eye-ball the data for a linear relationship is to plot the two variables: the independent variable on the x-axis, and the dependent variable on the y-axis.

Two Types of Plots in SAS You can plot your data in SAS using either PROC PLOT or PLOT GPLOT. Both are acceptable methods, although some find that GPLOT creates a better-looking graph (it also creates the plot in the separate Graph window in SAS, as opposed to PLOT, which creates the plot in the Output window).

Fog Data Set The data set densefog.csv contains information on number of deaths and sulfur dioxide (SO 2 ) level for various locations. Input the data set into SAS using the following code (with the necessary modfications to the file location):

Plotting the Data We now want to plot the data to determine whether a linear relationship between number of deaths and SO 2 levels seems justified. SO 2 is considered the independent variable (X) and #deaths is the dependent variable (Y). The PLOT statement is generally “PLOT Y*X”. First, use PROC PLOT:

Notice the Plot is in the Output

Interpreting the Plot Each data point is represented by the letter A. If two points have the same X and Y values, the letter B denotes this. If three points were to fall on the same location, it would be denoted by a C, etc. From this plot, it appears a linear relationship could be justified (imagine drawing a line through the points).

PROC GPLOT Now plot the same data using a slightly different method, PROC GPLOT. This plot will also indicate that a linear relationship appears to be justified (it is the same plot as in PROC PLOT, only in a slightly different format):

GPLOT in GRAPH Window

Notes on GPLOT Notice that the GPLOT is nicely contained on one page, whereas the plot from PLOT is more spread out in the Output. To save the PLOT, simply save the Output as a.rtf; it can be opened in Word later. To save the GPLOT, you can copy and paste the graph into Word. If this doesn’t work, you can export the image under File -> Export as Image…(see previous slide) and save the graph as a.bmp file. This file can then be accessed later and inserted into a Word document.

Correlation One way to test whether two variables are linearly related is by finding the correlation between them and testing the hypotheses H 0 : r = 0 vs. H 1 : r ≠ 0 A large r value (closer to 1 or -1) indicates a strong relationship. A positive r indicates a positive correlation (as one variable increases, the other variable also increases); a negative r indicates a negative correlation (as one variable increases, the other variable decreases).

PROC CORR in SAS An easy way to calculate the correlation between variables in SAS is with the CORR procedure. Make sure to check your Log after running this program:

PROC CORR Output

Interpreting Output The correlation (r) between deaths and sulfur dioxide is The p-value of this correlation is p<0.0001, indicating we reject the null hypothesis and conclude that there is a correlation between deaths and sulfur dioxide. There is a strong, positive, linear relationship between deaths and sulfur dioxide.

Linear Regression Now that we have determined a linear relationship exists between these two variables, we can conduct linear regression analysis to quantify this relationship. Linear regression will define a line that describes the relationship between these two variables. (Note: It is not necessary to test for a correlation before doing regression analysis; it is only important to eye-ball the data to determine whether a linear relationship seems justified.)

PROC REG in SAS The following code runs the regression procedure in SAS. The general model statement is: model y-variable = x-variable You can also request a plot of the two variables showing the fitted regression line.

Linear Regression Output

Regression Line Plot (you may have to scroll down in your GRAPH window to see it—notice it has the same title as the PROC CORR, because we did not define a new title)

Interpreting Output The value for b 0 can be found under Parameter Estimates to the right of “Intercept.” The value for b 1 can also be found under Parameter Estimates, to the right of the name of the predictor variable (in this case, sulfur dioxide (sd)). Using the output from PROC REG, you can now estimate the regression equation: Yhat = x If you wanted to estimate the number of deaths with a sulfur dioxide level of 0.20, you would put this value into your regression equation and solve for Yhat: Yhat = (0.20) = deaths.

Interpreting Output, cont. Find the R 2 value on the output (0.7960). This value is the amount of variability in your dependent variable explained by the presence of the predictor variable in the model. In this case, 80% of the variability in number of deaths is explained by sulfur dioxide levels. Also notice that R 2 = r 2 = = ( ) 2.

Testing β 1 = 0 When conducting linear regression, you want to test whether there is a significant linear relationship between your predictor and outcome variables. In simple linear regression (only one predictor variable), this can done by either testing r = 0 or β 1 =0. This test of independence is: Ho: β 1 =0 vs. Ha: β 1 ≠ 0 Because β 1 is the slope of the regression line, if there is no relationship between the two variables (i.e. they are independent), you would expect the slope of the line to be 0 (meaning that levels of y do not change with changes in the levels of x). The alternative to this is that the line does have some non-zero slope, indicating that the two variables are dependent.

Testing Independence,cont. In simple linear regression, the overall F-test and individual t-test of β 1 =0 have the same p-value. The test of Ho: β 1 =0 is t* = b 1 /se(b 1 ). In this example, the t* for this test = /32.87 = 7.12, with a p-value < Notice that SAS computes t* and calculates the p-value. Because the p-value < 0.05, we reject the null hypothesis and conclude that sulfur dioxide and number of deaths are not independent.

Conclusion Now you are familiar with conducting simple linear regression in SAS. The next tutorial introduces you to model diagnostics using SAS. These help you determine whether the assumptions of the regression model are met, whether the model is a good fit for the data, and whether there are any outlying data points.