Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chapter 12 Inference for Linear Regression
Kin 304 Regression Linear Regression Least Sum of Squares
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Review ? ? ? I am examining differences in the mean between groups
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Objectives (BPS chapter 24)
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Chapter 12 Simple Regression
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Chapter Topics Types of Regression Models
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Lecture 5: Simple Linear Regression
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Multiple Regression Research Methods and Statistics.
Correlation Coefficients Pearson’s Product Moment Correlation Coefficient  interval or ratio data only What about ordinal data?
Relationships Among Variables
Correlation and Regression
Lecture 16 Correlation and Coefficient of Correlation
Lecture 15 Basics of Regression Analysis
Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.
Introduction to Linear Regression and Correlation Analysis
Section #6 November 13 th 2009 Regression. First, Review Scatter Plots A scatter plot (x, y) x y A scatter plot is a graph of the ordered pairs (x, y)
Simple Linear Regression Models
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Introduction to Linear Regression
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Regression Lesson 11. The General Linear Model n Relationship b/n predictor & outcome variables form straight line l Correlation, regression, t-tests,
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Examining Relationships in Quantitative Research
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Regression and Correlation
Regression Analysis.
REGRESSION G&W p
Correlation, Bivariate Regression, and Multiple Regression
Practice. Practice Practice Practice Practice r = X = 20 X2 = 120 Y = 19 Y2 = 123 XY = 72 N = 4 (4) 72.
Reasoning in Psychology Using Statistics
Correlation and Simple Linear Regression
Inference for Regression
Multiple Regression.
Correlation and Simple Linear Regression
Correlation and Regression
No notecard for this quiz!!
Correlation and Simple Linear Regression
Introduction to Regression
Correlation and Regression
Correlation and Regression
Simple Linear Regression and Correlation
Introduction to Regression
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
MGS 3100 Business Analysis Regression Feb 18, 2016
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your intention – the sample used for the assessment of the correlation must be ‘representative’ of the population within which you wish to make the predictions.

Assume you would like to predict the scores that each of you would get on Exam 2 in the course. If we assume that you are a sample from the ‘population’ of students who have taken this course, we could look at past performance to predict future performance. If all you knew was that you were students from the same population taking the same course – what would you predict for your Exam 2 scores?

Students Exam 2 (85.7) Mean for Exam 2 based on previous students Given no other information, the best guess would be that each student would get the mean Exam 2 grade, BUT if there was a variable related to Exam 2 scores that was available for each student, it could be used to improve the predictions

For example, it turns out that grades on Exam 1 and grades on Exam 2 are significantly correlated (r =.64), so they share some variance and knowing a student’s Exam 1 grade should help predict her Exam 2 grade Note ‘outlier’

Note that original data set produced this scatter plot What is the problem? Note ‘outlier’ Reduces r from.64 to.52

Simple Linear Regression is just an application using Pearson’s r (a coefficient of the strength and direction of linear association) so assumptions are the same Involves finding the linear relationship between X and Y, that minimizes the differences between actual Y scores and predicted Y p scores (predicted from X)

line of best fit (minimizes errors) - where formula for points on line is Y = a + bX a is ‘intercept’: value of Y when X = 0 b is the slope of the line: change in Y with each change in X so all predicted scores (Y p ) fall on the line of best fit

Y X Mean Y 0 Intercept- a Y when X = 0 Mean X X Y slope

In regression language a is the ‘regression constant’ b is the ‘regression coefficient’- based on r and the variability of Y relative to the variability of X If r = 1, have perfect straight line relationship If r is less than 1 equation becomes Y p = a + bX (+ residual)

Y X Mean Y 0 Intercept- a Y when X = 0 Mean X X Y slope Line of Best Fit would minimize deviations of scores from regression line

The regression line of ‘best fit’ minimizes those errors of prediction least squares regression line Sum (Y actual – Y p ) 2 b y = r(SD y /SD x ) If X and Y are converted to z scores, both SDs = 1, so b y = r b y also can be found by Cov xy /Var x – even though r is correlation of X & Y, b will vary depending on which one is used to predict other – changes which SD goes on top/bottom of ratio a y = Mean Y – b y (Mean X) Value of Y p when X = 0

Partitioning the Variability in Y SSTotal = Sum (Y - Mean Y) 2 variability of Y scores from the mean Separated into SSregression = Sum (Y p – Mean Y) 2 Improvement in predictions when using X (variability in Y explained by X), rather than assuming everyone gets the Mean SSresidual = Sum (Y - Y p ) 2 Degree to which predictions do not match the actual scores (prediction errors that have been minimized)

SSregression / SST = r 2 % of total variance in Y accounted for by X or --variance in Y explained by X SSresidual / SST = 1 – r 2 % unexplained variance in Y (errors) Variance of errors = SSresidual/df = MSresidual Standard Error of Estimate = SQRT (MSresidual) typical amount by which predicted score deviates from actual score Across each value of X, what is the typical deviation of actual from predicted scores of Y

Standard Error of Estimate For predicting scores for any individual, can estimate SEE for that prediction from SEE est = SEE * SQRT (X - M x ) 2 N (N-1) * (S x 2 ) the error is higher as the score on X deviates from the Mean X, and with a smaller sample size used for making the estimate

IQGPA Mean Example in Handout Packet – Predicting IQ from GPA

Mean IQ = 105 This would be your best ‘guess’ for every person if you had no useful predictor Improvement in Prediction using GPA Residual – distance from the line Residual much greater here Mean GPA = 3.06

Predicted IQ = (GPA) + error approximate 95% CIs are + 2(6.57) for predicting mean IQ of those with a given GPA Although listed as R and R 2 in SPSS regression output, these are, in Simple Linear Regression analyses, just the Pearson r and r 2 SPSS will report an ANOVA Table with Regression output, but for the Simple Linear Regression, all you need report is the t-value (which has df = n-p-1; p = # of predictors) that tests the significance of the single predictor (gpa) in the Model (is r reliably different from 0). Note that the Standardized Coefficient, beta, which is the regression coefficient when all variables are standardized (z- scores) is the same as r Adjusted R 2 is adjusted for the sample size and the number of predictors in the model. Since the sample value will be an inflated estimate of the value for R 2 in the population, use adjusted R 2 when applying results to the population.

Note that accuracy of predictions decreases as you move away from the means 95% confident that the ‘true’ line of best fit lies within these CIs

Note that accuracy of predictions decreases as you move away from the means 95% confident that the ‘true’ line of best fit lies within the CIs If you consider all the lines that might fall in the intervals, can see that variability increases as you move away from the ‘center’ (Mx, My)

Predicting grades for PSYC 6102 Exam 2 from Exam 1 scores r = +.64 Usual convention is to report regression constant and coefficient to 3 decimal places Exam2 = * Exam1 R 2 =.41

Predicted Exam 2 score = (Exam 1) + error approximate 95% CI, at best, + 2(5.15)

Note that the 95% CI’s are much wider for making predictions for individual’s Exam 2 scores rather than predicting the typical Exam 2 score (on the line) Usual convention is to report regression constant and coefficient to 3 decimal places

Partial Correlation logic Finding the correlation of X and Y after ‘partialling’ out the relationship of each with Z predict X from Z – for each person, find residuals (amount prediction missed by) predict Y from Z – for each person, find residuals (amount prediction missed by) Correlate the two sets of residuals relationship of X and Y after removing relationship each has with Z