Correlation and Regression Analysis

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Chapter 12 Simple Linear Regression
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 12 Simple Linear Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Korelasi dalam Regresi Linear Sederhana Pertemuan 03 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Pertemua 19 Regresi Linier
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Business Statistics - QBM117 Statistical inference for regression.
Introduction to Regression Analysis, Chapter 13,
Relationships Among Variables
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 11 Simple Regression
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chapter 10 Correlation and Regression
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Lecture 10: Correlation and Regression Model.
Simple linear regression Tron Anders Moger
Correlation & Regression Analysis
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Chapter 13 Simple Linear Regression
Linear Regression and Correlation Analysis
CHAPTER 29: Multiple Regression*
Product moment correlation
Presentation transcript:

Correlation and Regression Analysis Dr. Mohammed Alahmed

GOALS Understand and interpret the terms dependent and independent variable. Calculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate. Conduct a test of hypothesis to determine whether the coefficient of correlation in the population is zero. Calculate the least squares regression line. Predict the value of a dependent variable based on the value of at least one independent variable. Explain the impact of changes in an independent variable on the dependent variable. Dr. Mohammed Alahmed

Introduction Correlation and regression analysis are related in the sense that both deal with relationships among variables. For example, we may be interested in studying the relationship between blood pressure and age, height and weight…. The nature and strength of the relationship between variables may be examined by Correlation and Regression analysis. Dr. Mohammed Alahmed

Correlation Analysis The term “correlation” refers to a measure of the strength of association between two variables. Finding the relationship between two quantitative variables without being able to infer causal relationships Correlation is a statistical technique used to determine the degree to which two variables are related. Dr. Mohammed Alahmed

If the two variables increase or decrease together, they have a positive correlation. If, increases in one variable are associated with decreases in the other, they have a negative correlation Dr. Mohammed Alahmed

Visualizing Correlation A scatter plot (or scatter diagram) is used to show the relationship between two variables. Linear relationships implying straight line association are visualized with scatter plots Dr. Mohammed Alahmed

Linear Correlation Only! Linear relationships Curvilinear relationships Y X Y X Dr. Mohammed Alahmed

Correlation Coefficient The population correlation coefficient ρ (rho) measures the strength of the association between the variables. The sample (Pearson) correlation coefficient r is an estimate of ρ and is used to measure the strength of the linear relationship in the sample observations. Dr. Mohammed Alahmed

r is a statistic that quantifies a relation between two variables. Can be either positive or negative Falls between -1.00 and 1.00 Dr. Mohammed Alahmed

The value of the number (not the sign) indicates the strength of the relation. The purpose is to measure the strength of a linear relationship between 2 variables. A correlation coefficient does not ensure “causation” (i.e. a change in X causes a change in Y) Dr. Mohammed Alahmed

Calculating the Correlation Coefficient The sample (Pearson) correlation coefficient (r) is defined by where: r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable Dr. Mohammed Alahmed

Statistical Inference for Correlation Coefficients Significance Test for Correlation Hypotheses H0: ρ = 0 (no correlation) H1: ρ ≠ 0 (correlation exists) Test statistic Dr. Mohammed Alahmed

Example A small study is conducted involving 17 infants to investigate the association between gestational age at birth, measured in weeks, and birth weight, measured in grams. Dr. Mohammed Alahmed

Dr. Mohammed Alahmed

Dr. Mohammed Alahmed

r H0: ρ = 0 H1: ρ ≠ 0 Dr. Mohammed Alahmed

Cautions about Correlation Correlation is only a good statistic to use if the relationship is roughly linear. Correlation can not be used to measure non-linear relationships Always plot your data to make sure that the relationship is roughly linear! Dr. Mohammed Alahmed

Regression Analysis Regression analysis is used to: Predict the value of a dependent variable based on the value of at least one independent variable. Explain the impact of changes in an independent variable on the dependent variable. Dependent variable: the variable we wish to explain. Independent variable: the variable used to explain the dependent variable. Dr. Mohammed Alahmed

Simple Linear Regression Model Only one independent variable, X Relationship between X and Y is described by a linear function. Changes in Y are assumed to be caused by changes in X. Dr. Mohammed Alahmed

The formula for a simple linear regression Linear component Population y intercept Population Slope Coefficient Random Error term, or residual Dependent Variable Independent Variable Random Error component The regression coefficients β0 and β1 are unknown and have to be estimated from the observed data (sample). Dr. Mohammed Alahmed

y x εi Slope = β1 Random Error for this x value β0 xi Observed Value of y for xi Predicted Value of y for xi xi Slope = β1 εi β0 Dr. Mohammed Alahmed

Linear Regression Assumptions The assumption of linearity The relationship between the dependent and independent variables is linear. The assumption of homoscedasticity The errors have the same variance The assumption of independence The errors are independent of each other The assumption of normality The errors are normally distributed Dr. Mohammed Alahmed

Estimated Regression Model The sample regression line provides an estimate of the population regression line Estimate of the regression intercept Estimate of the regression slope Estimated (or predicted) y value Independent variable The individual random error terms ei have a mean of zero Dr. Mohammed Alahmed

Least Squares Method b0 and b1 are called the regression coefficients and obtained by finding the values of b0 and b1 that minimize the sum of the squared residuals Define a residual e as the difference between the observed y and fitted 𝑦 , that is, Residuals are interpreted as estimates of random errors e‘s Dr. Mohammed Alahmed

The Least Squares Equation The formulas for b1 and b0 are: b0 is the estimated average value of y when the value of x is zero b1 is the estimated change in the average value of y as a result of a one-unit change in x The coefficients b0 and b1 will usually be found using computer software, such as SPSS. Dr. Mohammed Alahmed

Relationship between the Regression Coefficient (b1) and the Correlation Coefficient (r) What is the relationship between the sample regression coefficient (b1) and the sample correlation coefficient (r)? Sx is the standard deviation of X and Sy the standard deviation of Y Dr. Mohammed Alahmed

Example Use the previous example assuming the birth weight is the dependent variable and gestational age as the independent variable. Fit a linear-regression line relating birth weight to gestational age using these data. Predict the birth weight of a baby from a women with gestational age 40.5 weeks. Dr. Mohammed Alahmed

Dr. Mohammed Alahmed

b0 b1 Dr. Mohammed Alahmed

Coefficient of Determination, R2 The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable The coefficient of determination is also called R-squared and is denoted as R2 Dr. Mohammed Alahmed

R2 = Explained variation / Total variation R2 is always (%) and between 0% and 100%: 0% indicates that the model explains none of the variability of the response data around its mean. 100% indicates that the model explains all the variability of the response data around its mean. In general, the higher the R-squared, the better the model fits your data. Dr. Mohammed Alahmed

r2 = 0.668 66.8 % of the variation in birth weight is explained by variation in gestational age in week Dr. Mohammed Alahmed

F- test for Simple Linear Regression The criterion for goodness of fit is the ratio of the regression sum of squares to the residual sum of squares. A large ratio indicates a good fit, whereas a small ratio indicates a poor fit. In hypothesis-testing terms we want to test the hypothesis: H0: β = 0 vs. H1: β ≠ 0 Dr. Mohammed Alahmed

The P-value < 0.05. Therefore H0 is rejected, implying a significant linear relationship between birth weight and gestational age. Dr. Mohammed Alahmed

Checking the Regression Assumptions There are two strategies for checking the regression assumptions: Examining the degree to which the variables satisfy the criteria, .e.g. normality and linearity, before the regression is computed by plotting relationships and computing diagnostic statistics. Studying plots of residuals and computing diagnostic statistics after the regression has been computed. Dr. Mohammed Alahmed

Check Linearity assumption: A scatter plot (or scatter diagram) is used to show the relationship between two variables. Dr. Mohammed Alahmed

Check Independence assumption: Error terms associated with individual observations should be independent of each other. Rule of thumb: Random samples ensure independence. scatterplot of residuals and predicted value should show no trends Dr. Mohammed Alahmed

Check Equal Variance Assumption (Homoscedasticity): Variability of error terms should be the same (constant) for all values of each predictor. Check 1: Scatterplot of residuals against the predicted value shows consistent spread. Check 2: Boxplot of y against each predictor of x should show consistent spread. Dr. Mohammed Alahmed

Check Normality Assumption: Check normality of residuals and individual variables and identify outliers of variables using normal probability plot Run normality tests. All or almost all of them should have P-value > 0.05 Dr. Mohammed Alahmed

Construct normal probability plot (qq_plot) of residuals Plot histogram of residuals. A bell-shaped curve centered around zero should be displayed. Construct normal probability plot (qq_plot) of residuals Dr. Mohammed Alahmed