Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Correlation and Linear Regression.
Inference for Regression
Review ? ? ? I am examining differences in the mean between groups
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Chapter 10 Simple Regression.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
REGRESSION AND CORRELATION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Relationships Among Variables
Correlation & Regression Math 137 Fresno State Burger.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Correlation & Regression
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd.
Simple Linear Regression
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Hypothesis of Association: Correlation
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Linear correlation and linear regression + summary of tests
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Lecture 10: Correlation and Regression Model.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Simple linear regression Tron Anders Moger
SIMPLE LINEAR REGRESSION AND CORRELLATION
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Correlation & Regression
Correlation and Regression
Chapter 14: Correlation and Regression
Correlation and Simple Linear Regression
6-1 Introduction To Empirical Models
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.
Correlation & Regression
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1

Bi-variate Analyses Association between one continuous variable and one categorical variable – two-independent sample t-test, one-way ANOVA and their non-parametric approaches Association between two categorical variables – Chisq- test and Fisher’s exact test Association between two continuous variables – Pearson’s correlation coefficient, Spearman rho correlation coefficient, and simple linear regression 2

3 Correlation

Correlations Bi-variate correlation (  ) is an indicator of the strength and direction of the relationship between two variables –Related to the slope for the two variables through variance terms Correlation ‘matrix’ is frequently used to screen for important relationships Correlation or association may not necessarily reflect “causality” 4

Correlation Matrix Correlation between “Immunization rate” and “<5 mortality rate 5

Correlation Analysis Performing “Scatter Plots” before calculating correlation coefficient Only “ linear ” correlation is discussed here. 6

% of Immunization & <5 Mortality Rate (Per 1,000 Live Birth) in 20 Countries 7

8

No Correlation 9

No Correlation ? 10

11

Notes, cont. Linear relations only. Correlation applies only to linear relationships This figure shows a strong non-linear relationship, yet r = Correlation does not necessarily mean causation. Beware lurking variables (next slide). 12

Confounded Correlation A near perfect negative correlation (r = −.987) was seen between cholera mortality and elevation above sea level during a 19th century epidemic. We now know that cholera is transmitted by water. The observed relationship between cholera and elevation was confounded by the lurking variable proximity to polluted water. 13

Direction of Association Direction of association can also be determined by both scatter plots and correlation coefficients The shape of scatter plot may reflect the direction of association Correlation coefficient ranges from –1 (perfect inverse relationship) to +1 (perfect positive relationship) 14

Strength of Association Can be judged from scatter plots A “flat” scatter plot indicates a strong association, while a “round” scatter plot represents “no” association 15

The strength of association between two continuous variables can also be determined by correlation coefficients The magnitude of a correlation coefficient may indicate the strength of association; the larger the magnitude, the stronger the association. 16

Examples of correlations

Pearson’s Correlation Coefficient 18

19

Hypothesis Test We conduct the hypothesis test to guard against identifying too many random correlations. Random selection from a random scatter can result in an apparent correlation 20

相關分析 - 的統計推論 To determine a 95% CI for  is complicated by the fact that only when  =0 can r be considered to have come from an approximate normal distribution. For values of  other than 0, Fisher’s Z transformation, defined below, must be employed 21

22

Assumptions for Pearson’s Correlation Coefficient Variables must be continuous Variables must be normally distributed What if the above assumption(s) do not hold ? Answers: Using non-parametric approach 23

Spearman’s Rank Correlation Coefficient 24

Spearman Rank-order Correlation Coefficient Two professors assessed 12 students. The following table showed the information. What is the correlation between the two sets of scores assigned by the two professors? 25

Student idProf. AProf. Bdidi 2 xiyixi-yi(xi-yi) * *2** * * ** ** * * Sum=

Simple Linear Regression 27

28 Simple Linear Regression ANOVA was extension of T-Test to multiple group means Linear regression extends ANOVA to continuous predictor variables –Systolic blood pressure predicted by body mass index –Body mass index predicted by caloric intake –Caloric intake predicted by measure of stress However, may relate to inactivity due to lack of visual acuity Important to specify a biologically plausible model –Systolic blood pressure predicted by eye color –Body mass index predicted by visual acuity

Regression describes the relationship in the data with a line that predicts the average change in Y per unit X. The best fitting line is found by minimizing the sum of squared residuals, as shown in this figure. 29

30 Assumptions for Linear Regression Model Assumptions are important in linear regression, but are not absolute (LINE) –Predictor variables are ‘fixed’; i.e., same meaning among individuals –Predictor variable measured ‘without error’ –For each value of the predictor variable, there is a normal (N) distribution of outcomes (subpopulations) and the variance of these distributions are equal (E)

31 Assumptions (continued) –The means of the outcome subpopulations lie on a straight line related to the predictors; i.e., The predictors and the outcomes are linearly related (L)  Y|x =  +  x –The outcomes are independent of each other (I) Regression model: y =  +  x + 

32 Graphic Presentation of Model Assumptions

33 Interpretation of Regression Model Two parameters –  is the intercept (Y value) when the predictor is zero May not be really plausible –  is the ‘slope’ of the regression line and represents the change in Y for a unit change in X i.e., a slope of 0.58 would indicate that for a one unit change in X, there is a 0.58 unit change in Y –  is the error term for each individual and is the residual for that individual Residual is the difference between the fitted line (predicted value) and the observed value

34 Approach to Developing a Regression Model Determine outcome and plausible predictors Plot outcome vs. each predictor to check for linearity Fit the regression model and review parameters and tests If model has a significant fit and parameters are significantly different from 0, look at residuals to better evaluate fit

Maternal average body weight during pregnancy and infant birth weight ID MWBW ID MWBW Mean SD Variance

36 Scattered plot of mother’s pregnancy weight (kg) (X) versus infant’s birth weight (g) (Y)

37 Fitting a linear regression line

38

Regression Line The regression line equation is: where ŷ ≡ predicted value of Y, a ≡ the intercept of the line, and b ≡ the slope of the line Equations to calculate a and b SLOPE: INTERCEPT: 39

Regression Line Slope b is the key statistic produced by the regression 40

41

42 判定係數 即

43 R 2 = / =0.27 F=t 2 / =8.580

44 Testing Assumptions by Analyzing Residuals (N) Normal Distribution If the relationships are linear and the dependent variable is normally distributed for each value of the independent variable, the distribution of the residuals should be approximately normal. This can be assessed by using a histogram of the standardized residuals.

45

46 Testing Assumptions by Analyzing Residuals (E) Homoscedasticity To check this assumption, the residuals can be plotted against the predicted values and against the independent variables.

47

Residual Plots With a little experience, you can get good at reading residual plots. Here’s an example of linearity with equal variance. 48

Residual Plots Example of linearity with unequal variance 49

50 Testing Assumptions by Analyzing Residuals (L) Linearity When standardized predicted values are plotted against the observed values, the data would form a straight line from the lower-left corner to the upper-right corner, if the model fit the data exactly.

51

Example of Residual Plots Example of non-linearity with equal variance 52