Correlation and regression

Slides:



Advertisements
Similar presentations
Chapter 12 Simple Linear Regression
Advertisements

Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Chapter 9: Correlation and Regression
Correlation and Regression Analysis
Review Regression and Pearson’s R SPSS Demo
Relationships Among Variables
Lecture 5 Correlation and Regression
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Introduction to Linear Regression and Correlation Analysis
Relationship of two variables
Chapter 11 Simple Regression
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
Correlation and Regression
EQT 272 PROBABILITY AND STATISTICS
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Correlation and Regression Chapter 9. § 9.3 Measures of Regression and Prediction Intervals.
Elementary Statistics Correlation and Regression.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Environmental Modeling Basic Testing Methods - Statistics III.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK) UNIVERSITI MALAYSIA PERLIS.
Correlation and Regression Chapter 9. § 9.2 Linear Regression.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Correlation and Regression Elementary Statistics Larson Farber Chapter 9 Hours of Training Accidents.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Correlation and Linear Regression
Correlation and Regression
Chapter 13 Simple Linear Regression
Lecture 11: Simple Linear Regression
Regression and Correlation
Regression Analysis AGEC 784.
Correlation & Regression
REGRESSION (R2).
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
Correlation and Simple Linear Regression
Linear Regression and Correlation Analysis
Correlation and Regression
Chapter 5 STATISTICS (PART 4).
SIMPLE LINEAR REGRESSION MODEL
Simple Linear Regression
Chapter 11 Simple Regression
CHAPTER 10 Correlation and Regression (Objectives)
Correlation and Simple Linear Regression
Correlation and Regression
Correlation and Simple Linear Regression
Correlation and Regression
Correlation and Regression
Simple Linear Regression and Correlation
Chapter Thirteen McGraw-Hill/Irwin
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Correlation and regression Introduction to Statistical Methods for Measuring “Omics” and Field Data Correlation and regression

Overview Correlation Simple Linear Regression

Correlation

General Overview of Correlational Analysis The purpose is to measure the strength of a linear relationship between 2 variables. A correlation coefficient does not ensure “causation” (i.e. a change in X causes a change in Y) X is typically the input, measured, or independent variable. Y is typically the output, predicted, or dependent variable. If X increases and there is a predictable shift in the values of Y, a correlation exists.

General Properties of Correlation Coefficients Values can range between +1 and -1 The value of the correlation coefficient represents the scatter of points on a scatterplot You should be able to look at a scatterplot and estimate what the correlation would be You should be able to look at a correlation coefficient and visualize the scatterplot

Interpretation Depends on what the purpose of the study is… but here is a “general guideline”... Value = magnitude of the relationship Sign = direction of the relationship

Correlation graph Strong relationships Weak relationships Y Y Positive correlation X X Y Y Negtaive correlation X X

The Pearson Correlation Coefficient

Correlation Coefficient The correlation coefficient is a measure of the strength and the direction of a linear relationship between two variables. The symbol r represents the sample correlation coefficient. The formula for r is The range of the correlation coefficient is 1 to 1. If x and y have a strong positive linear correlation, r is close to 1. If x and y have a strong negative linear correlation, r is close to 1. If there is no linear correlation or a weak linear correlation, r is close to 0.

Calculating a Correlation Coefficient In Words In Symbols Find the sum of the x-values. Find the sum of the y-values. Multiply each x-value by its corresponding y-value and find the sum. Square each x-value and find the sum. Square each y-value and find the sum. Use these five sums to calculate the correlation coefficient. Continued.

Correlation Coefficient Example: Calculate the correlation coefficient r for the following data. x y 1 – 3 2 – 1 3 4 5

Correlation Coefficient Example: Calculate the correlation coefficient r for the following data. x y xy x2 y2 1 – 3 9 2 – 1 – 2 4 3 16 5 10 25

Correlation Coefficient Example: Calculate the correlation coefficient r for the following data. x y xy x2 y2 1 – 3 9 2 – 1 – 2 4 3 16 5 10 25 There is a strong positive linear correlation between x and y.

Significance Test for Correlation Hypotheses H0: ρ = 0 (no correlation) HA: ρ ≠ 0 (correlation exists) Test statistic (with n – 2 degrees of freedom)

Linear Regression

Linear regression Deals with relationship between two variables X and Y. Y is the variables whose “behavior” we wish to study ( e.g., fuel efficiency in a car). X is the variable we believe would help explain the behavior of Y (e.g., the size of the car).

Regression model The simple linear regression model:

Components of the models

Regression Line A regression line, also called a line of best fit, is the line for which the sum of the squares of the residuals is a minimum. The Equation of a Regression Line The equation of a regression line for an independent variable x and a dependent variable y is ŷ = mx + b where ŷ is the predicted y-value for a given x-value. The slope m and y-intercept b are given by

Regression Line x y 1 – 3 2 – 1 3 4 5 Example: Find the equation of the regression line. x y 1 – 3 2 – 1 3 4 5 Continued.

Regression Line x y xy x2 y2 1 – 3 9 2 – 1 – 2 4 3 16 5 10 25 Example: Find the equation of the regression line. x y xy x2 y2 1 – 3 9 2 – 1 – 2 4 3 16 5 10 25 Continued.

Regression Line x y xy x2 y2 1 – 3 9 2 – 1 – 2 4 3 16 5 10 25 Example: Find the equation of the regression line. x y xy x2 y2 1 – 3 9 2 – 1 – 2 4 3 16 5 10 25 Continued.

Regression Line x y xy x2 y2 1 – 3 9 2 – 1 – 2 4 3 16 5 10 25 Example: Find the equation of the regression line. x y xy x2 y2 1 – 3 9 2 – 1 – 2 4 3 16 5 10 25 Continued.

Regression Line Hours, x Test score, y xy x2 y2 1 2 3 5 6 7 10 96 85 Example: The following data represents the number of hours 12 different students watched television during the weekend and the scores of each student who took a test the following Monday. a.) Find the equation of the regression line. b.) Use the equation to find the expected test score for a student who watches 9 hours of TV. Hours, x 1 2 3 5 6 7 10 Test score, y 96 85 82 74 95 68 76 84 58 65 75 50 xy 164 222 285 340 380 420 348 455 525 500 x2 4 9 25 36 49 100 y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500

Regression Line Continued. Example continued: 100 x y Hours watching TV Test score 80 60 40 20 2 4 6 8 10 ŷ = –4.07x + 93.97 Continued.

Regression Line Example continued: Using the equation ŷ = -4.07x + 93.97, we can predict the test score for a student who watches 9 hours of TV. ŷ = –4.07x + 93.97 = –4.07(9) + 93.97 = 57.34 A student who watches 9 hours of TV over the weekend can expect to receive about a 57.34 on Monday’s test.

Variation About a Regression Line The total variation about a regression line is the sum of the squares of the differences between the y-value of each ordered pair and the mean of y. The explained variation is the sum of the squares of the differences between each predicted y-value and the mean of y. The unexplained variation is the sum of the squares of the differences between the y-value of each ordered pair and each corresponding predicted y-value.

Coefficient of Determination The coefficient of determination R2 is the ratio of the explained variation to the total variation. That is, Example: The correlation coefficient for the data that represents the number of hours students watched television and the test scores of each student is r  0.831. Find the coefficient of determination. About 69.1% of the variation in the test scores can be explained by the variation in the hours of TV watched. About 30.9% of the variation is unexplained.

Regression hypothesis

RStudio Function cor.test is used to calculate correlation r, and t statistics. Function lm is used to calculate regression Example: Hours, x 1 2 3 5 6 7 10 Test score, y 96 85 82 74 95 68 76 84 58 65 75 50 X<-c(0,1,2,3,3,5,5,5,6,7,7,10) Y<-c(96,85,82,74,95,68,76,84,58,65,75,50) cor.test(X,Y) G<-lm(X~Y) Summary(G)

RStudio Count<-c(9,25,15,2,14,25,24,47) > Count [1] 9 25 15 2 14 25 24 47 Speed<-c(2,3,5,9,14,24,29,34) G<-lm(Count~Speed) > summary(G) Call: lm(formula = Count ~ Speed) Residuals: Min 1Q Median 3Q Max -13.377 -5.801 -1.542 5.051 14.371 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.2546 5.8531 1.410 0.2081 Speed 0.7914 0.3081 2.569 0.0424 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 10.16 on 6 degrees of freedom Multiple R-squared: 0.5238, Adjusted R-squared: 0.4444 F-statistic: 6.599 on 1 and 6 DF, p-value: 0.0424