Elementary Statistics Larson Farber 9 Correlation and Regression
Bivariate vs. Univariate *univariate data – data that involves only one variable For example: How many miles per gallon does your car get? {34.2, 15.4, 20.2, 30.5, 15.1, 9.2, 16.5 } *bivariate data - data that involves two different variables whose values can change.
Example #1 – MPG vs Weight
Example - correlation
Example – “line of best fit”
Correlation Section 9.1
Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes smoked per day Score on SAT Height Hours of Training Explanatory (Independent) Variable Response (Dependent) Variable A relationship between two variables Number of Accidents Shoe SizeHeight Lung Capacity Grade Point Average IQ
Negative Correlation–as x increases, y decreases x = hours of training y = number of accidents Scatter Plots and Types of Correlation Hours of Training Accidents
Positive Correlation–as x increases, y increases x = SAT score y = GPA GPA Scatter Plots and Types of Correlation Math SAT
No linear correlation x = height y = IQ Scatter Plots and Types of Correlation Height IQ
Correlation Coefficient A measure of the strength and direction of a linear relationship between two variables The range of r is from –1 to 1. If r is close to 1 there is a strong positive correlation. If r is close to –1 there is a strong negative correlation. If r is close to 0 there is no linear correlation. –1 0 1
Correlation Examples
x y Absences Final Grade Application Final Grade X Absences
xy x 2 y2y2 Computation of r x y
r is the correlation coefficient for the sample. The correlation coefficient for the population is (rho). The sampling distribution for r is a t-distribution with n – 2 d.f. Standardized test statistic For a two tail test for significance: For left tail and right tail to test negative or positive significance: Hypothesis Test for Significance (The correlation is not significant) (The correlation is significant)
A t-distribution with 5 degrees of freedom Test of Significance You found the correlation between the number of times absent and a final grade r = – There were seven pairs of data.Test the significance of this correlation. Use = Write the null and alternative hypothesis. 2. State the level of significance. 3. Identify the sampling distribution. (The correlation is not significant) (The correlation is significant) = 0.01
t –4.032 Rejection Regions Critical Values ± t 0 4. Find the critical value. 5. Find the rejection region. 6. Find the test statistic.
t 0 –4.032 t = –9.811 falls in the rejection region. Reject the null hypothesis. There is a significant correlation between the number of times absent and final grades. 7. Make your decision. 8. Interpret your decision.
Linear Regression Section 9.2
The equation of a line may be written as y = mx + b where m is the slope of the line and b is the y-intercept. The line of regression is: The slope m is: The y-intercept is: Once you know there is a significant linear correlation, you can write an equation describing the relationship between the x and y variables. This equation is called the line of regression or least squares line. The Line of Regression
Calculate m and b. Write the equation of the line of regression with x = number of absences and y = final grade. The line of regression is:= –3.924x xy x 2 y2y2 x y
Absences Final Grade m = –3.924 and b = The line of regression is: Note that the point = (8.143, ) is on the line. The Line of Regression
The regression line can be used to predict values of y for values of x falling within the range of the data. The regression equation for number of times absent and final grade is: Use this equation to predict the expected grade for a student with (a) 3 absences(b) 12 absences (a) (b) Predicting y Values = –3.924(3) = = –3.924(12) = = –3.924x
Example #6 – price and age A study was conducted to investigate the relationship between the resale price (in hundreds of dollars) and the age (in years), of midsize luxury American automobiles. The equation of the least-squares regression line was determined to be y = – (x). (a) Find the resale value of the car when it is 3 yrs old. (b) Find the resale value of the car when it is 6 yrs old. (c) How old is the car when it is not worth anything (e.g. price = $0)?
Regression Equation y = x Sales = (Advertising) What would you predict for Sales if I spend $2,000 on Advertising (remember that both variables are in units of $1,000)? What about $6,000?
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide Determine the type of correlation between the variables. A. Positive linear correlation B. Negative linear correlation C. No linear correlation
2. The equation of the regression line for temperature (x) and number of cups of coffee sold per hour (y) is Predict the number of cups of coffee sold per hour when the temperature is 48 º. A B C D. 50.5
Answers 1.(B) negative linear correlation 2.(C) 13.8