Download presentation

Presentation is loading. Please wait.

1
Elementary Statistics Correlation and Regression

2
Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes smoked per day Score on SAT Height Hours of Training Explanatory (Independent) Variable Response (Dependent) Variable A relationship between two variables Number of Accidents Shoe SizeHeight Lung Capacity Grade Point Average IQ

3
Negative Correlation–as x increases, y decreases x = hours of training y = number of accidents Scatter Plots and Types of Correlation 60 50 40 30 20 10 0 02468 1214161820 Hours of Training Accidents

4
Positive Correlation–as x increases, y increases x = SAT score y = GPA GPA Scatter Plots and Types of Correlation 4.00 3.75 3.50 3.00 2.75 2.50 2.25 2.00 1.50 1.75 3.25 300350400450500550600650700750800 Math SAT

5
No linear correlation x = height y = IQ Scatter Plots and Types of Correlation 160 150 140 130 120 110 100 90 80 606468727680 Height IQ

6
Correlation Coefficient A measure of the strength and direction of a linear relationship between two variables The range of r is from –1 to 1. If r is close to 1 there is a strong positive correlation. If r is close to –1 there is a strong negative correlation. If r is close to 0 there is no linear correlation. –1 0 1

7
x y 8 78 2 92 5 90 12 58 15 43 9 74 6 81 Absences Final Grade Application 95 90 85 80 75 70 65 60 55 45 40 50 0246810121416 Final Grade X Absences

8
6084 8464 8100 3364 1849 5476 6561 624 184 450 696 645 666 486 57516375157939898 1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81 64 4 25 144 225 81 36 xy x 2 y2y2 Computation of r n x y

9
r is the correlation coefficient for the sample. The correlation coefficient for the population is (rho). The sampling distribution for r is a t-distribution with n – 2 d.f. Standardized test statistic For a two tail test for significance: For left tail and right tail to test negative or positive significance: Hypothesis Test for Significance (The correlation is not significant) (The correlation is significant)

10
A t-distribution with 5 degrees of freedom Test of Significance You found the correlation between the number of times absent and a final grade r = –0.975. There were seven pairs of data.Test the significance of this correlation. Use = 0.01. 1. Write the null and alternative hypothesis. 2. State the level of significance. 3. Identify the sampling distribution. (The correlation is not significant) (The correlation is significant) = 0.01

11
t 0 4.032 –4.032 Rejection Regions Critical Values ± t 0 4. Find the critical value. 5. Find the rejection region. 6. Find the test statistic.

12
t 0 –4.032 t = –9.811 falls in the rejection region. Reject the null hypothesis. There is a significant correlation between the number of times absent and final grades. 7. Make your decision. 8. Interpret your decision.

13
The equation of a line may be written as y = mx + b where m is the slope of the line and b is the y-intercept. The line of regression is: The slope m is: The y-intercept is: Once you know there is a significant linear correlation, you can write an equation describing the relationship between the x and y variables. This equation is called the line of regression or least squares line. The Line of Regression

14
180 190 200 210 220 230 240 250 260 1.52.02.53.0 Ad $ = a residual (xi,yi)(xi,yi) = a data point revenue = a point on the line with the same x-value

15
Calculate m and b. Write the equation of the line of regression with x = number of absences and y = final grade. The line of regression is:= –3.924x + 105.667 6084 8464 8100 3364 1849 5476 6561 624 184 450 696 645 666 486 57 516375157939898 1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81 64 4 25 144 225 81 36 xy x 2 y2y2 x y

16
0246810121416 40 45 50 55 60 65 70 75 80 85 90 95 Absences Final Grade m = –3.924 and b = 105.667 The line of regression is: Note that the point = (8.143, 73.714) is on the line. The Line of Regression

17
The regression line can be used to predict values of y for values of x falling within the range of the data. The regression equation for number of times absent and final grade is: Use this equation to predict the expected grade for a student with (a) 3 absences(b) 12 absences (a) (b) Predicting y Values = –3.924(3) + 105.667 = 93.895 = –3.924(12) + 105.667 = 58.579 = –3.924x + 105.667

18
The coefficient of determination, r 2, is the ratio of explained variation in y to the total variation in y. The correlation coefficient of number of times absent and final grade is r = –0.975. The coefficient of determination is r 2 = (–0.975) 2 = 0.9506. Interpretation: About 95% of the variation in final grades can be explained by the number of times a student is absent. The other 5% is unexplained and can be due to sampling error or other variables such as intelligence, amount of time studied, etc. The Coefficient of Determination

19
The Standard Error of Estimate, s e,is the standard deviation of the observed y i values about the predicted value. The Standard Error of Estimate

20
1 8 78 74.275 13.8756 2 2 92 97.819 33.8608 3 5 90 86.047 15.6262 4 12 58 58.579 0.3352 5 15 43 46.807 14.4932 6 9 74 70.351 13.3152 7 6 81 82.123 1.2611 92.767 = 4.307 xy Calculate for each x. The Standard Error of Estimate

21
Given a specific linear regression equation and x 0, a specific value of x, a c-prediction interval for y is: where Use a t-distribution with n – 2 degrees of freedom. The point estimate is and E is the maximum error of estimate. Prediction Intervals

22
Construct a 90% confidence interval for a final grade when a student has been absent 6 times. 1. Find the point estimate: The point (6, 82.123) is the point on the regression line with x-coordinate of 6. Application

23
Construct a 90% confidence interval for a final grade when a student has been absent 6 times. 2. Find E, At the 90% level of confidence, the maximum error of estimate is 9.438. Application

24
Construct a 90% confidence interval for a final grade when a student has been absent 6 times. When x = 6, the 90% confidence interval is from 72.685 to 91.586. 3. Find the endpoints. Application – E = 82.123 – 9.438 = 72.685 + E = 82.123 + 9.438 = 91.561 72.685 < y < 91.561

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google