Presentation is loading. Please wait. # Elementary Statistics Correlation and Regression.

## Presentation on theme: "Elementary Statistics Correlation and Regression."— Presentation transcript:

Elementary Statistics Correlation and Regression

Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes smoked per day Score on SAT Height Hours of Training Explanatory (Independent) Variable Response (Dependent) Variable A relationship between two variables Number of Accidents Shoe SizeHeight Lung Capacity Grade Point Average IQ

Negative Correlation–as x increases, y decreases x = hours of training y = number of accidents Scatter Plots and Types of Correlation 60 50 40 30 20 10 0 02468 1214161820 Hours of Training Accidents

Positive Correlation–as x increases, y increases x = SAT score y = GPA GPA Scatter Plots and Types of Correlation 4.00 3.75 3.50 3.00 2.75 2.50 2.25 2.00 1.50 1.75 3.25 300350400450500550600650700750800 Math SAT

No linear correlation x = height y = IQ Scatter Plots and Types of Correlation 160 150 140 130 120 110 100 90 80 606468727680 Height IQ

Correlation Coefficient A measure of the strength and direction of a linear relationship between two variables The range of r is from –1 to 1. If r is close to 1 there is a strong positive correlation. If r is close to –1 there is a strong negative correlation. If r is close to 0 there is no linear correlation. –1 0 1

x y 8 78 2 92 5 90 12 58 15 43 9 74 6 81 Absences Final Grade Application 95 90 85 80 75 70 65 60 55 45 40 50 0246810121416 Final Grade X Absences

6084 8464 8100 3364 1849 5476 6561 624 184 450 696 645 666 486 57516375157939898 1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81 64 4 25 144 225 81 36 xy x 2 y2y2 Computation of r n x y

r is the correlation coefficient for the sample. The correlation coefficient for the population is (rho). The sampling distribution for r is a t-distribution with n – 2 d.f. Standardized test statistic For a two tail test for significance: For left tail and right tail to test negative or positive significance: Hypothesis Test for Significance (The correlation is not significant) (The correlation is significant)

A t-distribution with 5 degrees of freedom Test of Significance You found the correlation between the number of times absent and a final grade r = –0.975. There were seven pairs of data.Test the significance of this correlation. Use = 0.01. 1. Write the null and alternative hypothesis. 2. State the level of significance. 3. Identify the sampling distribution. (The correlation is not significant) (The correlation is significant) = 0.01

t 0 4.032 –4.032 Rejection Regions Critical Values ± t 0 4. Find the critical value. 5. Find the rejection region. 6. Find the test statistic.

t 0 –4.032 t = –9.811 falls in the rejection region. Reject the null hypothesis. There is a significant correlation between the number of times absent and final grades. 7. Make your decision. 8. Interpret your decision.

The equation of a line may be written as y = mx + b where m is the slope of the line and b is the y-intercept. The line of regression is: The slope m is: The y-intercept is: Once you know there is a significant linear correlation, you can write an equation describing the relationship between the x and y variables. This equation is called the line of regression or least squares line. The Line of Regression

180 190 200 210 220 230 240 250 260 1.52.02.53.0 Ad \$ = a residual (xi,yi)(xi,yi) = a data point revenue = a point on the line with the same x-value

Calculate m and b. Write the equation of the line of regression with x = number of absences and y = final grade. The line of regression is:= –3.924x + 105.667 6084 8464 8100 3364 1849 5476 6561 624 184 450 696 645 666 486 57 516375157939898 1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81 64 4 25 144 225 81 36 xy x 2 y2y2 x y

0246810121416 40 45 50 55 60 65 70 75 80 85 90 95 Absences Final Grade m = –3.924 and b = 105.667 The line of regression is: Note that the point = (8.143, 73.714) is on the line. The Line of Regression

The regression line can be used to predict values of y for values of x falling within the range of the data. The regression equation for number of times absent and final grade is: Use this equation to predict the expected grade for a student with (a) 3 absences(b) 12 absences (a) (b) Predicting y Values = –3.924(3) + 105.667 = 93.895 = –3.924(12) + 105.667 = 58.579 = –3.924x + 105.667

The coefficient of determination, r 2, is the ratio of explained variation in y to the total variation in y. The correlation coefficient of number of times absent and final grade is r = –0.975. The coefficient of determination is r 2 = (–0.975) 2 = 0.9506. Interpretation: About 95% of the variation in final grades can be explained by the number of times a student is absent. The other 5% is unexplained and can be due to sampling error or other variables such as intelligence, amount of time studied, etc. The Coefficient of Determination

The Standard Error of Estimate, s e,is the standard deviation of the observed y i values about the predicted value. The Standard Error of Estimate

1 8 78 74.275 13.8756 2 2 92 97.819 33.8608 3 5 90 86.047 15.6262 4 12 58 58.579 0.3352 5 15 43 46.807 14.4932 6 9 74 70.351 13.3152 7 6 81 82.123 1.2611 92.767 = 4.307 xy Calculate for each x. The Standard Error of Estimate

Given a specific linear regression equation and x 0, a specific value of x, a c-prediction interval for y is: where Use a t-distribution with n – 2 degrees of freedom. The point estimate is and E is the maximum error of estimate. Prediction Intervals

Construct a 90% confidence interval for a final grade when a student has been absent 6 times. 1. Find the point estimate: The point (6, 82.123) is the point on the regression line with x-coordinate of 6. Application

Construct a 90% confidence interval for a final grade when a student has been absent 6 times. 2. Find E, At the 90% level of confidence, the maximum error of estimate is 9.438. Application

Construct a 90% confidence interval for a final grade when a student has been absent 6 times. When x = 6, the 90% confidence interval is from 72.685 to 91.586. 3. Find the endpoints. Application – E = 82.123 – 9.438 = 72.685 + E = 82.123 + 9.438 = 91.561 72.685 < y < 91.561

Download ppt "Elementary Statistics Correlation and Regression."

Similar presentations

Ads by Google