Download presentation
Presentation is loading. Please wait.
Published byGladys Clark Modified over 8 years ago
1
Correlation and Regression
2
O UTLINE Introduction 10-1 Scatter plots. 10-2 Correlation. 10-3 Correlation Coefficient. 10-4 Regression.
3
Correlation and Regression numerical or quantitative Correlation and Regression are inferential statistics involves determining whether a relationship between two or more numerical or quantitative variables exists. Examples: Is the number of hours a student studies is related to the student’s score on a particular exam? Is caffeine related to heart damage? Is there a relationship between a person’s age and his or her blood pressure?
4
I NTRODUCTION Correlation Correlation is a statistical method used to determine whether a relationship between variables exists. Regression Regression is a statistical method used to describe the nature of the relationship between variables (positive or negative, linear or nonlinear).
5
1. Are two or more variables related? 2. If so, what is the strength of the relationship? 3. What type of relationship exists? 4. What kind of predictions can be made from the relationship? S TATISTICAL Q UESTIONS
6
There are two types of relationships simple multiple there are two types of variables: independent variable independent variable (explanatory or predictor variable) dependent variable dependent variable (outcome or response variable). there are two or more independent variables that are used to predict one dependent variable.
7
Simple relationship (independent + dependent) Positive relationship Negative relationship both variables increase or decrease at the same time. Example: a person’s height and perfect weight. one variable increases, the other variable decreases and vice versa. Example: the strength of people over 60 years of age.
8
Example : Is there a relationship between a person’s age and his or her blood pressure? The type of relationship: independent variable(s): The independent variable(s): The dependent variable:
9
Example: Is there a relationship between a students final score in math and factors such as the number of hours a student studies, the number of absences, and the IQ score. The type of relationship: independent variable(s): The independent variable(s): The dependent variable:
10
Scatter Plots
11
scatter plot A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the independent variable x and the dependent variable y. + linear relationship - linear relationship No linear relationship
12
Construct a scatter plot for the data obtained in a study on the number of absences and the final grades of seven randomly selected students from a statistics class. Example 10-2: StudentNumber of absencesFinal grade A682 B286 C1543 D974 E1258 F590 G878
14
From the following scatter plot, the relationship can be described as: a) Positive b) Strong positive c) Strong negative d) No relationship
15
Correlation
16
The correlation coefficient computed from the sample data measures the strength and direction of a linear relationship between two variables. The range of the correlation coefficient is from 1 to 1. symbol for the sample r symbol for the population
17
If there is a strong positive linear relationship between the variables, the value of r will be close to 1. If there is a strong negative linear relationship between the variables, the value of r will be close to 1.
18
MeaningCorrelation Coefficient value Complete Positive Linear Relationship+1 Strong Positive Linear Relationship من 0.70 إلى 0.99 Moderate Positive Linear Relationship من 0.50 إلى 0.69 Weak Positive Linear Relationship من 0.01 إلى 0.49 No Linear Relationship0 Weak Negative Linear Relationship من -0.01 إلى -0.49 Moderate Negative Linear Relationship من -0.50 إلى 0.69- Strong Negative Linear Relationship من 0.70- إلى 0.99- Complete Negative Linear Relationship
20
Which of the following values does not represent a correlation coefficient: a) r= 1.5 b) r= 0.90 c) r= 0 d) r= -1
21
If the correlation coefficient between 2-variable equal (-0.096), this means the relationship between 2-variable is: a) Weak positive b) Weak negative c) Strong positive d) Strong negative --------------------------------------------------------------------------
22
Pearson Ch (10) Spearman Rank Ch (13) -Denoted by ( r ) -Only Used when Two variables are quantitative. -Denoted by ( r s ) -Used when Two variables are Quantitative or Qualitative. Correlation Coefficient
23
Pearson linear correlation coefficient is where n is the number of data pairs (sample size).
24
Example 10-5: Compute the correlation coefficient for the data in Example 10–2. Student Number of absences x Final grade y xyx2x2 y2y2 A682 B286 C1543 D974 E1258 F590 G878
25
Compute the correlation coefficient for the data in Example 10–1. Example 10-4: compa ny Cars x Income y xyx2x2 y2y2 A63.07.0441396949 B29.03.9113.1084115.21 C20.82.143.68432.644.41 D19.12.853.48364.817.84 E13.41.418.76179.561.96 F8.51.52.7572.252.25 Σx = 153.8Σy = 18.7 Σxy = 682.77 Σx 2 = 5859.26Σy 2 = 80.67
26
Solution :
27
Rank Correlation Coefficient
28
Spearman rank correlation coefficient, Other types of correlation coefficients. Is called the Spearman rank correlation coefficient, can be used when the data are ranked. Where d = difference in ranks. n = number of data pairs.
29
EX:EX: To study the correlation estimates of the students in the subject of statistics and mathematics in their estimates we chose 5 students and their estimates were as follows: BDCAFStat (X) AFBCDMath (Y)
30
Example 13-7 P(698): Two students were asked to rate eight different textbooks for a specific course on an ascending scale from 0 to 20 points. Compute the correlation coefficient for the data: Textbook.Student 1 Student 2 Rank(X 1 )Rank(X 2 )d=X 1 – X 2 d² ABCDEFGHABCDEFGH 4 10 18 20 12 2 5 9 4 6 20 14 16 8 11 7 Total
31
Example 13-7 P(698): Textbook.Student 1 Student 2 Rank(X 1 )Rank(X 2 )d=X 1 – X 2 d² ABCDEFGHABCDEFGH 4 10 18 20 12 2 5 9 4 6 20 14 16 8 11 7 7421386574213865 8713254687132546 -3 1 -2 1 3 2 1914194119141941 Total30
33
If the different between the ranks of two variables are (-1, 0, 0, -1, 4, -2), Find the value of the Spearman correlation coefficient? a) r s = 1.5 b) r s = 0.90 c) r s = 0.371
34
Regression
35
regression line If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is the data’s line of best fit.
36
Best fit means that the sum of the squares of the vertical distance from each point to the line is at a minimum.
38
Example 10-9: Find the equation of the regression line for the data in Example 10–4, and graph the line on the scatter plot. Σx = 153.8, Σy = 18.7, Σxy = 682.77, Σx 2 = 5859.26, Σy 2 = 80.67, n = 6
40
If age (x), number of sick days (y) And the equation of the regression line is: y`= 12.5 - 0.615x Answer questions: 1- Predict the number of sick days if age is 40 years. =-12.1 ------------------------------------------------------------------------ 2- For each increase of age, the number of sick days: a) Decreases by 0.615 on average. b) Increases by 0.615 on average. ------------------------------------------------------------------------ 3- The correlation coefficient (relationship) between two variables: a) Positive b) Negative c) -0.615 d) 12.5 ------------------------------------------------------------------------ 4- What is the slope of the regression line? a) 12.5 b) 0.615 c) -0.615
41
Predictions Predictions are made in all areas and daily. Examples include weather forecasting, stock market analyses, sales predictions, crop predictions, gasoline price predictions, and sports predictions. Some predictions are more accurate than others, due to the strength of the relationship. That is, the stronger the relationship is between variables, the more accurate the prediction is.
42
Find two points to sketch the graph of the regression line. Use any x values between 10 and 60. For example, let x equal 15 and 40. Substitute in the equation and find the corresponding y value. Plot (15,1.986) and (40,4.636), and sketch the resulting line.
44
For Example: If Absences (x) and Final Grade (y) :b= -3.622, which means: for each increase of 1 absences (x), the value of (y) changes -3.622 unit (the final grade decrease 3.622 scores) on average.
45
*R EMARK : o The sign of the correlation coefficient and the sign of the slope of the regression line will always be the same. r (positive) ↔ b (positive) r (negative) ↔ b (negative) Car Rental Companies: r=0.982,b=0.106 Absences and Final Grade: r= -0.944, b= -3.622 The regression line will always pass through the poin t (, ӯ ).
46
*R EMARK : The magnitude of the change in one variable when the other variable changes exactly 1 unit is called a marginal change. The value of slope b of the regression line equation represent the marginal change. For Example: Car Rental Companies: b= 0.106, which means for each increase of 10,000 cars, the value of y changes 0.106 unit (the annual income increase $106 million) on average.
47
Example 10-11: Use the equation of the regression line to predict the income of a car rental agency that has 200,000 automobiles. x = 20 corresponds to 200,000 automobiles. Hence, when a rental agency has 200,000 automobiles, its revenue will be approximately $2.516 billion.
48
If (x) is the temperature degree, and (y) is the number of emergency calls, Choose the right equation of the regression line. When the temperature increases by one degree, then the emergency calls decreases by 0.025 on average. a) y`= 12.5 + 0.615x b) y`= 12.5 - 0.615x c) y`= -4.66 + 0.025x d) y`= 0.99 – 0.025x
49
When we study the relationship between the number of hours a person exercises and his weight, the correlation coefficient may be: a) -0.90 b) 0.88 c) 0.90 d) -0.01
50
The correlation coefficient and the slope of the regression are the same in: a) Only sign. b) Only value c) Sign and value ------------------------------------------------------------------------------ If the both variables (x,y) are quantitative data, then the appropriate correlation coefficient is: a) Spearman b) Rank c) Pearson
51
To determine the relationship between variables exists is to use the ….. a) Correlation b) Scatter plot c) Regression d) a and c ------------------------------------------------------------------------------ The range of correlation coefficient (r) is: a) -1 ≤ r ≤ 1 b) -1 < r <1 c) -1 < r ≤ 1 d) -1 ≤ r < 1
52
If the value of correlation coefficient (r) will be close to 1, then the relationship between variables is: a) Positive linear b) Strong positive linear c) Weak positive linear d) Complete positive linear ------------------------------------------------------------------------------
53
*** Remark *** The sign of the slope of the regression line indicates the direction of the relationship, Positive slope Positive relationship Negative slope Negative relationship
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.