Download presentation
Presentation is loading. Please wait.
1
Chapter 5 STATISTICS (PART 4)
2
introduction Many problems in science and engineering involve exploring the relationship between two or more variables. Two statistical techniques: Regression Analysis Computing the Correlation Coefficient (r). The correlation coefficient (r ) Tells us how strongly two variables are related.
3
introduction In simple correlation and regression studies, the researcher collects data on two numerical or quantitative variables to see whether a relationship exists between the variables. For example, if a researcher wishes to see whether there is a relationship between number of hours of study and test scores on an exam, she must select a random sample of students, determine the hours each studied, and obtain their grades on the exam. A table can be made for the data, as shown here. Student A B C D E F Hour of study, x 6 2 1 5 3 Grade, y (%) 82 63 57 88 68 75
4
Regression and Correlation
Regression is a statistical procedure for establishing the relationship between 2 or more variables Linear regression - study on the linear relationship between two or more variables. This is done by fitting a linear equation to the observed data. 2 types of relationship: 1) Simple ( 2 variables) 2) Multiple (more than 2 variables)
5
In simple linear regression only two variables are involved:
X is the independent variable. Y is dependent variable. Independent variable: Variable in regression that can be manipulated. Also known as explanatory variable. Dependent variable: Variable in regression that cannot be manipulated. Also known as response variable. Correlation describes the strength of a linear relationship between two variables Linear means “straight line”
6
Example 6.1: A nutritionist studying weight loss programs might wants to find out if reducing intake of carbohydrate can help a person reduce weight. a) X is the carbohydrate intake (independent variable). b) Y is the weight (dependent variable). An entrepreneur might want to know whether increasing the cost of packaging his new product will have an effect on the sales volume. a) X is cost b) Y is sales volume
7
Regression Uses a variable (x) to predict some outcome variable (y)
Tells you how values in y change as a function of changes in values of x
8
Regression Calculates the “best-fit” line for a certain set of data.
Best fit means that the sum of the squares of the vertical distances from each point to the line is at a minimum. The reason you need a line of best fit is that the values of y will be predicted from the values of x; hence, the closer the points are to the line, the better the fit and the prediction will be
9
The Simple Linear Regression Model
is an equation that describes a dependent variable (Y) in terms of an independent variable (X) plus random error, where, = intercept of the line with the Y-axis = slope of the line = random error Random error, is the difference of data point from the deterministic value. This regression line is estimated from the data collected by fitting a straight line to the data set and getting the equation of the straight line,
10
Regression Equation Regression equation describes the regression line mathematically Intercept, Slope, By using the least squares method (a procedure that minimizes the vertical deviations of plotted points surrounding a straight line) we are able to construct a best fitting straight line to the scatter diagram points and then formulate a regression equation in the form of:
11
Regression Equation Positive Linear Relationship E(y) Regression line
x Regression line Intercept b0 Slope b1 is positive
12
Negative Linear Relationship
E(y) x Intercept b0 Regression line Slope b1 is negative
13
No Relationship E(y) x Intercept b0 Regression line Slope b1 is 0
14
INFERENCES ABOUT ESTIMATED PARAMETERS: Least Squares Method
The least squares method is commonly used to determine values for and that ensure a best fit for the estimated regression line to the sample data points. Rounding rule for intercept and slope: 3 decimal places. The straight line fitted to the data set is the line:
15
LEAST SQUARES METHOD Theorem:
Given the sample data , the coefficients of the least squares line are: i) y-intercept for the Estimated Regression Equation: and are the mean of x and y respectively.
16
ii) Slope for the Estimated Regression Equation,
where
17
LEAST SQUARES METHOD Given any value of the predicted value of the dependent variable , can be found by substituting into the equation
18
Exercise 6.1: A sample of 6 persons was selected the value of their age ( x variable) and their weight is demonstrated in the following table. Find the regression equation and what is the predicted weight when age is 8.5 years. Serial no. Age (x) Weight (y) 1 2 3 4 5 6 7 8 9 12 10 11 13
19
Answer Serial no. Age (x) Weight (y) xy X2 Y2 1 2 3 4 5 6 7 8 9 12 10 11 13 84 48 96 50 66 117 49 36 64 25 81 144 100 121 169 Total 41 461 291 742
20
Compute Compute
21
Compute Thus, regression model is let x = 8.5:
22
let x = 7.5 : we create a regression line by plotting two estimated values for y against their X component, then extending the line right and left.
23
Exercise 6.2: The data below represent scores obtained by ten primary school students before and after they were taken on a tour to the museum (which is supposed to increase their interest in history) Before,x 65 63 76 46 68 72 57 36 96 After, y 66 86 48 71 42 87 Find a linear regression model with “before” as the explanatory variable and “after” as the dependent variable. Predict the score a student would obtain “after” if he scored 60 marks “before”.
26
Correlation Finding the relationship between two quantitative variables without being able to infer causal relationships Correlation is a statistical technique used to determine the degree to which two variables are related
27
Scatter plots The pattern of data is indicative of the
type of relationship between your two variables: positive relationship negative relationship no relationship
28
Positive relationship
30
Negative relationship
Reliability Age of Car
31
No relation
32
Simple Correlation coefficient (r)
Correlation coefficient – statistic showing the degree of relation between two variables It is also called Pearson's correlation or product moment correlation coefficient. It measures the nature and strength between two variables of the quantitative type. The symbol for the sample coefficient of correlation is r, population ρ. Formula :
33
The sign of r denotes the nature of association
while the value of r denotes the strength of association. If the sign is +ve this means the relation is direct (an increase in one variable is associated with an increase in the other variable and a decrease in one variable is associated with a decrease in the other variable). While if the sign is –ve this means an inverse or indirect relationship (which means an increase in one variable is associated with a decrease in the other).
34
Properties of : Values of close to 1 implies there is a strong positive linear relationship between x and y. Values of close to -1 implies there is a strong negative linear relationship between x and y. Values of close to 0 implies little or no linear relationship between x and y.
35
The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the association as illustrated by the following diagram. strong intermediate weak -1 1 -0.25 -0.75 0.75 0.25 no relation perfect correlation Direct indirect
36
If r = Zero this means no association or correlation between the two variables.
If 0 < r < 0.25 = weak correlation. If 0.25 ≤ r < 0.75 = intermediate correlation. If 0.75 ≤ r < 1 = strong correlation. If r = l = perfect correlation.
38
Refer Exercise 6.2: Students score in history c) Calculate the value of r and interpret its meaning Solution: Thus, there is a strong positive linear relationship between score obtain before (x) and after (y).
39
Exercise 6.3: A sample of 6 children was selected, data about their age in years and weight in kilograms was recorded as shown in the following table . It is required to find the correlation between age and weight. serial No Age (years) Weight (Kg) 1 7 12 2 6 8 3 4 5 10 11 9 13
40
Serial n. Age (years) (x) Weight (Kg) (y) xy x2 y2 1 7 12 84 49 144 2 6 8 48 36 64 3 96 4 5 10 50 25 100 11 66 121 9 13 117 81 169 Total ∑x= 41 ∑y= ∑xy= 461 ∑x2= 291 ∑y2= 742
41
r = 0.759 Since the r value close to 1 implies that there is strong positive linear relationship between age (x) and weight (y).
42
Exercise 6.4: 1. Hospital Beds Licensed beds and staffed beds data follow: Find y when x 44. 2. Commercial Movie Releases New movie releases per studio and gross receipts are as follows: i) Find y when x = 200 new releases ii) Is there a significant relationship between these two variables? Licensed beds, x 144 32 175 185 208 100 169 Staffed beds, y 112 162 141 103 80 118 No. of releases 361 270 306 22 35 10 8 12 21 Gross receipts (million $) 3844 1962 1371 1064 334 241 188 154 125
43
Answer: 1. 2. positive linear relationship between x and y
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.