Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Regression

Similar presentations


Presentation on theme: "Introduction to Regression"— Presentation transcript:

1 Introduction to Regression

2 Introduction to linear regression
Simple linear regression Multiple regression Find out relationships between dependent and independent variable Introduction to SPSS

3 Simple linear regression
y = Dependent variable X = Independent variable = y-intercept of the line (constant), cuts through the y-axis = Unknown parameter – Slope of the line = Random error component

4 Terminology for simple regression
X Dependent variable Independent variable Explained variable Explanatory variable Response variable Control variable Predicted variable Predictor variable Regressand Regressor

5 Examples With a sample from the population
denote a random sample of size n from the population File: dataspss-s4.1

6 DETERMINING THE EQUATION OF THE REGRESSION LINE
Deterministic Regression Model – mathematical models that produce an ‘exact’ output for a given input Probabilistic Regression Model- a model that includes an error term that allows for various values of output to occur for a given value of input = predicted value of y = value of independent variable for the ith value = real value of dependent variable for the ith value = population slope = population intercept = error of prediction for the ith value

7 Simple Linear Regression Model
Y Observed value of Y for Xi εi Slope = β1 Predicted value of Y for Xi Random error for this Xi value Intercept = β0 Xi X

8 Sample Regression Function (SRF)
= estimated value of Y for observation i xi = value of X for observation i b0 = Y- intercept is the value of Y when X is zero b1 = slope of the regression line change in Y for 1 unit X b1 > 0 : Line will go up; positive relationship between X and Y b1 < 0 : Line will go down; negative relationship between X and Y

9 SIMPLE LINEAR REGRESSION MODEL (sample)
where

10 Sample Regression Function (SRF)
(continued) b0 and b1 are obtained by finding the values of b and b1 that minimizes the sum of the squared residuals (minimize the error). This process is called Least Squares Analysis b0 provides an estimate of 0 b1 provides and estimate of 1 Yi = actual value of Y for observation i = predicted value of Y for observation i ei = residual (error)

11 RESIDUAL ANALYSIS Residual :Difference between the actual y value and the predicted y values predicted by the regression model Sum of the residuals is zero Question for the students: What will we get if we sum the squared residuals?

12 X = experience years (year) Y = Income (10 million VND)
Simple example 1 -2 4 2 -1 3 5 8 Average =3 Average =2 Total=7 10 X = experience years (year) Y = Income (10 million VND)

13 Fit to the data y . 4 3 . . 2 . . 1 x 1 2 3 4 5

14 Result of estimation by SPSS

15 Meaning of b0 and b1 Y-Intercept (b0) Slope (b1)
Average value of individual income (Y) is (10 million VND) when when the experience year (X) is 0 Slope (b1) Income (Y) is expected to increase by 0.7 (*10 million VND) for each unit increased in experience year b1 > 0 : Line will go up; positive relationship between X and Y (increase) b1 < 0 : Line will go down; negative relationship between X and Y (decrease)

16 Measures of Variation Total variation is made up of two parts.
Total Sum of Squares Regression Sum of Squares Error Sum of Squares Measures the variation of the Yi values around their mean Y. Variation attributable to factors other than the relationship between X and Y. Explained variation attributable to the relationship between X and Y. /* Other notation for SSyy is SST. They are the same!

17 Measure of Variation: The Sum of Squares
(continued) Y SSE =(Yi - Yi )2 _ SSyy = (Yi - Y)2 _ SSR = (Yi - Y)2 _ Y X Xi

18 Result of estimation by SPSS

19 Coefficient of Determination, r2
The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable. The coefficient of determination is also called r-squared and is denoted as r2 Note:

20 The Coefficient of Determination r2 and the Coefficient of Correlation r
r2 = Coefficient of Determination Measures the % of variation in Y that is explained by the independent variable X in the regression model r = Coefficient of Correlation Measures how strong the relationship is between X and Y r > 0 if b1>0 r < 0 if b1 <0 -1<r<1

21 Examples of Approximate r2 values
Perfect linear relationship between X and Y. 100% of the variation in Y is explained by variation in X. Y Y X X r2 = 1 r2 = 1 Y No linear relationship between X and Y. The value of Y does not depend on X (None of the variation in Y is explained by variation in X). r2 = 0 X

22 Examples of Approximate r2 values
Y Weaker linear relationships between X and Y. Some but not all of the variation in Y is explained by variation in X. X Y X

23 Standard Error of the Estimate
The standard deviation of the variation of observations around the regression line is estimated by: Where SSE = error sum of squares n = sample size

24 Result of estimation by SPSS

25 Inferences About the Slope
The standard error of the regression slope coefficient (b1) is estimated by: Where = Estimate of the standard error of the least squares slope. = Standard error of the estimate.

26 Result of estimation by SPSS
Sb1 =

27 Inference about the Slope: t Test
t test for a population slope: Is there a linear relationship between X and Y? Null hypothesis (H0) and Alternative hypothesis (H1) H0: β1 = 0 (no linear relationship) H1: β1 ≠ 0 (linear relationship does exist) Test statistic with d.f. = n-2 Where b1 = regression slope coefficient β1 = hypothesized slope Sb1 = standard error of the slope

28 Result of estimation by SPSS

29 Inferences about the Slope: t Test Example
Test Statistic: t = 3.66 T critical = +/ (from t tables) Decision: Reject H0 Conclusion: There is sufficient evidence that number of customers affects weekly sales. d.f. = 5-2 = 3 Do not reject H0 /2=.025 /2=.025 Reject H0 Reject H0 -t/2 t/2 -3.182 3.182 3.66

30 F Test for Significance
F Test statistic: Where F follows an F distribution with k numerator and (n – k - 1) denominator degrees of freedom. k = the number of independent (explanatory) variables in the regression model.

31 Result of estimation by SPSS

32 F Test for Significance Example
df1= k =1 df2 = n-k-1=5-1-1 Test Statistic: Conclusion: Reject H0 at  = 0.05 There is sufficient evidence that number of customers affects weekly sales. H0: β1 = 0 H1: β1 ≠ 0  = .05 df1= df2 = 3 Critical Value: F =  = .05 F Do not reject H0 F.05 = Reject H0

33 Introduction to SPSS (file: dataspss-s4.1)

34 Result of estimation by SPSS

35 Voice of result R-squared ranges in value between 0 and 1
R2 = 0, nothing to help explain the variance in y R2 = 1, all the same points lie on the estimated regression line Example: R2 = 0.93 implies that the regression equation explains 93% of the variation in the dependent variable Sig. (significant): Goodness of fit only if Sig. of coefficient < 0.01  significant at 1%, Ho is rejected 0.01 ≤ Sig. value < 0.05  significant at 5%, Ho is rejected Sig. value > 0.1  significant at 10%, Ho is rejected

36 Introduction to multiple regression
Find out relationships between dependent and independent variables Dummy variable enclosed Solution? and SPSS

37 Linear regression y = Dependent (or response variable)
X1, X2, …., Xn = Independent or predictor variables = y-intercept of the line (constant), cuts through the y-axis = Unknown parameter – Slope of the line = Random error component

38 Terminology for simple regression
X1, x2, …, xk Dependent variable Independent variable Explained variable Explanatory variable Response variable Control variable Predicted variable Predictor variable Regressand Regressor

39 Example File: dataspss-s4.2 Dependent variable ? Independent variables
SPSS program Estimate and discuss

40 Think? Survey conducted with variables
Income Age Years in experience working Education Gender ……… Think which one is dependent and independent variables?

41 Regression with dummy independent variables
Independent variable: Gender 1= female, 0 = male If coefficient estimated of gender is a positive value, dependent variable is the direction increase with female If coefficient estimated of gender is negative value, dependent variable is the direction increase with male.

42 Practice dataspss-s4.2 Develop hypothesis Develop regression model
Identify dependent variable Identify independent variables Discussion on output

43 Samples of hypotheses An increase in education does not cause a rise in the earning People’s earning is not positively influenced by their age There is not a significant relationship between the earning and the gender


Download ppt "Introduction to Regression"

Similar presentations


Ads by Google