Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applied Regression Analysis BUSI 6220

Similar presentations


Presentation on theme: "Applied Regression Analysis BUSI 6220"— Presentation transcript:

1 Applied Regression Analysis BUSI 6220
KNN Ch. 2 Inferences in Simple Linear Regression Adapted from notes by: Dr. K.N.Thompson, Dept. of Marketing, University of North Texas, 1999 Dr. S. Kulkarni, ITDS Dept., University of North Texas, 2004 Dr. N. Evangelopoulos, ITDS Dept., University of North Texas, 2012

2 Normal Error Regression Model
0 and 1 are parameters; Xi = value of predictor on ith trial ( a constant); i = 1, …. n Yi= value of observed response on ith trial; independent normal random variables E{Yi}= 0 + 1 Xi Variance of 2 i is a random error term N(0, 2) E{i }=0 (expected value of error terms is zero) 2 {i} = 2 (variance of error terms is constant)  {i, j} = 0 for all i, j; i  j (error terms are independent; do not covary; are not correlated) Error terms are normally distributed

3 Inferences About 1 -- The Slope of the Regression Line

4 Usual inference about 1
Null Hypothesis Alternative Hypothesis H0: Slope of the regression line is 0; there is no linear relationship between X and Y.

5 Slope of the regression line is 0;
There is no linear relationship between X and Y; Regression line is horizontal Means of probability distributions for all Yi are equal: Probability distributions of all Yi are identical

6 Required Test Statistic for Evaluating Ho
Studentized test statistic (s is estimated) Distributed as tn-2

7 Estimating Point estimator s2{b1} is an unbiased estimator of 2{b1}

8 Estimating s{b1} for GMAT Example

9

10

11 For GMAT Example

12 Two-Sided (Tailed) Test of Ho GMAT Example
There is no linear relationship between GMAT and GPA Control risk of Type I error at  = .05 Test statistic is t*

13 Decision Rule for Test of Ho

14 The Decision... The null hypothesis must be rejected...
b1= .0084; S{b1} = .0014 t* = .0084/.0014 = 5.83 t (.975, 18)=2.101 (critical t) t* > t, therefore not Ho The null hypothesis must be rejected...

15 IBM SPSS Results... Computes probability of two-tailed “t” directly -- much better!

16 A One-Sided (Tailed) Test of 1
Assume for GMAT example that we think that the relationship between GMAT and GPA should always be positive...

17 Null & Alternative Hypotheses
Decision Rules

18 For the GMAT Data... t* = 5.831 (same as before) t is now smaller
all 5% is in one ‘tail’, rather than spread across two tails t = 1.734 t* > t, reject not Ho b1 probably is not less than or equal to 0 May assume b1 is positive

19 ANOVA Approach to Regression

20 Essentially... ANOVA partitions the sum of square (SS) in the criterion variable into two parts: SS that can be attributed to the predictor; and, Error SS -- sum of squares unique to the criterion Total SS in criterion is SSTO SS attributed to predictor is SSR Error or unique SS is SSE

21 Sums of Squares are Additive
SSTO SSR SSE = +

22 The ANOVA Table E{MSR} = E{MSE} = 2 i.e. an unbiased estimator of the error variance.. If 1 = 0, MSR and MSE about same size & F* will be small...

23 GMAT Example..

24 GMAT ANOVA Table

25 IBM SPSS ANOVA

26 The F-test for the ANOVA..
Appropriate test is F An upper tail test F* is distributed as F(1-; 1, n-2) Ho: 1=0; Ha: 10 Decision rule If F* F (1-; 1, n-2), conclude Ho If F*> F (1-; 1, n-2), conclude not Ho For GMAT example F*=34.005, F(.95;1,18)=4.41 Conclude not Ho

27 Relationship Between F and t
In simple regression (i.e. a single predictor variable is employed) for a given  F* = (t*)2 t is two-tailed

28 General Linear Test Approach

29 Approach Involves... Fit a ‘full model’ to the data and obtain SSE(F)
Simply the SSE obtained from fitting a standard regression line to the data: Yi= 0+ 1Xi+i Fit a ‘reduced model’ to the data and obtain SSE(R) Consider Ho -- usually Ho: 1=0 Model when Ho holds is the reduced model When 1=0, model reduces to Yi= 0+I Because best estimator of 0 is , SSE(R) = SSE(R) = SSTO

30 Test Statistic Since SSE(F) = SSE, df=n-2 Since SSE(R) = SSTO, df=n-1

31 Coefficient of Determination
r2 is the coefficient of determination r2 = SSR/SSTO = 1-SSE/SSTO 0  r2  1 r2 is the ‘proportion of variance in the criterion associated with the use of the predictor’ When all observations fall directly on regression line, predictor perfectly explains all variation in the criterion and r2 = 1 When regression line is horizontal (b1=0), SSE = SSTO and r2 = 0 (Caveat: What happens when line is horizontal but all points fall on it?)

32 GMAT Example r2 = SSR/SSTO = 6.434/9.84 = .654

33 Measures of Strength of Association
(The coefficient of correlation)

34 Correlation Coefficient
Pearson’s Product-Moment Correlation rxy = Correlation between two continuous variables measured at least at interval level -1  rxy  +1 Actually, (prove this as an exercise)

35 Correlation Coefficient
Pearson’s Product-Moment Correlation (continued) r = Correlation between two continuous variables measured at least at interval level Unlike r2, does not have a clear-cut interpretation Used extensively in behavioral research Inflates apparent relationship between X and Y

36 Relationship Between r and b1
When data are in their original metric, When data are standardized,

37 Limitations of r & r2 High r or r2 may not imply strong predictive capability r’s as high as .9 (r2 = .81) can still have wide confidence intervals for the estimate Always compute confidence or prediction intervals High r or r2 always suggests regression line is a good fit Only if relationship is linear. Can still get relatively high r, r2 if relationship is curvilinear

38 Limitations of r & r2 Low r or r2 always suggests that X and Y are not related, or are weakly related Only if relationship is linear Can still get very low r, r2 if relationship is curvilinear Transforming X, Y, or both prior to constructing regression model my improve the fit (later topic)

39 Formulae for r When X and Y are standardized
When X and Y are in raw score form (non-standardized) .

40 Formulae for r Is rxy a least squares estimator or an MLE estimator?
Is this estimator unbiased? What is it an “estimator” of ? Isn’t this the same as “r” from a simple linear regression model? Point Biserial “r” (one variable dichotomous)

41 Formulae for r Phi Coefficient (Both X and Y dichotomous)

42 Inferences on Correlation Coefficients
Bivariate Normal Population Interpretation of r2XY is important Testing the H0 : rXY= 0 (Relate this to Simple Linear Regression!) If H0 holds, then t* given below ~ tn-2

43 Advance Inference on Correlation Coefficient
(Optional material)

44 Interval Estimation of rXY
Sampling distribution of rXY is complicated when rXY 0 Cannot use “t” ! If n  25 then where, When n  25, The Fisher z transformation

45 Estimation of rXY (Continued.)
Then the CI for is, We have to retransform back to rXY in order to get its CI. See KNN for testing hypotheses about independent samples from two bivariate normal populations Tanh(arctanh(rxy)-Za/2/Sqrt(n-3))<r< Tanh(arctanh(rxy)+Za/2/Sqrt(n-3))

46 What if populations are not Normal?
Resort to non-parametric approach The famous Spearman “Rank” Correlation coefficient, If there are no ties in the ranks then we can use the more commonly found approximation,

47 Hypothesis Test for Population Correlation Coefficient
H0: No association between X and Y Ha: There is association between X and Y Sampling distribution of rs is available in tables and is not too complicated However, when n >10 then we can use, as in the Normal case Spearman’s rank correlation coefficient is also used to test for heteroscedasticity


Download ppt "Applied Regression Analysis BUSI 6220"

Similar presentations


Ads by Google