Applied Regression Analysis BUSI 6220

Applied Regression Analysis BUSI 6220
KNN Ch. 2 Inferences in Simple Linear Regression Adapted from notes by: Dr. K.N.Thompson, Dept. of Marketing, University of North Texas, 1999 Dr. S. Kulkarni, ITDS Dept., University of North Texas, 2004 Dr. N. Evangelopoulos, ITDS Dept., University of North Texas, 2012

Normal Error Regression Model
0 and 1 are parameters; Xi = value of predictor on ith trial ( a constant); i = 1, …. n Yi= value of observed response on ith trial; independent normal random variables E{Yi}= 0 + 1 Xi Variance of 2 i is a random error term N(0, 2) E{i }=0 (expected value of error terms is zero) 2 {i} = 2 (variance of error terms is constant)  {i, j} = 0 for all i, j; i  j (error terms are independent; do not covary; are not correlated) Error terms are normally distributed

Inferences About 1 -- The Slope of the Regression Line

Usual inference about 1
Null Hypothesis Alternative Hypothesis H0: Slope of the regression line is 0; there is no linear relationship between X and Y.

Slope of the regression line is 0;
There is no linear relationship between X and Y; Regression line is horizontal Means of probability distributions for all Yi are equal: Probability distributions of all Yi are identical

Required Test Statistic for Evaluating Ho
Studentized test statistic (s is estimated) Distributed as tn-2

Estimating Point estimator s2{b1} is an unbiased estimator of 2{b1}

Estimating s{b1} for GMAT Example

For GMAT Example

Two-Sided (Tailed) Test of Ho GMAT Example
There is no linear relationship between GMAT and GPA Control risk of Type I error at  = .05 Test statistic is t*

Decision Rule for Test of Ho

The Decision... The null hypothesis must be rejected...
b1= .0084; S{b1} = .0014 t* = .0084/.0014 = 5.83 t (.975, 18)=2.101 (critical t) t* > t, therefore not Ho The null hypothesis must be rejected...

IBM SPSS Results... Computes probability of two-tailed “t” directly -- much better!

A One-Sided (Tailed) Test of 1
Assume for GMAT example that we think that the relationship between GMAT and GPA should always be positive...

Null & Alternative Hypotheses
Decision Rules

For the GMAT Data... t* = 5.831 (same as before) t is now smaller
all 5% is in one ‘tail’, rather than spread across two tails t = 1.734 t* > t, reject not Ho b1 probably is not less than or equal to 0 May assume b1 is positive

ANOVA Approach to Regression

Essentially... ANOVA partitions the sum of square (SS) in the criterion variable into two parts: SS that can be attributed to the predictor; and, Error SS -- sum of squares unique to the criterion Total SS in criterion is SSTO SS attributed to predictor is SSR Error or unique SS is SSE

Sums of Squares are Additive
SSTO SSR SSE = +

The ANOVA Table E{MSR} = E{MSE} = 2 i.e. an unbiased estimator of the error variance.. If 1 = 0, MSR and MSE about same size & F* will be small...

GMAT Example..

GMAT ANOVA Table

IBM SPSS ANOVA

The F-test for the ANOVA..
Appropriate test is F An upper tail test F* is distributed as F(1-; 1, n-2) Ho: 1=0; Ha: 10 Decision rule If F* F (1-; 1, n-2), conclude Ho If F*> F (1-; 1, n-2), conclude not Ho For GMAT example F*=34.005, F(.95;1,18)=4.41 Conclude not Ho

Relationship Between F and t
In simple regression (i.e. a single predictor variable is employed) for a given  F* = (t*)2 t is two-tailed

General Linear Test Approach

Approach Involves... Fit a ‘full model’ to the data and obtain SSE(F)
Simply the SSE obtained from fitting a standard regression line to the data: Yi= 0+ 1Xi+i Fit a ‘reduced model’ to the data and obtain SSE(R) Consider Ho -- usually Ho: 1=0 Model when Ho holds is the reduced model When 1=0, model reduces to Yi= 0+I Because best estimator of 0 is , SSE(R) = SSE(R) = SSTO

Test Statistic Since SSE(F) = SSE, df=n-2 Since SSE(R) = SSTO, df=n-1

Coefficient of Determination
r2 is the coefficient of determination r2 = SSR/SSTO = 1-SSE/SSTO 0  r2  1 r2 is the ‘proportion of variance in the criterion associated with the use of the predictor’ When all observations fall directly on regression line, predictor perfectly explains all variation in the criterion and r2 = 1 When regression line is horizontal (b1=0), SSE = SSTO and r2 = 0 (Caveat: What happens when line is horizontal but all points fall on it?)

GMAT Example r2 = SSR/SSTO = 6.434/9.84 = .654

Measures of Strength of Association
(The coefficient of correlation)

Correlation Coefficient
Pearson’s Product-Moment Correlation rxy = Correlation between two continuous variables measured at least at interval level -1  rxy  +1 Actually, (prove this as an exercise)

Correlation Coefficient
Pearson’s Product-Moment Correlation (continued) r = Correlation between two continuous variables measured at least at interval level Unlike r2, does not have a clear-cut interpretation Used extensively in behavioral research Inflates apparent relationship between X and Y

Relationship Between r and b1
When data are in their original metric, When data are standardized,

Limitations of r & r2 High r or r2 may not imply strong predictive capability r’s as high as .9 (r2 = .81) can still have wide confidence intervals for the estimate Always compute confidence or prediction intervals High r or r2 always suggests regression line is a good fit Only if relationship is linear. Can still get relatively high r, r2 if relationship is curvilinear

Limitations of r & r2 Low r or r2 always suggests that X and Y are not related, or are weakly related Only if relationship is linear Can still get very low r, r2 if relationship is curvilinear Transforming X, Y, or both prior to constructing regression model my improve the fit (later topic)

Formulae for r When X and Y are standardized
When X and Y are in raw score form (non-standardized) .

Formulae for r Is rxy a least squares estimator or an MLE estimator?
Is this estimator unbiased? What is it an “estimator” of ? Isn’t this the same as “r” from a simple linear regression model? Point Biserial “r” (one variable dichotomous)

Formulae for r Phi Coefficient (Both X and Y dichotomous)

Inferences on Correlation Coefficients
Bivariate Normal Population Interpretation of r2XY is important Testing the H0 : rXY= 0 (Relate this to Simple Linear Regression!) If H0 holds, then t* given below ~ tn-2

Advance Inference on Correlation Coefficient
(Optional material)

Interval Estimation of rXY
Sampling distribution of rXY is complicated when rXY 0 Cannot use “t” ! If n  25 then where, When n  25, The Fisher z transformation

Estimation of rXY (Continued.)
Then the CI for is, We have to retransform back to rXY in order to get its CI. See KNN for testing hypotheses about independent samples from two bivariate normal populations Tanh(arctanh(rxy)-Za/2/Sqrt(n-3))<r< Tanh(arctanh(rxy)+Za/2/Sqrt(n-3))

What if populations are not Normal?
Resort to non-parametric approach The famous Spearman “Rank” Correlation coefficient, If there are no ties in the ranks then we can use the more commonly found approximation,

Hypothesis Test for Population Correlation Coefficient
H0: No association between X and Y Ha: There is association between X and Y Sampling distribution of rs is available in tables and is not too complicated However, when n >10 then we can use, as in the Normal case Spearman’s rank correlation coefficient is also used to test for heteroscedasticity

Applied Regression Analysis BUSI 6220

Similar presentations

Presentation on theme: "Applied Regression Analysis BUSI 6220"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Applied Regression Analysis BUSI 6220

Similar presentations

Presentation on theme: "Applied Regression Analysis BUSI 6220"— Presentation transcript:

Similar presentations

About project

Feedback