Download presentation

Presentation is loading. Please wait.

Published byMadeleine Melling Modified over 2 years ago

1

2
GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

3
The statistical utility of a regression line

4
The regression model and its underlying assumptions i i i Y i = + X i + i Systematic or deterministic component represented by a straight line Random or stochastic component represented by the deviations of the observations about the line – alpha – beta delta

5
Illustrating regression using the fixed X model

6
Assumptions of the regression model 1.The relationship between X and Y is linear; 2.Values of X are fixed and measured without error; 3.The disturbance terms i are normally distributed with equal variance about the line Y = + X and each has an expected value E( i ) = 0. This means that the expected value for a given value of X is E(Y i,X i ) = + X i;

7
Assumptions of the regression model 4.The d i are statistically uncorrelated: a.There is no autocorrelation (d i term is uncorrelated with X) b.There is no spatial autocorrelation (d i are not correlated with another variable).

8
Testing the assumptions 1.Specific statistical tests using residuals of the sample regression line as estimates of the error term in the true population regression model 2.Histogram of residuals 3.Examination of residual plots

9
Examination of residual plots

10
Recap Y i = a + bX i + d i

11
Inferences in regression analysis Slope parameter ( ) Intercept parameter ( ) Precision of estimates derived from the sample regression equation

12
Testing if =0 If = 0 then Y= constant for all X

13
Testing =0 using the t test Hypotheses H o : = 0 H 1 : 0 Under H o, repeated sampling yields a distribution of b which follows a t distribution about an expected value of = 0.

14
Testing =0 using the t test Since we are testing = 0, then where df = n-2 is the number of degrees of freedom s b = estimated standard error of the sampling distribution of b The test statistic to be calculated is

15
Regression in EXCEL Regression Statistics Multiple R0.78 R square0.61 Adj R square0.59 Standard error242.79 Observations20 ANOVA dfSSMSFSig F Regression11691558 28.74.31E-5 Residual18106106258948 Total192752620 CoeffsStandard Errort statP-valueLower 95%Upper 95% Intercept895149.765.981.178E-55811210 Elevation, m2.380.4445.364.31E-51.443.31

16
Testing =0 using the t test Since we are testing = 0, then The test statistic to be calculated is So…… With df=n-2=18, t crit = 2.1 at = 0.05.

17
Regression in EXCEL Regression Statistics Multiple R0.78 R square0.61 Adj R square0.59 Standard error242.79 Observations20 ANOVA dfSSMSFSig F Regression11691558 28.74.31E-5 Residual18106106258948 Total192752620 CoeffsStandard Errort statP-valueLower 95%Upper 95% Intercept895149.765.981.178E-55811210 Elevation, m2.380.4445.364.31E-51.443.31

18
Testing =Q using the t test (Q 0) The test statistic to be calculated is If we are testing = 4, then

19
Testing =0 using the F test Decomposition of the variance using sums of squares Total sum = Regression sum + Residual sum of squares of squares of squares

20
Testing =0 using the F test Decomposition of the variance using sums of squares Total sum of squares (TSS) is the sum of the squared deviations of the individual observations about their mean Regression sum of squares (RSS) is the sum of the squared deviations of the predicted Y values (Y) about the mean Y The difference between TSS and RSS is unexplained by the regression line and is therefore the residual sum of squares (Residual SS)

21
Testing =0 using the F test Decomposition of the variance using sums of squares TSS=RSS+Resid SS

22
Regression in EXCEL Regression Statistics Multiple R0.78 R square0.61 Adj R square0.59 Standard error242.79 Observations20 ANOVA dfSSMSFSig F Regression11691558 28.74.31E-5 Residual18106106258948 Total192752620 CoeffsStandard Errort statP-valueLower 95%Upper 95% Intercept895149.765.981.178E-55811210 Elevation, m2.380.4445.364.31E-51.443.31

23
Regression in Excel ANOVA dfSSMSFSig F Regression1RSSRSS/dfRMSS/Resid MSS Residualn-2Resid SSResid SS/df Totaln-1TSSTSS/df RMSS = Regression MSS Resid MSS – Residual MSS

24
Regression in Excel ANOVA dfSSMSFSig F Regression11691558 28.74.31E-5 Residual18106106258948 Total192752620 With 1 and 18 df, F crit 2.7 at = 0.05.

25
Constructing a confidence interval for In general For the rainfall data

26
Regression in EXCEL Regression Statistics Multiple R0.78 R square0.61 Adj R square0.59 Standard error242.79 Observations20 ANOVA dfSSMSFSig F Regression11691558 28.74.31E-5 Residual18106106258948 Total192752620 CoeffsStandard Errort statP-valueLower 95%Upper 95% Intercept895149.765.981.178E-55811210 Elevation, m2.380.4445.364.31E-51.443.31

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google