Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMPUTATIONAL ASPECTS OF LOCAL REGRESSION MODELLING: taking spatial analysis to another level Stewart Fotheringham Martin Charlton Chris Brunsdon Spatial.

Similar presentations


Presentation on theme: "COMPUTATIONAL ASPECTS OF LOCAL REGRESSION MODELLING: taking spatial analysis to another level Stewart Fotheringham Martin Charlton Chris Brunsdon Spatial."— Presentation transcript:

1 COMPUTATIONAL ASPECTS OF LOCAL REGRESSION MODELLING: taking spatial analysis to another level Stewart Fotheringham Martin Charlton Chris Brunsdon Spatial Analysis Research Group Department of Geography, University of Newcastle upon Tyne, Newcastle upon Tyne, ENGLAND NE1 7RU

2 GEOGRAPHICALLY WEIGHTED REGRESSION The mechanics of GWR New software for GWR: GWR 2.0 GWR in practice: an example of the determinants of London house prices Won’t discuss the math of GWR in much detail

3 Some Definitions Spatial nonstationaritySpatial nonstationarity exists when the same stimulus provokes a different response in different parts of the study region Global modelsGlobal models are statements about processes which are assumed to be stationary and as such are location independent Local modelsLocal models are spatial disaggregations of global models, the results of which are location-specific

4 Local versus Global LocalglobaldataLocal versus global data: the example of US climate data LocalglobalrelationshipsLocal versus global relationships: the example of house price determinants LocalglobalmodelsLocal versus global models: the example of regression

5 Why might relationships vary spatially? Sampling variation Relationships intrinsically different across space e.g. differences in attitudes, preferences or different administrative, political or other contextual effects produce different responses to the same stimuli Model misspecification - suppose a global statement can ultimately be made but models not properly specified to allow us to make it. Local models good indicator of how model is misspecified. Can all contextual effects ever be removed? Can all significant variations in local relationships be removed?

6 Regression In a typical linear regression model applied to spatial data we assume a stationary process: y i =  0 +  1 x 1i +  2 x 2i +…  n x ni +  i

7 so that... The parameter estimates obtained in the calibration of such a model are constant over space:  ’ = (X T X) -1 X T Y which means that any spatial variations in the processes being examined can only be measured by the error term

8 Consequently... We might map the residuals from the regression to determine whether there are any spatial patterns. Or compute an autocorrelation statistic We might even try to ‘model’ the error dependency with various types of spatial regression models.

9 However... Why not address the issue of spatial nonstationarity directly and allow the relationships we are measuring to vary over space? This is the essence of GWR y( g ) =  0 ( g ) +  1 ( g ) x 1 +  2 (g) x 2 +…  n (g) x n +  ( g ) where (g) refers to a location at which estimates of the parameters are obtained

10 … with the estimator  ’ = (X T W (g) X) -1 X T W (g) Y where W (g) is a matrix of weights specific to location g such that observations nearer to g are given greater weight than observations further away.

11 w g1 0.……..…..0 0 w g2 …..……..0 W(g)=0 0 w g3 ……..0.. 0 0 0 ………w gn where w gn is the weight given to data point n for the estimate of the local parameters at location g

12 Weighting schemes fixedadaptive Numerous weighting schemes can be used. They can be either fixed or adaptive. Two examples of a fixed weighting scheme are the Gaussian function: w ij = exp[-(d ij 2 / h 2 )/2] where h is known as the bandwidth and controls the degree of distance-decay and the bisquare function: w ij = [1-(d ij 2 / h 2 )] 2 if d ij < h = 0 otherwise

13

14

15 Perhaps better... Is to use a spatially adaptive weighting function such as: w ij = exp(-R ij / h) where R is the ranked distance or w ij = [1-(d ij 2 / h 2 )] 2 if j is one of the Nth nearest neighbours of i = 0 otherwise In the latter, we estimate an optimal value of N in the GWR routine

16

17 Calibration The results of GWR appear to be relatively insensitive to the choice of weighting function as long as it is a continuous distance-based function Whichever weighting function is used, the results will, however, be sensitive to the degree of distance-decay. Therefore an optimal value of either h or N has to be obtained. This can be found by minimising a crossvalidation score or the Akaike Information Criterion

18 where... CV =  i [y i - y  i ’ (h) ] 2 where y  i ’ (h) is the fitted value of y i with data from point i omitted from the calibration AIC = 2n ln(  ’) + n ln(2  ) SS + n[n+ Tr(S)] / [n - 2 - Tr(S)] S where n is the number of data points,  ’ is the estimated standard deviation of the error term, and Tr(S) is the trace of the hat matrix. y’ = Sy

19 GWR Jargon Sample pointsSample points –locations at which your data are measured Regression pointsRegression points –locations at which you require parameter estimates These need not be the same locations This can be handy if you want to map the results from very large data sets

20 In GWR, we can also... estimate local standard errors derive local t statistics calculate local goodness-of-fit measures calculate local leverage measures perform tests to assess the significance of the spatial variation in the local parameter estimates perform tests to determine if the local model performs better than the global one, accounting for differences in degrees of freedom

21 Software for GWR (GWR 2.0) User-friendly and menu-driven Provide a range of options Provide some helpful diagnostics Provide results for other software

22 The architecture of GWR 2.0 Currently about 3000 lines of FORTRAN VB front-end to create a control file and run the program Can run FORTRAN code under Unix with a control file or, more easily, by using the VB model editor in Windows (95; 98; ME)

23 Running the program There are several routes through the program depending on what you want to do

24 Creating a new model Data Dependent variable Independent variables Sample point locational variables Regression point locational variables Weighting scheme/fitting Output/listing

25 Data first... Usual Windows form Assumes filetype of.dat

26 Data structure First line is a comma separated list of variable names (<= 8 characters) Data lines have numeric items only terminated by a carriage return One line of data per location Space or comma delimited (easily imported)

27 User’s decision Are the sample points and regression points identical?

28 Output File This will appear in the results folder What type do you want?

29 Types of output Currently the program will output parameter estimates and other information in these formats: –ArcINFO ungenerated export.e00 –MapInfo Interface File.mif –Comma Separated Variables.csv

30 The Model Editor The variable names have already been extracted from the file

31 The Model Editor To specify a dependent variable, highlight it in the list on the left and click on the [->] symbol

32 The Model Editor To specify independent variables, highlight them in the list on the left and click on the [- >] symbol

33 The Model Editor To specify location variables, highlight them in the list on the left and click on the relevant [->] symbols

34 The Model Editor To specify a weight variable, highlight it in the list on the left and click on the [->] symbol

35 The Model Editor Next you specify the type of kernel: this can be fixed (Gaussian) or variable (bisquare)

36 The Model Editor You can either preset the bandwidth in the units that the location variables are measured in (for example, metres)

37 Or if you want the program to determine the optimal bandwidth, specify crossvalidation. For large files, there is a sampling option to speed the process.

38 The Model Editor The parameter estimates can be output in different formats..csv is text format. ArcInfo and MapINFO will load directly.

39 The Model Editor The type of output in the printed listing can also be controlled.

40 The Model Editor Once the model specification is completed, save it before you run it. The file appears in the model folder.

41 Running the Model The saved model is identified. You must also specify a listing file for the printed output Then hit Run

42 Dependent (y) variable..........PctBatch Easting (x-coord) variable.....Longitud Northing (y-coord) variable.....Latitude No weight variable specified Dependent variables in this model... PctPov Kernel type: Variable width Determine kernel size using crossvalidation... ** Crossvalidation report requested ** Prediction report requested ** Parameter estimates to be written to.e00 file ** Monte Carlo significance tests for spatial variation ** Casewise diagnostics to be printed ** Regression models will include intercept Number of data cases read: 159 Observation points read... Dependent mean= 10.9471693 Number of observations, nobs= 159 Number of predictors, nvar= 1 Obseration Easting extent: 4.41947222 Observation Northing extent: 4.20193577 The output listing Some useful statistics

43 CrossValidation begins... CV will be based on 159 cases Initial estimates for bandwidth 2.5 159. Bandwidth AIC 984.664747 985.627297 … … … … … … … … … … ** Convergence: NNN= 23 Calibration report In this case we are using an adaptive kernel. Each kernel will contain the locations of the 23 nearest sample points to each regression point.

44 ********************************************************** * GLOBAL REGRESSION PARAMETERS * ********************************************************** Diagnostic information... Residual sum of squares........ 4300.93244 Effective degrees of freedom... 2. Sigma.......................... 5.23397307 Akaike Information Criterion... 981.708877 Parameter Estimate Std Err T --------- ------------ ------------ ------------ Intercept 17.048159406898 1.185353792053 14.382338523865 PctPov -0.315445286712 0.057407022204 -5.494890213013 The parameter estimates for the “global” model are reported first. Both parameters are significant. Global Results

45 ********************************************************** * GWR ESTIMATION * ********************************************************** Fitting Geographically Weighted Regression Model... Number of observations............ 159 Number of independent variables... 2 (Intercept is variable 1) Number of nearest neighbours...... 23 Number of locations to fit model.. 159 Diagnostic information... Residual sum of squares........ 2400.32563 Effective degrees of freedom... 33.2254127 Sigma.......................... 4.36856328 Akaike Information Criterion... 970.752344 The equivalent diagnostic information for the GWR model is printed. The AIC value is smaller than that of the global model, suggesting that the GWR model is “better”. GWR Results

46 Predictions from this model... Obs Y(i) Yhat(i) Res(i) X(i) Y(i) 1 8.200 8.051 0.149 -82.286 31.753 F 2 6.400 7.592 -1.192 -82.875 31.295 F 3 6.600 8.117 -1.517 -82.451 31.557 F 4 9.400 10.276 -0.876 -84.454 31.331 F 5 13.300 10.856 2.444 -83.251 33.072 F 6 6.400 13.254 -6.854 -83.501 34.353 F 7 9.200 16.463 -7.263 -83.712 33.993 F If you have requested an output file, this information and the diagnostics are also written to this file.

47 ********************************************************* * PARAMETER 5-NUMBER SUMMARIES * ********************************************************** Label Minimum Lwr Quartile Median Upr Quartile Maximum Intrcept 0.032515 12.992820 16.978909 20.477042 28.491125 PctPov -1.148710 -0.573805 -0.287116 -0.124888 0.806435 As there are as many sets of parameter estimates as there are regression points a “5 number summary” is provided to give the main “highlights” of each distribution. As well as the maximum, minimum and median, the program prints the upper and lower quartiles.

48 ************************************************* * * Test for spatial variability of parameters * * ************************************************* Tests based on the Monte Carlo significance test procedure due to Hope [1968,JRSB,30(3),582-598] Parameter p-value ---------- ------------------ Intercept 0.75000 PctPov 0.98000 It is useful to know whether there is sufficient evidence to suggest that the variation in parameter estimates is not random. Two tests have been programmed: a Monte Carlo significance test and a test after Leung.

49 To import the results into ArcVIEW, use the Import71 utility. This creates an ArcINFO point coverage. Parameter estimates

50 Spatial variation in the intercept term Spatial variation in the intercept term

51 Spatial variation in the poverty parameter

52 What’s planned for GWR 3.0 ? Regression through the origin Further kernel types Poisson regression Further export options Other suggestions from users of GWR 2.0 The GWR website is: http://www.ncl.ac.uk/geography/GWR Details of program availability given


Download ppt "COMPUTATIONAL ASPECTS OF LOCAL REGRESSION MODELLING: taking spatial analysis to another level Stewart Fotheringham Martin Charlton Chris Brunsdon Spatial."

Similar presentations


Ads by Google