# Part B: Spatial Autocorrelation and regression modelling

## Presentation on theme: "Part B: Spatial Autocorrelation and regression modelling"— Presentation transcript:

Part B: Spatial Autocorrelation and regression modelling
Chapter 5 Part B: Spatial Autocorrelation and regression modelling

Autocorrelation Time series correlation model
{xt,1} t=1,2,3…n‑1 and {xt,2} t=2,3,4…n 3rd edition

Spatial Autocorrelation
Correlation coefficient {xi} i=1,2,3…n, {yi} i=1,2,3…n Time series correlation model {xt,1} t=1,2,3…n‑1 and {xt,2} t=2,3,4…n Mean values: Lag 1 autocorrelation: large n 3rd edition

Spatial Autocorrelation
Classical statistical model assumptions Independence vs dependence in time and space Tobler’s first law: “All things are related, but nearby things are more related than distant things” Spatial dependence and autocorrelation Correlation and Correlograms 3rd edition

Spatial Autocorrelation
Covariance and autocovariance Lags – fixed or variable interval Correlograms and range Stationary and non-stationary patterns Outliers Extending concept to spatial domain Transects Neighbourhoods and distance-based models 3rd edition

Spatial Autocorrelation
Global spatial autocorrelation Dataset issues: regular grids; irregular lattice (zonal) datasets; point samples Simple binary coded regular grids – use of Joins counts Irregular grids and lattices – extension to x,y,z data representation Use of x,y,z model for point datasets Local spatial autocorrelation Disaggregating global models 3rd edition

Spatial Autocorrelation
Joins counts (50% 1’s) A. Completely separated pattern (+ve) B. Evenly spaced pattern (-ve) C. Random pattern 3rd edition

Spatial Autocorrelation
Joins count Binary coding Edge effects Double counting Free vs non-free sampling Expected values (free sampling) 1-1 = 15/60, 0-0 = 15/60, 0-1 or 1-0 = 30/60 3rd edition

Spatial Autocorrelation
Joins counts A. Completely separated (+ve) B. Evenly spaced (-ve) C. Random 3rd edition

Spatial Autocorrelation
Joins count – some issues Multiple z-scores Binary or k-class data Rook’s move vs other moves First order lag vs higher orders Equal vs unequal weights Regular grids vs other datasets Global vs local statistics Sensitivity to model components 3rd edition

Spatial Autocorrelation
Irregular lattice – (x,y,z) and adjacency tables Cell data Cell coordinates (row/col) x,y,z view +4.55 +5.54 +2.24 -5.15 +9.02 +3.10 -4.39 -2.09 +0.46 -3.06 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 4,1 4,2 4,3 x y z 1 2 4.55 3 5.54 2.24 ‑5.15 9.02 3.1 ‑4.39 ‑2.09 4 0.46 ‑3.06 3 7 1 4 8 2 5 9 6 10 Cell numbering Adjacency matrix, total 1’s=26 3rd edition

Spatial Autocorrelation
“Spatial” (auto)correlation coefficient Coordinate (x,y,z) data representation for cells Spatial weights matrix (binary or other), W={wij} From last slide: Σ wij=26 Coefficient formulation – desirable properties Reflects co-variation patterns Reflects adjacency patterns via weights matrix Normalised for absolute cell values Normalised for data variation Adjusts for number of included cells in totals 3rd edition

Spatial Autocorrelation
Moran’s I TSA model 3rd edition

Spatial Autocorrelation
Moran I =10*16.19/(26*196.68)=  0 A. Computation of variance/covariance-like quantities, matrix C B. C*W: Adjustment by multiplication of the weighting matrix, W 3rd edition

Spatial Autocorrelation
Moran’s I Modification for point data Replace weights matrix with distance bands, width h Pre-normalise z values by subtracting means Count number of other points in each band, N(h) 3rd edition

Spatial Autocorrelation
Moran I Correlogram Source data points Lag distance bands, h Correlogram 3rd edition

Spatial Autocorrelation
Geary C Co-variation model uses squared differences rather than products Similar approach is used in geostatistics 3rd edition

Spatial Autocorrelation
Extending SA concepts Distance formula weights vs bands Lattice models with more complex neighbourhoods and lag models (see GeoDa) Disaggregation of SA index computations (row-wise) with/without row standardisation (LISA) Significance testing Normal model Randomisation models Bonferroni/other corrections 3rd edition

Regression modelling Simple regression – a statistical perspective
One (or more) dependent (response) variables One or more independent (predictor) variables Linear regression is linear in coefficients: Vector/matrix form often used Over-determined equations & least squares 3rd edition

Regression modelling Ordinary Least Squares (OLS) model
Minimise sum of squared errors (or residuals) Solved for coefficients by matrix expression: 3rd edition

Regression modelling OLS – models and assumptions
Model – simplicity and parsimony Model – over-determination, multi-collinearity and variance inflation Typical assumptions Data are independent random samples from an underlying population Model is valid and meaningful (in form and statistical) Errors are iid Independent; No heteroskedasticity; common distribution Errors are distributed N(0,2) 3rd edition

Regression modelling Spatial modelling and OLS
Positive spatial autocorrelation is the norm, hence dependence between samples exists Datasets often non-Normal >> transformations may be required (Log, Box-Cox, Logistic) Samples are often clustered >> spatial declustering may be required Heteroskedasticity is common Spatial coordinates (x,y) may form part of the modelling process 3rd edition

Regression modelling OLS vs GLS OLS assumes no co-variation
Solution: GLS models co-variation: y~ N(,C) where C is a positive definite covariance matrix y=X+u where u is a vector of random variables (errors) with mean 0 and variance-covariance matrix C 3rd edition

Regression modelling GLS and spatial modelling Other models
y~ N(,C) where C is a positive definite covariance matrix (C must be invertible) C may be modelled by inverse distance weighting, contiguity (zone) based weighting, explicit covariance modelling… Other models Binary data – Logistic models Count data – Poisson models 3rd edition

Regression modelling Choosing between models
Information content perspective and AIC where n is the sample size, k is the number of parameters used in the model, and L is the likelihood function 3rd edition

Regression modelling Some ‘regression’ terminology Simple linear
Multiple Multivariate SAR CAR Logistic Poisson Ecological Hedonic Analysis of variance Analysis of covariance 3rd edition

Regression modelling Spatial regression – trend surfaces and residuals (a form of ESDA) General model: y - observations, f( , , ) - some function, (x1,x2) - plane coordinates, w - attribute vector Linear trend surface plot Residuals plot 2nd and 3rd order polynomial regression Goodness of fit measures – coefficient of determination 3rd edition

Regression modelling Regression & spatial autocorrelation (SA)
Analyse the data for SA If SA ‘significant’ then Proceed and ignore SA, or Permit the coefficient,  , to vary spatially (GWR), or Modify the regression model to incorporate the SA 3rd edition

Regression modelling Regression & spatial autocorrelation (SA)
Analyse the data for SA If SA ‘significant’ then Proceed and ignore SA, or Permit the coefficient,  , to vary spatially (GWR) or Modify the regression model to incorporate the SA 3rd edition

Regression modelling Geographically Weighted Regression (GWR)
Coefficients, , allowed to vary spatially, (t) Model: Coefficients determined by examining neighbourhoods of points, t, using distance decay functions (fixed or adaptive bandwidths) Weighting matrix, W(t), defined for each point Solution: GLS: 3rd edition

Regression modelling Geographically Weighted Regression
Sensitivity – model, decay function, bandwidth, point/centroid selection ESDA – mapping of surface, residuals, parameters and SEs Significance testing Increased apparent explanation of variance Effective number of parameters AICc computations 3rd edition

Regression modelling Geographically Weighted Regression
Count data – GWPR use of offsets Fitting by ILSR methods Presence/Absence data – GWLR True binary data Computed binary data - use of re-coding, e.g. thresholding 3rd edition

Regression modelling Regression & spatial autocorrelation (SA)
Analyse the data for SA If SA ‘significant’ then Proceed and ignore SA, or Permit the coefficient,  , to vary spatially (GWR) or Modify the regression model to incorporate the SA 3rd edition

Regression modelling Regression & spatial autocorrelation (SA)
Modify the regression model to incorporate the SA, i.e. produce a Spatial Autoregressive model (SAR) Many approaches – including: SAR – e.g. pure spatial lag model, mixed model, spatial error model etc. CAR – a range of models that assume the expected value of the dependent variable is conditional on the (distance weighted) values of neighbouring points Spatial filtering – e.g. OLS on spatially filtered data 3rd edition

Regression modelling SAR models Pure spatial lag: Re-arranging:
MRSA model: Spatial weights matrix Autoregression parameter Linear regression added 3rd edition

Regression modelling SAR models Spatial error model:
Substituting and re-arranging: Linear regression + spatial error iid error vector Spatial weighted error vector Linear regression (global) iid error vector SAR lag Local trend 3rd edition

Regression modelling CAR models Standard CAR model:
Local weights matrix – distance or contiguity Variance : Different models for W and M provide a range of CAR models Autoregression parameter Expected value at i weighted mean for neighbourhood of i 3rd edition

Regression modelling Spatial filtering
Apply a spatial filter to the data to remove SA effects Model the filtered data Example: Spatial filter 3rd edition

Similar presentations