# Spatial autoregressive methods

## Presentation on theme: "Spatial autoregressive methods"— Presentation transcript:

Spatial autoregressive methods
Nr245 Austin Troy Based on Spatial Analysis by Fortin and Dale, Chapter 5

Autcorrelation types None: independence
Spatial independence, functional dependence True autocorrelation>> inherent autoregressive Functional autocorr>> induced autoregressive

Autocorrelation types
Double autoregressive Notice there are now two autocorrelation parameters r-x and r-z

Effects? Standard test statistics become “too liberal”—more significant results than the data justify Because observations are not totally independent have lower actual degrees of freedom, or lower “effective sample size”: n’ instead of n; since t stat denominator = s/n, if n is too big it inflates the t statistic Above: simulations yield 4x the type 1 errors for inherent AR than expected, induced AR model yields 2 x and double AR model is 8x

What to do? Non-effective
Why not just adjust up the significance level? E.g. 99% instead of 95%? Because don’t how by how much to adjust without further information. Could end up with a test that is way too conservative Why not just adjust sampling to only include “independent samples?” Because wasteful of data and because easy to mistake “critical distance to independence”

Best approach: Adjust effective sample size
In presence of SA, variance of mean of obs can be adjusted sing covariances of Xs Cov(Xi, Xj) becomes For large sample sizes So for instance n=1000 and ro=.4 means n’=429 Problem is that, to be useful, autoregressive model (ro parameter) has to be an effective descriptor of the structure of autocorrelation of the data, but it’s a simplification Next step therefore is factoring in correlation matrix, R, based on lag distances r(d)

Moving average models At 1st order we get a matrix like:
Half of info for Xi contained in Xi+1 Half contained in Xi-1 Hence only every other ob. Needed So produce ro=.5 for large n and n’=n/2. n’=n/2 A k order model can take form Translates into generalized matrix form With variance covariance matrix

Moving average When you increase the order, calculating sample size gets complicated; e.g. second order model, where two ro parameters now Important point: If there are several different levels of autocorrelation (rk), each rk must be incorporated even if non-significant Using only significant values can understate n’ Fortin and Dale recommend not using moving average approach because very sensitive to irregularities in the data and can produce a wide range of estimates

Two dimensional approaches
Problem with MA approach as it was just presented is assumes one-dimensionality In 2-d spatial data, xi depends on all neighbors most likely Now must define what is “neighbor” in 2d (e.g. w=1/8 for 9 cell grid of neighbors, all else = 0) Two best ways for dealing with this: Simultaneous autoregressive models (SAR) Conditional autoregressive models (CAR) CAR’s neighborhood matrices specify relationship between lagged response values at each location and neighboring location SAR’s specify relationship between lagged residuals Both use nxn spatial weights matrix (W) composed of wij Can be based on adjacency, number neighbors or distance Zeros on diagonals, weights on off diagonals In both SAR and CAR, SA tends to persist across long distances

CAR More commonly used in spatial statistics
Not based on spatial dependence per se; instead probability of a certain value is conditional on neighbor values Here Where j is the autocorrelation parameter and V is a symmetrical weight matrix Symmetrical requirement means that directional processes can’t be modeled.

SAR Based on concept of set of simultaneous equations to be solved. In this xi and xi-1 are each defined by their own equations containing other xs Where x is a vector and is linearly dependent on a vector of underlying variables z1, z2 z3…. Given as matrix Z, u is a vector non-independent error terms with mean zero and var-covar matrix C Spatial autocorrelation enters via u where Here e is independent error term and W is neighbor weights standardized to row totals of 1. W is not necessarily symmetrical, allowing for inclusion of anisotropy. Wij is >0 if values at location i is not independent of value at location j

SAR This yields the model With variance covariance matrix (from u)
Note how similar to MA—difference is no inverse in formula The elements of C are variances From Fortin and Dale p. 231

SAR Advantages: doesn’t require weight matrix to be symmetrical, so can model anisotropic phenomena. SAR can take three forms Lagged response model: autoregressive process only occurs in the response variable Lagged mixed model, where SA affects both response and predictors Spatial error model: assumes SA process occurs only in error term and not in response or predictor