Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture #3: Modeling spatial autocorrelation in normal, binomial/logistic, and Poisson variables: autoregressive and spatial filter specifications Spatial.

Similar presentations

Presentation on theme: "Lecture #3: Modeling spatial autocorrelation in normal, binomial/logistic, and Poisson variables: autoregressive and spatial filter specifications Spatial."— Presentation transcript:


2 Lecture #3: Modeling spatial autocorrelation in normal, binomial/logistic, and Poisson variables: autoregressive and spatial filter specifications Spatial statistics in practice Center for Tropical Ecology and Biodiversity, Tunghai University & Fushan Botanical Garden

3 Topics for todays lecture Autoregressive specifications and normal curve theory (PROC NLIN). Auto-binomial and auto-Poisson models: the need for MCMC. Relationships between spatial autoregressive and geostatistical models Spatial filtering specifications and linear and generalized linear models (PROC GENMOD). Autoregressive specifications and linear mixed models (PROC MIXED). Implications for space-time datasets (PROC NLMIXED)

4 What is an auto- model? Y is on both sides of the = sign

5 The auto-normal (auto-Gaussian) model

6 Popular autoregressive equations for the normal probability model A normality assumption usually is added to the error term. M is diagonal, and often is I 2 nd -order models 1 st -order model

7 spatial autoregression The workhorse of classical statistics is linear regression; the workhorse of spatial statistics is nonlinear regression. The simultaneous autoregressive (SAR) model where denotes the spatial autocorrelation parameter

8 Georeference data preparation Concern #1: the normalizing factor –Rule: probabilities must integrate/sum to 1 –Both a spatially autocorrelated and unautocorrelated mathematical space must satisfy this rule Jacobian term for Gaussian RVs – a function of the eigenvalues of matrix W (or C) symmetric set of eigenvalues non- symmetric set of eigenvalues

9 Calculation of the Jacobian term Step 1 extract the eigenvalues from n-by-n matrix W (or C) - eigenvalues are the n solutions to the equation det(W – I) = 0 - eigenvectors are the n solutions to the equation (W - I)E = 0. Step 2 (from matrix determinant) compute ; J 2 is

10 Minimizing SSE 0.9795 1.0542 MIN OLS: 1.1486 MIN with Jacobian, which is a weight: 1.8959 Relative plots (in z scores) worst case scenario

11 Gaussian approximations allow an evaluation of redundant information Houston (n=690)Syracuse (n=208) % redundant information n* % redundant information n* population density 61 667215 % male 321561878 black/white ratio 63 625727 % widowed 52 911684 % with university degree 70 494337 % Chinese 51 933448 effective sample size

12 The auto- binomial/logistic model NOTE: a data transformation does not exist that enables binary 0-1 responses to conform closely to a bell-shaped curve

13 Primary sources of overdispersion: binomial extra variation [Var(Y) = np(1-p), and >1] misspecification of the mean function nonlinear relationships & covariate interactions presence of outliers heterogeneity or intra-unit correlation in group data inter-unit spatial autocorrelation choosing an inappropriate probability model to represent the variation in data excessive counts (especially 0s)

14 The auto-binomial/logistic model By definition, a percentage/binary response variable is on the left-hand side of the equation, and some spatial lagged version of this response variable also is on the right-hand side of the equation. Unlike the auto-Gaussian model, whose normalizing constant (i.e., its Jacobian term) is numerically tractable, here the normalizing constant is intractable. A specific relationship tends to hold between the logistic models intercept and autoregressive parameters.

15 Pseudo-likelihood estimation Maximum pseudo-likelihood treats areal unit values as though they are conditionally independent, and is equivalent to maximum likelihood estimation when they are independent. Each areal unit value is regressed on a function of its surrounding areal unit values. Statistical efficiency is lost when dependent values are assumed to be independent.

16 Quasi-likelihood estimation Maximum quasi-likelihood treats the variation of Y values as though it is inflated, and estimates of the variance term np(1-p) for the purpose of rescaling when testing hypotheses. This approach is equivalent to maximum likelihood estimation when = 1, and m ost log-likelihood function asymptotic theory transfers to the results.

17 Preliminary estimation (pseudo- and quasi-likelihood) results: F/P (%) modelinterceptSAse SA dispersionDeviance binomial-1.100***** 945.96 auto-binomial-1.110.890.0010384.51 quasi-auto-binomial-1.100.890.01519.740.99 auto-logistic-2.030.800.032*****0.93

18 What is the alternative to pseudo-likelihood? MCMC maximum likelihood estimation! exploits the sufficient statistics based upon Markov chain transition matrices converging to an equilibrium exploits marginal probabilities, and hence can begin with pseudo-likelihood results based upon simulation theory

19 Properties of estimators: a review Unbiasedness Efficiency Consistency Robustness BLUE BLUP SufficiencySufficiency

20 MCMC maximum likelihood estimation MCMC denotes Markov chain Monte Carlo Pseudo-likelihood works with the conditional marginal models MCMC is needed to compute the simultaneous likelihood result MCMC exploits the conditional models

21 The theory of Markov chains was developed by Andrei Markov at the beginning of the 20th century. A Markov chain is a process consisting of a finite number of states and known probabilities, p ij, of moving from state i to state j. Ergodicity Thm Markov chain theory is based on the Ergodicity Thm: irreducible, recurrent non-null, and aperiodic. If a Markov chain is ergodic, then a unique steady state distribution exists, independent of the initial state: for transition matrix M, ; P(X t+1 = j| X 0 =i 0, …, X t =i t ) = P(X t+1 = j| X t =i t ) = t p ij

22 Example transition matrix convergence: ABC DEF 0.200.15

23 Monte Carlo simulation is named after the city in the Monaco principality, because of a roulette, a simple random number generator. The name and the systematic development of Monte Carlo methods date from about 1944. The Monte Carlo method provides approximate solutions to a variety of mathematical problems by performing statistical sampling experiments with a computer using pseudo-random numbers.

24 MCMC provides a mechanism for taking dependent samples in situations where regular sampling is difficult, if not completely impossible. The standard situation is where the normalizing constant for a joint or a posterior probability distribution is either too difficult to calculate or analytically intractable. MCMC has been around for about 50 years.

25 MCMC What is MCMC? A definition MCMC is used to simulate from some distribution p known only up to a constant factor, C: p i = Cq i where q i is known but C is unknown and too horrible to calculate. MCMC begins with conditional (marginal) distributions, and MCMC sampling outputs a sample of parameters drawn from their joint (posterior) distribution.

26 Starting with any Markov chain having transition matrix M over the set of states i on which p is defined, and given X t = i, the idea is to simulate a random variable X* with distribution q i : q ij = P(X* = j| X t = i). The distribution q i is called the proposal distribution. After a burn-in set of simulations, a chain converges to an equilibrium p o = 0.5 p=0.2

27 a stochastic process that returns a different result with each execution; a method for generating a joint empirical distribution of several variables from a set of modelled conditional distributions for each variable when the structure of data is too complex to implement mathematical formulae or directly simulate. a recipe for producing a Markov chain that yields simulated data that have the correct unconditional model properties, given the conditional distributions of those variables under study. its principal idea is to convert a multivariate problem into a sequence of univariate problems, which then are iteratively solved to produce a Markov chain. MCMC Gibbs sampling is a MCMC scheme for simulation from p where a transition kernel is formed by the full conditional distributions of p.

28 (1) t = 0; set initial values 0 x = ( 0 x 1, …, 0 x n ) (2) obtain new values t x = ( t x 1, …, t x n ) from t-1 x: t x 1 ~ p (x 1 |{ t-1 x 2, …, t-1 x n ) t x 2 ~ p (x 2 |{ t x 1, t-1 x 3, …, t-1 x n ) … t x n ~ p (x 1 |{ t x 1, …, t x n-1 ) (3) t = t+1; repeat step (2) until convergence. A Gibbs sampling algorithm

29 Monitoring convergence MCMC exploits the sufficient statistics, which should be monitored with a time- series plot for randomness. After removing burn-in iteration results, a chain should be weeded (i.e., only every k th output is retained). These weeded values should be independent; this property can be checked by constructing a correlogram. Convergence of m chains can be assessed using ANOVA: within-chain variance pooling is legitimate when chains have converged.

30 Sufficient statistics for normal, binomial, and Poisson models A sufficient statistic (established with the Rao-Blackwell factorization theorem) is a statistic that captures all of the information contained in a sample that is relevant to the estimation of a population parameter.

31 Implementation of MCMC for the autologistic model Y1Y1 Y2Y2 …Y 20 Y 21 Y 22 …Y 40................... Y 381 Y 382 …Y 400 drawings from the binomial distribution is the Monte Carlo part MCMC-MLEs are extracted from the generated chains

32 MCMC results 25,000 + 225,000/100 burn-in + weeded alpharho dfFprobF iter- ation 441.00.521.00.47 chain 20.10.910.10.92 inter- action 881.00.561.00.54 error6615

33 Some prediction comparisons

34 The (modified) auto-Poisson model NOTE: the auto-Poisson model can only capture negative spatial autocorrelation NOTE: excessive zeroes is a serious problem with empirical Poisson RVs

35 MCMC Spatial autoregression: the auto-Poisson model The workhorse of spatial statistical generalized linear models is MCMC For counts, y, in the set of integers {0, 1, 2, 3, … }

36 c -1 is an intractable normalizing factor MCMC is initiated with pseudo- likelihood estimates positive spatial autocorrelation can be handled with Winsorizing, or binomial approximation

37 When VAR(Y) > overdispersion (extra Poisson variation) is encountered Detected when deviance/df > 1 Often described as VAR(Y) = Leads to the Negative Binomial model Conceptualized as the number of times some phenomenon occurs before a fixed number of times (r) that it does not occur.

38 Preliminary estimation (pseudo- and quasi-likelihood) results: B/D modelSAse SA dispersionDeviance Poisson0***** 1230.20 auto-Poisson0.02<0.0010822.23 quasi-auto-Poisson0.020.00629.25530.96 auto-negative binomial0.020.0070.06261.01

39 MCMC results typical correlogram 25,000 + 500,000/100 burn-in + weeded

40 Some prediction comparisons

41 Geographic covariation: n-by-n matrix V autoregression works with the inverse covariance matrix & geostatistics works with the covariance matrix itself

42 Relationships between the range parameter and rho for an ideal infinite surface modified Bessel function for CAR Bessel function for SAR

43 Constructing eigenfunctions for filtering spatial autocorrelation out of georeferenced variables: MC = (n/1 T C1) x Y T (I – 11 T /n)C (I – 11 T /n)Y/ Y T (I – 11 T /n)Y the eigenfunctions come from (I – 11 T /n)C (I – 11 T /n)

44 C versus (I – 11 T /n)C(I – 11 T /n) = MCM CMCM 2.062.07*0.00-1.10-1.09-1.98 5.51*1.92 -0.10 -1.21 -2.02 4.914.991.571.59-0.15 -1.33 -2.12 4.35 1.321.35-0.29-0.28-1.38 -2.15 -1.44 -2.23 3.96 1.051.06-0.49-0.46-1.54-1.52-2.24 3.843.880.930.94-0.53 -1.56-1.55-2.33-2.32 3.423.430.80 -0.59 -1.60-1.59-2.40-2.39 3.35 0.780.79-0.63 -1.64 -2.41 2.902.910.580.61-0.80 -1.74 -2.43-2.42 2.652.720.38 -0.89-0.88-1.84 -2.54 2.532.590.27 -0.92 -1.87 -2.62 2.352.400.170.19-0.95 -1.90 -2.67 -1.07-1.06-1.96 -2.70

45 Eigenvectors of MCM (I – 11 T /n) = M ensures that the eigenvector means are 0 symmetry ensures that the eigenvectors are orthogonal M ensures that the eigenvectors are uncorrelated replacing the 1 st eigenvalue with 0 inserts the intercept vector 1 into the set of eigenvectors thus, the eigenvectors represent all possible distinct (i.e., orthogonal and uncorrelated) spatial autocorrelation map patterns for a given surface partitioning Legendre and his colleagues are developing analogous eigenfunction spatial filters based upon the truncated distance matrix used in geostatistics

46 Expectations for the Moran Coefficient for linear regression with normal residuals

47 A spatial filtering counterpart to the auto-normal model specification. Y = E k ß + ε b = E k T Y Only a single regression is needed to implement the stepwise procedure. MAX: R 2 ; eigenvectors selected in order of their bivariate correlations residual spatial autocorrelation =

48 Selected demographic attributes of China attribute # common to MAX- R 2, MIN- MC # not truly redundant info (~MAX-R 2, MIN-MC) # spatially structured (MAX-R 2, ~MIN-MC) population density (|z res | = 7.5 6.3) 14915171 crude fertility rate (|z res | = 4.4 2.7) 229105 0 % 100+ years old (|z res | = 0.4 0.0) 145 820 births/deaths ratio (|z res | = 2.7 0.6) 233119 0

49 Overdispersion: binomial extra variation E(Y) = np and Var(Y) = np(1-p), and >1 tends to have little impact on regression parameter point estimates (maximum likelihood estimator typically is consistent, although small sample bias might occur); but, regression parameter standard error estimates (variances/covariances) are underestimated may be reflected in the size of the deviance statistic difficult to detect in binary 0-1 data

50 Spatial structure and generalized linear modeling: Poisson regression CBR: the spatial filter is constructed with 199 of 561 candidate eigenvectors. PoissonNegative binomial SF negative binomial deviance1377.311.021.10 mean0.12410.13510.1308 dispersion00.09330.0302 Pseudo-R 2 (observed vs predicted births) 0.762 0.903 SF results in green SF

51 Spatial structure and generalized linear modeling: binomial regression % population 100+ years old: the spatial filter is constructed with 92 of 561 candidate eigenvectors. binomialSF binomial deviance 4.761.00 Intercept -12.0706 (0.0124) -12.5000 (0.0276) scale 11.47 Pseudo-R 2 (observed vs predicted births) 00.283 SF

52 Advantages of spatial filtering Do not need MCMC for GLM parameter estimation – conventional statistical theory applies Uncover distinct map pattern components of spatial autocorrelation that relate directly to the MC The eigenvectors are orthogonal and uncorrelated Can always calculate the necessary eigenvectors as long as the number of areal units does not exceed n 10,000

53 Interpretation of MIN-MC selections Matrix E k contains three disjoint eigenvector subsets: E r, for those representing redundant locational information; E s, for those representing spatially structured random effects; and, E misc, for those being unrelated to Y. Accordingly, the pure spatial autocorrelation model becomes Y = µ1 + E r ß r + (E s ß s + e), where ß r and ß s respectively are regression coefficients defining relationships between Y and the sets of eigenvectors E r and E s, and the term (E s ß s + e) behaves like a spatially structured random effect.

54 Random effects model is a random observation effect (differences among individual observational units) is a time-varying residual error (links to change over time) The composite error term is the sum of the two.

55 Random effects model: normally distributed intercept term ~ N(0, ) and uncorrelated with covariates supports inference beyond the nonrandom sample analyzed simplest is where intercept is allowed to vary across areal units (repeated observations are individual time series) The random effect variable is integrated out (with numerical methods) of the likelihood fcn accounts for missing variables & within unit correlation (commonality across time periods)

56 Random effects: mixed models Moving closer to a Bayesian perspective, spatial autocorrelation can be accounted for by introducing a (spatially structured) random effect into a model specification. SAS PROC MIXED supports this approach for linear modeling in which a map is treated as a multivariate sample of size 1. SAS PROC NLMIXED supports this approach for generalized linear modeling.

57 SAS PROC MIXED and random effects: Y=XB + Zu The spatially correlated errors model is performed with PROC MIXED through the REPEATED statement. The SUBJECT=INTERCEPT option specifies that the correlation between units is essentially between experimental units that are different observations within the data set. The LOCAL option in the REPEATED statement tells PROC MIXED to include a nugget effect. EXAMPLE: density of workers across Germanys 439 Kreises LN(density – 23.53) ~ N

58 A spatial covariance structure coupled with a random slope coefficient model 192,721 distance pairs d max = 9.32478

59 PROC MIXED output: intercept intercept estimate corre- lation -2log(L)nugget(partial) sill range 5.28 (0.06) none1445.101.57820 5.01 (0.12) spherical1348.40.91390.55421.3801 5.01 (0.18) exponential1349.80.91540.58730.7824 5.01 (0.13) Gaussian1344.70.98580.51940.7260 5.01 (0.18) power1349.80.91540.58730.2786

60 Random intercept term measureNo covariates Spatial filter Spherical semivariogram -2log(L)1445.11179.31348.4 Intercept variance 0.96310.25380.5542 Residual variance 0.61160.60110.9139 Intercept estimate 5.2827 (0.0599) 5.2827 (0.0443) 5.0142 (0.1210) The spatial filter contains 27 (of 98) eigenvectors, with R 2 = 0.4542, P(S-W residuals ) < 0.0001.

61 Generalized linear mixed models One drawback of spatial filtering is that as the number of areal units increases, the number of eigenvectors needed to construct a spatial filter tends to increase, resulting in asymptotics being difficult or impossible to achieve. This situation can be remedied by resorting to a space-time data set, with time being repeated measures whose correlation can be captured by a random effects intercept term.

62 Unemployment in Germany: 1996-2002 yearyear-specific eigenvectorscommon eigenvectors globalregionallocalglobalregionallocal 1996 E9, E16, E21, E25, E41, E52, E53, E64 E89 E2 - E5 E6 - E8, E11, E18, E24, E28, E30, E39, E60 E74 1997 E1E15, E19, E21, E34, E38, E64E93 1998 E13, E15, E16, E19, E21, E34, E38, E42, E52, E66 E68, E93 1999 E9, E13, E15, E16, E19, E21, E34, E38, E42, E52, E66 E93 2000 E9, E13, E15, E16, E19, E21, E25, E34, E38, E42, E51, E52, E66 E93, E97 2001 E9, E12, E13, E15, E16, E19, E34, E42, E52, E56, E65, E66 E68, E93, E97 2002 E1E9, E12, E13, E15, E16, E19, E20, E25, E38, E42, E52, E65, E66

63 Unemployment in Germany: annual spatial filters year# of eigenvecvtors scaleadjusted pseudo-R 2 19962421.980.59291.0232 19972324.380.64251.0412 19982723.520.68461.0438 19992723.250.70681.0364 20003023.830.74831.0507 20013025.180.76831.0489 20022926.080.75491.0459

64 The composite spatial filter constructed with common vectors yearSF residuals MCGR 19960.67 0.210.62 19970.73 0.200.66 19980.76 0.200.64 19990.79 0.210.61 20000.83 0.250.59 20010.85 0.270.57 20020.85 0.270.56 SF1.140.15 Dark red: very high Light red: high Gray: medium Light green: low Dark green: very low former east-west divide

65 Generated space-time predictions the lack of serial correlation information in 1996 is conspicuous the best fit is in the center of the space-time series

66 % urban in Puerto Rico: SF-logistic with a spatial structured random effect


Download ppt "Lecture #3: Modeling spatial autocorrelation in normal, binomial/logistic, and Poisson variables: autoregressive and spatial filter specifications Spatial."

Similar presentations

Ads by Google