Presentation on theme: "Spatial statistics in practice"— Presentation transcript:
1Spatial statistics in practice Lecture #5: MAPS WITH GAPS-- Small geographic area estimation, kriging, and kernel smoothingSpatial statistics in practiceCenter for Tropical Ecology and Biodiversity, Tunghai University & Fushan Botanical Garden
2Topics for today’s lecture The E-M algorithmThe spatial E-M algorithmKriging in ArcGISgeographically weighted regression (GWR)approaches to map smoothing
3THEOREM 1When missing values occur only in a response variable, Y, then the iterative solution to the EM algorithm produces the regression coefficients calculated with only the complete data.PF: Let b denote the vector of regression coefficients that is converged upon. Then if ,
4THEOREM 2 When missing values occur only in a response variable, Y, then by replacing the missing values with zeroes and intro-ducing a binary 0/-1 indicator variable covariate -Im for each missing value m, such that Im is 0 for all but missing value observation m and 1 for missing value observation m, the estimated regression coefficient bm is equivalent to the point estimate for a new obser-vation, and hence furnishes EM algorithm imputations.PF:Let bm denote the vector of regression coefficients for the missing values, and partition the data matrices such that
5The EM algorithm solution where:the missing values are replaced by 0 in Y, andIm is an indicator variable for missing value mthat contains n-m 0s and a single 1
6THEOREM 3 For imputations computed based upon Theorem 2, each standard error of the estimated regression coefficients bm is equivalent to the conventional standard deviation used to construct a prediction interval for a new observation, and as such furnishes the corresponding EM algorithm imputation standard error.PF:
7What is the set of equations for the following case? 107y4 = ?
11EM algorithm solution for aggregated georeferenced data: vandalized turnips plots
12MTB > regress c4 8 c7-c14 Regression Analysis: C4 versus C7, C8, C9, C10, C11, C12, C13, C14 The regression equation is C4 = C C C C C C C C14 Predictor Coef SE Coef T P Constant C7 [I1-I6] C8 [I2-I6] C9 [I3-I6] C10 [I4-I6] C11 [I5-I6] C12 [plot(6,5)] C13 [plot(5,6)] C14 [plot(6,6)]
13Analysis of Variance for C4 Source DF SS MS F P C5 5 1289. 0 257. 8 8 Analysis of Variance for C Source DF SS MS F P C Error Total Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev (-----*-----) (----*-----) (----*-----) (-----*-----) (-----*-----) (------*-----) Pooled StDev =
14Residual spatial autocorrelation What does this mean?
18What is the set of equations for the following case? 7Y2 = ?10
19spatial autoregressive (AR) krigingestimate withsemivariogram modelfit semivariogram model with
20The pure spatial autocorrelation CAR model NOTE: exactly the same algebraic structure as the kriging equationDispersed missing values:Imputation = the observed mean plus a weighted average of the surrounding residuals
21Employing rook’s adjacency and a CAR model, what is the equation for the following imputation? 10376y5 = ?495
22The spatial filter EM algorithm solution where:the missing values are replaced by 0 in Y, andIm is an indicator variable for missing value mthat contains n-m 0s and a single 1
26Predicted from spatial filter Missing 1992 georeferenced density of milk production in Puerto Rico: constrained (total = 1918)Predicted from 1991 DMILKPredicted from spatial filterPredicted from both235703851,3391,8481,065344468predictionsMoran scatterplot
27USDA-NASS estimation of Pennsylvania crop production covariatetotalconstraintsmap gaps
39Cross-validation of spatial filter for observed turnip data
40Kriging: best linear unbiased spatial interpolator (i.e., predictor) The accompanying table contains a test set of sixteen random samples (#17-32) used to evaluate three maps. The “Actual” column lists the measured values for the test locations identified by “Col, Row” coordinates. The difference between these values and those predicted by the three interpolation techniques form the residuals shown in parentheses. The “Average” column compares the whole field arithmetic mean of 23 (guess 23 everywhere) for each test location.
41ArcGIS: Geostatistical Wizard density ofGermanworkersanisotropycheck
42Cross-validation check of krigged values This is one use ofthe missing spatial dataimputation methods.
43Unclipped krigged surface exponential semivariogram modelvalues increase with darkness of brownextrapolationkrigged (mean response) surfaceprediction error surface
44Clipped krigged surface krigged (mean response) surfacevalues increase with darkness of brownprediction error surface
45Detrended population density across China anisotropycheck
46Cross-validation check of krigged values This is one use ofthe missing spatial dataimputation methods.
47Unclipped krigged surface exponential semivariogram modelvalues increase with darkness of brownextrapolationkrigged (mean response) surfaceprediction error surface
48Clipped krigged surface krigged (mean response) surfacevalues increase with darkness of brownprediction error surface
49THEOREM 4The maximum likelihood estimate for missing georeferenced values described by a spatial autoregressive model specification is equivalent to the best linear unbiased predictor kriging equation of geostatistics.
50Geographically weighted regression: GWR Spatial filtering enables easier implementation of GWR, as well as proper assessment of its dfsStep #1: compute the eigenvectors of a geographic connectivity matrix, say CStep #2: compute all of the interactions terms XjEk for the P covariates times the K candidate eigenvectors (e.g., with MC > 0.25)Step #3: select from the total set, including the individual eigenvectors, with stepwise regression
51Step #4: the geographically varying intercept term is given by: Step #5: the geographically varying covariate coefficient is given by factoring Xj out of its appropriate selected interaction terms:
52A Puerto Rico DEM example Mean elevation (Y) is a function of: standard deviation of elevation (X), eigenvectors E1-E18, and 18 interaction terms (XE)Resultsintercept: 1, E2, E5-E7, E9, E11-E13, E15, E18slope: 1, E4, E6, E9, E10R2 increases from (with X only) to (with geographically varying coefficients)P(S-W) = 0.52 for the final model
56A summary: what have we learned during the 5 lectures? The nature of data and its information content.What is spatial autocorrelation?Visualizing spatial autocorrelation: Moran scatterplots, semivariogram plots, and maps.Defining and articulating spatial structure: topology and distance perspectives; contagion and hierarchy concepts.Necessary concepts from multivariate statistics.An example of the elusive negative spatial autocorrelation.Some comments about spatial sampling.Implications about space-time data structure.
57Multivariate grouping, and location-allocation modeling. Lecture #2Multivariate grouping, and location-allocation modeling.Going from the global to the local: variability and heterogeneity.Impacts of spatial autocorrelation on histograms.The LISA and Getis-Ord statistics.Cluster analysis: multivariate analysis, cluster detection, and spider diagrams.An overview of geographic and space-time clusters.Regression diagnostics and geographic clusters
58Lecture #3Autoregressive specifications and normal curve theory (PROC NLIN).Auto-binomial and auto-Poisson models: the need for MCMC.Relationships between spatial autoregressive and geostatistical modelsSpatial filtering specifications and linear and generalized linear models (PROC GENMOD).Autoregressive specifications and linear mixed models (PROC MIXED).Implications for space-time datasets (PROC NLMIXED)
59Lecture #4Frequentist versus Bayesian perspectives.Implementing random effects models in GeoBUGS.Spatially structured and unstructured random effects: the CAR, the ICAR, and the spatial filter specificationsLecture #5The E-M algorithmThe spatial E-M algorithmKriging in ArcGISApproaches to map smoothing