Local Indicators of Categorical Data Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems,

Local Indicators of Categorical Data Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems, 5(2), 139-160.

Why does space matter? Toblers first Law: "Everything is related to everything else, but near things are more related than distant things.“[1] Spatial autocorrelation Observations are located in space/ have spatial component Where did someone get sick? Where are richer people living? A wide range of questions can be evaluated from a spatial perspective High likelihood of similar properties if distance (physical but also social etc.) is low Data has often distinct spatial characteristics Clustering vs randomness vs uniform distribution

Spatial Data Basics Spatial Data is stored together with attributes in two formats Raster Data Area represented by equally sized squares Vector Data Data represented as Points, Lines or Polygons

Global and local measures Expression of spatial value similarity Global Measures Moran’s I (deviation from mean) Geary’s C (actual values) Getis-Ord (identifies general clustering of high or low values) Join-Count Statistic (binary data) Single value for entire data set Local Measures Value for each observation E.g. Local Getis-Ord and Local Moran’s I Expression of spatial value similarity

Example Global Moran’s I N is the number of observations (points or polygons) is the mean of the variable X i is the variable value at a particular location X j is the variable value at another location W ij is a weight indexing location of i relative to j

Measures of Local Spatial Association Common Uses assessing the assumption of stationarity for a given study region identifying the existence of pockets of distinctive data values (hot and cold spots) identifying the scale (spatial extent) at which there is no discernible association of data values

Measures of Local Spatial Association Example Local Moran’s I Measurement of similarity for each region Local Getis Ord… Sum of local values creates global test statistic All common measures for continuous (and ordinal) variables Developed in context of regression to identify residuals Would quantify categorical data implying measurable distance No measurements for local spatial association of categorical data

Categorical Data Join-Count widely applied as global measure Mostly for binary data More classes problematic and require large sub regions to ensure sufficient counts Only cells and polygons Counts links between cells Values assigned based on occurrence or non occurrence Border between cells Assume from now on a raster dataset with black and white cells New: Local join-count statistic Different from quantitative data; two base concepts: composition which relates to aspatial characteristics of the different classes Global and local concentration configuration which refers to characteristics of the spatial distribution of the classes Clustering

Categorical Data Global composition: Share of one class at overall count 15 cells black, 85 white  total:100 Share: 15% black Local composition: If global composition is known likelihood of finding x members of a class is given by binominal distribution: Evaluation of significant presence and absence of cells based on formula above for specific m by m subregion; adjustment for multiple testing; assuming no spatial dependence Pr(X = x) < 0.05

Join-Count Test Statistic Test Statistic given by: Z= Observed - Expected SD of Expected Expected = randomly generated Expected ValuesSD of expected k= total number of joins Pb expected proportion black (random or given)  pw proportion white M is based on k via

Categorical Data Global configuration Counts all possible links and counts links with b/b, w/w, b/w – share Rarely used High share of b/b and w/w in contrast to b/w indicates clustering High relative share of b/w indicates dispersal Local configuration Local configuration dependent on global and local composition  Conditional relationship; Is number of joint counts different from random distribution of black cells

Categorical Data Local configuration continued Using global composition we derive joint count for random distribution Distribution of joint counts For large datasets with counts for b/b, w/w and b/w larger than 30: normal approximation Smaller: total count or simulation of sample configurations Counting all links in subregion around spatial unit Identifying all cells which differ significantly regarding b/b, w/w and b/w count from global value assuming randomness Local composition and configuration can be combined as tool for visualization

An example Perennial shrub Atriplex hymenelytra Study area: Death Valley, CA Black: Presence of perennial shrub / White: Absence Global composition: 65/256=0.254 Insignificant global test, no spatial association Local tests for matrices: 3x3, 5x5 and 7x7

Example Significant deviations from global composition under the assumption of non-dependence

Example Significant deviation from global configuration under the assumption of non- dependence

Example Combination of both Interpretation can be difficult Hot clumps,and hot or clump only indicate area specifically suitable for growth of the shrub Explorative data analysis: next question: What makes this area special?

Problems Assumption of global spatial non-dependence Problematic True random patterns very unlikely With global spatial dependence: Too liberal: many local hotspots identified Suggested method: Identify cells with significant local composition Compare number and distribution with random simulations Identify cells with significant local configuration (clumps) Compute probability to encounter black cells in clumps + outside of clumps Evaluate local composition using additive binominal with all subregions Useful? Step two enables evaluation via montecarlo simulation if numbers and distribution vary significantly We are still often interested in the hotspots  targets for intervention etc.

Potential Problems Vector data characterized by unequally sized polygons How to define areas? Steps to central polygon Potential bias towards large polygons with many boundaries Highly complex data problematic What if polygon has multiple borders with second polygon Other methods yield also results Moran’s I and Getis-Ord produce results with binary data Though conceptually inappropriate might provide hints and include global composition and standardize Scan Statistic to identify hot-spots but requires conversion to point data

Potential Problems Edge Effects What to do with missing values at the edge of the study area? Use of count data to estimate edge effects highly problematic Modifiable areal unit problem (MAUP) Testing across varying subregion sizes (steps) Clustering varies across geographic scales Multiple testing Can be too conservative

Conclusion Joint counts well established measure of global spatial association for categorical data Development of local spatial statistics for categorical data More accurate conceptual treatment of categorical data Can visualize clustering and concentration of categorical data Useful for explorative spatial data analysis But often limited to binary problem Practical improvement? A local Moran’s I may provide an indication It depends on the question asked Assessing impact of global measures Complicated and not fully developed Necessity: depends on question asked

Software to deal with spatial problems GIS Spatial data tool Spatial properties (adjacency…) inherent to datasets – worry free Tools can be created in Python/ integrated tools for spatial statistics Push a button but limited options in non-spatial statistics R Flexible and a large variety of available tools Data has to be preprocessed to allow spatial calculation –adjacency etc. Can take some time (Matlab) (SAS) Seems to have a variety of procedures for point data analysis

Code in R Introduction to spatial R: https://pakillo.github.io/R-GIS-tutorial/ Creating neighbors in spatial data: https://cran.r-project.org/web/packages/spdep/vignettes/nb.pdf This can be also used to create all subregions Global join count (SPDEP package): http://www.inside-r.org/packages/cran/spdep/docs/joincount.test Perform this test on all subregions using global as expected values Procedure for test for differences in local composition (has to be performed for all spatial units) (stats package): https://stat.ethz.ch/R-manual/R- devel/library/stats/html/prop.test.html https://stat.ethz.ch/R-manual/R- devel/library/stats/html/prop.test.html

References Anselin, L. (1995). Local indicators of spatial association-LISA. Geographical analysis, 27(2), 93-115. Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems, 5(2), 139-160. Rogerson, P., & Yamada, I. (2008). Statistical detection and surveillance of geographic clusters. CRC Press. Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240

Local Indicators of Categorical Data Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems,

Similar presentations

Presentation on theme: "Local Indicators of Categorical Data Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Local Indicators of Categorical Data Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems,

Similar presentations

Presentation on theme: "Local Indicators of Categorical Data Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems,"— Presentation transcript:

Similar presentations

About project

Feedback