4What is spatial autocorrelation? Based on Tobler’s first law of geography, “Everything is related to everything else, but near things are more related than distant things.”It’s the correlation of a variable with itself through space.Patterns may indicate that data are not independent of one another, violating the assumption of independence for some statistical tests.
5Tests for spatial autocorrelation will allow you to answer the following questions about your data: How are the features distributed?What is the pattern created by the features?Where are the clusters?How do patterns and clusters of different variables compare to one another?
6PatternsUseful to:Better understand geographic phenomena (ex. Habitats)Monitor conditions (ex. Level of clustering)Compare different sets of features (ex. Patterns of different types of crimes)Track change
7PatternsYou can measure the pattern formed by the location of features or patterns of attribute values associated with features (ex. median home value, percent female, etc.).New AIDS cases in 1994New AIDS cases in 2003
8Types of data often analyzed Location of crimes, animals, retail, industry, etc.Land coverLand useCensus/social data
9Software ArcGIS GeoDa – open source Complete GIS software with hundreds of toolsCan work with several datasets (layers) at once.GeoDa – open sourceSolely for spatial statisticsUse one dataset (layer) at a time.Simple, easy-to-use, interfaceAvailable with registration at:
11Spatial Neighborhoods and Weights Neighborhood = area in which the GIS will compare the target values to neighboring valuesNeighborhoods are most often defined based on adjacency or distance, but can be defined based on travel time, travel cost, etc.You can also define a cutoff distance, the amount of adjacency (borders vs. corners), or the amount of influence at different distancesA table of spatial weights is used to incorporate these definitions into statistical analysis.
12Distance ModelsInverse distance – all features influence all other features, but the closer something is, the more influence it hasDistance band – features outside a specified distance do not influence the features within the areaZone of indifference – combines inverse distance and distance bandDistance could be Euclidean or ManhattanYou choose the model for certain calculations based on what you know about the data.
13Adjacency ModelsK Nearest Neighbors – a specified number of neighboring features are included in calculationsPolygon Contiguity – polygons that share an edge or node influence each otherSpatial weights – specified by user (ex. Travel times or distances)
14Types of Contiguity Rook = Share edges Bishop = share corners Queen = share edges or cornersSecondary order contiguity = neighbor of neighborImage from:
16Average Nearest Neighbor Measures how similar the actual mean distance is to the expected mean distance for a random distributionMeasures clustering vs. dispersion of feature locationsCan be used to compare distributions to one anotherConcerns: one point on a line is chosen for analysis, extent of study area can affect results (many features near the edge of the study bias results)Z score used to determine significanceFeatures at an edge bias the study because they do not have as many neighbors
17Ripley’s K-functionGIS counts the number of neighboring features within a given distance to each feature.Like Nearest Neighbor, the K-function measures clustering/dispersion of feature locations, but includes neighbors occurring within a certain distance.It is often used with individual points.The test compares the observed K value at each distance to the expected K value for a random distribution at each distance.Concerns: points at the edge of the study area may have few neighbors
18Ripley’s K-functionWe won’t be using these in the exercises, but they are useful to know they exist.Assaults are clustered until about 13,000 ft. and then dispersed beyond 15,000 ft.
20Global vs. Local Statistics Global statistics – identify and measure the pattern of the entire study areaDo not indicate where specific patterns occurLocal Statistics – identify variation across the study area, focusing on individual features and their relationships to nearby features (i.e. specific areas of clustering)
21Spatial Autocorrelation (Moran’s I) Global statisticMeasures whether the pattern of feature values is clustered, dispersed, or random.Compares the difference between the mean of the target feature and the mean for all features to the difference between the mean for each neighbor and the mean for all features.For more information on the equation, see ESRI online helpMean of Target FeatureMean of each neighborMean of all features
22Spatial Autocorrelation (Moran’s I) Calculates I values:I=0=random distributionI<0=Values dispersed0<I=Values clusteredClusteredDispersedRandom
23Spatial Autocorrelation (Moran’s I) I= -.12, slightly dispersedMoran’s I shows the similarity of nearby features through the I value (-1 to 1), but does not indicate if the clustering is for high values or low values.I= .26, clustered
24Anselin Local Moran’s I Local statisticMeasures the strength of patterns for each specific feature.Compares the value of each feature in a pair to the mean value for all features in the study area.See ESRI online help for the equation.
25Anselin Local Moran’s I Positive I value:Feature is surrounded by features with similar values, either high or low.Feature is part of a cluster.Statistically significant clusters can consist of high values (HH) or low values (LL)Negative I value:Feature is surrounded by features with dissimilar values.Feature is an outlier.Statistically significant outliers can be a feature with a high value surrounded by features with low values (HL) or a feature with a low value surrounded by features with high values (LH).
26Anselin Local Moran’s I What do the dark blue and bright red splotches tell us in the map on the right?Census tracts for percentage 65 and aboveZ-scoresI values
27Getis-Ord General G Global statistic Indicates that high or low values are clusteredThe value of the target feature itself is not included in the equation so it is useful to see the effect of the target feature on the surrounding area, such as for the dispersion of a disease.Works best when either high or low values are clustered (but not both).See ESRI online help for the equation.Provides one statistic so if there are both high and low clusters then stat could be inaccurate.
28Getis-Ord General GHigh G score: Statistically significant clustering of high values.Low G value: Slight clustering of low values.
29Hot Spot Analysis (Getis-Ord Gi*) Local version of the G statisticThe value of the target feature is included in analysis, which shows where hot spots (clusters of high values) or cold spots (clusters of low values) exist in the area.To be statistically significant, the hot spot or cold spot will have a high/low value and be surrounded by other features with high/low values.See ESRI online help for the equation.Local equivalent of getis-ord general G
31G statistics vs. Local Moran’s I G statistics are useful when negative spatial autocorrelation (outliers) is negligible.Local Moran’s I calculates spatial outliers.
32G statistics in GeodaGi > The value itself at a location (i) is not included in the analysisGi* > The value (i) is included in the numerator and denominator.
33Permutations in GeodaPermutation inference is shuffling values around and re-computing statistics each time with a different set of random numbers to construct a reference distribution.Permutations are used to determine how likely it would be to observe the Moran’s I value of an actual distribution under conditions of spatial randomness.P-values are dependent on the number of permutations so they are “pseudo p-values”
34Reference Distribution Geoda generates a historgram of the Moran’s I values compared to the observed Moran’s I.
35Bivariate Moran’s I An option in Geoda It tests 2 variables: the correlation between a given variable (x) at a location and a different variable (y) at surrounding locations.Results are difficult to interpret.It is useful in examining the range of interaction provided x and y are correlated at the same location.