Download presentation

Presentation is loading. Please wait.

Published byChristian Kent Modified over 4 years ago

1
www.spatialanalysisonline.com Chapter 5 Part A: Spatial data exploration

2
3 rd editionwww.spatialanalysisonline.com2 Spatial data exploration Spatial analysis and data models (Anselin, 2002) ObjectField GISvectorraster Spatial Datapoints, lines, polygonssurfaces Locationdiscretecontinuous Observationsprocess realisationsample Spatial Arrangementspatial weightsdistance function Statistical Analysislatticegeostatistics Predictionextrapolationinterpolation Modelslag and errorerror Asymptoticsexpanding domaininfill

3
3 rd editionwww.spatialanalysisonline.com3 Spatial data exploration Sampling frameworks Pure random sampling Stratified random – by class/strata (proportionate, disproportionate) Randomised within defined grids Uniform Uniform with randomised offsets Sampling and declustering

4
3 rd editionwww.spatialanalysisonline.com4 Spatial data exploration Sampling frameworks – point sampling

5
3 rd editionwww.spatialanalysisonline.com5 Spatial data exploration Sampling frameworks – within zones Selection of 5 random points per zone Grid generation - square grid within field boundaries Grid generation (hexagonal) - selection of 1 point per cell, random offset from centre

6
3 rd editionwww.spatialanalysisonline.com6 Spatial data exploration A. 10% random sample from existing point setB. Stratified random selection, 30% of each stratum 800 radio-activity monitoring sites in Germany. Random sample of 80 (red/large dots) 200 radio-activity monitoring sites in Germany. Random sample of 30 (red/large dots) =100 units of radiation

7
3 rd editionwww.spatialanalysisonline.com7 Spatial data exploration Random points on a network

8
3 rd editionwww.spatialanalysisonline.com8 Spatial data exploration EDA, ESDA and ESTDA EDA – basic aims (after NIST) maximize insight into a data set uncover underlying structure extract important variables detect outliers and anomalies test underlying assumptions develop parsimonious models determine optimal factor settings

9
3 rd editionwww.spatialanalysisonline.com9 Spatial data exploration ESDA (see GeoDa and STARS) Extending EDA ideas to the spatial domain (lattice/zone models) Brushing Linking Mapped histograms Outlier mapping Box plots Conditional choropleth plots Rate mapping

10
3 rd editionwww.spatialanalysisonline.com10 Spatial data exploration ESDA: Brushing & linking

11
3 rd editionwww.spatialanalysisonline.com11 Spatial data exploration ESDA: Histogram linkage

12
3 rd editionwww.spatialanalysisonline.com12 Spatial data exploration ESDA: Parallel coordinate plot & star plot

13
3 rd editionwww.spatialanalysisonline.com13 Spatial data exploration ESDA: Mapped box plots

14
3 rd editionwww.spatialanalysisonline.com14 Spatial data exploration ESDA: Conditional choropleth mapping

15
3 rd editionwww.spatialanalysisonline.com15 Spatial data exploration ESDA: Mapped point data A. Variable point sizeB. Variable colourC. Semivariogram pairsD. Voronoi analysis

16
3 rd editionwww.spatialanalysisonline.com16 Spatial data exploration ESDA: Trend analysis (continuous spatial data)

17
3 rd editionwww.spatialanalysisonline.com17 Spatial data exploration ESDA: Cluster hunting – GAM/K (steps) 1.Read data for the population at risk 2.Identify the MBR containing the data, identify starting circle radius, and degree of overlap 3.Generate a grid covering the MBR 4.For each grid-intersection generate a circle of radius r 5.Retrieve two counts for the population at risk and the variable of interest 6.Apply some significance test procedure 7.Keep the result if significant 8.Repeat Steps 5 to 7 until all circles have been processed 9.Increase circle radius by dr and return to Step 3 else go to Step 10 10.Create a smoothed density surface of excess incidence for the significant circles 11.Map this surface and inspect the results

18
3 rd editionwww.spatialanalysisonline.com18 Spatial data exploration Grid-based statistics Univariate analysis of attribute data (non- spatial metrics) Cross-classification and cross-tab analyses Spatial pattern analysis for grid data (including Landscape metrics) Patch metrics; Class-level metrics; Landscape- level metrics Quadrat analysis Multi-grid regression analysis

19
3 rd editionwww.spatialanalysisonline.com19 Spatial data exploration Grid-based statistics Landscape metrics Non-spatial Proportional abundance; Richness; Evenness; Diversity Spatial Patch size distribution and density; Patch shape complexity; Core Area; Isolation/Proximity; Contrast; Dispersion; Contagion and Interspersion; Subdivision; Connectivity

20
3 rd editionwww.spatialanalysisonline.com20 Spatial data exploration Point (event) based statistics Typically analysis of point-pair distances Points vs events Distance metrics: Euclidean, spherical, L p or network Weighted or unweighted events Events, NOT computed points (e.g. centroids) Classical statistical models vs Monte Carlo and other computational methods

21
3 rd editionwww.spatialanalysisonline.com21 Spatial data exploration Point (event) based statistics Basic Nearest neighbour (NN) model Input coordinates of all points Compute (symmetric) distances matrix D Sort the distances to identify the 1st, 2nd,...kth nearest values Compute the mean of the observed 1st, 2nd,...kth nearest values Compare this mean with the expected mean under Complete Spatial Randomness (CSR or Poisson) model

22
3 rd editionwww.spatialanalysisonline.com22 Spatial data exploration Point (event) based statistics – NN model

23
3 rd editionwww.spatialanalysisonline.com23 Spatial data exploration Point (event) based statistics – NN model Mean NN distance: Variance: NN Index (Ratio): Z-transform:

24
3 rd editionwww.spatialanalysisonline.com24 Spatial data exploration Point (event) based statistics Issues Are observations n discrete points? Sample size (esp. for k th order NN, k>1) Model requires density estimation, m Boundary definition problems (density and edge effects) – affects all methods NN reflexivity of point sets Limited use of frequency distribution Validity of Poisson model vs alternative models

25
3 rd editionwww.spatialanalysisonline.com25 Spatial data exploration Frequency distribution of nearest neighbour distances, i.e. The frequency of NN distances in distance bands, say 0-1km, 1-2kms, etc The cumulative frequency distribution is usually denoted G(d) = #(d i < r)/nwhere d i are the NN distances and n is the number of measurements, or F(d) = #(d i < r)/mwhere m is the number of random points used in sampling

26
3 rd editionwww.spatialanalysisonline.com26 Spatial data exploration Computing G(d) [computing F(d) is similar] Find all the NN distances Rank them and form the cumulative frequency distribution Compare to expected cumulative frequency distribution: Similar in concept to K-S test with quadrat model, but compute the critical values by simulation rather than table lookup

27
3 rd editionwww.spatialanalysisonline.com27 Spatial data exploration Point (event) based statistics – clustering (ESDA) Is the observed clustering due to natural background variation in the population from which the events arise? Over what spatial scales does clustering occur? Are clusters a reflection of regional variations in underlying variables? Are clusters associated with some feature of interest, such as a refinery, waste disposal site or nuclear plant? Are clusters simply spatial or are they spatio-temporal?

28
3 rd editionwww.spatialanalysisonline.com28 Spatial data exploration Point (event) based statistics – clustering k th order NN analysis Cumulative distance frequency distribution, G(r) Ripley K (or L) function – single or dual pattern PCP Hot spot and cluster analysis methods

29
3 rd editionwww.spatialanalysisonline.com29 Spatial data exploration Point (event) based statistics – Ripley K or L Construct a circle, radius d, around each point (event), i Count the number of other events, labelled j, that fall inside this circle Repeat these first two stages for all points i, and then sum the results Increment d by a small fixed amount Repeat the computation, giving values of K(d) for a set of distances, d Adjust to provide normalised measure L:

30
3 rd editionwww.spatialanalysisonline.com30 Spatial data exploration Point (event) based statistics – Ripley K

31
3 rd editionwww.spatialanalysisonline.com31 Spatial data exploration Point (event) based statistics – comments CSR vs PCP vs other models Data: location, time, attributes, error, duplicates Duplicates: deliberate rounding, data resolution, genuine duplicate locations, agreed surrogate locations, deliberate data modification Multi-approach analysis is beneficial Methods: choice of methods and parameters Other factors: borders, areas, metrics, background variation, temporal variation, non-spatial factors Rare events and small samples Process-pattern vs cause-effect ESDA in most instances

32
3 rd editionwww.spatialanalysisonline.com32 Spatial data exploration Hot spot and cluster analysis – questions where are the main (most intensive) clusters located? are clusters distinct or do they merge into one another? are clusters associated with some known background variable? is there a common size to clusters or are they variable in size? do clusters themselves cluster into higher order groupings? if comparable data are mapped over time, do the clusters remain stable or do they move and/or disappear?

33
3 rd editionwww.spatialanalysisonline.com33 Spatial data exploration Hot spot (and cool-spot) analysis Visual inspection of mapped patterns Scale issues Proximal and duplicate points Point representation (size) Background variation/controls (risk adjustment) Weighted or unweighted Hierarchical or non-hierarchical Kernel & K-means methods

34
3 rd editionwww.spatialanalysisonline.com34 Spatial data exploration Hot spot analysis – Hierarchical NN Cancer incidence data 1 st and 2 nd order clusters

Similar presentations

OK

Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,

Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google