Spatial analysis: a roadmap David O’Sullivan University of Auckland School of Geography and Environmental Science

Spatial analysis: a roadmap David O’Sullivan University of Auckland School of Geography and Environmental Science d.osullivan@auckland.ac.nz

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan2 / 51 Overview A definition of spatial analysis –Data types and basic questions The bad news: classic problems of spatial analysis –Spatial dependence The good news: potential of spatial analysis –Some generally useful concepts in the analysis of spatial data

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan3 / 51 One definition of spatial analysis According to O’Sullivan and Unwin (2003) in Geographic Information Analysis: “concerned with investigating the patterns that arise as a result of processes that may be operating in space. Techniques and methods to enable the representation, description, measurement, comparison and generation of spatial patterns are central to the study of geographic information analysis” O'Sullivan, D. and Unwin, D. J. 2003. Geographic Information Analysis. Wiley: Hoboken, NJ.

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan4 / 51 What else is ‘spatial analysis’? It depends on your point of view: 1.Spatial data manipulation, often in a GIS, is frequently referred to as ‘spatial analysis’ 2.Spatial data analysis is descriptive and exploratory 3.Spatial statistical analysis employs statistical methods to investigate data with respect to some statistical model 4.Spatial modeling is about constructing models to predict or to better understand spatial phenomena

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan5 / 51 Auckland schools

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan6 / 51 Two key aspects in spatial analysis Spatial data –What types of spatial data exist? Applying standard statistical ideas to spatial data –What problems does this introduce? –Why is spatial data special?

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan7 / 51 Spatial data There are, broadly speaking, two main ways of representing the world: Vector objectsRaster fields

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan8 / 51 Vector object types Points Cities, people, houses, schools, crimes, disease incidence and mortality… Lines Drainage, road, communications, power networks, commutes… Areas Census districts, cities, boroughs, townships, school enrolment zones, police precincts, land cover units…

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan9 / 51 Field data are useful when a phenomenon is measurable at all locations –Field data are common for natural phenomena—air pressure, wind speed etc. –In social/human/population geography field data are sometimes used to represent estimated densities, since a density may be considered to be measurable everywhere (e.g., crime mapping) Fields

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan10 / 51 Attribute data types At each location, we have measurements of various attributes There are two broad classes (familiar from statistics): –Numerical data, which are either ratio or interval; and –Categorical data, which are either ordinal or nominal

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan11 / 51 The entity-attribute model Points Lines Areas Fields Nominal Ordinal Interval Ratio State highway Spot height Electoral districts

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan12 / 51 Some possible combinations Source: O'Sullivan and Unwin. 2003.

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan13 / 51 Some reservations … or … not so fast! This model is reductive –In particular, it is a statistician’s and GIS person’s view –How do you represent complex ethnographic data in this framework? ‘Home’? Photographs? Sound-clips? Hyperlinks? –In addition scale is often a complicating factor

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan14 / 51 OK, but… why the interest in spatial data? We assume that location makes a difference, so… –statistical distributions remain relevant… –… but now spatial patterns in the data are also of interest… –… and the possible relationships between the two are really what matter

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan15 / 51 Source: O'Sullivan and Unwin. 2003. Statistics and spatial analysis Statistics is about –Describing observed data –Comparing observed data to expectations based on a statistical model –Inferring from the comparison whether the observed data is compatible with the assumptions

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan16 / 51 Simplification in statistics To make the mathematics work, assumptions about observed data are made. In particular that –Observations are random samples from a population –Observations are independent of one another But… these assumptions are never true of spatial data

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan17 / 51 The problem with spatial data or: why “spatial [data] is special” In a nutshell: “Everything is related to everything else, but near things are more related than distant things ” Tobler, W. 1970. A computer movie simulating urban growth in the Detroit region, Economic Geography 46, 234–40 –This is sometimes called the First Law of Geography … because it is generally true! –It follows that: spatial data cannot be considered independent random samples from a population

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan18 / 51 Why “spatial is special” more specifically Some commonly identified ‘problems’ with spatial data are: –Autocorrelation –The modifiable areal unit problem (or MAUP) –Scale effects –Non-uniformity of space and ‘edge effects’

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan19 / 51 Spatial autocorrelation This follows directly from the observation that “… near things are more related than distant things” –Spatial data are self-correlated –There is redundancy in spatial data because observations made at locations near one another tend to be similar

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan20 / 51 Percent Pakeha by meshblock Positive autocorrelation is much more common in socio-economic data

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan21 / 51 Problem, or opportunity? Autocorrelation is only a problem if we choose to see it as one. Equally, we can –Describe or measure the autocorrelation structure of spatial data, in order to characterize it –Potentially use the description to improve subsequent analysis (e.g., simple interpolation becomes kriging in Geostatistics)

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan22 / 51 Describing autocorrelation Two broad effects can be considered: –First order variation Large scale variation in the mean value—a trend or background effect –Second order variation Local variation perhaps due to interaction effects between observations May be isotropic (no directional effects) or anisotropic (with a directional component)

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan23 / 51 First or second order? In practice, 1 st and 2 nd order effects are hard to distinguish Source: O’Sullivan and Unwin. 2003.

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan24 / 51 Autocorrelation statistics A number of formal measures exist: –Joins count statistics for binary or classed data –Moran’s I and Geary’s c for numeric data, at points or aggregated to areas –Semivariogram and covariogram functions for point data

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan25 / 51 Results For the Pakeha (European) population in Auckland City, using Moran’s I, we get:

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan26 / 51 The modifiable areal unit problem (MAUP) Areas are ‘arbitrary’: they are designed for convenience of data collection, not with respect to underlying patterns –Standard statistical techniques are sensitive to the choice of units –In one study* it was shown that the correlation between two variables can be estimated anywhere between –1 and +1, depending on the spatial units used! * Openshaw, S. and P. J. Taylor. 1979. A million or so correlation coefficients: three experiments on the modifiable areal unit problem. In N. Wrigley (ed.) Statistical Methods in the Spatial Sciences, Pion: London, 127-44.

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan27 / 51 Redistricting Perhaps the clearest example of MAUP in practice… Source: The Economist

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan28 / 51 Scale Geographic scale effects are fundamental: –Different data types are appropriate at different scales, so available spatial data may be dependent on scale, ruling out some types of analysis –Scale is a factor in autocorrelation, and in the distinction between 1 st and 2 nd order effects –It is also a factor in MAUP with respect to the level of aggregation used

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan29 / 51 Non-uniformity of space The non-uniformity of space refers to problems arising from tacit assumptions about the uniform spatial density of ‘background’ populations –For example, ‘clusters’ of crime are expected in urban areas because more people live there –This leads to numerous analytic complexities Example: ISO9000 certified firms in the United States

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan30 / 51 Edge effects Entities on the edge of a study area only have neighbors in one direction (toward the middle) –Unless care is taken, this can distort things –Again, coping with this leads to numerous analytic complexities

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan31 / 51 The bad news summarized Data are spatially dependent –Spatial autocorrelation Data are also dependent on how you look at them spatially: –Aggregation –Scale –Non-uniformity of space –Edge-effects

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan32 / 51 Some good news Spatial data do have an intrinsic advantage, however… In addition to data we have a record of where the data were observed Making the most of this extra information lies at the heart of spatial analysis

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan33 / 51 Some useful general concepts A number of concepts are frequently invoked in spatial analysis –Distance –Adjacency –Interaction –Neighborhood –Proximity polygons

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan34 / 51 Distance Easily calculated from two coordinates –Use Pythagoras’s theorem –This is trickier on a sphere, but for projected data at sub-regional scales the Euclidean approximation is adequate yy xx d

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan35 / 51 Other distance metrics A variety of non-Euclidean measures Network distance on a transport system Travel time Perceived distance

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan36 / 51 Adjacency This is a sort of binary distance: two spatial objects are either adjacent or not –Often we use distance to decide: if d = 0, then two objects are adjacent –This can get complex for some kinds of object –The meaning is clearest for polygons

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan37 / 51 Queens and rooks These terms are fairly self explanatory, referring to which types of adjacency we choose to ‘allow’

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan38 / 51 A simple example

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan39 / 51 Adjacency applied to measuring autocorrelation For each pair of neighboring cases, calculate the covariance: –This produces a positive number when two values are similar, and a negative number when they are different Averaged over all neighboring cases, and scaled by dividing by the variance of the data, we get a number between –1 and +1 –This is interpreted in the same way as a standard correlation coefficient

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan40 / 51 Election 2000 again Those county level results expressed in terms of the percent share for George W. Bush The clear spatial pattern in these data is confirmed by a Moran’s I value of around 0.45

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan41 / 51 Interaction Interaction (often denoted w ij ) is a measure of the likely strength of relationship between two entities –The most common form is inverse-distance

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan42 / 51 Other measures of interaction –Inverse distance powered (usually squared) –Negative exponential –Weighted inverse distance

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan43 / 51 Matrices and spatial pattern Many of these concepts assign a value to describe the relationship between every pair of objects This lends itself to being recorded in a matrix:

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan44 / 51 Spatial weights matrices A particularly common matrix is the spatial weights matrix, usually denoted by W This records the interaction between each pair of objects, and appears in –autocorrelation, point pattern analysis, interpolation, spatial regression, geographically weighted regression, spatial interaction modeling…

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan45 / 51 Neighborhood Neighborhood is a less clear-cut concept –It can mean the region of space around some object –Or the set of objects considered to be neighbors of that object Some notion of neighborhood is implied by any given weights matrix

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan46 / 51 Proximity polygons Proximity polygons are an increasingly important example of the neighborhood concept A proximity polygon is associated with each spatial object and is the region of space nearer to that object than to any other A good demonstration of the idea is Voroglide by: Praktische Informatik VI, FernUniversität Hagen, Christian Icking, Rolf Klein, Peter Köllner, Lihong Ma

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan47 / 51 Uses of proximity polygons Proximity polygons are commonly used in –Interpolation –Point pattern analysis Increasingly they are used throughout spatial analysis, especially in location decision making

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan48 / 51 Just to review… Distance Adjacency Interaction Neighborhood

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan49 / 51 GIS and spatial analysis GIS vendors often claim to offer ‘spatial analysis’ –This usually doesn’t mean statistical spatial analysis, but spatial data manipulation— buffering, overlay etc. –However, GIS has increased the need for spatial analysis, because more people are making maps, and asking questions about them!

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan50 / 51 If spatial analysis is so useful, why is it not integrated into GIS?! Different perspectives on spatial data –GIS is built around the entity-attribute model. Spatial analysis uses this data (because that’s the way it comes). Conceptually, spatial analysis sees data as patterns which are the outcomes of processes, which can be quite different. Spatial analysis is not widely understood –It has been a specialized field, and therefore hard to justify incorporating into GIS, as a standard tool Spatial analysis can make GIS hard to sell –Spatial analysis is about asking difficult questions, not about easy answers

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan51 / 51 Questions? David O’Sullivan University of Auckland School of Geography and Environmental Science d.osullivan@auckland.ac.nz

Spatial analysis: a roadmap David O’Sullivan University of Auckland School of Geography and Environmental Science

Similar presentations

Presentation on theme: "Spatial analysis: a roadmap David O’Sullivan University of Auckland School of Geography and Environmental Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spatial analysis: a roadmap David O’Sullivan University of Auckland School of Geography and Environmental Science

Similar presentations

Presentation on theme: "Spatial analysis: a roadmap David O’Sullivan University of Auckland School of Geography and Environmental Science"— Presentation transcript:

Similar presentations

About project

Feedback