Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis Stefan Falke stefan@me.wustl.edu http://capita.wustl.edu/ENVE424/REU/SpatialAnalysis.htm.

Similar presentations


Presentation on theme: "Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis Stefan Falke stefan@me.wustl.edu http://capita.wustl.edu/ENVE424/REU/SpatialAnalysis.htm."— Presentation transcript:

1 Stefan Falke stefan@me.wustl.edu
An Overview of Spatial Data Analysis Stefan Falke

2 Pop vs Soda vs Coke

3 Pop vs Soda vs Coke by County

4 2000 Presidential Election Results
Bush States: 30 votes: 50,456,169 Gore States: 21 votes: 50,996,116

5 2000 Presidential Election Results by County
Bush Gore

6 Environmental Pattern and Trend Analysis
When analyzing environmental data we examine: Spatial Patterns Temporal Trends We are particularly interested in changes in these patterns and trends and relationships with other patterns and trends The analysis also strives to determine why we see these patterns and trends – what are the casual factors and what are their impacts.

7 Spatial and Temporal Data Analysis
Turns raw data into useful information by adding greater informative content and value Wisdom Knowledge / Evidence Data Information

8 What is Spatial Data Analysis?
Spatial analysis is the quantitative and qualitative study of phenomena that are located in space. Environmental spatial data analysis describes characteristics and behavior of the environment Explores patterns, trends, and relationships in environmental data Seeks to explain these patterns, trends, and relationships Differs from general data analysis and statistics in that: Spatial data are dependent on location and related by location (they do not adhere to the independence assumption made in regular data analysis) Have properties that require special analysis methods Why is spatial analysis such a big deal? about 85% of environmental data is spatial

9 What is GIS? Traditional definition is that GIS is a set of computer tools for accessing, processing, visualizing, analyzing, interpreting, and presenting spatial data. ‘GIS’ is Geographical Information System OR IS IT Geographical Information Science? GISystems: Emphasis on technology and tools GIScience: Fundamental issues raised by the use of GIS, such as Spatial analysis Map projections Accuracy Scientific visualization Implementation and application of GIS covers a wide spectrum: Simple maps Overlaying multiple map “layers” Conducting proximity or cluster analysis based on distance Comparing data sets (simple spatial statistics) Complex statistical analysis

10 Nature Vol 427 22 January 2004

11 Special Spatial Nomenclature
Geographic – Limited to phenomena and problems relating to Earth’s surface and near-surface Spatial – Any space, including geographic, but not restricted to geographic coordinate space, e.g. medical imaging Geospatial – A recent term to represent the subset of spatial applied specifically to the Earth’s surface. (synonymous with geographic)

12

13 Tobler’s First Law of Geography
“Everything is related to everything else, but near things are more related than distant things.” Tobler, 1970 This general assumption is what subjects spatial data subject to special statistical laws

14 Types of Spatial Analysis
There are literally thousands of techniques Bailey and Gatrell, 1995 offer four spatial data analysis classes: Point Data Analysis Do the locations of point data and the relationship among the points represent a ‘significant’ pattern Continuous Data Analysis What are the spatial pattern and characteristics over a region given a set of samples Area Data Analysis Analysis of data that have been aggregated over a spatial zone, e.g. county

15 The John Snow Map A classic example of the use of location to draw inferences 1854 cholera outbreak in London Point data map indicated some spatial clustering Overlaying a map of water pump locations showed many cases were concentrated around a single pump

16 Continuous Data Analysis
Temperature data is well suited for converting from point to continuous data - It has high spatial density - Ambient temperature is relatively spatially homogenous (no sharp gradients)

17 County Level Aggregated Data
Also known as a chloropleth plot

18 Scale The most appropriate analysis method to use depends on the spatial and temporal scales of the problem. The spatial variability of temperature at a ‘local’ scale is not necessarily significant when conducting an analysis over at the ‘regional’ or ‘global’ scale.

19 Scale Dependent Measurements
How long is Maine’s coastline? length=340 km length=355 km length=415 km From Longley et al., 2001

20 What’s in a map, anyway? Theme: Static map
Maps of entities whose location is known and constant (relatively) Roads, borders, locations of buildings These types of layers are often referred to as “thematic” layers Are usually used to provide context to other spatial data Statistical: Realization of one of the many possible patterns that may have been generated by a process Given a set of conditions, a given spatial pattern is just one instance among a distribution of possible patterns The question is: Is the observed realization significantly different than what would be expected by chance?

21 Deterministic versus Stochastic Processes
Deterministic processes have one realization: the value at a given location is always the same, regardless of the number of times the process is occurs Stochastic processes have multiple realizations that are not precisely predicted and involve a random component. For our purposes, random refers to the method used to generate a pattern not the resulting pattern itself.

22 Examples of Deterministic & Stochastic Processes
random variable

23 Random Spatial Processes
A random process does not mean that all events are independent of one another, as is the case with flipping a coin or rolling dice. Rather, spatial random processes are random with dependence (or rules). Consider a “conditionally” random display of 4 coins: Flip the first 3 coins and display by their flipped side (head or tails) The 4th coin will not be flipped The 4th coin is displayed as follows: If the 2nd and 3rd flipped coins are heads, the 4th is the same as the first Otherwise, the 4th is opposite of the first.

24 Basic Statistical Concepts
Variance: Mean: Median: The value in the distribution at which 50% of the data points lie both above and below Covariance: Frequency/Probability Distributions Normal or Gaussian Poisson mean=variance mean=median

25 Distribution Summary Statistics
The features of a distribution can be summarized using: Measures of Location Mean Median Quantiles Measures of Spread Standard Deviation = Square Root of Variance Measures of Shape Coefficient of skewness – a measure of symmetry Kurtosis – a measure of the likelihood of outliers

26 Complete Spatial Randomness
Take as an example a randomly generated point data set where 1) the chance of a given x,y point existing is equal to the chance any other point existing (uniform probability distribution) 2) the existence of a x,y point is independent of the existence of any other point These two conditions constitute an independent random process (IRP) or complete spatial randomness (CSR)

27 Exploratory Spatial Data Analysis (ESDA)
Aim is to identify data properties for purposes of pattern detection Based on the use of graphical and visual methods and the use of numerical techniques that are statistically robust i.e. not much affected by extreme or atypical data values. ArcGIS Geostatistical Analyst extension contains a set of ESDA tools: Histogram (Frequency Distribution) Voronoi Map QQPlot Trend Analysis

28 Exploratory Analysis Example

29 Summary Statistics

30 Quantile Plots Graphs the quantiles of a dataset against the quantiles of a normal distribution

31 Vornoi Plot Voronoi plots assign or calculate values to a point’s polygon. Including: value itself mean of neighboring polygons most frequent value among neighboring polygons unique value among neighbors variation among neighbors

32 Spatial Smoothing/Averaging

33 Data Types Two general views to organizing spatial data:
Entities or objects Point measurements, rivers, structures Have attributes or features attached to them Point, vector or area format Values exist at discrete locations Fields Continuous data such as temperature gradient fields and satellite imagery Values exist over an area Raster format (grids)

34 Data Types Entities and fields can be transformed to the other type

35 Raster and Vector Data Models
Real World 600 1 2 3 4 5 6 7 8 9 10 1 B G Trees 500 2 B G G 3 B 400 4 B G G Trees Y-AXIS 5 B G G 300 6 B G G BK House 7 B 200 8 B B River 9 B 100 10 B 100 200 300 400 500 600 X-AXIS Raster Representation Vector Representation adapted from Lembo, 2003

36 Landcover Raster Grid (16-20) (11-15) (6-10) (1-5) 2 17 16 15 14 11 13
12 10 8 7 6 5 4 3 Legend Mixed conifer Douglas fir Oak savannah Grassland

37 What is GIS? Traditional definition is that GIS is a set of computer tools for accessing, processing, visualizing, analyzing, interpreting, and presenting spatial data. ‘GIS’ is Geographical Information System OR IS IT Geographical Information Science? GISystems: Emphasis on technology and tools GIScience: Fundamental issues raised by the use of GIS, such as Spatial analysis Map projections Accuracy Scientific visualization Implementation and application of GIS covers a wide spectrum: Simple maps Overlaying multiple map “layers” Conducting proximity or cluster analysis based on distance Comparing data sets (simple spatial statistics) Complex statistical analysis

38 GIS Functionality Filtering Aggregation Integration
Retrieves a subset of a dataset Examples Query (search) Aggregation Combines attributes or features within data sources (layers) Reclassify, dissolve Integration Combine two or more data sources (layers) Example Polygon overlay, table joining

39 Spatial Queries (Filter)
Identifying features based on spatial criteria Criteria include variations on: adjacency, containment, arrangement, and connectivity Adjacency Which states are adjacent to the State of Missouri? Containment Which states “contain” the Mississippi River and its tributaries?

40 Reclassification (Aggregation)
An assignment of a class or value based on the attributes or geography of an object

41 Reclassification & Dissolve

42 Variable Distance Buffering

43 Polygon Overlay (Integration)
Topology describes the relationships between elements of a map. A topological data structure defines the elements of the map in a way that makes it possible to know which line segments are connected to each other and to know what polygon is adjacent to each side of a line segment.

44 Polygon Overlay Examples
“Cookie-cutter” method

45 © Paul Bolstad, GIS Fundamentals
Coordinate Systems A geographical coordinate system uses a three-dimensional spherical surface to define locations on the earth. Divides space into orderly structure of locations. Two types: Cartesian and angular (spherical) © Paul Bolstad, GIS Fundamentals

46 Parallels and Meridians
Meridians are great circles of constant longitude Example is the prime meridian Parallels are circles of constant latitude Example is the equator latitude (φ): angular distance from equator longitude (λ): angular distance from standard meridian St. Louis 38° 39' N 90° 38' W New York 40° 47' N 73° 58' W Los Angeles 34° 3' N 118° 14' W Rome 41° 48' N  12° 36' E Sydney 33° 52' S  151° 12' E

47 Earth’s Expanding Waistline
From the Chronicle of Higher Education Jan 17, 2003

48 Datum While a spheroid approximates the shape of the earth, a datum defines the position of the ellipsoid relative to the center of the Earth The datum provides a frame of reference for measuring locations on the surface of the Earth A datum is chosen to align a spheroid to closely fit the Earth’s surface in a particular area

49 Map Projections and Distortions
Three general types of projections: Equal area – the ratio of areas on the earth and on the map are constant. Shape, angle, and scale are distorted. Conformal – the shape of any small surface of the map is preserved in its original form. If meridians and parallel lines are at 90-degree angles, then angles are also preserved. Equidistant - preserve distances between certain points. Scale is not maintained correctly, however, typically one or more lines has its scale maintained.

50 Comparing Projections

51 Summary Statistics of a Point Pattern
Mean center average of the x and y coordinates (geographic mean) X Standard Distance average distance of points from center (provides measure of dispersion) X Summary Circle Centered at mean center with a radius of the standard distance X

52 US Population Density

53 Geographic Center of US Population
The center of the US population is calculated as the average latitude and longitudes weighted by the population at a uniformly spaced set of points

54 Quadrant Count A quadrant count is conducted by superimposing a regular grid over data, counting the number of events in each grid cell and divide the count by its cell area to get intensity. 40 grid cells Variance: Mean cell count A s2 to µ ratio greater than 1 indicates clustering

55 Spatial Autocorrelation
Defines the correlation between values of the same variable at different spatial locations Positive Spatial Autocorrelation Like values tend to cluster in space Negative Spatial Autocorrelation Neighbors are dissimilar Zero Spatial Autocorrelation No correlation

56 spatial estimation method continuous surface of estimates (map)
From points to fields The factor that determines how much influence a data point is assigned during the calculation of the estimate spatial estimation method ci is the estimated value at location i n is the number of data points cj is the value at data point j wij is the weight assigned to data point j continuous surface of estimates (map) point monitoring data The weighting factor is usually the distinguishing feature of interpolation methods. Biggest challenge: How to determine the weights?

57 Inverse Distance Interpolation
k is the power-law of distance weighting Constrained to the minimum and maximum values in point data set

58 Spatial Smoothing/Averaging

59 Landcover Raster Grid (16-20) (11-15) (6-10) (1-5) 2 17 16 15 14 11 13
12 10 8 7 6 5 4 3 Legend Mixed conifer Douglas fir Oak savannah Grassland

60 Raster Analysis (Continuous Data)
2 7 Moving Windows minimum maximum 2 3 5 2 3 6 3 5 7 range mean 5 4

61 Slope Slope is the change is elevation (rise) with a change in horizontal position (run). The steepest decent between a cell and its neighbors is known as the gradient. Slope is often reported in degrees (0° is flat, 90° is vertical) but is also expressed as a percent

62 Hands-on Exercise: Mapping Census Data
Database manipulation (table joins) Reprojecting maps Calculating derived values (population density, change population over time) Visualization

63 ArcGIS Main Components
ArcCatalog ArcToolbox ArcMap

64

65 Data Quality It is impossible to make a perfect representation of the world, so uncertainty about it is inevitable Uncertainty is found in data and in its processing and analysis The outputs from spatial data analysis and GIS are only as good as the inputs and associated assumptions.

66 Logical Consistency Representation of data that does not make sense
Road in the water Contours that cross or end Features on steep slopes

67 Modifiable areal unit problem
Multiple ways to aggregate data into zones and thereby yielding different results.

68 Anscombe’s Quartet These four data sets look identical from a statistical perspective.

69 Anscombe’s Quartet They don’t look anything alike from a graphical perspective!!


Download ppt "Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis Stefan Falke stefan@me.wustl.edu http://capita.wustl.edu/ENVE424/REU/SpatialAnalysis.htm."

Similar presentations


Ads by Google