Geographical analysis Overlay, cluster analysis, auto- correlation, trends, models, network analysis, spatial data mining.

Slides:



Advertisements
Similar presentations
Geographic Information Systems GIS Data Models. 1. Components of Geographic Data Spatial locations Attributes Topology Time.
Advertisements

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Basic geostatistics Austin Troy.
Raster Based GIS Analysis
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
Computational Geometry and Spatial Data Mining
Border around project area Everything else is hardly noticeable… but it’s there Big circles… and semi- transparent Color distinction is clear.
TERMS, CONCEPTS and DATA TYPES IN GIS Orhan Gündüz.
x – independent variable (input)
Correlation and Autocorrelation
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Trajectory Simplification
Geographic Information Systems
Spatial Analysis Longley et al., Ch 14,15. Transformations Buffering (Point, Line, Area) Point-in-polygon Polygon Overlay Spatial Interpolation –Theissen.
Spatial Interpolation
Geographic Information Systems. What is a Geographic Information System (GIS)? A GIS is a particular form of Information System applied to geographical.
Week 17GEOG2750 – Earth Observation and GIS of the Physical Environment1 Lecture 14 Interpolating environmental datasets Outline – creating surfaces from.
Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.
PROCESS IN DATA SYSTEMS PLANNING DATA INPUT DATA STORAGE DATA ANALYSIS DATA OUTPUT ACTIVITIES USER NEEDS.
1 Spatial Databases as Models of Reality Geog 495: GIS database design Reading: NCGIA CC ’90 Unit #10.
Lecture 4. Interpolating environmental datasets
Why Geography is important.
Area, buffer, description Area of a polygon, center of mass, buffer of a polygon / polygonal line, and descriptive statistics.
Geographical analysis
Applications in GIS (Kriging Interpolation)
Let’s pretty it up!. Border around project area Everything else is hardly noticeable… but it’s there Big circles… and semi- transparent Color distinction.
Intro. To GIS Lecture 6 Spatial Analysis April 8th, 2013
GIS Analysis. Questions to answer Position – what is here? Condition – where are …? Trends – what has changed? Pattern – what spatial patterns exist?
Slope and Aspect Calculated from a grid of elevations (a digital elevation model) Slope and aspect are calculated at each point in the grid, by comparing.
Basic Spatial Analysis
Spatial data models (types)
Data Mining Techniques
FNR 402 – Forest Watershed Management
Lecture 9 Managing a GIS project. GIS analysis Collect and process data to aid in decision making  Use the data to make decisions  Identify alternatives.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Spatial Statistics Applied to point data.
United Nations Regional Seminar on Census Data Dissemination and Spatial Analysis Amman, Jordan, May, 2011 Spatial Analysis & Dissemination of Census.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
Interpolation.
8. Geographic Data Modeling. Outline Definitions Data models / modeling GIS data models – Topology.
How do we represent the world in a GIS database?
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
15. Descriptive Summary, Design, and Inference. Outline Data mining Descriptive summaries Optimization Hypothesis testing.
Models in GIS A model is a description of reality It may be: Dynamic orStatic Dynamic spatial models e.g., hydrologic flow Static spatial models (or point.
Geographic Information Systems Data Analysis. What is GIS Data ?
Geographic Information Science
Intro to Raster GIS GTECH361 Lecture 11. CELL ROW COLUMN.
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
Data Types Entities and fields can be transformed to the other type Vectors compared to rasters.
Chapter 8 – Geographic Information Analysis O’Sullivan and Unwin “ Describing and Analyzing Fields” By: Scott Clobes.
Spatial Interpolation III
ISPRS Congress 2000 Multidimensional Representation of Geographic Features E. Lynn Usery Research Geographer U.S. Geological Survey.
1 Spatial Data Models and Structure. 2 Part 1: Basic Geographic Concepts Real world -> Digital Environment –GIS data represent a simplified view of physical.
NR 143 Study Overview: part 1 By Austin Troy University of Vermont Using GIS-- Introduction to GIS.
So, what’s the “point” to all of this?….
Grid-based Map Analysis Techniques and Modeling Workshop
L15 – Spatial Interpolation – Part 1 Chapter 12. INTERPOLATION Procedure to predict values of attributes at unsampled points Why? Can’t measure all locations:
Geotechnology Geotechnology – one of three “mega-technologies” for the 21 st Century Global Positioning System (Location and navigation) Remote Sensing.
Geometric Description
Attractiveness Mapping Modeling Land Use Preference.
Definition of Spatial Analysis
Statistical Surfaces, part II GEOG370 Instructor: Christine Erlien.
Geographical Data Types, relations, measures, classifications, dimension, aggregation.
INTERPOLATION Procedure to predict values of attributes at unsampled points within the region sampled Why?Examples: -Can not measure all locations: - temperature.
Data Transformation: Normalization
Vector Analysis Ming-Chun Lee.
Raster Analysis Ming-Chun Lee.
Quantifying Scale and Pattern Lecture 7 February 15, 2005
Spatial interpolation
Presentation transcript:

Geographical analysis Overlay, cluster analysis, auto- correlation, trends, models, network analysis, spatial data mining

Geographical analysis Combination of different geographic data sets or themes by overlay or statistics Discovery of patterns, dependencies Discovery of trends, changes (time) Development of models Interpolation, extrapolation, prediction Spatial decision support, planning Consequence analysis (What if?)

Example overlay Two subdivisions with labeled regions Soil type 1 Soil type 2 Soil type 3 Soil type 4 Birch forest Beech forest Mixed forest Birch forest on soil type 2 soilvegetation

Kinds of overlay Two subdivisions with the same boundaries - nominal and nominal Religion and voting per municipality - nominal and ratio Voting and income per municipality - ratio and ratio Average income and age of employees Two subdivisions with different boundaries Soil type and vegetation Subdivision and elevation model Vegetation and precipitation

Kinds of overlay, cont’d Subdivision and point set quarters in city, occurrences of violence on the street Two elevation models elevation and precipitation Elevation model and point set elevation and epicenters of earthquakes Two point sets money machines, street robbery locations Network and subdivision, other network, elevation model

Result of overlay New subdivision or map layer, e.g. for further processing Table with combined data Count, surface area SoilVegetationArea #patches Type 1Beech30 ha 2 Type 2Birch15 ha 2 Type 3Mixed 8 ha 1 Type 4Beech2 ha 1….

Buffer and overlay Neighborhood analysis: data of a theme within a given distance (buffer) of objects of another theme Sightings of nesting locations of the great blue heron (point set) Rivers; buffer with width 500 m of a river Overlay  Nesting locations great blue heron near river

Overlay: ways of combination Combination (join) of attributes One layer as selection for the other –Vegetation types only for soil type 2 –Land use within 1 km of a river

Overlay in raster Pixel-wise operation, if the rasters have the same coordinate (reference) system ForestPopulation increase above 2% per year Pixel-wise AND Both

Overlay in vector E.g. the plane sweep algorithm as given in Computational Geometry (line segment intersection), to get the overlay in a topological structure Using R-trees as an indexing structure to find intersections of boundaries

Combined (multi-way) overlays Site planning, new construction sites depending on multiple criteria Another example (earth sciences): Parametric land classification: partitioning of the land based on chosen, classified themes

ElevationAnnual precipitation

Types of rock Overlay: partitioning based on the three themes

Analysis point set Points in an attribute space: statistics, e.g. regression, principal component analysis, dendrograms (area, population, #crimes) #population #crimes (12, , 34) (14, , 31) (15, , 14) (17, , 82) (17, , 79) …… ……

Analysis point set Points in geographical space without associated value: clusters, patterns, regularity, spread Actual average nearest neighbor distance versus expected Av. NN. Dist. for this number of points in the region For example: volcanoes in a region; crimes in a city

Analysis point set Points in geographical space with value: up to what distance are measured values “similar” (or correlated)?

Analysis point set Temperature at location x and 5 km away from x is expected to be nearly the same Elevation (in Switzerland) at location x and 5 km away from x is not expected to be related (even over 1 km), but it is expected to be nearly the same 100 meters away Other examples: –depth to groundwater –soil humidity –nitrate concentration in the soil

Analysis point set Points in geographical space with value: auto-correlation (~ up to what distance are measured values “similar”, or correlated) n points  (n choose 2) pairs; each pair has a distance and a difference in value

distance difference distance Classify distances and determine average per class Average difference  observed expected difference 2 2 2

distance sill range Observed variogram Model variogram (linear) Smaller distances  more correlation, smaller variance Average difference  observed expected difference 2 2 nugget

Importance auto-correlation Descriptive statistic of a data set: describes the distance-dependency of auto-correlation Interpolation based on data further away than the range is nonsense range ??

Importance auto-correlation If the range of a geographic variable is small, more sample point measurements are needed to obtain a good representation of the geographic variable through spatial interpolation  influences cost of an analysis or decision procedure, and quality of the outcome of the analysis

Analysis subdivision Nominal subdivision: auto-correlation (~ clustering of equivalent classes) Ratio subdivision: auto-correlation PvdA CDA VVD Auto-correlation No auto-correlation

Auto-correlation nominal subdivision 22 neighbor relations (adjacencies) among 12 provinces Pr(province A = VVD and province B = VVD ) = 4/12 * 3/11 E( VVD adj. VVD ) = 22 * 12/132 = 2 Reality: 4 times E( CDA adj. PvdA ) = 5.33; reality once PvdA CDA VVD Join count statistic: 4/12 * 4/11 * 2 * 22

Geographical models Properties of (geographical) models: –selective (simplification, more ideal) –approximative –analogous (resembles reality) –structured (usable, analyzable, transformable) –suggestive –re-usable (usable in related situations)

Geographical models Functions of models: –psychological (for understanding, visualization) –organizational (framework for definitions) –explanatory –constructive (beginning of theories, laws) –communicative (transfer scientific ideas) –predictive

Example: forest fire Is the Kröller-Müller museum well enough protected against (forest)fire? Data: proximity fire dept., burning properties of land cover, wind, origin of fire Model for: fire spread b * ws * (1- sh) * (0.2 + cos ) b = burn factor ws = wind speed  = angle wind – direction pixel sh = soil humidity Time neighbor pixel on fire: [1.41 *]

Forest fire Forest; burn factor 0.8 Heath; burn factor 0.6 Road; burn factor 0.2 Museum Soil humidity Origin 9 minutes Wind, speed 3

Forest fire model Selective: only surface cover, humidity and wind; no temperature, seasonal differences, … Approximative: surface cover in 4 classes; no distinction in forest type, etc., pixel based so direction discretized Structured: pixels, simple for definition relations between pixels Re-usable: approach/model also applies to other locations (and other spread processes)

Network analysis When distance or travel time on a network (graph) is considered Dijkstra’s shortest path algorithm Reachability measure for a destination: potential value w = weight origin j  = distance decay parameter c = distance cost between origin j and destination i j ij

Example reachability Law Ambulance Transport: every location must be reachable within 15 minutes (from origin of ambulance)

Example reachability Physician’s practice: - optimal practice size: 2350 (minimum: 800) - minimize distance to practice - improve current situation with as few changes as possible

Current situation: 16 practices, people, average 1875 per practice Computed, improved situation: 13 practices

Example in table Original New Number of practices Number of practice locations 9 7 Number of practices < 800 size 2 0 Number of people > 3 km Average travel distance (km) 0,9 1,2 Largest distance (km) 5,2 5,4

Analysis elevation model Landscape shape recognition: - peaks and pits - valleys and ridges - convexity, concavity Water flow, erosion, watershed regions, landslides, avalanches

Spatial data mining Finding spatial patterns in large spatial data sets –within one spatial data set –across two or more data sets With time: spatio-temporal data mining

Spatial data mining and computation “Geographic data mining involves the application of computational tools to reveal interesting patterns in objects and events distributed in geographic space and across time” (Miller & Han, 2001) Large data sets  attempt to carefully define interesting patterns (to avoid finding non-interesting patterns)  advanced algorithms needed for efficiency

Clustering? Are the people clustered in this room?  How do we define a cluster? In spatial data mining we have objects/ entities with a location given by coordinates Cluster definitions involve distance between locations

Clustering - options Determine whether clustering occurs Determine the degree of clustering Determine the clusters Determine the largest cluster Determine the outliers

Co-location Are the men clustered? Are the women clustered? Is there a co-location of men and women?  co-location pattern

Co-location Like before, we may be interested in –is there co-location? –the degree of co-location –the largest co-location –the co-locations themselves –the objects not involved in co-location

Spatio-temporal data Locations have a time stamp Interesting patterns involve space and time Example here: time-stamped point set

Trajectory data Entities with a trajectory (time-stamped motion path) Interesting patterns involve subgroups with similar heading, expected arrival, joint motion,... n entities = trajectories; n = 10 – 100,000 t time steps; t = 10 – 100,000  input size is nt m size subgroup (unknown); m = 10 – 100,000

Trajectory data Migration patterns of animals Trajectories of tornadoes Tracking of (suspect) individuals for security Lifelines of people for social behavior

Example pattern in trajectories What is the location visited by most entities? location = circular region of specified radius

Example pattern in trajectories What is the location visited by most entities? location = circular region of specified radius 4 entities

Example pattern in trajectories What is the location visited by most entities? location = circular region of specified radius 3 entities

Example pattern in trajectories Compute buffer of each trajectory

Example pattern in trajectories Compute buffer of each trajectory Compute the arrangement of the buffers and the cover count of each cell 1

Example pattern in trajectories One trajectory has t time stamps; its buffer can be computed in O(t log t) time All buffers can be computed in O(nt log t) time The arrangement can be computed in O(nt log (nt) + k) time, where k = O((nt) 2 ) is the complexity of the arrangement Cell cover counts are determined in O(k) time

Example pattern in trajectories Total: O(nt log (nt) + k) time If the most visited location is visited by m entities, this is O(nt log (nt) + ntm) Note: input size is nt ; n entities, each with location at t moments

Patterns in entity data Spatial data n points (locations) Distance is important –clustering pattern Presence of attributes (e.g. male/female): –co-location patterns Spatio-temporal data n trajectories, each has t time steps Distance is time- dependent –flock pattern –meet pattern Heading and speed are important and are also time-dependent

Patterns in trajectories n trajectories, each with t time steps  n polygonal lines with t vertices

Patterns in trajectories Flock and meet patterns: large enough subset that has same “character” during a time interval –close to each other –same direction of motion –... Flock: changing location Meet: fixed location Determine the longest duration pattern

Patterns in trajectories Longest flock: given a radius r and subset size m, determine the longest time interval for which the m entities were within each other’s proximity (circle radius r) Time = longest flock in [ 1.9, 6.4 ] m = 3

Patterns in trajectories Computing the longest flock is NP-hard This remains true for radius cr approximations with c < 2 A radius 2 approximation of the longest flock can be computed in time O(n 2 t log n)... meaning: if the longest flock for radius r has duration , then we surely find a flock of duration   for radius 2r

Patterns in trajectories flock meet fixed subset m = 3 fixed radius

Patterns in trajectories Go into 3D (space-time) for algorithms time flockmeet

Patterns in trajectories Exact radius results flock meet NP-hard O(n 2 t 2 (n 2 log n + t))

Patterns in trajectories Approximate radius results flock meet O(n 2 t log n) O((n 2 t log n) / (m  2 )) factor 2 factor 1+ 

Patterns in trajectories Flock and meet patterns require algorithms in 3- dimensional space (space-time) Exact algorithms are inefficient  only suitable for smaller data sets Approximation can reduce running time with an order of magnitude

Summary There are many types of geographical analysis, it is the main task of a GIS Overlay and buffer analysis are most important Statistics is also very important Spatial and spatio-temporal data mining gives new types of analysis of geographic data