城市空间信息技术 第十章 数据探查 胡嘉骢 不动产学院 博士 副教授 城市规划系主任 手机 : ( ) QQ:
2 CHAPTER 11 DATA EXPLORATION 11.1 Data Exploration 数据探查 11.2 Attribute Data Query 属性数据查询 11.3 Spatial Data Query 空间数据查询 11.4 Raster Data Query 栅格数据查询 11.5 Geographic Visualization 地理可视化
3 CHAPTER 11 DATA EXPLORATION Beginning of GIS analysis What do you do with a database of dozens of layers and hundreds of attributes? Data exploration allows you to examine trends, focus on relationships Better understand data Link maps, graphs, and tables
Data Exploration Exploratory data analysis –Statistical analysis Dynamic graphics Data visualization –Finding Gestalt 完形 (finding patterns and properties in a data set) –Posing (形成) queries –Making comparisons
Descriptive Statistics Summarize values of a data set –Range –Median –Mean –Mode –Quantile analysis –Variance –Standard deviation –Z score GIS packages offer descriptive statistics
Graphs Visual display of data Numerous possibilities
7 Figure 11.1 A line graph. 折线 图
8 Figure 11.2 A histogram. 柱状图
9 Figure 11.3 A cumulative distribution graph. 累积 分布状况图
10 Figure 11.4 A scatterplot ( 散点 图 ) plotting % persons 18 years old in 2000 against % population change, 1990–2000. A weak positive relationship, with a correlation coefficient of 0.376, is present between the two variables.
11 Figure 11.5 A bubbleplot showing % population change, 1990–2000, along the x- axis; % persons under 18 years old in 2000 along the y-axis; and state population in 2000 by the bubble size Graphs
12 Figure 11.6 A boxplot based on the % population change, 1990– 2000, data set.
13 Figure 11.7 Boxplot (a) suggests that the data values follow a normal distribution. Boxplot (b) shows a positively skewed distribution with a higher concentration of data values near the high end. The x’s in (b) may represent outliers, which are more than 1.5 box lengths from the end of the box. Boxplot (c) shows a negatively skewed distribution with a higher concentration of data values near the low end.
14 Figure 11.8 A QQ plot plotting % population change, data value against the standardized value from a normal distribution Graphs
15 Figure 11.9 A 3-D plot showing annual precipitation at 105 weather stations in Idaho. A north to south decreasing trend is apparent in the plot.
Dynamic Graphics Graphs displayed in multiple and dynamically linked windows Directly manipulate data points –Pose query in one window and get response in another window Multiple linked windows optimal framework for posing queries
17 Brushing Figure The scatterplot on the left is dynamically linked to the map on the right. The “brushing” of two data points in the scatterplot highlights the corresponding states (Washington and New Mexico) on the map.
18 Other Dynamic Graphic Manipulation Methods Rotation Deletion Transformation
Data Exploration and GIS Similar to exploratory data analysis in statistics, with tow differences –In GIS it involves both spatial and attribute data –Media for data exploration in GIS involves maps and map features
Attribute Data Query Search attribute data in order to retrieve a data subset Selected subset can be examined in a table, displayed in charts, or linked to map features Expressions which must be interpretable by the GIS
SQL (Structured Query Language) Data query language designed for relational databases Designed by IBM in the 1970s and used by many commercial database management systems
22 SQL Structure (Syntax) select from where select keyword selects fields from selects tables where specifies the condition or criterion for data query
23 Figure PIN relates the owner and parcel tables and allows use of SQL with both tables SQL
24 SQL Examples Queries sale date of parcel coded P101 select Parcel.Sale_date from Parcel where Parcel.PIN = ‘P101’
25 SQL Examples Queries parcels larger than 2 acres that are zoned commercial select Parcel.PIN from Parcel where Parcel.Acres > 2 AND Parcel.Zone_code = 2
26 SQL Examples Queries sale date of parcel owned by Costello select Parcel.Sale_date from Parcel, Owner where Parcel.PIN = Owner.PIN AND Owner_name = ‘Costello’ This query involves two tables which must be joined first
Query Expressions where expression contains Boolean expressions and Boolean connectors
28 Boolean Expressions Contains two operands and a logical operator Parcel.PIN = ‘P101’ Operators include =,, >=,
29 Boolean Connectors AND, OR, XOR, NOT Used to connect two or more expressions
30 Figure The shaded portion represents the complement of data subset A (top), the union of data subsets A and B (middle), and the intersection of A and B (bottom).
Type of Operation Select a subset and divide the data into two groups –Those containing the selected records –Those containing the unselected records Three types of operations –Add more records –Subtract records –Select smaller subset
32 Figure Three types of operation may be performed on the subset of 40 records: add more records to the subset (+2), remove records from the subset (-5), or select a smaller subset (20).
Examples of Query Operations Select a data subset and add more records to it Select a data subset and switch selection Select a data subset and select a smaller subset from it
Relational Database Query Relational database often consists of many tables. A relational database query selects overlapping records from all tables Must understand the structure of the database Can either join or relate the tables
35 Figure The keys relating three dBASE files in the MUIR database and the soil attribute table.
Spatial Data Query Retrieving data subset from a layer by working directly with features Select features using cursor, graphic, or spatial relationship between features. Results can be displayed on a map, linked to records in a table, displayed in charts, or saved as a new data set for further processing
Feature Selection by Cursor Pointing and selecting or by dragging a box around the map features
Feature Selection by Graphic Uses a graphic, such as a circle, box, line or polygon to select features that fall inside or are intersected by the graphic Examples: selecting restaurants within a one- mile radius of a hotel, selecting land parcels that intersect a proposed highway, or finding owners of land parcels within a proposed nature reserve
39 Figure Select features by a circle centered at Sun Valley.
Feature Selection by Spatial Relationship Select features based on their spatial relationship to other features In same layer or in different layers Containment, intersect, proximity
41 Containment Select features that fall completely within features for selection Schools within a particular county, state parks within a particular state
42 Intersect Select features that intersect other features Selecting land parcels that intersect a proposed road, urban areas that intersect a fault line
43 Proximity Select features within a specified distance of other features State parks within ten miles of an interstate highway Adjacency - when features to be selected and selection features share common boundary
Combining Attributes and Spatial Data Queries When data exploration requires both attribute and spatial query Gas stations within one mile of freeway exits and have an annual revenue exceeding $2 million
Raster Data Query Concept and some methods same as for vector data query Practical differences
Query by Cell Value Operand (运算对象) is raster itself rather than a field, as in vector query Boolean statement to separate cells that satisfy the query statement from those that do not
47 Figure Raster data query: slope = 2 and aspect = 1. Selected cells are coded 1 and others 0 in the output raster.
Query by Select Features Query by using feature such as points, circles, boxes, or polygons
Geographic Visualization Cartographic visualization Using maps to process visual information Data classification, spatial aggregation, map comparison
Data Classification Groups based on statistics
51 Figure Two classification schemes: above or below the national average (a), and mean and standard deviation (SD) (b).
Spatial Aggregation Groups data spatially
53 Figure Two levels of spatial aggregation: by state (a), and by region (b).
Map Comparison Compare data from different layers to examine relationships
55 Figure An example of map comparison. Deer relocations tend to be concentrated along the clear-cut/old forest edge.
56 Other Options Place all layers on a screen and view them one at at time Use set of adjacent views Use map symbols to show multiple data sets
57 Figure A bivariate map: (1) rate of unemployment in 1997, either above or below the national average, and (2) rate of income change, 1996–1998, either above or below the national average.
谢 谢!