Chapters 9 and 10, Longley et al. Data Bases: Population and Maintenance Geog 176B Lecture 8
Data Collection One of most expensive GIS activities Many diverse sources (source integration, data fusion, interoperability) Two broad types of collection Data capture (direct collection) Data transfer Two broad capture methods Primary (direct measurement) Secondary (indirect derivation)
Stages in Data Collection Projects PlanningPreparation Digitizing / Transfer Editing / Improvement Evaluation
Data Collection Techniques RasterVector Primary Digital remote sensing images GPS measurements Digital aerial photographs Survey measurements Secondary Scanned mapsTopographic surveys DEMs from mapsToponymy data sets from atlases
Primary Data Capture Capture specifically for GIS use Raster – remote sensing e.g. SPOT and IKONOS satellites and aerial photography Passive and active sensors Resolution is key consideration Spatial Spectral Temporal
Imagery for GIS
Vector Primary Data Capture Surveying Locations of objects determines by angle and distance measurements from known locations Uses expensive field equipment and crews Most accurate method for large scale, small areas GPS Collection of satellites used to fix locations on Earth’s surface Differential GPS used to improve accuracy
Total Station
Pen/Portable PC and GPS
Secondary Geographic Data Capture Data collected for other purposes can be converted for use in GIS Raster conversion Scanning of maps, aerial photographs, documents, etc Important scanning parameters are spatial and spectral (bit depth) resolution
Scanner
Raster to vector conversion
Vector Secondary Data Capture Collection of vector objects from maps, photographs, plans, etc. Digitizing Manual (table) Heads-up and vectorization Photogrammetry – the science and technology of making measurements from photographs, etc.
Digitizer
Data Transfer Buy vs. build is an important question Many widely distributed sources of GI Includes geocoding Key catalogs include Geodata.gov Geography Network Access technologies Translation Direct read
Managing Data Capture Projects Key principles Clear plan, adequate resources, appropriate funding, and sufficient time Fundamental tradeoff among Quality, accuracy, speed and price Two strategies Incremental ‘Blitzkrieg’ Alternative resource options In house Specialist external agency
Map scaleGround distance corresponding to 0.5 mm map distance 1: cm 1: m 1: m 1:10,0005 m 1:24,00012 m 1:50,00025 m 1:100,00050 m 1:250, m 1:1,000, m 1:10,000,0005 km A useful rule of thumb is that positions measured from maps are accurate to about 0.5 mm on the map. Multiplying this by the scale of the map gives the corresponding distance on the ground.
Positional Accuracy (cont.) within a database a typical UTM coordinate pair might be: Easting m Northing m If the database was digitized from a 1:24,000 map sheet, the last four digits in each coordinate (units, tenths, hundredths, thousandths) would be questionable
Testing Positional Accuracy Use an independent source of higher accuracy: find a larger scale map use precision GPS Use internal evidence: digitized polygons that are unclosed, lines that overshoot or undershoot nodes, etc. are indications of error sizes of gaps, overshoots, etc. may be a measure of positional accuracy
Testing Accuracy (cont.) Compute accuracy from knowledge of the errors introduced by different sources e.g., 1 mm in source document 0.5 mm in map registration for digitizing 0.2 mm in digitizing if sources combine independently, we can get an estimate of overall accuracy...
Definitions Database – an integrated set of data (attributes) on a particular subject Geographic (=spatial) database - database containing geographic data of a particular subject for a particular area Database Management System (DBMS) – software to create, maintain and access databases
A GIS links attribute and spatial data Attribute Data Flat File Relations Map Data Point File Line File Area File Topology Theme
Advantages of Databases over Files Avoids redundancy and duplication Reduces data maintenance costs Faster for large datasets Applications are separated from the data Applications persist over time Support multiple concurrent applications Better data sharing Security and standards can be defined and enforced
Disadvantages of Databases over Files Expense Complexity Performance – especially complex data types Integration with other systems can be difficult
Types of DBMS Model Hierarchical Network Relational - RDBMS Object-oriented - OODBMS Object-relational - ORDBMS
Relational Databases rule now
Characteristics of DBMS (1) Data model support for multiple data types e.g MS Access: Text, Memo, Number, Date/Time, Currency, AutoNumber, Yes/No, OLE Object (MS Object linking and embedding), Hyperlink, Lookup Wizard Load data from files, databases and other applications Index for rapid retrieval
Characteristics of DBMS (2) Query language – SQL Security – controlled access to data Multi-level groups (e.g. census, NGA) Controlled update using a transaction manager Versioning Backup and recovery
Characteristics of DBMS (3) Applications Forms builder Reportwriter Internet Application Server CASE tools Programmable API (Applications program interface)
Geographic Information System Database Management System Data load Editing Visualization Mapping Analysis Storage Indexing Security Query Data System Task Role of DBMS
Relational DBMS (1) Data stored as tuples (tup-el), conceptualized as tables Table – data about a class of objects Two-dimensional list (array) Rows = objects Columns = object states (properties, attributes)
Table Row = object Vector feature Column = attribute
Relational DBMS (2) Most popular type of DBMS Over 95% of data in DBMS is in RDBMS Commercial systems IBM DB2 Informix Microsoft Access Microsoft SQL Server Oracle Sybase
SQL Structured (Standard) Query Language – (pronounced SEQUEL) Developed by IBM in 1970s Now de facto and de jure standard for accessing relational databases Three types of usage Stand alone queries High level programming Embedded in other applications
Types of SQL Statements Data Definition Language (DDL) Create, alter and delete data CREATE TABLE, CREATE INDEX Data Manipulation Language (DML) Retrieve and manipulate data SELECT, UPDATE, DELETE, INSERT Data Control Languages (DCL) Control security of data GRANT, CREATE USER, DROP USER
Relational Join Fundamental query operation Occurs because Data created/maintained by different users, but integration needed for queries Table joins use common keys (column values) Table (attribute) join concept has been extended to geographic case
Join Record ID Address #cars State St Main St Elm St Pine Drive Ford Subaru Honda State St. Ford State St. Subaru State St. Honda Elm St. Kia
Spatial indexing Many maps tiled B-tree (Balanced) Grid indexing Quad tree: Points/regions R-tree (Based on MBR)
New global/spatial grids: QTM
Go2 Grids 38:53:22.08N 077:02:06.86W US.DC.WAS US.CA.SBA.UCSB.UCEN
Spatial Search: Gateway to Spatial Analysis Overlay is a spatial retrieval operation that is equivalent to an attribute join. Buffering is a spatial retrieval around points, lines, or areas based on distance.