Presentation on theme: "Representation of spatial data"— Presentation transcript:
1Representation of spatial data GIS thematic layers, raster and vector, conversion, subdivision representation, continuous data: contours, DEMs, TINs
2Thematic map layersSeparate storage of data according to theme: map layers (or data layers)GIS typically use tens to hundreds of map layersFor example: municipality borders, land use, cadastral boundaries, water pipes, churches, etc.
5Geometry, topology and attributes Geometry: coordinatesTopology: adjacency relations of objectsAttributes: properties, valuesExample: Country map of South AmericaGeometry: coordinates of the borders Topology: which countries border which Attributes: names of countries, population, etc.
6Representation of geometry Two main approaches: raster and vectorCan also be mixed in a GIS, any map layerConversion raster-vector and vice versa possibleRepresentation depends on type of data, way of acquisition, desired operations, etc.
7Raster structureDivision of space into equal-size cells (squares, pixels)Theme gives cells a value (nominal, ordinal, interval, ratio, vector, …)Cells should not contain any further spatial information (more detail)
8Data in raster form Point object in raster form Line object in Plane object inraster form
10Raster: pros and cons Simple structure Simple operations Obtained after scanning, remote sensingLess suitable for point and line objects: representation does not follow intuitionNetwork analysis difficultNot adaptive: no difference in detail possible in different regionsEither expensive in memory, or little precisionNot obtained after digitizing
11Raster: memory reduction Run-length encoding: no 2-dim array but coding start pixel with value and length of runBlock encoding: 2-dim versionDisadvantage: makes structure and operations much more complex(34,67) forest 9(34,67) forest 4,6
12Vector structure Objects stored as points, lines and areas Points have coordinates; lines connect points; areas are delimited by linesAttributes are stored with the objects (point, line or areal)
13Vector: pros and consElegant structure; fits with both point, line and areal objectsSmall storage consumptionPreciseAdaptive: additional control points possibleNetwork and cluster analysis possibleObtained after digitizingRelatively complexMap overlay and buffer computation complex
14Vector representation of a region Not necessarily simply-connected:NL has islandsNL has holes (Baarle-Nassau / Baarle-Hertog); there are even regions in these holes
17Subdivisions: spaghetti model Every chain is represented by a list with coordinate pairsSplit nodes are doubly storedAreas are not present explicitlyC1C2C5C4C3C6C1: (..,..), (..,..), (..,..), ...C2: (..,..), (..,..), (..,..), ...C3: (..,..), (..,..), (..,..), ...
18Subdivisions: polygon ring structure Every area is represented by a list with coordinate pairsControl points are doubly storedNeighbor areas are difficult to determineConsistency is difficult to maintainP1P2Consistency here refers to the property that a set of polygons should form a partition of a region. If you change the boundary of one polygon, you must change the boundaries of one or more other polygons as well to maintain the property that the polygons form a subdivision. Technically, consistency is not difficult to maintain, but it is an implementation hassle and causes inefficiency of updates, because these other polygons must be retrieved and changed as well.P3P1: (..,..), (..,..), (..,..), ...P2: (..,..), (..,..), (..,..), ...P3: (..,..), (..,..), (..,..), ...
19Subdivisions: topological structure (node-link structure) Nodes are objects with coordinatesEdges are connections of nodesSequences of edges along polygon boundaries form cyclesPolygons are objects that can access their boundariesDoubly-connected edge list
20Subdivisions: topological structure Edges are split into directed half-edgesHalf-edges have pointers toTwin half-edgeOrigin vertexNext and Prev half-edges of incident polygonIncident polygonPolygons have pointers to half-edges, one in each bounding cycleOriginpolygonTwinPrevNextpolygon
21Subdivisions: topological chain structure Splitting nodes are objects with coordinatesChains are connections between splitting nodes and contain zero or more nodes with coordinatesSequences of chains along polygon boundaries form cyclesPolygons are objects that can access their boundarieshalf-chainsDoubly-connected chain list
22Vector structures Memory Duplication Polygon Topology retrieve retrieveSpaghetti Polygon ring DC edge list DC chain listTopology retrieve refers to the topology of the subdivision: region objects have a way to access adjacent regions efficiently. With a doubly-connected chain list, this comes down to following the chains and always looking at the other side of the chain for an adjacent region. This takes time linear in the number of adjacent regions, while for the doubly-connected edge list this takes time linear in the number of edges in the boundary of the region (which is much more). For the spaghetti and polygon ring structures, there is no easy access to adjacent regions and you have to traverse whole tables in the database.
23Raster-vector conversion E.g. for data integrationVector-to-raster: Like in computer graphics: scan-conversion of lines, etc.Raster-to-vector: Consider pixel sides between pixels with different values as boundary and put in vector representation Thinning, line simplification
25Line simplification Douglas-Peucker algorithm from 1973 Input: chain p1, …, pn and error p1pn
26DP-algorithm Draw line segment between first and last point If all points in between are within error: readyOtherwise, determine farthest point and recursively continue on the part until farthest point and the part after farthest point
27DP-algorithm DP-standard(i, j, ) Determine farthest point pk between pi and pjIf distance(pk, pi pj) > then DP-standard(i, k, )DP-standard(k, j, )Return the concatenation ofthe simplifications
30Properties of the DP-algorithm DP-algorithm does not minimize the number of points in the simplificationDP-algorithmOptimal
31Properties of the DP-algorithm Determining farthest point takes O(n) timeWhole algorithm takes T(n) = T(m) + T(n-m+1) + O(n), T(2) = O(1) time, splitting in m and n-m+1 points“Fair” split gives O(n log n) timeWorst case gives quadratic time
32Properties of the DP-algorithm DP-algorithm may give self-intersections in the outputSolution: test output for self-intersections and continue adding control points if necessary
33Improved DP-algorithm DP-improved(i, j, )Simp = DP-standard(i, j, )V = set of intersecting segments of Simp RepeatFor all segments s V: Refine(s) in Simp; do 1 refinement à la DP by adding the farthest point, giving a new Simp V = set of intersecting segments of Simp Until V is empty
34Continuous data representation Digital Elevation Model (DEM)Data on interval or ratio measurement scaleData values of points near by will usually be not very differentRepresentation is necessarily an approximation: finite representation of information with infinite detailRaster (1x) or vector (2x)
38Elevation modelsContour model well-suited for visualisation, not for representation or storageInterpretations grid: - elevation whole cel: not a continuous model - elevation middle cel: interpolation needed; how?Advantage grid: simple storage, operations simple tooAdvantage TIN: more efficient in storage, adaptive
39Interpolation for grid 20182018182218222018Linear interpolation; saddle point problem18222018201818221822Linear interpolation; additional point4= 19.5Non-linearinterpolation
40Topological TIN structure With explicit vertex and triangle representationt2wt3t1t1t2ttuvuwt3vx, y-coordinates and elevation
41Topological TIN structure With explicit vertex and triangle representationt2wt3t1t1t2ttuvuwt3vBecause t1 has pointers to two the same vertices as t, we can determine their shared edge, even though it is not represented explicitly
42Topological TIN structure With explicit vertex and triangle representationwwt1t2t2t1tuvtt3vut3
43Topological TIN structure Alternatively, edges have an explicit representation toowt1t2wt1te1e2e1e2ue3vt3tuve3
44Summary representation Objects have geometry and attributes, at least the attributes are in a databaseGeometry can be stored in raster or vector form; each has advantages and disadvantagesImportant geometric types of representations are those for subdivisions and for elevation modelsFor subdivisions, the doubly-connected chain list is the most suitable structureFor elevation models, grids or TINs are most useful