2GIS Concepts Information cycle: Data/Information/System/Information SystemGeographic Information SystemMain Components/CharacteristicsGeographic DatabaseData ModelingData RepresentationSpatial AnalysisImplementing a GIS
4Data / InformationInformation is the result of interpretation of relations existing between a certain number of single elements (called data).Example:The Museum located at 5th Avenue, NY, was built in 1898.Data: Museum, address, year of construction.
5SystemA system is a set organized globally and comprising elements which coordinate for working towards doing a result.Example: Water supply systemElements: pipes, valves, hydrants, water meters, pumps, reservoirs, etc.
6Information System (IS) An Information System is a set organized globally and comprising elements (data, equipment, procedures, users) that coordinate for working towards doing a result (information).
7GIS: “G” & “IS”Definition: A GIS is a collection of computer hardware and software, geographic data, methods, and personnel assembled to capture, store, analyze and display geographically referenced information in order to resolve complex problems of management and planning.A GIS is an IS with geographic data as input and geographic information as outputGeographic Data? Geographic Information?
9GIS Input Output User Interface Models Other GIS Reports Maps Geographic DataGeographic InformationInputGISOutputReportsMapsPhoto. ProductsStatisticsInput Data for modelsManipulationAnalysisMapsCensusField DataRS DataOthersDataCaptureDisplayStorageGIS ComponentsUserInterfaceModelsOther GIS
10GIS: Main Characteristics Integration of Multiple data:- Sources- Scales- FormatsGeographic DatabaseSpatial Analysis
11Data from multiple sources-at multiple scales-in multiple formats Census/ Tabular dataMapsPicture & MultimediaGPS/ air photos/ satellite images
12Referencing map features: Coordinate systems & map projections To integrate geographic data from many different sources, we need to use a consistent spatial referencing system for all data sets
13The Latitude/Longitude reference system latitude φ : angle from the equator to the parallellongitude λ : angle from Greenwich meridian
14Map ProjectionsCurved surface of the earth needs to be “flattened” to be presented on a map: Map ProjectionProjections are classified according to which properties they preserve: area, shape, angles, distanceSome distortion is inevitable:Less distortion if maps show only small areas, but large if the entire earth is shownProjection is the method by which the curved surface is converted into a flat representation.Projection is the method by which the curved surface is converted into a flat representation
15UTM: Universal Transverse Mercator Minimal distortions of area, angles, distance and shape at large and medium scalesVery popular for large and medium scale mapping (e.g., topographic maps)Cylindrical projection with a central meridian that is specific to a standard UTM zone60 zones around the worldOne cartographic reference system that desreves more detailed discussion is the UTM system
17The concept of scaleScale is the ratio between distances on a map and the corresponding distances on the earth’s surfacee.g., a scale of 1:100,000 means that 1 cm on the map corresponds to 100,000 cm or 1 km in the real worldSmall scale: small fraction such as 1:10,000,000 shows only large featuresLarge scale: large fraction such as 1:25,000 shows great detail for a small area“small scale” vs “large scale” often confused
18Multi-scale The same feature represented in different scales. Example: lakelarge scale map of 1:25,000 mayshow individual housessmaller scale map of 1:500,000shows only points representingvillagesLarge scale(1:25.000)Small scale1:
20Geographic Database Geographic Data Characteristics/Examples Definitions:Entity/Attribute/Dataset/DatabaseData ModelingSpatial representationVector/RasterTopology
21Descriptive Data vs Geographic Data Descriptive attributesGeographic Data:Spatial attributesLocationFormGeographic location is the element that distinguishes geographic information from all other other types.
22Geographic Data Characteristics : Position:explicit geographic referenceCartesian coordinates :X,Y,ZGeographic coordinates (lat, log)implicit geographic referenceAddressPlace-nameEtc.Geometric Form:ex: a polygon representing a parcel of landSpatial data describes the location and shape of geographic features, and their relationship to other features, andDescriptive data which characterizes the geographic features.
23Example1: Parcel of land Attribute (descriptive) DataLandownerAreaEtc.Spatial dataPositionLocated at 100 Nelson Mandela AveX= a; Y=b within system (X,Y)Formdimensions (sides and arcs, constituting a polygon)
25Spatial entityWe use the term entity to refer to a phenomenon that can not be subdivided into like units.Example: a house is not divisible into houses, but can be split into rooms.Others: a lake, a statistical unit, a school, etc.In database management systems, the collection of objects that share the same attributes.An entity is referenced by a single identifier, perhaps a place-name, or just a code numberEach spatial entity has one or more attributes that identify what the entity is, describe it, or represent some magnitude associated with the entity. Indeed, the type of analysis you plan to do depends on the type of attributes you are working with.Example: you can categorize roads by whether they are local roads, highways, etc; by their length; their width; their pavement; etc.
26AttributeEach spatial entity has one or more attributes that identify what the entity is, and describe it.Example: you can categorize roads by whether they are local roads, highways, etc; by their length; their width; their pavement; etc.The type of analysis you plan to do depends on the type of attributes you are working with.
27Dataset“A dataset is a single collection of values or objects without any particular requirement as to form of organization.”Example: Streets, rivers, cities, etc.
28Geographic Database“A geographic database is a collection of spatial data and related descriptive data organized for efficient storage, manipulation and analysis by many users.”It supports all the different types of data that can be used by a GIS such as:Attribute tablesGeographic featuresSatellite and aerial imagerySurface modeling dataSurvey measurementsFundamentally, a GIS is based on a structured database that describes the world in geographic terms.
29Data ModelingData Modeling is the process of defining (geographic features) to be included in the database, their attributes and relationships, and their internal representation in the Database. It involves the development of conceptual, logical and physical models of the geographic Database.The outcomes include a Data Dictionary
30Modeling Process Reality Modeling Geographic Database (data & treat.) Abstracting the Real WorldRealityModeling(data & treat.)We have real world as percieved by users > And we would build a physical data modelWe insert an intermediatestep: the CDMGeographicDatabase
31ANSI/SPARC: Study Group on Data Base Management Systems (1975) Different users have different views of the world“Real World”External Model 1External Model 2External Model 3Conceptual ModelLogical ModelPhysical Model
32Conceptual Model A synthesis of all external models (user’s views). Schematic representations of phenomena and how they are related.Information content of the database (not the physical storage) so that the same conceptual model may be appropriate for diverse physical implementations.Therefore, the conceptual model is independent from technology.The conceptual level corresponds to a synthesis of all external models.Although an abstraction of the real world, the conceptual model is consisting of schematic representations of phenomena and how they are related.The organization scheme created at this stage generally deals with only the information content of the database, not the physical storage, so that the same conceptual model may be appropriate for diverse physical implementations. Therefore, the conceptual model is independent from technology.
33Conceptual Model (cont.) Easy to readConceived for the analyst or designerObjective representation of the reality, therefore independently from the selected GDB SystemOne conceptual model for the Database
34Data Logical Model & Physical Model We transform the conceptual model into a new modeling level which is more computing oriented: the logical model (Example: the Relational Database approach)We transform the logical model into an internal model (physical model) which is concerned with the byte-level data structure of the database.Whereas the logical model is concerned with tables and data records, the physical model deals with storage devices, file structure, access methods, and locations of data.
35Several types of data organization Hierarchical model- Hierarchical relationships between data(parent- child)Network Model- Focus on connections (e.g. airline booking system)Relational model- Based on relations (tables)- True Relat. DBMS use SQLObject-Oriented model- Focus on ObjectsHierarchical: (Obselete).Hierarchical model- Hierarchical relationships between data (parent-child)Network Model- Focus on connections (e.g. airline booking systems)- Adv.: fast, efficient; Disadv.: inflexibleRelational model- Data is organized within Tables (files) and relationships expressed between tables and data elements- True Relational DBMS use the Structured Query Language (SQL)Object-Oriented model- Focus on Objects: efficient storage and retrieval- (Future + Web)Network Model: technically obselete
36Entity-relationship Formalism Entity nameAttributesENTITY_NAME1-attribute 1-attribute 2…ENTITY_NAME2-attribute 1-attribute 2…0-N0-1Identifier (key-attribute)Association (relationship)Maximum cardinality(indeterminable/any number)(0,N) refers to the cardinality of the relationship. It means that , for example, at a minimumMinimum cardinality(0,N) refers to the cardinality of the relationship
38The E/R diagram for land parcels STREET-nameABSEGMENT-numberPARCEL-number2-N0-11-23-N1-N2-2A: Streets have edges (segments)B: parcels have boundaries (segments)C: line have two endpointsD: parcels have owners, and people own land.CD2-N1-NPOINT-number-x,yLANDOWNER-name-date-of-birth
40Data Dictionary Definition: A data catalog that describes the contents of a database. Information is listed about each field in the attribute table and about the format, definitions and structures of the attribute tables.A data dictionary is an essential component of metadata information.
41Example Definition of entities RAIL: way of communication and transportationDefinition of attributesRAIL-ID: reference numbers for rail segmentsRAIL_CLASS: single track, double track, electrified, etc.RAIL_NAME: name for particular railwayExplanations for measurements of attributes (type of attribute values) or coding practicesRAIL-ID: INTEGERRAIL-NAME: CHARACTER, LONG=30
43EA database entities Street Number Name --- Admin. Unit AU AU_Pop. --- EA-codeAreaPop.BuildingsNumberHHsEtc.Crew leader areaCL-codeNameRO responsibleLandmark-----
44Example of RelationsEA entity can be linked to the entity crew leader area. The table for this entity could have attributes such as the name of the crew leader, the regional office responsible, contact information, and the crew leader code (CL code) as primary code, which is also present in the EA entity.REAEA-codeAreaPop.Crew leader areaCL-codeNameRO responsible1-11-N
45Entity: Enumeration areas Type (attributes)EA-code Area Pop CL-code… … …Identifier
46Components of a digital EA database Boundary database
47A Simpler AlternativeIn many countries, EA map design may be simpler than in this exampleInstead of a fully integrated digital base map in vector format, rasterized images of topographic maps may be used as a backdrop for EA boundariesIn some instances, map features may be more generalized, for instance by using only the centerlines for the streets and polygons for entire city blocks rather than for individual housesThis can include the use of free data as a baseline or starting point in the creation or updating of census related maps
49Two Fundamental Types of Data GIS work with two fundamentally different types of geographic informationVectorRaster (or Grid)Both types have unique advantages and disadvantagesA GIS should be able to handle both types
50Vector vs Raster or Discrete vs Continuous Riverx1,y1Raster-based analysis:Area of analysis divided into squares of uniform size.- Each cell characterizes the feature of interest within this area with a single value.Vector data:Coordiante-based data structure commonly used to represent linear features.Each feature is represented as a list of ordered x,y coordinatesxn,yn
51Raster DataA raster image is a collection of grid cells - like a scanned map or pictureRaster data is extremely useful for continuous data representationelevationslopemodeling surfacesSatellite imagery and aerial photos are commonly used raster data setsEasier data structureCertain analysis operations are more easily implementedGood for representing continuous variation (vegetation, countour lines, etc.)
52Vector Data Vector data are stored as a series of x,y coordinates Good for discrete data representationpoints: wells, town centroidslines: roads, rivers, contourspolygons: enumeration areas,districts, town boundaries, building footprints
53Raster-Vector conversion (“vectorization”) Computer algorithms are used to convert data of one type to the other:Vector --- Raster is trivialRaster ---- Vector is harder
55Vector: Points, lines, polygons Set of geometric primitives:pointslinespolygonsynodevertexx
56II I Vector Structure Spaghetti Topology Network (graph) On retrouve dans le modèle vecteur, plusieurs niveaux sémantiques: du plus structuré (topo) vers le plus simple (spaghetti)Modèle spaghettiModèle réseau (graphe quelconque)Modèle topologique
57Spaghetti File No Topology = raw file or ‘spagehetti file’ Lines not connected; have no ‘intelligence’
59TopologyData structure in which each point, line and piece or whole of a polygon :“knows” where it is“knows” what is around it“understands” its environment“knows” how to get aroundHelps answer the question what is where?
60Topology: Spatial Relationships Left Polygon = ARight Polygon = BNode 1 = Chains A,B,CChain A is connected to chains B & CPolygon B Contained within polygon AAdjacencyConnectivityContainment
61Example of Topological data structure Node X Y LinesI ,2,4II ,5,6III ,3,5IV ,3,61Poly LinesA ,4,5B ,4,6C ,5,66A5IIIIII4453From To Left RightLine Node Node Poly PolyI III O AI IV B OIII IV O CI II A BII III A CII IV C B2BC36IV12O = “outside” polygon
64Comparison Advantages: Spaghetti Topology Set of independent objects Representation of heterogonous objects within the same modelAppropriate to CADPre-calculation of topological relationsMaintenance of topological constraintscorrespondence with exchange formats
65…cont. Disadvantages: Spaghetti Topology Spatial Relationships calculatedRisk of incoherence (duplication of common boundaries)High cost of up-to-dateMany levels of indirections for complex objectsMaintenance
66Some well known Topological models TIGER: Topologically Integrated Geographic Encoding and Referencing (Census Bureau of the USA)Line is the principal element to which are related points and area featuresARC/INFO model: ESRIPoint, Line, Polygon
67Zip Codes Cities Voting Districts Census Tracts Counties MCD’s TIGER Data: PolygonZip CodesCitiesVoting DistrictsCensus TractsCountiesMCD’sBlock Groups
70Recapitulation on spatial models Transformations between models:“vectorization” of raster images (costly)topology toward spaghetti (easy)spaghetti toward topology (possible but costly)The vector model most used, essentially topology; it’s useful to integrate raster and vector
71Spatial Analysis: Query select features by their attributes:“find all districts with literacy rates < 60%”select features by geographic relationships“find all family planning clinics within this district”combined attributes/geographic queries“find all villages within 10km of a health facility that have high child mortality”Query operations are based on the SQL (Structured Query Language) concept
72Examples:What is at…?Features that meet a set of criteria
73Spatial Analysis (cont.) Buffer: find all settlements that are more than 10km from a health clinicPoint-in-polygon operations: identify for all villages into which vegetation zone they fallPolygon overlay: combine administrative records with health district dataNetwork operations: find the shortest route from village to hospital
74Modeling/Geoprocessing modeling: identify or predict a process that has created or will create a certain spatial patterndiffusion: how is the epidemic spreading in the province?interaction: where do people migrate to?what-if scenarios: if the dam is built, how many people will be displaced?
75Spatial relationships Logical connections between spatial objects represented by points, lines and polygonse.g.,- point-in-polygon- line-line- polygon-polygonFundamental Questions addressed by GISWhat is in a particular place?Find the position of some objectWhat is the area of, the distance from specific object?Find patterns- by looking at the distribution of features on the map instead of just an individual featureCombination of data layersPoint in polygon operationLine in polygon operationPolygon overlay
79“is near to”: Buffer Operations Point bufferAffected area around a polluting facilityCatchment area of a water source
80Buffer Operations Line buffer How many people live near the polluted river?What is the area impacted by highway noise
81Buffet Operations Polygon buffer Area around a reservoir where development should not be permitted
82“ is within”: point in polygon Which of the cholera cases are within the containment area
83Problem: We may have a set of point coordinates representing clusters from a demographic survey and we would like to combine the survey information with data from the census that is available by enumeration areas.Solution:“Point-in-Polygon” operation will identify for each point the EA area into which it falls and will attach the census data to the attribute record of that survey point.
86Data LayersA GIS combines layers of information about a place to give you a better understanding of that place. What layers of information you combine depends on your purpose.GIS uses geography, or space, as the common key element between datasets. Information is linked only if it relates to the same geographic area.- Data Layers: space as an indexing system.- A GIS combines layers of information about a place to give you a better understanding of that place. What layers of information you combine depends on your purpose.
87Spatial aggregation Example of Spatial aggregation: fusion of many provinces constituting an economic region
88Spatial data transformation: interpolation Example 1: Based on a set of station precipitation surface estimates, we can create a raster surface that shows rainfall in the entire region13.520.126.027.212.715.924.526.1
90Implementing a GIS Consider the strategic purpose Plan for the planningDetermine technology requirementsDetermine the end productsDefine the system scopeCreate a data designChoose a data modelDetermine system requirementsAnalyze benefits and costsMake an implementation planSource: Thinking About GIS, Third Edition Geographic Information System Planning for Managers
91GIS: Enables us to handle very large amounts of data Example: census data– thousands of EAs– hundreds of variables– many complementary data layers(roads, rivers, public facilities)Example: remote sensing– satellites send huge amounts of datathat need to be processed, interpretedand stored
92GIS: Helps to make data re-usable and useful to many more users Census geography– EA maps do not have to be redrawnevery time, only updated– census information can be used formany more applications– data sharing among agencies
93In Conclusion GIS for inventory/visualization GIS creates maps from data pulled from databases anytime to any scale for anyoneGIS for database managementGIS for spatial analysis/modelingGIS a tool to query, analyze, and map data in support of the decision making process.GIS can be viewed three ways:The Database ViewThe Map ViewThe Model ViewImprove Organizational IntegrationA GIS can link data sets together by common locational data, such as addresses, which help departments and agencies share their data. By creating a common database, data can be collected once and used many times.
94What is Not GIS GPS – Global Positioning System …not just software! …not just for making maps!Maps are an input data to and a “product” of a GISA way to visualize the analysis
95Literature related to Census Mapping & GIS US National Research Council:Tools and Methods for EstimatingPopulations At RiskDavid Martin (1996)Geographic Information Systems:Socioeconomic ApplicationsLongley and al, Wiley (2005)Geographic Information Systems andScience, second editionESRI Press:Unlocking the Census with GISMapping the Census 2000
96Contact Information:Demographic Statistics SectionUN Statistics DivisionNew York
97Compromise projections Comoromise projections: Do not preserve any proprety but reprsent a good compromise between the different objectives. E.g. Robibson’s projection of the world