Presentation on theme: "Data Quality GiGo: garbage in, garbage out"— Presentation transcript:
1 Data Quality GiGo: garbage in, garbage out ‘Cos it’s in the computer, don’t mean it’s rightIt’s not the things you don’t know that matter, it’s the things you know that aren’t so.Will Rogers, Famous Okie GI specialist“But there are also unknown unknowns: the ones we don't know we don't know.” Donald Rumsfeld“Fast is fine, but accuracy is everything.” Wyatt Earp
2 Horwood’s Short Laws on Data Dr. Edgar Horwood, founder of the Urban and Regional Information Systems Association (URISA) and Professor of Civil Engineering and Urban Planning at the University of Washington was an early pioneer of computer mapping in the early 1960s.Good data are the data you already have.Bad data drives out good.The data you have for the present crisis was collected to relate to the previous one.The respectability of existing data grows with elapsed time and distance from the source of the data.Data can be moved from one office to another but cannot be created or destroyed.If you have the right data, you have the wrong problem; and vice versa.The important thing is not what you do but how you measure it.In complex systems there is no relationship between the information gathered and the decision made.The acquisition of knowledge from experience is an exception.Knowledge grows at half the rate at which academic courses proliferate.For more information, go to:
3 Data Quality: How good is your data? Scaleratio of distance on a map to the equivalent distance on the earth's surfacePrimarily an output issue; at what scale do I wish to display?Precision or Resolutionthe exactness of measurement or descriptionDetermined by input; can output at lower (but not higher) resolutionAccuracythe degree of correspondence between data and the real worldFundamentally controlled by the quality of the inputLineageThe original sources for the data and the processing steps it has undergoneCurrencythe degree to which data represents the world at the present moment in timeDocumentation or Metadatadata about data: recording all of the aboveStandardsCommon or “agreed-to” ways of doing thingsData built to standards is more valuable since it’s more easily shareable
4 Scaleratio of distance on a map, to the equivalent distance on the earth's surface.Large scale -->large detail, small area covered (1”=200’ or 1:2,400)Small scale -->small detail, large area (1:250,000)A given object (e.g. land parcel) appears larger on a large scale mapscale can never be constant everywhere on a map ‘cos of map projectionproblem is worst for small scale maps & certain projections (e.g. mercator)can be true from a single point to everywherecan be true along a line , or a set of lineson large scale maps, adjustments often made to achieve ‘close to true’ scale everywhere (e.g State Plane and UTM systems)scale representationVerbal: (good for interpretation.) 0ne inch each equals one statute milerepresentative fraction (RF) 1: 63,360 (good for measurement) (smaller fraction=smaller scale:1:2,000,000 smaller than 1:2,000)scale bar: (good if enlarged/reduced)use them allon a map!Miles12
5 Scale Examples Common Scales Large versus Small large: above 1:12,500 1: (1”=16.8ft)1:2, (1”=56 yards; 1cm=20m)1:20, (5cm=1km)1:24, (1”=2,000ft)1:25, (1cm=.5km)1:50, (2cm=1km)1:62, (1.6cm=1km; 1”=.986mi)1:63, (1”=1mile; 1cm=.634km)1:100,000 (1”=1.58mi; 1cm=1km)1:500,000 (1”=7.9mi; 1cm=5km)1:1,000,000(1”=15.8mi; 1cm=10km)1:7,500,000(1”=118mi); 1cm=750km)Large versus Smalllarge: above 1:12,500medium: 1:13, :126,720small: :130, :1,000,000very small: below 1:1,000,000( really, relative to what’s available for a given area; Maling 1989)Map sheet examples:1:24,000: 7.5 minute USGS Quads(17 by 22 inches; 6 by 8 miles)1:7,500,000 US wall map(26 by 16 inches)1:20,000,000: US 8.5” X 11”
6 Scale, Resolution & Accuracy in GIS Systems On paper maps, scale is hard to change, thus it generally determines resolution and accuracy--and consistent decisions are made for these.A GIS is scale independent since output can be produced at any scale, irrespective of the characteristics of the input data— at least in theoryin practice, an implicit range of scales or maximum scale for anticipated output should be chosen and used to determine:what features to showmanholes only on large scale mapshow features will be representedmanhole a polygon at 1:50; cities a point at 1:1,000,000appropriate levels for accuracy and precisionLarger scale generally requires greater resolutionLarger scale necessitates a higher level of accuracyGIS also helps with the the generalization problem implicit in paper mapsA road drawn with 0.5 mm wide line (the smallest for decent visibility)At 1:24,000 implies the road is 12 meters (36 feet) wideAt 1:250,000 implies the road is 125 meters (375 feet) wideAt least in a GIS you can store the true road width, but be careful with plots!
7 Precision or Resolution it’s not the same as scale or accuracy! Precision: the exactness of measurement or descriptionthe “size” of the “smallest” feature which can be displayed, recognized, or describedCan apply to space, time (e.g. daily versus annual), or attribute (douglas fir v. conifer)for raster data, it is the size of the pixel (resolution)e.g. for NTGISC digital orthos is 1.6ft (half meter)raster data can be resampled by combining adjacent cells;this decreases resolution but saves storageeg 1.6 ft to 3.2 ft (1/4 storage); to 6.4 ft (1/16 storage)resolution and scalegenerally, increasing to larger scale allows features to be observed better and requires higher resolutionbut, because of the human eye’s ability to recognize patterns, features in a lower resolution data set can sometimes be observed better by decreasing the scale (6.4 ft resolution shown at 1:400 rather than 1:200)resolution and positional accuracyyou can see a feature (resolution), but it may not be in the right place (accuracy)higher accuracy generally costs much more to obtain than higher resolutionaccuracy cannot be greater (but may be much less) than resolution (e.g. if pixel size is one meter, then best accuracy possible is one meter)1.6ft3.2ft
8 Accuracy: rests on at least four legs, not one! Positional Accuracy (sometimes called Quantitative accuracy)Spatialhorizontal accuracy: distance from true locationvertical accuracy: difference from true heightTemporalDifference from actual time and/or dateAttribute Accuracy or Consistency-- the validity concept in experimental design/stat. inf.a feature is what the GIS/map purports it to bea railroad is a railroad, and not a roadA soil sample agrees with the type mappedCompleteness--the reliability concept from experimental design/stat. inf.Are all instances of a feature the GIS/map claims to include, in fact, there?Partially a function of the criteria for including features: when does a road become a track?Simply put, how much data is missing?Logical Consistency: The presence of contradictory relationships in the databaseNon-SpatialSome crimes recorded at place of occurrence, others at place where report takenData for one country is for 2000, for another its for 2001Annual data series not taken on same day/month etc. (sometimes called lineage error)Data uses different source or estimation technique for different years (again, lineage)Overshoots and gaps in road networks or parcel polygons
9 Sources of Error Error is the inverse of accuracy Sources of Error Error is the inverse of accuracy. It is a discrepancy between the coded and actual values.SourcesInherent instability of the phenomena itselfE.g. Random variation of most phenomena (e.g. leaf size)MeasurementE.g. surveyor or instrument errorModel used to represent dataE.g. choice of spheroid, or classification systemsData encoding and entryE.g. keying or digitizing errorsData processingE.g. single versus double precision; algorithms usedPropagation or cascading from one data set to anotherE.g. using inaccurate layer as source for another layerExample for Positional Accuracychoice of spheroid and datumchoice of map projection and its parametersaccuracy of measured locations (surveying) of features on earthmedia stability (stretching ,folding, wrinkling of maps, photos)human drafting, digitizing or interpretation errorresolution &/or accuracy of drafting/digitizing equipmentThinnest visible line: millimetersAt scale of 1:20,000 = feet(20,000 x 0.2 = 4,000mm = 4m = 12.8 feet)registration accuracy of ticsmachine precision: coordinate rounding error in storage and manipulationother unknown
10 Measurement of Positional Accuracy usually measured by root mean square error: the square root of the average squared errorsUsually expressed as a probability that no more than P% of points will be further than S distance from their true location.Loosely we say that the rmse tells us how far recorded points in the GIS are from their true location on the ground, on average.More correctly, based on the normal distribution of errors, 68% of points will be rmse distance or less from their true location, 95% will be no more than twice this distance, providing the errors are random and not systematic (i.e. the mean of the errors is zero)e.g. for NTGISC digital orthos RMSE is 3.2 feet (one meter)for USGS Digital Ortho Quads RMSE spec. is approx. 33 feet or 10 meters (but in reality much better)-- with GPS, height is 2 or 3 times less accurate in practice at high precision than horizontal (officially the spec is 1.5, but data collection errors affect vertical the most)e12 + e22 + e en2n-1rmse =where ei is the distance (horizontally or vertically )between the tue location of point ion the ground, and its location represented in the GIS.
11 National Map Accuracy Standards: 1941/47 established in 1941 by the US Bureau of the Budget (now OMB) for use with US Geological Survey maps (Maling, 1989, p. 146)horizontal accuracy: not more than 10% of tested, ‘well defined’ points shall be more than the following distances from their true location:1:62,500: 1/50th of an inch (.02”)1:24,000: 1/40th of an inch (amended to 1/50=.02” in 1947)1:12,000: 1/30 of an inch (.033”)Thus, on maps with a scale of 1:63,360 (1”=1 mile) 90%of points should be within feet [(63360 X .02)/12)] of their true location.on USGS quads with a scale of 1:24,000 (1”=2,000ft) 90% of points should be within 40 feet [(24,000 X .02)/12] of their true location.on a map with a scale of 1:12,000 (1”=1,000ft), 90% of points should be within 33 feet (1,000 X .033), approx. 10 metersgives rise to the loose, but often used, statement that the “NMAS is 10 meters”Inadequate for the computer agehow many points? how select?how determine their ‘true’ locationwhat about attribute completeness?Unfortunately, the “new standard” doesn’t address all these issues either1:20,0001/50=.02”1/30=.033”Smaller scaleLarger scale
12 National Standard for Spatial Data Accuracy (NSSDA) 1998 Geospatial Positioning Accuracy Standard (FGDC-STD-007)Part 3, National Standard for Spatial Data Accuracy FGDC-STD“replacement” for National Map Accuracy Standard of 1941/47specifies a statistic and testing methodology for positional (horizontal and vertical) accuracy of maps and digital datano single threshold metric to achieve (as with old Standard), but users encouraged to establish thresholds for specific applicationsaccuracy reported in ground units (not map units as in 1941 standard [1/30th inch])testing method compares data set point coordinate values with coordinate values from a higher accuracy source for readily visible or recoverable ground pointsaltho. uses points, principles apply to all geospatial data including point, vector and raster objectsother standards for data content will adopt NSSDA for particular spatial objectscopies of the standard available at:Accuracy Standard has 7 parts, of which parts 4-7 apply to specific data types
13 GPS and Positional Accuracy Global Positioning System satellite positioning with WAAS (wide area augmentation system) adjustment gives positional accuracy within about 3 meters (10ft).This is more accurate than most printed maps and nautical charts!It is also more accurate than most digital maps and charts since these often derive from paper maps and surveys conducted prior to GPSYour integrated GPS/digital chart can show you nicely heading down the center of a channel, but positional inaccuracy in the chart can leave you grounded!According to chartIn reality
14 Summary: Resolution, Scale, Accuracy & Storage: illustrating the relationship Largest (maximum) scale for given pixel size.Storage is for USGS 7.5 quad. area(in Texas, USGS quad is about 7 mi x 8.5 mi=60 sq. miles--16 quads for Dallas County)Source: GPS Technology Corporation
15 Go to quality_graphics.ppt Examples of AccuracyGo to quality_graphics.ppt
16 Lineageidentifies the original sources from which the data was deriveddetails the processing steps through which the data has gone to reach its current formBoth impact its accuracyBoth should be in the metadata, and are required by the Content Standard for Metadata (see below)Michael Goodchild ( the guru of GIS) advocates:Measurement-based GIS, in which how data collected and how measurements made are a part of the record (as in surveying)Coordinate-based GIS, is the current approach, and it tracks none of this.(see Shi, Fisher and Goodchild Spatial Data Quality London: Taylor and Frances, 2002)
17 Currency: Is my data “up-to-date”? data is always relative to a specific point in time, which must be documented.there are important applications for historical data (e.g. analyzing trends), so don’t necessarily trash old data“current” data requires a specific plan for on-going maintenancemay be continuous, or at pre-defined points in time.otherwise, data becomes outdated very quicklycurrency is not really an independent quality dimension; it is simply a factor contributing to lack of accuracy regardingconsistency: some GIS features do not match those in the real world todaycompleteness: some real world features are missing from the GIS databaseMany organizations spend substantial amounts acquiring a data set without giving any thought to how it will be maintained.
18 Standards: common “agreed-to” ways of doing things May exist for:Data itself [including process (the way it’s produced) and product (the outcome)]Utilities Data Content Standard, FGDC-STD Accuracy of dataGeospatial Positioning Accuracy Standard, Part 3, National Standard for Spatial Data Accuracy, FGDC-STD Documentation about the data (metadata)Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD Transfer of data and its documentationSpatial Data Transfer Standard (SDTS), FGDC-STD-002For symbology and presentationDigital Geologic Map Symbolization May address:Content (what is recorded)Format (how it’s recorded: file format, .tif, shapefile, etc)May be a product of:An organization’s internal actions [private or organization standards]An external government body (Federal Geographic Data Committee) or third sector body (Open GIS Consortium) [public or de jure standards]Laissez-faire market-place-forces leading to one dominant approach e.g. “Wintel standard” [industry or de facto standards]
19 Who Sets Public Standards ? Federal Geographic Data CommitteeSets standards for geospatial data which all federal agencies are required to followHas representatives from most federal agenciesNational Institute for Standards and Technology (NIST) sets federal gov. standards for other things (e.g. IT in general)national standards bodiesAmerican National Standards Institute (ANSI)has the US’s single vote at ISOUnited States InterNational Committee on Information Technology Standards (INCITS) handles IT standards for ANSISeveral FGDC standards been submitted for approvalMost countries in the world have their equivalent to ANSIinternational standards bodiesISO (International Organization for Standardization)other assorted vendor groups, professional associations, trade associations, and consortiaOpen GIS Consortium (OGC) is the main player in GIS
20 The Process for Setting de jure standards! Source: URISA News Issue 197, Sept/Oct. 2003Go to the following web site for excellent overview of standard making: process
21 Adopting Standards: What you should do Data quality achieved by adoption and use of standards: Do it!Common ways of doing things essential for using & sharing data internally and externallyonly federal agencies required to use FGDC standards, its optional for any others (e.g. state, local)power of feds often results in adoption by everybody, although there are some noted failures (e.g.the OSI, GOSIP, & POSIX standards in computing in the 1980s failed and were withdrawn)FGDC or ISO standards provide excellent starting point for local standards, and should be adopted unless there are compelling reasons otherwiseStandards for metadata (“documenting your data”) are the most important and should be first priority.Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STDISO Document Geographic Information-Metadata (content) and 19139, Geographic Information—Metadata—Implementation Specification, (format for storing ISO metadata in XML format)If not one of these standard for metadata, adopt some standard!
22 Content Standards for Digital Geospatial Metadata What and Why? Chesapeake Bay Program, Data Center WorkgroupApril 11, 1996Content Standards for Digital Geospatial Metadata What and Why?Metadata — describes the content, quality, format, source and other characteristics of data.Allows you and others to:Locate data (find, discover)Evaluate data (quality, restrictions, reputation)Extract (order, download, pay)Employ (apply, use)and automate this process.Definition: Metadata are “data about data.” They describe the content, quality, condition, and other characteristics of data. Metadata help a person to locate and understand data.Major uses of metadata:Organize and maintain and organization’s investment in data. Metadata help insure an organization’s investment in data. As personnel change or time passes, information about an organization’s data will be lost and the data may lose their value. Later workers may have little understanding of the content and uses for a digital data base and may find that they can’t trust results generated from these data. Complete metadata descriptions of the content and accuracy of a geospatial data set will encourage appropriate use of the data. Such descriptions also may provide some protection for the producing organization if conflicts arise over the misuse of data.Provide information to data catalogs and clearinghouses. Applications of geographic information systems often require many themes of data. Few organizations can afford to create all data they need. Often data created by an organization also may be useful to others. By making metadata available through data catalogs and clearinghouses, organizations can find data to use, partners to share data collection and maintenance efforts, and customers for their data. The FGDC is sponsoring the development of the National Geospatial Data Clearinghouse through which data producers can provide metadata to others using the Internet.Provide information to aid data transfer. Metadata should accompany the transfer of a data set. The metadata will aid the organization receiving the data process and interpret data, incorporate data into its holdings, and update internal catalogs describing its data holdings.Michael A. Domaratz, FGDC Secretariat21
23 Chesapeake Bay Program, Data Center Workgroup April 11, 1996Main Sections of the US Federal Content Standard for Digital Geospatial MetadataIdentificationTitle? Area covered? Themes? Currency? Restrictions?Data Quality (5 aspects)Positional & Attribute Accuracy? Completeness? Logical Consistency? Lineage?Spatial Data OrganizationIndirect? Vector? Raster? Type of elements? Number?Spatial ReferenceProjection? Grid system? Datum? Coordinate system?Entity and Attribute InformationFeatures? Attributes? Attribute values?DistributionDistributor? Formats? Media? Online? Price?Metadata ReferenceMetadata currency? Responsible party?For more info, go to:By law (Executive Order 12906, 1994), all federal agencies must document their data according to:Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD Michael A. Domaratz, FGDC Secretariat23
24 Traditional Minimum Documentation Requirements for Maps/GIS geodetic datum name (e.g NAD27)--which implies:ellipsoid/spheroid name (earth model) e.g. Clark 1866point of origin (ties ellipsoid to earth) e.g Meades Ranchrequired for all GIS data bases and mapsprojection name and its parameters and its measurement units(see terrestrial lecture for exact details)Required for all maps since 2-D by natureRequired for GIS if data is in X-Y projected formSource informationaccuracy standard(s) to which builtauthor/publisher/creator name and/or data sourcedate(s) of data collection/update, and of map/gis creationCartographers demand all maps havenorth arrowmap scalegraticule indicationat least four latitude/longitude tic marks, with values in degreesat least four X-Y tic marks, with values and units of measurement (feet, meters, etc.)If GIS data in lat/long, must know datum.If GIS data in XY, must know datum and projection info)tic marks:Points of positional reference used to relate map to ground or other map+
25 Texas Standards http://www.dir.state.tx.us/tgic/pubs/pubs.htm Standards for digital spatial data (raster and vector) for State agencies in Texas were established in 1992Currently (2004), being reviewed by the Texas Geographic Information Council (TGIC) for possible updateApply to map scales of 1:24,000 and smaller (e.g., 1:100,000; 1:250,000).Cover variety of issues including data layers, datum, projections, accuracy, metadata, etc..Two major planning reports on GIS in state gov. in Texas are:Digital Texas: 2002 Biennial Report on Geographic Information Systems TechnologyGeographic Information Framework for Texas (1999)
26 Importance of Standards Great Baltimore Fire of fire engines from different regions responded only to be found useless since they had different hose coupling sizes that did not fit Baltimore hydrants - fire burned over 30 hours, resulted in destruction of 1526 building covering 17 city blocks.Fire Fall River, MA saved when over 20 neighboring fire department responded to a town fire since they had standardized on hydrants and hose couplings sizes.9/11: Response in NY and DC severely hampered byincompatibilities between GIS data sets, and lack of dataAlso, incompatibilities between communications systemsThe most important standard?Railroad track gauge - adopted by US, UK, Canada, and much of Europe.South America still hampered by differing railroad gauges between countries.
27 The Best Time to Adopt a Standard? Now?Now?Before!
28 Appendix FGDC Standards (status as of March 2004) For latest, go to:
29 FGDC: Metadata Standards Content Standard for Digital Geospatial Metadata (version 2.0) FGDC-STDContent Standard for Digital Geospatial Metadata, Part 1: Biological Data Profile FGDC-STDMetadata Profile for Shoreline Data (FGDC-STD )Content Standard for Digital Geospatial Metadata: extension for remote sensing data (FGDC-STD )Encoding Standard for Geospatial Metadata (Draft)Metadata Profile for Cultural and Demographic Data (dropped)Current thrust is to integrate FGDC Metadata standards (and other FGDC standards eventually) into International Standards Organization (ISO) standards.
30 FGDC: Data Accuracy Standard Geospatial Positioning Accuracy Standard (FGDC-STD-007)Part 1, Reporting Methodology FGDC-STDPart 2, Geodetic Control Networks FGDC-STDPart 3, National Standard for Spatial Data Accuracy FGDC-STDPart 4: Architecture, Engineering Construction, and Facilities Management (FGDC-STD ),Part 5: Standard for Hydrographic Surveys and Nautical Charts (Review)An umbrella incorporating several accuracy standards.Part 3 is the general standard.It essentially updates the National Map Accuracy Standard of 1941/47
31 FGDC: Data Content Standards Facility ID Data Standard, (Review)Address Content Standard (Review)US National Grid (FGDC-STD )Earth Cover Classification System, (draft)Geologic Data Model, (Draft)Governmental Unit Boundary Data Content Standard, (Draft)Biological Nomenclature and Taxonomy Data Standard (draft)National Hydrography Framework Geospatial Data Content Standard (proposal)Environmental Hazards Geospatial Data Content Standard, (dropped)NSDI Framework Data layers (under Review—see next slide)Cadastral Data Content Standard FGDC-STD-003Classification of Wetlands and Deep Water Habitats FGDC-STD-004Vegetation Classification Standard FGDC-STD-005Soils Geographic Data Standard, FGDC-STD-006Content Standard for Digital Orthoimagery, (FGDC-STD )Content Standard for Remote Sensing Swath Data, (FGDC-STD )Utilities Data Content Standard, (FGDC-STD )NSDI Framework Transportation Identification Standard, (Review)Hydrographic Data Content Standard for Coastal and Inland Waterways, (Review)Content Standard for Framework Land Elevation Data, (Review)
32 FGDC: Framework Data Standards establish data content requirements for the seven layers of geospatial data that comprise the National Spatial Data Infrastructure (NSDI), the base layers needed for any geographic areageodetic control,elevation,OrthoimageryHydrography (water)TransportationCadastral (landownership)governmental unit boundariesGoals are toFacilitate and promote exchange of framework layers between producers, consumers, and vendors thru a common content and way of describing that contentLower the cost of data for everyoneFor each layer, specifies an integrated application schema in Unified Modeling Language (UML) including feature types, attribute types, attribute domain, feature relationships, spatial representation, data organization, and metadatano standard specified for data format, but an appendix describes a possible implementation using the Geography Markup Language (GML) Version 3.0, developed through the Open GIS Consortium, Inc. (OGC).
33 FGDC: Data Transfer Standards Spatial Data Transfer Standard (SDTS) FGDC-STD-002SDTS, Part 1 Logical Specification (FIPSPUB 173-1, July 1994)SDTS, Part 2 Spatial Features (FIPSPUB 173-1, July 1994)SDTS, Part 3 ISO 8211 Encoding (FIPSPUB 173-1, July 1994)SDTS, Part 4 Topological Vector Encoding (FIPSPUB 173-1, July 1994)SDTS, Part 5 Raster Profile and Extensions (FGDC-STD-002.5, 2000)SDTS, Part 6: Point Profile, FGDC-STD-002.6, 2000SDTS Part 7: Computer-Aided Design and Drafting (CADD) Profile (FGDC-STD-002.7, 2000)One of the first of the FGDC standards (along with metadata).Intended to facilitate transfers between different GIS systems.Competitive pressures plus internal weaknesses hindered adoption.
34 FGDC: Data Symbology and Presentation Standards Digital Geologic Map Symbolization, (Review)