Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Quality GiGo: garbage in, garbage out

Similar presentations

Presentation on theme: "Data Quality GiGo: garbage in, garbage out"— Presentation transcript:

1 Data Quality GiGo: garbage in, garbage out
‘Cos it’s in the computer, don’t mean it’s right It’s not the things you don’t know that matter, it’s the things you know that aren’t so. Will Rogers, Famous Okie GI specialist “But there are also unknown unknowns: the ones we don't know we don't know.” Donald Rumsfeld “Fast is fine, but accuracy is everything.” Wyatt Earp

2 Horwood’s Short Laws on Data
Dr. Edgar Horwood, founder of the Urban and Regional Information Systems Association (URISA) and Professor of Civil Engineering and Urban Planning at the University of Washington was an early pioneer of computer mapping in the early 1960s. Good data are the data you already have. Bad data drives out good. The data you have for the present crisis was collected to relate to the previous one. The respectability of existing data grows with elapsed time and distance from the source of the data. Data can be moved from one office to another but cannot be created or destroyed. If you have the right data, you have the wrong problem; and vice versa. The important thing is not what you do but how you measure it. In complex systems there is no relationship between the information gathered and the decision made. The acquisition of knowledge from experience is an exception. Knowledge grows at half the rate at which academic courses proliferate. For more information, go to:

3 Data Quality: How good is your data?
Scale ratio of distance on a map to the equivalent distance on the earth's surface Primarily an output issue; at what scale do I wish to display? Precision or Resolution the exactness of measurement or description Determined by input; can output at lower (but not higher) resolution Accuracy the degree of correspondence between data and the real world Fundamentally controlled by the quality of the input Lineage The original sources for the data and the processing steps it has undergone Currency the degree to which data represents the world at the present moment in time Documentation or Metadata data about data: recording all of the above Standards Common or “agreed-to” ways of doing things Data built to standards is more valuable since it’s more easily shareable

4 Scale ratio of distance on a map, to the equivalent distance on the earth's surface. Large scale -->large detail, small area covered (1”=200’ or 1:2,400) Small scale -->small detail, large area (1:250,000) A given object (e.g. land parcel) appears larger on a large scale map scale can never be constant everywhere on a map ‘cos of map projection problem is worst for small scale maps & certain projections (e.g. mercator) can be true from a single point to everywhere can be true along a line , or a set of lines on large scale maps, adjustments often made to achieve ‘close to true’ scale everywhere (e.g State Plane and UTM systems) scale representation Verbal: (good for interpretation.) 0ne inch each equals one statute mile representative fraction (RF) 1: 63,360 (good for measurement) (smaller fraction=smaller scale: 1:2,000,000 smaller than 1:2,000) scale bar: (good if enlarged/reduced) use them all on a map! Miles 1 2

5 Scale Examples Common Scales Large versus Small large: above 1:12,500
1: (1”=16.8ft) 1:2, (1”=56 yards; 1cm=20m) 1:20, (5cm=1km) 1:24, (1”=2,000ft) 1:25, (1cm=.5km) 1:50, (2cm=1km) 1:62, (1.6cm=1km; 1”=.986mi) 1:63, (1”=1mile; 1cm=.634km) 1:100,000 (1”=1.58mi; 1cm=1km) 1:500,000 (1”=7.9mi; 1cm=5km) 1:1,000,000(1”=15.8mi; 1cm=10km) 1:7,500,000(1”=118mi); 1cm=750km) Large versus Small large: above 1:12,500 medium: 1:13, :126,720 small: :130, :1,000,000 very small: below 1:1,000,000 ( really, relative to what’s available for a given area; Maling 1989) Map sheet examples: 1:24,000: 7.5 minute USGS Quads (17 by 22 inches; 6 by 8 miles) 1:7,500,000 US wall map (26 by 16 inches) 1:20,000,000: US 8.5” X 11”

6 Scale, Resolution & Accuracy in GIS Systems
On paper maps, scale is hard to change, thus it generally determines resolution and accuracy--and consistent decisions are made for these. A GIS is scale independent since output can be produced at any scale, irrespective of the characteristics of the input data— at least in theory in practice, an implicit range of scales or maximum scale for anticipated output should be chosen and used to determine: what features to show manholes only on large scale maps how features will be represented manhole a polygon at 1:50; cities a point at 1:1,000,000 appropriate levels for accuracy and precision Larger scale generally requires greater resolution Larger scale necessitates a higher level of accuracy GIS also helps with the the generalization problem implicit in paper maps A road drawn with 0.5 mm wide line (the smallest for decent visibility) At 1:24,000 implies the road is 12 meters (36 feet) wide At 1:250,000 implies the road is 125 meters (375 feet) wide At least in a GIS you can store the true road width, but be careful with plots!

7 Precision or Resolution it’s not the same as scale or accuracy!
Precision: the exactness of measurement or description the “size” of the “smallest” feature which can be displayed, recognized, or described Can apply to space, time (e.g. daily versus annual), or attribute (douglas fir v. conifer) for raster data, it is the size of the pixel (resolution) e.g. for NTGISC digital orthos is 1.6ft (half meter) raster data can be resampled by combining adjacent cells; this decreases resolution but saves storage eg 1.6 ft to 3.2 ft (1/4 storage); to 6.4 ft (1/16 storage) resolution and scale generally, increasing to larger scale allows features to be observed better and requires higher resolution but, because of the human eye’s ability to recognize patterns, features in a lower resolution data set can sometimes be observed better by decreasing the scale (6.4 ft resolution shown at 1:400 rather than 1:200) resolution and positional accuracy you can see a feature (resolution), but it may not be in the right place (accuracy) higher accuracy generally costs much more to obtain than higher resolution accuracy cannot be greater (but may be much less) than resolution (e.g. if pixel size is one meter, then best accuracy possible is one meter) 1.6ft 3.2ft

8 Accuracy: rests on at least four legs, not one!
Positional Accuracy (sometimes called Quantitative accuracy) Spatial horizontal accuracy: distance from true location vertical accuracy: difference from true height Temporal Difference from actual time and/or date Attribute Accuracy or Consistency-- the validity concept in experimental design/stat. inf. a feature is what the GIS/map purports it to be a railroad is a railroad, and not a road A soil sample agrees with the type mapped Completeness--the reliability concept from experimental design/stat. inf. Are all instances of a feature the GIS/map claims to include, in fact, there? Partially a function of the criteria for including features: when does a road become a track? Simply put, how much data is missing? Logical Consistency: The presence of contradictory relationships in the database Non-Spatial Some crimes recorded at place of occurrence, others at place where report taken Data for one country is for 2000, for another its for 2001 Annual data series not taken on same day/month etc. (sometimes called lineage error) Data uses different source or estimation technique for different years (again, lineage) Overshoots and gaps in road networks or parcel polygons

9 Sources of Error Error is the inverse of accuracy
Sources of Error Error is the inverse of accuracy. It is a discrepancy between the coded and actual values. Sources Inherent instability of the phenomena itself E.g. Random variation of most phenomena (e.g. leaf size) Measurement E.g. surveyor or instrument error Model used to represent data E.g. choice of spheroid, or classification systems Data encoding and entry E.g. keying or digitizing errors Data processing E.g. single versus double precision; algorithms used Propagation or cascading from one data set to another E.g. using inaccurate layer as source for another layer Example for Positional Accuracy choice of spheroid and datum choice of map projection and its parameters accuracy of measured locations (surveying) of features on earth media stability (stretching ,folding, wrinkling of maps, photos) human drafting, digitizing or interpretation error resolution &/or accuracy of drafting/digitizing equipment Thinnest visible line: millimeters At scale of 1:20,000 = feet (20,000 x 0.2 = 4,000mm = 4m = 12.8 feet) registration accuracy of tics machine precision: coordinate rounding error in storage and manipulation other unknown

10 Measurement of Positional Accuracy
usually measured by root mean square error: the square root of the average squared errors Usually expressed as a probability that no more than P% of points will be further than S distance from their true location. Loosely we say that the rmse tells us how far recorded points in the GIS are from their true location on the ground, on average. More correctly, based on the normal distribution of errors, 68% of points will be rmse distance or less from their true location, 95% will be no more than twice this distance, providing the errors are random and not systematic (i.e. the mean of the errors is zero) e.g. for NTGISC digital orthos RMSE is 3.2 feet (one meter) for USGS Digital Ortho Quads RMSE spec. is approx. 33 feet or 10 meters (but in reality much better) -- with GPS, height is 2 or 3 times less accurate in practice at high precision than horizontal (officially the spec is 1.5, but data collection errors affect vertical the most) e12 + e22 + e en2 n-1 rmse = where ei is the distance (horizontally or vertically )between the tue location of point i on the ground, and its location represented in the GIS.

11 National Map Accuracy Standards: 1941/47
established in 1941 by the US Bureau of the Budget (now OMB) for use with US Geological Survey maps (Maling, 1989, p. 146) horizontal accuracy: not more than 10% of tested, ‘well defined’ points shall be more than the following distances from their true location: 1:62,500: 1/50th of an inch (.02”) 1:24,000: 1/40th of an inch (amended to 1/50=.02” in 1947) 1:12,000: 1/30 of an inch (.033”) Thus, on maps with a scale of 1:63,360 (1”=1 mile) 90% of points should be within feet [(63360 X .02)/12)] of their true location. on USGS quads with a scale of 1:24,000 (1”=2,000ft) 90% of points should be within 40 feet [(24,000 X .02)/12] of their true location. on a map with a scale of 1:12,000 (1”=1,000ft), 90% of points should be within 33 feet (1,000 X .033), approx. 10 meters gives rise to the loose, but often used, statement that the “NMAS is 10 meters” Inadequate for the computer age how many points? how select? how determine their ‘true’ location what about attribute completeness? Unfortunately, the “new standard” doesn’t address all these issues either 1:20,000 1/50=.02” 1/30=.033” Smaller scale Larger scale

12 National Standard for Spatial Data Accuracy (NSSDA) 1998
Geospatial Positioning Accuracy Standard (FGDC-STD-007) Part 3, National Standard for Spatial Data Accuracy FGDC-STD “replacement” for National Map Accuracy Standard of 1941/47 specifies a statistic and testing methodology for positional (horizontal and vertical) accuracy of maps and digital data no single threshold metric to achieve (as with old Standard), but users encouraged to establish thresholds for specific applications accuracy reported in ground units (not map units as in 1941 standard [1/30th inch]) testing method compares data set point coordinate values with coordinate values from a higher accuracy source for readily visible or recoverable ground points altho. uses points, principles apply to all geospatial data including point, vector and raster objects other standards for data content will adopt NSSDA for particular spatial objects copies of the standard available at: Accuracy Standard has 7 parts, of which parts 4-7 apply to specific data types

13 GPS and Positional Accuracy
Global Positioning System satellite positioning with WAAS (wide area augmentation system) adjustment gives positional accuracy within about 3 meters (10ft). This is more accurate than most printed maps and nautical charts! It is also more accurate than most digital maps and charts since these often derive from paper maps and surveys conducted prior to GPS Your integrated GPS/digital chart can show you nicely heading down the center of a channel, but positional inaccuracy in the chart can leave you grounded! According to chart In reality

14 Summary: Resolution, Scale, Accuracy & Storage: illustrating the relationship
Largest (maximum) scale for given pixel size. Storage is for USGS 7.5 quad. area (in Texas, USGS quad is about 7 mi x 8.5 mi=60 sq. miles--16 quads for Dallas County) Source: GPS Technology Corporation

15 Go to quality_graphics.ppt
Examples of Accuracy Go to quality_graphics.ppt

16 Lineage identifies the original sources from which the data was derived details the processing steps through which the data has gone to reach its current form Both impact its accuracy Both should be in the metadata, and are required by the Content Standard for Metadata (see below) Michael Goodchild ( the guru of GIS) advocates: Measurement-based GIS, in which how data collected and how measurements made are a part of the record (as in surveying) Coordinate-based GIS, is the current approach, and it tracks none of this. (see Shi, Fisher and Goodchild Spatial Data Quality London: Taylor and Frances, 2002)

17 Currency: Is my data “up-to-date”?
data is always relative to a specific point in time, which must be documented. there are important applications for historical data (e.g. analyzing trends), so don’t necessarily trash old data “current” data requires a specific plan for on-going maintenance may be continuous, or at pre-defined points in time. otherwise, data becomes outdated very quickly currency is not really an independent quality dimension; it is simply a factor contributing to lack of accuracy regarding consistency: some GIS features do not match those in the real world today completeness: some real world features are missing from the GIS database Many organizations spend substantial amounts acquiring a data set without giving any thought to how it will be maintained.

18 Standards: common “agreed-to” ways of doing things
May exist for: Data itself [including process (the way it’s produced) and product (the outcome)] Utilities Data Content Standard, FGDC-STD   Accuracy of data Geospatial Positioning Accuracy Standard, Part 3, National Standard for Spatial Data Accuracy, FGDC-STD   Documentation about the data (metadata) Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD   Transfer of data and its documentation Spatial Data Transfer Standard (SDTS), FGDC-STD-002 For symbology and presentation Digital Geologic Map Symbolization   May address: Content (what is recorded) Format (how it’s recorded: file format, .tif, shapefile, etc) May be a product of: An organization’s internal actions [private or organization standards] An external government body (Federal Geographic Data Committee) or third sector body (Open GIS Consortium) [public or de jure standards] Laissez-faire market-place-forces leading to one dominant approach e.g. “Wintel standard” [industry or de facto standards]

19 Who Sets Public Standards ?
Federal Geographic Data Committee Sets standards for geospatial data which all federal agencies are required to follow Has representatives from most federal agencies National Institute for Standards and Technology (NIST) sets federal gov. standards for other things (e.g. IT in general) national standards bodies American National Standards Institute (ANSI) has the US’s single vote at ISO United States InterNational Committee on Information Technology Standards (INCITS) handles IT standards for ANSI Several FGDC standards been submitted for approval Most countries in the world have their equivalent to ANSI international standards bodies ISO (International Organization for Standardization) other assorted vendor groups, professional associations, trade associations, and consortia Open GIS Consortium (OGC) is the main player in GIS

20 The Process for Setting de jure standards!
Source: URISA News Issue 197, Sept/Oct. 2003 Go to the following web site for excellent overview of standard making: process

21 Adopting Standards: What you should do
Data quality achieved by adoption and use of standards: Do it! Common ways of doing things essential for using & sharing data internally and externally only federal agencies required to use FGDC standards, its optional for any others (e.g. state, local) power of feds often results in adoption by everybody, although there are some noted failures (e.g.the OSI, GOSIP, & POSIX standards in computing in the 1980s failed and were withdrawn) FGDC or ISO standards provide excellent starting point for local standards, and should be adopted unless there are compelling reasons otherwise Standards for metadata (“documenting your data”) are the most important and should be first priority. Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD ISO Document Geographic Information-Metadata (content) and 19139, Geographic Information—Metadata—Implementation Specification, (format for storing ISO metadata in XML format) If not one of these standard for metadata, adopt some standard!

22 Content Standards for Digital Geospatial Metadata What and Why?
Chesapeake Bay Program, Data Center Workgroup April 11, 1996 Content Standards for Digital Geospatial Metadata What and Why? Metadata — describes the content, quality, format, source and other characteristics of data. Allows you and others to: Locate data (find, discover) Evaluate data (quality, restrictions, reputation) Extract (order, download, pay) Employ (apply, use) and automate this process. Definition: Metadata are “data about data.” They describe the content, quality, condition, and other characteristics of data. Metadata help a person to locate and understand data. Major uses of metadata: Organize and maintain and organization’s investment in data. Metadata help insure an organization’s investment in data. As personnel change or time passes, information about an organization’s data will be lost and the data may lose their value. Later workers may have little understanding of the content and uses for a digital data base and may find that they can’t trust results generated from these data. Complete metadata descriptions of the content and accuracy of a geospatial data set will encourage appropriate use of the data. Such descriptions also may provide some protection for the producing organization if conflicts arise over the misuse of data. Provide information to data catalogs and clearinghouses. Applications of geographic information systems often require many themes of data. Few organizations can afford to create all data they need. Often data created by an organization also may be useful to others. By making metadata available through data catalogs and clearinghouses, organizations can find data to use, partners to share data collection and maintenance efforts, and customers for their data. The FGDC is sponsoring the development of the National Geospatial Data Clearinghouse through which data producers can provide metadata to others using the Internet. Provide information to aid data transfer. Metadata should accompany the transfer of a data set. The metadata will aid the organization receiving the data process and interpret data, incorporate data into its holdings, and update internal catalogs describing its data holdings. Michael A. Domaratz, FGDC Secretariat 21

23 Chesapeake Bay Program, Data Center Workgroup
April 11, 1996 Main Sections of the US Federal Content Standard for Digital Geospatial Metadata Identification Title? Area covered? Themes? Currency? Restrictions? Data Quality (5 aspects) Positional & Attribute Accuracy? Completeness? Logical Consistency? Lineage? Spatial Data Organization Indirect? Vector? Raster? Type of elements? Number? Spatial Reference Projection? Grid system? Datum? Coordinate system? Entity and Attribute Information Features? Attributes? Attribute values? Distribution Distributor? Formats? Media? Online? Price? Metadata Reference Metadata currency? Responsible party? For more info, go to: By law (Executive Order 12906, 1994), all federal agencies must document their data according to: Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD   Michael A. Domaratz, FGDC Secretariat 23

24 Traditional Minimum Documentation Requirements for Maps/GIS
geodetic datum name (e.g NAD27)--which implies: ellipsoid/spheroid name (earth model) e.g. Clark 1866 point of origin (ties ellipsoid to earth) e.g Meades Ranch required for all GIS data bases and maps projection name and its parameters and its measurement units (see terrestrial lecture for exact details) Required for all maps since 2-D by nature Required for GIS if data is in X-Y projected form Source information accuracy standard(s) to which built author/publisher/creator name and/or data source date(s) of data collection/update, and of map/gis creation Cartographers demand all maps have north arrow map scale graticule indication at least four latitude/longitude tic marks, with values in degrees at least four X-Y tic marks, with values and units of measurement (feet, meters, etc.) If GIS data in lat/long, must know datum. If GIS data in XY, must know datum and projection info) tic marks: Points of positional reference used to relate map to ground or other map +

25 Texas Standards
Standards for digital spatial data (raster and vector) for State agencies in Texas were established in 1992 Currently (2004), being reviewed by the Texas Geographic Information Council (TGIC) for possible update Apply to map scales of 1:24,000 and smaller (e.g., 1:100,000; 1:250,000). Cover variety of issues including data layers, datum, projections, accuracy, metadata, etc.. Two major planning reports on GIS in state gov. in Texas are: Digital Texas: 2002 Biennial Report on Geographic Information Systems Technology Geographic Information Framework for Texas (1999)

26 Importance of Standards
Great Baltimore Fire of fire engines from different regions responded only to be found useless since they had different hose coupling sizes that did not fit Baltimore hydrants - fire burned over 30 hours, resulted in destruction of 1526 building covering 17 city blocks. Fire Fall River, MA saved when over 20 neighboring fire department responded to a town fire since they had standardized on hydrants and hose couplings sizes. 9/11: Response in NY and DC severely hampered by incompatibilities between GIS data sets, and lack of data Also, incompatibilities between communications systems The most important standard? Railroad track gauge - adopted by US, UK, Canada, and much of Europe. South America still hampered by differing railroad gauges between countries.

27 The Best Time to Adopt a Standard?
Now? Now? Before!

28 Appendix FGDC Standards (status as of March 2004) For latest, go to:

29 FGDC: Metadata Standards
Content Standard for Digital Geospatial Metadata (version 2.0) FGDC-STD Content Standard for Digital Geospatial Metadata, Part 1: Biological Data Profile FGDC-STD Metadata Profile for Shoreline Data (FGDC-STD ) Content Standard for Digital Geospatial Metadata: extension for remote sensing data (FGDC-STD ) Encoding Standard for Geospatial Metadata (Draft) Metadata Profile for Cultural and Demographic Data (dropped) Current thrust is to integrate FGDC Metadata standards (and other FGDC standards eventually) into International Standards Organization (ISO) standards.

30 FGDC: Data Accuracy Standard
Geospatial Positioning Accuracy Standard (FGDC-STD-007) Part 1, Reporting Methodology FGDC-STD Part 2, Geodetic Control Networks FGDC-STD Part 3, National Standard for Spatial Data Accuracy FGDC-STD Part 4: Architecture, Engineering Construction, and Facilities Management (FGDC-STD ), Part 5: Standard for Hydrographic Surveys and Nautical Charts (Review) An umbrella incorporating several accuracy standards. Part 3 is the general standard. It essentially updates the National Map Accuracy Standard of 1941/47

31 FGDC: Data Content Standards
Facility ID Data Standard, (Review) Address Content Standard (Review) US National Grid (FGDC-STD ) Earth Cover Classification System, (draft) Geologic Data Model, (Draft) Governmental Unit Boundary Data Content Standard, (Draft) Biological Nomenclature and Taxonomy Data Standard (draft) National Hydrography Framework Geospatial Data Content Standard (proposal) Environmental Hazards Geospatial Data Content Standard, (dropped) NSDI Framework Data layers (under Review—see next slide) Cadastral Data Content Standard FGDC-STD-003 Classification of Wetlands and Deep Water Habitats FGDC-STD-004 Vegetation Classification Standard FGDC-STD-005 Soils Geographic Data Standard, FGDC-STD-006 Content Standard for Digital Orthoimagery, (FGDC-STD ) Content Standard for Remote Sensing Swath Data, (FGDC-STD ) Utilities Data Content Standard, (FGDC-STD ) NSDI Framework Transportation Identification Standard, (Review) Hydrographic Data Content Standard for Coastal and Inland Waterways, (Review) Content Standard for Framework Land Elevation Data, (Review)

32 FGDC: Framework Data Standards
establish data content requirements for the seven layers of geospatial data that comprise the National Spatial Data Infrastructure (NSDI), the base layers needed for any geographic area geodetic control, elevation, Orthoimagery Hydrography (water) Transportation Cadastral (landownership) governmental unit boundaries Goals are to Facilitate and promote exchange of framework layers between producers, consumers, and vendors thru a common content and way of describing that content Lower the cost of data for everyone For each layer, specifies an integrated application schema in Unified Modeling Language (UML) including feature types, attribute types, attribute domain, feature relationships, spatial representation, data organization, and metadata no standard specified for data format, but an appendix describes a possible implementation using the Geography Markup Language (GML) Version 3.0, developed through the Open GIS Consortium, Inc. (OGC).

33 FGDC: Data Transfer Standards
Spatial Data Transfer Standard (SDTS) FGDC-STD-002 SDTS, Part 1 Logical Specification (FIPSPUB 173-1, July 1994) SDTS, Part 2 Spatial Features (FIPSPUB 173-1, July 1994) SDTS, Part 3 ISO 8211 Encoding (FIPSPUB 173-1, July 1994) SDTS, Part 4 Topological Vector Encoding (FIPSPUB 173-1, July 1994) SDTS, Part 5 Raster Profile and Extensions (FGDC-STD-002.5, 2000) SDTS, Part 6: Point Profile, FGDC-STD-002.6, 2000 SDTS Part 7: Computer-Aided Design and Drafting (CADD) Profile (FGDC-STD-002.7, 2000) One of the first of the FGDC standards (along with metadata). Intended to facilitate transfers between different GIS systems. Competitive pressures plus internal weaknesses hindered adoption.

34 FGDC: Data Symbology and Presentation Standards
Digital Geologic Map Symbolization, (Review)

Download ppt "Data Quality GiGo: garbage in, garbage out"

Similar presentations

Ads by Google