Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS.

Similar presentations


Presentation on theme: "Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS."— Presentation transcript:

1 Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS

2 ©2005 Austin Troy Data Quality Two key components of quality in data are accuracy and precision Error is a result of both inaccuracy and imprecision in the data; it is a general term encompassing lack of reliability GIS data quality is, in theory, a compromise between needs and costs In practice it is usually about what is available Introduction to GIS

3 ©2005 Austin Troy Data Quality Cost of data is a reflection of that precision: Because lower-quality data tend to be cheaper and more available, a very common problem in GIS is the inappropriate use of data A critical step in developing a GIS is deciding “what is accurate enough?” This is function of needs, cost, accessibility and time User needs determine accuracy and, in general, accuracy determines price Introduction to GIS

4 ©2005 Austin Troy Accuracy What is accuracy? “the degree to which information on a map or in a digital database matches true or accepted values.” From Kenneth E. Foote and Donald J. Huebner http://www.colorado.edu/geography/gcraft/notes/error/error_f.html It is also a reflection of how close a measurement represent the actual quantity measured Accuracy is a reflection of the number and severity of errors in a dataset or map. Introduction to GIS

5 ©2005 Austin Troy Precision Quality is also a function of “precision” Precision is the intensity or level of preciseness, or exactitude in measurements. The more precise a measurement is, the smaller the unit which you intend to measure Hence, a measurement down to a fraction of a cm is more precise than a measurement to a cm However, data with a high level of precision can still be inaccurate—this is due to errors Each application requires a different level of precision Introduction to GIS

6 ©2005 Austin Troy Precision Each application requires a different level of precision Engineering and surveying applications typically require highly levels of precision; they may be measuring to a millimeter On the other end of the spectrum, studies of weather patterns, or crop cover require much less precision Precise data are costly: for example carefully surveyed point locations needed by utilities to record the locations of pumps, wires, pipes and transformers cost $5-20 per point to collect Introduction to GIS

7 ©2005 Austin Troy Positional Accuracy and Precision One of the primary types of error in GIS is positional error—that is, errors in 2D (x,y) and in the 3 rd dimension (height) Positional accuracy and precision are functions of the scale at which the digital layer was created If created from digitizing a paper map, the minimum usable scale of the digital layer is considered the scale of that map Scale is a function of the map’s resolution Introduction to GIS

8 ©2005 Austin Troy Positional Accuracy Positional accuracy standards specify that acceptable positional error varies with scale Data can have high level of precision but still be positionally inaccurate Positional error is inversely related to precision and to amount of processing Introduction to GIS

9 ©2005 Austin Troy Measurement of Accuracy Accuracy is often stated as a confidence interval: e.g. 104.2 cm +/-.01 means true value lies between 104.21 and 104.19 One of the key measurements of positional accuracy is root mean squared error (MSE); equals squared difference between observed and expected value for observation i divided by total number of observations, summed across each observation i This is just a standardized measure of error—how close the predicted measure is to observed Introduction to GIS

10 ©2005 Austin Troy Positional Error Different agencies have different standards for positional error Example: USGS horizontal positional requirements state that 90% of all points must be within 1/30th of an inch for maps at a scale of 1:20,000 or larger, and 1/50th of an inch for maps at scales smaller than 1:20,000 Introduction to GIS

11 ©2005 Austin Troy Positional Error USGS Accuracy standards on the ground: 1:4,800 ± 13.33 feet 1:10,000 ± 27.78 feet 1:12,000 ± 33.33 feet 1:24,000 ± 40.00 feet 1:63,360 ± 105.60 feet 1:100,000 ± 166.67 feet Introduction to GIS See image from U. Colorado showing accuracy standards visually Hence, a point on a map represents the center of a spatial probability distribution of its possible locations probability distribution Thanks to Kenneth E. Foote and Donald J. Huebner, The Geographer's Craft Project, Department of Geography, The University of Colorado at Boulder for links

12 ©2005 Austin Troy Positional Error A critical point is to remember that “zooming” in a digital map does not increase the level of accuracy The accuracy and precision are based on the scale of the digital layer’s original parent source To see this, let’s look at river data derived from sources at three scales and three levels of precision 1:2,000,000- small scale 1:100,000- medium scale 1:24,000-large scale Introduction to GIS

13 ©2005 Austin Troy Positional Error-some examples Introduction to GIS

14 ©2005 Austin Troy Attribute Precision Attribute accuracy and precision refer to quality of non-spatial, attribute data Precision for numeric data means lots of digits Example: recording income down to cents, rather than just dollars Precision for categorical data means lots of categories Example: Anderson LU level 3 versus level 1 Introduction to GIS

15 ©2005 Austin Troy Conceptual Accuracy Misclassification result from differences in judgment or in the automated classification tools The accuracy of classifications will depend on the precision. The less precise your classifications, the less likely there will be errors If just classifying as “land and water”, that is not very precise, and not likely to result in an error Introduction to GIS

16 ©2005 Austin Troy Other measures of data quality Logical consistency Completeness Data currency/timeliness Accessibility These apply to both attribute and positional data Introduction to GIS

17 ©2005 Austin Troy Logical Consistency Do data follow rules of logic? Attribute Example: is something classified as both water and as commercially zoned land? Geospatial example: Do lines intersect when they should not (eg. With power lines)? Do polygons not close on themselves Introduction to GIS

18 ©2005 Austin Troy Completeness Is a data layer complete or lacking in coverage? Examples: does a layer on roads leave out some roads? If so, does it do so systematically or randomly? Does a database of buildings in a city leave out some buildings? Examples where completeness is crucial: a database of houses used to notify neighbors when a noxious facility is proposed? Imagine if a bunch of people were left out? Introduction to GIS

19 ©2005 Austin Troy Currency and Timeliness Since some things change faster than others, the importance of timeliness in data depends on what is being displayed By the time they have been digitized, they are often out of date ; e.g. tax parcels Updates are key, but the frequency of updates should depend on what is being displayed. Temporal validity must be stated: this tells someone using a map how long the data are considered valid Introduction to GIS

20 ©2005 Austin Troy Currency and Timeliness Introduction to GIS

21 ©2005 Austin Troy Currency and Timeliness Introduction to GIS Streets are another data set where currency is important; blue represents all the additional streets built between 1990 and 2000

22 ©2005 Austin Troy Conflation When one layer is better in one way and another is better in another and you wish to get the best of both Way of reconciling best geometric and attribute features from two layers into a new one Very commonly used for case where one layer has better attribute accuracy or completeness and another has better geometric accuracy or resolution Also used where newer layer is produced for some theme but is has lower resolution than older one Introduction to GIS

23 ©2005 Austin Troy Two general types of Conflation Attribute conflation: transferring attributes from an attribute rich layer to features in an attribute poor layer Feature conflation: improvement of features in one layer based on coordinates and shapes in another, often called rubber sheeting. User either transforms all features or specifies certain features to be kept fixed Introduction to GIS

24 ©2005 Austin Troy Attribute conflation More spatially accurate layer is referred to as the base, coordinate or target layer Layer with more accurate attribution is referred to as the reference, or non-base layer TIGER line files: good attribution, poor accuracy; USGS DLGs: opposite. Attribute conflation is frequently used by third party vendors to assign the rich attribute data of TIGER to the positionally accurate DLGs. Nodes are matched by iteratively rubber sheeting the reference layer to the base layer until matching nodes fall within certain tolerance. Then line features are matched up. Introduction to GIS

25 ©2005 Austin Troy Conflation examples Introduction to GIS Source: Stanley Dalal, GIS cafe


Download ppt "Lecture 23: Brief Introduction to Data quality By Austin Troy ------Using GIS-- Introduction to GIS."

Similar presentations


Ads by Google