Presentation is loading. Please wait.

Presentation is loading. Please wait.

GIS Data Quality.

Similar presentations


Presentation on theme: "GIS Data Quality."— Presentation transcript:

1 GIS Data Quality

2 Lecture Outline Accuracy Precision Recognizing and Avoiding Error
Error Sources Using Multiple Data Sets Together Completeness Compatibility Consistency Applicability Error Propagation Recognizing and Avoiding Error Metadata Here’s the outline of what we’ll be talking about today. We’ll start by discussing Error in individual data sets, and then expand that to multiple data sets. I’ll briefly give you some suggestions on how to Recognize and attempt to Avoid Error Finally, I’ll talk about Documenting your data with Metadata So, to start, what is Error? Anybody? Is it good or bad? Do you think it’s possible to be 100% error-free in a GIS data set? Why or why not?

3 Accuracy and Precision
Accuracy: the degree to which information on a map or in a digital database matches true or accepted values Precision: the level of measurement and exactness of description in a GIS database Error: inaccuracy and imprecision of data Accuracy and Precision can be applied to both spatial and non-spatial data Spatial – often refers to scale of map from which data is derived can also refer to GPS data Non-spatial – describes level of detail in the attribute data “Garbage in, garbage out” We describe error using two measures of quality: Accuracy and Precision Error is the amount of Inaccuracy and Imprecision There is also Data Quality, which indicates how good the data is; in other words, it is a combination of the Accuracy & Precision Error can be applied to both Spatial and Non-Spatial data

4 Accuracy and Precision
Accurate Imprecise Inaccurate Precise Here are some graphical examples of what is meant by accuracy and precision in GIS. These are Spatial examples. Non-spatial example of Accuracy: Population counts that are close to the true number of people Non-spatial example of Precision: The more information, the more precise (I.e. better description) Can also be the number of decimal places A good rule of thumb for precision when creating derivatives of data: Add one decimal place Example: If your population data is rounded to a whole number, then any data created from it (like Pop. Density) should only be taken out to 1 decimal place

5 Sources of Inaccuracy and Imprecision
Obvious Sources of Error Age of Data Areal Coverage Map Scale Density of Observations Relevance (use of “surrogate” data) Data Format Accessibility Cost Error from Natural Variation or Original Measurements Positional Accuracy Accuracy of Content Variation in the Data Processing Errors Numerical Errors Topological Errors Classification and Generalization Digitizing and Geocoding Age - may be too old to be relevant Areal Coverage - may not cover full area completely; lack at borders; cloud cover on imagery Scale – think about points between contour lines. Larger scale maps would have more detail (more contour lines, smaller contour intervals) and allow for better estimates of the points Density of Obs - are there enough obs. to justify your level of detail? Relevance - if you’re using data to indirectly measure something, is it appropriate? Format - has the data been transformed from one format to another many times? Think about raster compression formats like MrSid, these save a lot of space, but also lose detail. Accessibility - can you get the data that you need? If not, then less accurate data may have to be used. Cost - can you afford data of high accuracy? If not, then less accurate data may have to be used Positional - fuzzy boundaries: soil, vegetation, biomes Content - correct attribution? Caused by sloppiness or bad calibration of equipment Data Variation - is the nature of the data constant throughout time or does it vary? Numerical - errors in computation by the computer Topological - overshoots, slivers, dangles from Overlay Analysis Class & Gen - any analysis done on classes is affected by classification scheme; can incur error Dig & Geocode - errors in line digitizing

6 Scale Effects on Position
1:12,500 1:25,000 1:50,000 1:100,000 1:250,000 1:1,000,000 Horizontal Accuracy 9.5 m 12.7 m 25.4 m 50.8 m 126.9 m 507.9 m From: US National Map Accuracy Standards

7 Error Sources Associated With Digitizing

8 Spatial Data Error Location errors Attribute errors
Example: a schoolhouse is located 30 feet away from its marked location on a map A 300 meter contour line is offset 5 meters to the northwest A satellite image pixel is located 2.4 meters away from its actual location on the ground Attribute errors A schoolhouse is incorrectly labeled as a church A 300 meter contour line is actually supposed to be a 310 meter contour line A 300 meter contour line actually represents an elevation of 302 meters A classified satellite image pixel is labeled forest when it is actually a field

9 Spatial Data Error One data point – error/accuracy can be easily defined. Data sets/maps – error/accuracy must be summarized. How is accuracy determined and summarized? Very accurate data must be collected (sampled) about a subset of the full dataset/map. This accurate sample is then compared with the original data A summary is created that compares these 2 datasets (the sample with the same measurements from the original data)

10 Spatial Data Error Locational data accuracy can be summarized with Root Mean Square Error (RMSE). A kind of average of the distance points/pixels are represented from their actual location on the ground. Locational data can also be summarized in other ways: For example: For horizontal data, the USGS uses the US National Mapping Accuracy Standards: 90% of all measurable points are within 1/50 of an inch for maps of spatial scale less than or equal to 1:20,000, and within 1/30 of an inch for maps of spatial scale greater than 1:20,000.

11 Different scales can lead to different boundaries, even though the boundaries may be fuzzy and inexact

12 Error Error is unbiased when the error is in ‘random’ directions
GPS data Human error in surveying points Error is biased when there is systematic variation in accuracy within a geographic data set Example: GIS tech mistypes coordinate values when entering control points to register map to digitizing tablet all coordinate data from this map is systematically offset (biased) Example: the wrong datum is being used

13 Error when Using Multiple Data Sets
Error Propagation – one error leads to another using a mis-registered point to register another layer additive effect E.g., what happens if layer digitized with a spatial bias problem is used as the spatial reference to create another, new layer? Error Cascading – erroneous, imprecise and inaccurate information will skew a GIS solution when information is combined selectively into new layers errors propagate from layer to layer repeatedly effect can be additive or multiplicative When using multiple data sets, any error in one data set may influence another data set, and it may be compounded through your analyses When error in one data set leads to error in a second data set, we call that Error Propagation. This often has an additive effect. When the error propagates from data set to data set repeatedly and ultimately skews the results, this is Error Cascading. The effect can be additive or multiplicative, but it is often very difficult to tell.

14 Propagation & Cascading

15 Using Multiple Data Sets
Four Data Quality Considerations: Completeness A complete data set will cover the study area and time period in its entirety No data set is 100% complete Compatible Data sets must be compatible with one another Scale, data capture methods, etc. Consistency There must be consistency between and within data sets Data development, data capture methods Applicability Data must be appropriate for your intended use When using multiple data sets together, there are four Quality Considerations that you must keep in mind. Completeness - covers area and time period completely; never 100% complete Sample data is not complete by its nature Time-series data is never complete Must determine acceptable level of completeness for all data sets Compatible - use data sets together in a sensible manner similar scales, similar data capture methods, etc Consistency - must be consistency between and within data sets Data development, data capture, etc. If different parts of a data set were done by more than one person, were the same standards kept? (e.g. contours) This often goes hand-in-hand with compatibility Applicability - is your data appropriate for your uses? Don’t use a DEM to help you determine the spread of HIV!!

16 Documenting Your Data – Metadata
Metadata - data about data Used to document all aspects of a data set Allows the user to determine the usefulness of data set Organizations want to maintain their investment To share information about available data Data catalogs & clearinghouses To aid data transfer & appropriate use Metadata standards set by the Federal Geographic Data Committee (FGDC) All data distributed on the web and by sanctioned data distributors should have FGDC-compliant metadata Metadata is data about data. This documentation allows the user to understand all aspects of the data set, from it’s coordinate system and projection, to its attributes, to its processing history, and so on. It also allows the user to determine if the data is useful for his/her purposes. The Official Metadata standards were originally set down in 1994 by the FGDC. Here are a couple of web sites that you can look at if you’re interested. There are strict regulations on the construction of metadata that must be followed for it to be FGDC compliant. Compliance sets a common standard for everyone to use, and makes the exchange of data much easier.

17

18 Metadata in ArcGIS Visible in ArcCatalog
Contained in the .xml part of a shapefile Maintain investment by cataloguing & noting appropriate use of data

19

20 Reminders Case study #7 will be on Friday (Oct. 5th)
Mid-term study guide will be posted online on Friday (Oct. 5th) Mid-term review will be on Monday (Oct. 8th) Come with questions Mid-term exam will be next Wednesday (Oct. 10) Lab 3 is due next Friday (Oct. 12th) This was written incorrectly in the Lab 3 document


Download ppt "GIS Data Quality."

Similar presentations


Ads by Google