Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geographic data validation. Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced.

Similar presentations


Presentation on theme: "Geographic data validation. Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced."— Presentation transcript:

1 Geographic data validation

2 Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations

3 Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations

4 Basic concepts Quality Faithful representation of a feature Quality of data related to quality of output GIGO principle Data have the potential to be used in ways unforeseen when collected. The value of the data is directly related to the fitness for a variety of uses.

5 Basic concepts Fitness-for-use The suitability of a set of data for a specific purpose A.K.A. usability Should not be confused with quality Quality: Abstract Usability: Specific Low-quality dataset may be of a high usability

6 Basic concepts Precision o Closeness of repeated measurements to a given value, either correct or not Accuracy o Closeness of a measurement to the true value

7 Precision vs Accuracy

8 Basic concepts Precision o Closeness of repeated measurements to a given value, either correct or not Accuracy o Closeness of a measurement to the true value Precision is an intrinsic value Accuracy depends on knowing the true value of the variable Data validation: assessing the accuracy Compare against a reference value

9 Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations

10 Why do we need validation?

11

12 This was a striking example, but more subtle issues can (and actually do) happen We need to develop techniques and methodologies to explore the data In other words, we need to validate the data Validating gives a sense of the reliability of the records, and clues on how to improve it

13 Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations

14 How to assess? Depending on the aim of the assessment, different techniques Remember that high quality datasets are more likely to show high fitness-for-use Ideally, check for quality If we know the purpose, check for its fitness

15 How to assess? Work with geographic information a la DarwinCore Work with individual records as well as collections of data Start with the most basic pieces of information Look for coherence with other pieces of information If not, why? Make modifications of information to see if they fit In more advanced levels, make use of available taxonomic or temporal information

16 How to assess? Tools Spreadsheet: Microsoft Excel, LibreOffice Calc… o Well-known environment o Visually easy Open Refine o Spreadsheet-like, but with some enhanced features Scripts o Database scripts: work directly at the source o Other programming language: enhanced capabilities GIS software o Often linked with other tools, such as spreadsheets or scripts

17

18 Visualizations Visual exploration of record set Useful for a first-level assessment Primary visualization for geographic data: maps Next picture has several issues that can be detected using a map…

19

20 Coordinate transposition This happens when latitude is stored in longitude field and vice-versa Usually difficult to detect on a one-by-one basis But when looked at the whole picture…

21 Zero vs Null One of the most common issues Storing 0 (zero) instead of leaving the field empty This happens with some data management systems Latitude 0 and longitude 0 are stored meaning “unknown coordinates” But we do not know that, that is not what the standard says

22 Negation Forgetting or altering the positive/negative of the coordinates Usually forgetting the minus sign The most common source: transforming from DMS to DD, without taking “W” or “S” into account

23 Check against country The easiest way of checking these issues is to check if the coordinates fall inside the specified country… Of course, if we have a country value to check against Two ways Use GIS software Use webservices like geonames (we will see this in the openRefine session)

24 Georeferencing Intermediate check If we have locality information and coordinates, we can check if they match Georeferencing is a tough task, and prone to uncertainties, so some level of imprecision is to be expected Make good use of the “uncertainty” fields in DarwinCore! But still…

25 55.932576, 13.132359 Anahuac NWR (UTC 049) Grandville POINT(-1.3223333 53.44958) Marine Nature Study Area 78º 47’ 52” S; 35º 50’ 31” E Stewart Park POINT(-1.1735004 53.358746) Backyard My Habitat 55.932576, 13.132359 Wilderness Park, north of 14th St. 28054 Delaney Conservation Area 57.3, 11.9

26 Multi-domain checks Using information from different sources to check quality Especially use taxonomic information to improve geospatial data Most basic example: check data against range map If point falls inside range map of the specified species, OK Sometimes, temporal information is useful

27 Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations

28 Considerations NEVER modify the original data Data cleaning is a human task, and thus, it is not error- free Information we believe is wrong may be right Make an “improved copy” of the data Or “flag” the records as inaccurate Re-share the improvements With the community: so that others don’t have to re- invent the wheel With the original owners of the data: so that they can correct the errors at the source


Download ppt "Geographic data validation. Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced."

Similar presentations


Ads by Google