Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Quality Why should I care?

Similar presentations


Presentation on theme: "Data Quality Why should I care?"— Presentation transcript:

1 Data Quality Why should I care?

2 A Collection’s Assets It’s people It’s material It’s data Institution
Structure Database

3 A Collection’s Investments
It’s people It’s material It’s data Skills Preparations Data Quality

4 Otherwise it is just taking up resources.
Value Proposition The value of a natural history collection is measured by the extent to which it is used. Otherwise it is just taking up resources.

5 We want people to use collections,
Complication We want people to use collections, but... “Data quality is related to use and cannot be assessed independently of the user.” Chrisman (1991), Strong et al. (1997) Who? For what?

6 Vision “At this point I wish to emphasize what I believe will ultimately prove to be the greatest value of our museum. This value will not, however, be realized until the lapse of many years, possibly a century, assuming that our material is safely preserved. And this is that the student of the future will have access to the original record of faunal conditions in California and the west, wherever we now work.” Joseph Grinnell, 1910  "The Uses and Methods of a Research Museum"  Popular Science Monthly Enganche: Ahora bien, Qué tipo de anotaciones se hacen en el campo?

7 Field Notes Data collection Observations
“Our field-records will be perhaps the most valuable of all our results. ...any and all (as many as you have time to record) items are liable to be just what will provide the information wanted. You can't tell in advance which observations will prove valuable. Do record them all!” (Grinnell, 1908) Maps Diaries Photographs 2003 1911 7

8 Why I Should Care Axiom:
The value of a natural history collection is measured by the extent to which it is used. Observation: We don’t know who will ultimately use the collection, nor in what ways. Conclusion: We should make fit as much data as possible for as many uses as possible.

9 For data to be fit for use they must be:
Fitness for Use For data to be fit for use they must be: accessible easy to read timely consistent with other sources easy to interpret complete accurate relevant comprehensive specific Redman (2001)

10 The Data Value Chain Value, Impact Publication Management Collection
Books, journals Standards Publication Online databases Cleaning Enhancement Standards Value, Impact Management Digitization Cleaning Legacy Labels, catalogs Standards Collection Field notes Cleaning Lab notes

11 “We ain't one-at-a-timin' here. We're MASS communicatin’!”
Make data accessible Data Publication Data Aggregators Desde el punto de vista del que publica, tiene las ventajas de: Compartir con más gente a la vez Aprovechar el input de todas esas personas “We ain't one-at-a-timin' here. We're MASS communicatin’!” Pappy O’Daniel (O Brother Where Art Thou)

12 Data Aggregation

13 Data Aggregation A record is complete if it includes an identification at least to species, valid coordinates, full date and basis of record (e.g., observation, specimen)

14 Aggregators help us find data country countryCode “CD”
through countryCode “CD” “DRC” “D.R. Congo” “Dem. Rep. Congo” “CD” “CD” “CD”

15 Aggregators help us find data
They keep our original data and “index” for us voucher PreservedSpecimen They flag data quality issues for us month = 0 Basically, they do data quality improvement at the level of the aggregation.

16 or into the data publishing workflow.
How do they do that? They have code and vocabularies integrated either into their aggregation workflows or into the data publishing workflow. Current efforts underway to use the same data quality assertions for all aggregators.

17 Can I do that? Yes. It requires an investment in skills.
Programming (Java, Python, R) Web services (GeoLocate, Global Names Resolver) Desktop Tools (Excel, Access, OpenRefine) Web Apps (Integrated Publishing Toolkit) Workflows (Kurator, Kurator-Web) List of data quality related tools:

18 References Chapman, A. D Principles of Data Quality, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. ISBN Available online at Chrisman, N.R The Error Component in Spatial Data. pp in: Maguire D.J., Goodchild M.F. and Rhind D.W. (eds) Geographical Information Systems Vol. 1, Principals: Longman Scientific and Technical. Grinnell, J Field Notes. Museum of Vertebrate Zoology, University of California, Berkeley. [Grinnell’s personal field notes]. Grinnell, J The Uses and Methods of a Research Museum. Popular Science Monthly. 77: Redman, T.C Data Quality: The Field Guide. Boston, MA: Digital Press. ISBN: Strong, D.; Lee, Y.; and Wang, R. Data quality in context. Communications of the ACM, 40, 5 (1997), 103–110.


Download ppt "Data Quality Why should I care?"

Similar presentations


Ads by Google