Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

Similar presentations


Presentation on theme: "Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager."— Presentation transcript:

1 Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager

2 The Natural History Museum, London

3 Architectural wonders Waterhouse building opened in 1881 Steel frame and terracotta Purpose built for natural history collections

4 1000 staff 350 science staff 72 million specimens (estimated) Life Sciences –Plants, animals, birds, insects Earth Sciences –Minerals & gems, rocks, fossils, meteorites The Museum

5 My role Geologist by training Collections Manager for rock collections –125,000 rocks –10,000 decorative stones –37,000 ocean sediments –16,000 ore specimens Departmental EMu administrator –Registry management –Report writing –Training & documentation –EMu support & upgrade testing –Communication

6 Fingers in lots of pies Have been involved in cross-museum initiatives involving EMu.

7 Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager

8 The problem

9 Core Information 89,000 Records (73%) –Identification = 52,100 –Provenance = 64,215 –Acquisition = 38,700 –Storage = 14,300

10

11

12

13 Numbers Register volumeAcquisition recordsSpecimen records , (supplementary)501 (490)1965 (1927) 1-5 (merged)112421, ,080 Geological Society5109,852 TOTAL346663,107

14 The Problem Data sits outside Emu – how to get it in? Not as easy as it sounds – many barriers… Notes field used for data with uncertain placeholder. Sites data of variable levels of atomisation depending on experience of digitiser.

15 Acquisition Lot entry

16

17 The Problem Data sits outside Emu – how to get it in? Not as easy as it sounds – many barriers… Notes field used for data with uncertain placeholder. Sites data of variable levels of atomisation depending on experience of digitiser. Approx. 95% of specimens have a record in EMu with a minimum of registration number. Once cleaned - How to update records without overwriting enhanced data Unfamiliarity with Access Short time periods for data cleansing.

18 The Solution Google Refine Open Refine (Github) Personal web service Runs in your browser

19 The demo

20 Benefits Intuitive User Interface Powerful editing / data manipulation functions Cant make mistakes! Endless undo…..! Pick up where you left it Maintains history Link to open-data sources to validate your data Augment your data with free open data sources.


Download ppt "Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager."

Similar presentations


Ads by Google