Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data cleansing for Dummies: Google to the rescue!!

Similar presentations


Presentation on theme: "Data cleansing for Dummies: Google to the rescue!!"— Presentation transcript:

1 Data cleansing for Dummies: Google to the rescue!!
Title Slide - Use only use one of these and delete the rest Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager

2 The Natural History Museum, London
I work at the Natural History Museum in London – this fantastic building, designed to look like a cathedral . When it was built it was described as the ‘Temple of Nature’.

3 Architectural wonders
Waterhouse building opened in 1881 Steel frame and terracotta Purpose built for natural history collections Architectural wonders Waterhouse building was designed by an english architect called Alfred Waterhouse and it was opened in 1881 It consists of a steel frame covered in terracotta tiles – many of the tiles depicting animals and plants. It was purpose built for our natural history collections, but as we understand more about how the different materials that make up our collection behave over time we have come to realise that the environment conditions are not always appropriate for some collections – and so we have built a couple of modern extensions (Darwin Centre 1 and 2)

4 The Museum 1000 staff 350 science staff
72 million specimens (estimated) Life Sciences Plants, animals, birds, insects Earth Sciences Minerals & gems, rocks, fossils, meteorites The Museum 1000 staff 350 science staff An estimated 72 million specimens Life Sciences Plants, animals, birds, insects Earth Sciences Minerals & gems, rocks, fossils, meteorites

5 My role Geologist by training Collections Manager for rock collections
125,000 rocks 10,000 decorative stones 37,000 ocean sediments 16,000 ore specimens Departmental EMu administrator Registry management Report writing Training & documentation EMu support & upgrade testing Communication My role Geologist by training I joined the museum 19 years ago as a curator Collections Manager for rock collections 125,000 rocks 10,000 decorative stones 37,000 ocean sediments 16,000 Ores My role is somewhat schizophrenic as I also administer the part of Emu and the data relevant to the Mineralogy section. Emu was implemented department by department, rather than as one across the whole of science. I managed the implementation for Mineralogy and since then have continued to act as a focus dealing with: Departmental EMu administrator Registry management Report writing Training & documentation EMu support & upgrade testing Communication

6 ‘Fingers in lots of pies’
Have been involved in cross-museum initiatives involving EMu. I have many ‘fingers in lots of pies’ – this is an english phrase essentially meaning that I ‘m involved in many projects…. A combination of being the first to engage with Emu as well as finding myself on many committees has meant that I have been able to see the potential of using Emu’s toolkit to manage information relating to the collections and collections management activities: I shall introduce you to some of these during my talk:

7 Data cleansing for Dummies: Google to the rescue!!
Title Slide - Use only use one of these and delete the rest Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager

8 The problem

9 Core Information 89,000 Records (73%) Identification = 52,100
Provenance = 64,215 Acquisition = 38,700 Storage = 14,300

10

11

12

13 Numbers Register volume Acquisition records Specimen records 1-5 634
19,283 1-5 (supplementary) 501 (490) 1965 (1927) 1-5 (merged) 1124 21,210 6-11 1832 30,080 Geological Society 510 9,852 TOTAL 3466 63,107 Table slide – right click to update Insert a new slide to create a new purple background for charts and tables

14 The Problem Data sits outside Emu – how to get it in?
Not as easy as it sounds – many barriers… Notes field used for data with uncertain placeholder. Sites data of variable levels of atomisation depending on experience of digitiser. Text Slide – for use with other images or text heavy pages Please note when NHM Ingrid isn’t available Arial should be used

15 Acquisition Lot entry

16

17 The Problem Data sits outside Emu – how to get it in?
Not as easy as it sounds – many barriers… Notes field used for data with uncertain placeholder. Sites data of variable levels of atomisation depending on experience of digitiser. Approx. 95% of specimens have a record in EMu with a minimum of registration number. Once cleaned - How to update records without overwriting enhanced data Unfamiliarity with Access Short time periods for data cleansing. Text Slide – for use with other images or text heavy pages Please note when NHM Ingrid isn’t available Arial should be used

18 The Solution Google Refine Open Refine (Github) Personal web service
Runs in your browser Text Slide with images Use sparingly. The aim is to use them every so often to break up text. These slides are not to be used when other images appear on the page. Please note when NHM Ingrid isn’t available Arial should be used

19 The demo

20 Benefits Intuitive User Interface
Powerful editing / data manipulation functions Can’t make mistakes!  Endless undo…..! Pick up where you left it  Maintains history Link to open-data sources to validate your data Augment your data with free open data sources.


Download ppt "Data cleansing for Dummies: Google to the rescue!!"

Similar presentations


Ads by Google