Data cleansing for Dummies: Google to the rescue!!

Slides:



Advertisements
Similar presentations
Who Wants To Be A Millionaire?
Advertisements

Welcome to Who Wants to be a Millionaire
Collections Management Software for Museums and Archives r e d i s c o v e r y s o f t w a r e. c o m O V E R V I E W P R E S E N T A T I O N.
Name: Date: Read temperatures on a thermometer Independent / Some adult support / A lot of adult support
IDN Services and SERF Update Heather Weir
Editing Instructions Simply add a question and 4 possible answers by overtyping the white text. The green box on the next slide shows which answer should.
£1 Million £500,000 £250,000 £125,000 £64,000 £32,000 £16,000 £8,000 £4,000 £2,000 £1,000 £500 £300 £200 £100 Welcome.
Welcome to Who Wants to be a Millionaire
£1 Million £500,000 £250,000 £125,000 £64,000 £32,000 £16,000 £8,000 £4,000 £2,000 £1,000 £500 £300 £200 £100 Welcome.
Welcome to Who Wants to be a Millionaire
Welcome to Who Wants to be a Millionaire
Migrating Entomologys Collection Management System to EMu Adrian Hine.
Components: How Bibliographic Records Became Grandparents Heather Curtis, Project Manager 2UNITED STATES HOLOCAUST MEMORIAL MUSEUM.
Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.
Heteroptera: True Bugs 7 infraorders 85 families 40,000 described species.
Relative and Absolute Dating
Hosted by Mr. Mariano Science Earth & Space Science Standard Units Metric System
The ultimate Canadian geoscience reference database launched: September 2011 Elisabeth Kosters executive director.
MvCIS - Forbes Hawkins – Copyright © 2004 Museum Victoria Forbes Hawkins Collection Systems Developer Museum Victoria - Melbourne, Australia Museum Victoria.
Colorado Alliance of Research Libraries
Viktoria Kalke Mercon Benefit Services DON’T AGONIZE. ORGANIZE. LAWRENCE R. KENNEDY.
COMOS Mobile Solutions 1.0 Simplified global collaboration
EMu Online Data Sources Brad Lickman For Taxonomy and Geolocation (and Vocabulary Control)
LeadManager™- Internet Marketing Lead Management Solution May, 2009.
$1 Million $500,000 $250,000 $125,000 $64,000 $32,000 $16,000 $8,000 $4,000 $2,000 $1,000 $500 $300 $200 $100 Welcome.
PRODUCT TOUR. OUR OBJECTIVES Present an overview of OffenderWatch Get you started right away Provide continuous support.
05/10/2011http:// 1/15 Connected! How we Integrated our Collections in WordPress using the EMu API Paul Trafford
Depends entirely on support from the user base Many technical issues still need to be resolved Long term development horizon Proposal for a Simplified.
The Environment and Approach
Collections Management Museums EMu – Data Cleaning with EMu Data Cleaning with EMu Warren Hindley.
By: Colby Lacks.  Without such fossils, scientists would know very little about the history of life on earth.  The answers we get from rocks often cause.
Wet Specimen Collections and Alcohol Management A presentation by Giselle Stanton Collection Information - Standards and Support Collection Information.
Dave Smith Petrology Collections Manager Global EMu Users Meeting, NHM (11-14 th Oct 2011) Mapping museum pest activity.
MVWISE Wireless Input System for EMu Forbes Hawkins Collection Systems Developer Museum Victoria - Melbourne, Australia
The NSDL Registry Diane Hillmann  Jon Phipps. What We’re Doing Received an NSF grant in Oct. 2006, to: Register metadata schemas, vocabularies, application.
Supporting high-throughput digitisation workflows in EMu
Smithsonian Museum Of Natural History(SMNH) Lesson Plans For K-12 And Class Field Trip.
Automation in Registry Practice Thames Cancer Registry Jason Hiscox, Stephen Richards, Pam Acworth Automated Registration Workshop 4th December 2002.
RATE Managing Your Training. RATE Today's Objectives: Introduce you to RATE and it's purpose Highlight the core concepts and features Demo RATE in action.
Using Microsoft ACCESS to develop small to medium applications on campus.
The purpose of this Software Requirements Specification document is to clearly define the system under development, that is, the International Etruscan.
The GNM-DMS a Document Management System for the Germanische Nationalmuseum Martin Doerr, ICS-Forth Siegfried Krause, GNM April 2004.
Dave Smith Petrology Collections Manager European KE EMu Users Conference, April 2012 What’s underfoot? KE EMu un-earths a new moth habitat.
- 1 - Roadmap to Re-aligning the Customer Master with Oracle's TCA Northern California OAUG March 7, 2005.
Sarasota Policy Wiki Why Wiki? To provide a new platform for community input on public policies and issues. To encourage engagement.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
Collections Management Proposal for a Simplified Structure for EMu Chicago, Oct 2005.
June 2007Management and institutional contact SCONUL Access The largest borrowing scheme for higher education in the UK and Ireland.
Student Records Training Team
CSA Discovery Services!! Community of Scholars PapersInvited COS Funding Opportunities.
EMu Interface and the Web Clear identification of web fields for users and administrators Visual identifier of the web presentations in EMu, ie Collection.
Using REDCap (Research Electronic Data Capture) as a tool to perform research studies Abstract ID no. IRIA-1076.
Cscape EnvisionRV Horner APG, LLC. EnvisionRV - What Does it Do? Allows viewing and interacting with remote OCS controllers. The PC displays screens that.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Eileen S. Graham 1 & David E. Schindel 2 registry: grscicoll.org web: scicoll.org Author Affiliations 1 Scientific Collections International,
ACCOUNTABILITY LEADERSHIP INSTITUTE FOR ENGLISH LEARNERS & IMMIGRANT STUDENTS Digital Chalkboard: Online Resources for English Learners and Immigrant Students.
Using the Exchange Network A User’s Perspective Deb Soule Watershed Management Bureau New Hampshire Department of Environmental Services.
EMuUsers.Org KE EMu Users List server to be moved to EMuUsers.org which can be found at Site has been hosted by MV for a while.
Cathy Manduca, Sean Fox Science Education Resource Center, Carleton College Integrated Sites/Distributed Authors: SERC Web-based Authoring Environment.
What the $#*! IS my password? Secure Online Password Storage Lon Smith Aaron Gremmert.
The Fossil Record. 1. Fossils are the remains of organisms that lived in a previous geologic time. 2. The study of these fossils is called paleontology.
Vacation Rental Software Vacation Rental Property Software Help Vacation Property Managers Get a Vacation Rental Software with 10% Discount ! Buy Vacation.
Strengthening the capacities of the CRFM and its member states for information and knowledge sharing on sustainable management of fisheries in the Caribbean.
Dr. Kęstutis Adamonis, Dr. Romanas Zykus,
The Rock & Fossil Record
Geology Spring 2015 – 8th Grade.
New Event Registration Release Review
Geology Spring 2016 – 8th Grade.
Geological Change Over Time
Presentation transcript:

Data cleansing for Dummies: Google to the rescue!! Title Slide - Use only use one of these and delete the rest Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager

The Natural History Museum, London I work at the Natural History Museum in London – this fantastic building, designed to look like a cathedral . When it was built it was described as the ‘Temple of Nature’.

Architectural wonders Waterhouse building opened in 1881 Steel frame and terracotta Purpose built for natural history collections Architectural wonders Waterhouse building was designed by an english architect called Alfred Waterhouse and it was opened in 1881 It consists of a steel frame covered in terracotta tiles – many of the tiles depicting animals and plants. It was purpose built for our natural history collections, but as we understand more about how the different materials that make up our collection behave over time we have come to realise that the environment conditions are not always appropriate for some collections – and so we have built a couple of modern extensions (Darwin Centre 1 and 2)

The Museum 1000 staff 350 science staff 72 million specimens (estimated) Life Sciences Plants, animals, birds, insects Earth Sciences Minerals & gems, rocks, fossils, meteorites The Museum 1000 staff 350 science staff An estimated 72 million specimens Life Sciences Plants, animals, birds, insects Earth Sciences Minerals & gems, rocks, fossils, meteorites

My role Geologist by training Collections Manager for rock collections 125,000 rocks 10,000 decorative stones 37,000 ocean sediments 16,000 ore specimens Departmental EMu administrator Registry management Report writing Training & documentation EMu support & upgrade testing Communication My role Geologist by training I joined the museum 19 years ago as a curator Collections Manager for rock collections 125,000 rocks 10,000 decorative stones 37,000 ocean sediments 16,000 Ores My role is somewhat schizophrenic as I also administer the part of Emu and the data relevant to the Mineralogy section. Emu was implemented department by department, rather than as one across the whole of science. I managed the implementation for Mineralogy and since then have continued to act as a focus dealing with: Departmental EMu administrator Registry management Report writing Training & documentation EMu support & upgrade testing Communication

‘Fingers in lots of pies’ Have been involved in cross-museum initiatives involving EMu. I have many ‘fingers in lots of pies’ – this is an english phrase essentially meaning that I ‘m involved in many projects…. A combination of being the first to engage with Emu as well as finding myself on many committees has meant that I have been able to see the potential of using Emu’s toolkit to manage information relating to the collections and collections management activities: I shall introduce you to some of these during my talk:

Data cleansing for Dummies: Google to the rescue!! 01110010100101010 10010100010001011 11100001010100101 00100100010010101 11010110010010010 00101001010010101 Title Slide - Use only use one of these and delete the rest Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager

The problem

Core Information 89,000 Records (73%) Identification = 52,100 Provenance = 64,215 Acquisition = 38,700 Storage = 14,300

Numbers Register volume Acquisition records Specimen records 1-5 634 19,283 1-5 (supplementary) 501 (490) 1965 (1927) 1-5 (merged) 1124 21,210 6-11 1832 30,080 Geological Society 510 9,852 TOTAL 3466 63,107 Table slide – right click to update Insert a new slide to create a new purple background for charts and tables

The Problem Data sits outside Emu – how to get it in? Not as easy as it sounds – many barriers… Notes field used for data with uncertain placeholder. Sites data of variable levels of atomisation depending on experience of digitiser. Text Slide – for use with other images or text heavy pages Please note when NHM Ingrid isn’t available Arial should be used

Acquisition Lot entry

The Problem Data sits outside Emu – how to get it in? Not as easy as it sounds – many barriers… Notes field used for data with uncertain placeholder. Sites data of variable levels of atomisation depending on experience of digitiser. Approx. 95% of specimens have a record in EMu with a minimum of registration number. Once cleaned - How to update records without overwriting enhanced data Unfamiliarity with Access Short time periods for data cleansing. Text Slide – for use with other images or text heavy pages Please note when NHM Ingrid isn’t available Arial should be used

The Solution Google Refine Open Refine (Github) Personal web service Runs in your browser Text Slide with images Use sparingly. The aim is to use them every so often to break up text. These slides are not to be used when other images appear on the page. Please note when NHM Ingrid isn’t available Arial should be used

The demo

Benefits Intuitive User Interface Powerful editing / data manipulation functions Can’t make mistakes!  Endless undo…..! Pick up where you left it  Maintains history Link to open-data sources to validate your data Augment your data with free open data sources.