All Hands Meeting 2010, Cardiff The MusicNet Composer URI Project Today’s speaker: David Bretherton

Slides:



Advertisements
Similar presentations
University of Leeds Academic Services PORTOLE Project - Providing Online Resources to Online Learning Environments Tracey Stanley and Mina Sotiriou University.
Advertisements

IAML Congress Moscow, 2010 The musicSpace project: orchestrating musicological metadata. David Bretherton, Daniel Alexander.
ISVR Seminar 16 March 2010 MusicSpace: Orchestrating Musicological Metadata
Using Pivots to Explore Heterogeneous Collections A Case Study in Musicology Daniel Alexander Smith 8 December 2009.
MusicSpace David Bretherton mc schraefel (PI), Daniel Alexander Smith, Richard Polfreman,
Musicology in the Digital Age 26 April 2010 Introducing musicSpace David Bretherton
Konstanz, Jens Gerken ZuiScat An Overview of data quality problems and data cleaning solution approaches Data Cleaning Seminarvortrag: Digital.
University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.
Introducing Copac Copac is a national catalogue giving access to the merged catalogues of c.50 major libraries and collections in the UK and Ireland Copac.
SciVal Experts & SciVal Funding Information Sessions.
Web services for Improving the development of automatic generalisation solutions Nicolas Regnauld Research & Innovarion Ordnance Survey 07 th March 2006,
Administration & Workflow
Linked Open Data: Opportunities & Barriers for Archives Adrian Stevenson LOCAH Project Manager UKOLN, University of Bath, UK Archives 360, Society of American.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
ETEC 100 Information Technology
MusicSpace Principle investigator: dr monica mc schraefel David Bretherton Research Fellow (musicSpace)
U of R eXtensible Catalog Team MetaCat. Problem Domain.
Rich Tags: Cross-Repository Browsing Cross-site browsing and exploration of digital repositories Daniel Alexander Smith
FHIR and Primary Care Systems; and a FHIR Query Tool Robert Worden Open Mapping Software Ltd
Definitions Collaboration – working together on team projects and sharing information, often through ad-hoc processes, to accomplish project goals. Document.
Page 1 ISMT E-120 Introduction to Microsoft Access & Relational Databases The Influence of Software and Hardware Technologies on Business Productivity.
Text Search and Fuzzy Matching
26-28 th April 2004BioXHIT Kick-off Meeting: WP 5.2Slide 1 WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution.
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Data Structures and Algorithms Semester Project – Fall 2010 Faizan Kazi Comparison of Binary Search Tree and custom Hash Tree data structures.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Classroom User Training June 29, 2005 Presented by:
Improving the Catalogue Interface using Endeca Tito Sierra NCSU Libraries.
IAML Annual Study Weekend 12 April 2010 musicSpace: Music and the Semantic Web
‘The Universal Catalogue’ a cultural sector viewpoint David Dawson Senior Policy Adviser (Digital Futures) Museums, Libraries and archives Council.
Todd Kitta  Covenant Technology Partners  Professional Windows Workflow Foundation.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
Chapter 6.
Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton
Searching Sheet Music: IN Harmony Final Report Stacy Kowalczyk Digital Library Program Brownbag Spring Series February 13, 2008.
Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.
Being Smart with Graphs This material is based upon work supported by the National Science Foundation under Grant No. DRL ==≠≠ == Any opinions,
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Music Linked Data Workshop 12 May 2011 JISC, London MusicNet: Aligning Musicology’s Metadata David Bretherton (Music), Daniel Alexander Smith, Joe Lambert.
Target schema and domain evolution Source metadata preparation Source data preparation Metadata matching Target data instantiation Transformation and analysis.
"Hyper Clumps, Mini Clumps and National Catalogues: resource discovery for the 21st century“ 11th November 2004, British Library, London Making sense of.
ITGS Databases.
Introduction to the Semantic Web and Linked Data
A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
A centre of expertise in digital information management UKOLN is supported by: Functional Requirements Eprints Application Profile Working.
©2003 Prentice Hall Business Publishing, Accounting Information Systems, 9/e, Romney/Steinbart 4-1 Relational Databases.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Address Book Application Introducing Database Programming.
The Akoma Ntoso Naming Convention Fabio Vitali University of Bologna.
Renovation of Eurostat dissemination chain
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Irakli Garibashvili Director, National Scientific Library in Georgia.
A Faceted Interface to the Library Catalog Tito Sierra NCSU Libraries ALA Midwinter Meeting January 20, 2007.
SharePoint 2007 Business Intelligence October 23 th, 2008 Neil Iversen - Inetium.
Using OpenRefine in Digital Collections: the Spencer Sheet Music Project Bruce J. Evans Cataloging & Metadata Unit Leader/Music and Fine Arts Catalog Librarian.
Food and Agriculture Organization of the UN GILW Library and Documentation Systems Division Food, Nutrition and Agriculture Ontology Portal.
SharePoint 2007 Business Intelligence
British Library Document Supply Service (BLDSS) API
MS Access Forms, Queries, Reports Matt Martin
Data Virtualization Tutorial: Introduction to SQL Script
Human Computer Interaction Lecture 21,22 User Support
Joseph JaJa, Mike Smorul, and Sangchul Song
Author: Joe ‘The CRM Chap’ Griffin Website: crmchap.co.uk
musicSpace Principle investigator: dr monica mc schraefel
Project tracking system for the structure solution software pipeline
PREMIS Tools and Services
Databases and Information Management
Supporting the Digital Humanities Vienna, 19–20 October 2010
Presentation transcript:

All Hands Meeting 2010, Cardiff The MusicNet Composer URI Project Today’s speaker: David Bretherton Co-Authors: Daniel A. Smith, Joe Lambert, mc schraefel

Contents 1.Project Outline 2.Motivation 3.Towards a Solution 4.Up-Take 2

1. Project Outline 3

Key facts  MusicNet runs July 2010 – June  Funded by the ‘JISC Expose’ scheme.  In partnership with leading music(ological) data providers.  MusicNet is a spin-off from musicSpace. 4

Deliverables  Mint URIs for composers so that content providers can unambiguously identify them. – We hope to expand this work to include all music- related persons and musical works.  Publish alignment data to back-link into our data partners’ catalogues (and to other resources).  Build a suite of tools to support the alignment and integration of new linked data resources.  Build a demonstration service to illustrate the uses and benefits of the URIs. 5

2. Motivation 6 Addressing Non-Alignment

Non-alignment The impetus for MusicNet was musicSpace. musicSpace:  Integrated access to leading online music resources using the mSpace faceted browser.  Demonstrated that integration could support rapid exploration & knowledge building.  Enabled complex, multipart queries. 7

musicSpace demo 8 ‘What recording of works by Cage exist, which performers have recorded a particular work by Cage, and what else by Cage have they recorded?’ Screencast 1:

But partners’ data isn’t ‘clean’... 9 Schubert ‏ Schubert, Franz ‏ Schubert, Franz Peter ‏ Shu-po-t ʻ e, ‏ ‎ ‡d ‏ Schubert ‏ ‎ ‡d ‏ F. P. Schubert ‏ Schubert,... ‏ ‎ ‡d ‏ Schubert, F. ‏ Schubert, F. ‏ ‎ ‡d ‏ Schubert, Fr. ‏ Schubert, Fr. ‏ ‎ ‡d ‏ Schubert, Franciszek. ‏ Schubert, Franc ̧. ‏ ‎ ‡d ‏ Schubert, Franc ̧ ois ‏ ‎ ‡d ‏ Schubert, Franz P. ‏ ‎ ‡d ‏ Schubert, Franz Peter ‏ Schubert, Franz Peter, ‏ ‎ ‡d ‏ Schubert, Franz Peter ‏ ‎ ‡d ‏ Schubert, François, ‏ ‎ ‡d ‏ Schubert. ‏ Schubert ‏ ‎ ‡d ‏ Shu-po-t ʿ e ‏ ‎ ‡d ‏ Shubert, F. (Frant ︠ s ︡ ) ‏ ‎ ‡d ‏ Shubert, F. ‏ ‎ ‡q (Frant ︠ s ︡ ), ‏ ‎ ‡d ‏ Shubert, Frant ︠ s ︡, ‏ ‎ ‡d ‏ Shubert, Frant ︠ s ︡ ‏ ‎ ‡d ‏ Sh ū beruto, F. ‏ Sh ū beruto, Furantsu ‏ ‎ ‡d ‏ S ̌ ubert, Franc ‏ ‎ ‡d ‏ S ̌ ubertas, F. (Francas), ‏ ‎ ‡d ‏ S ̌ ubertas, Francas Peteris, ‏ ‎ ‡d ‏ Šubert, F. ‏ Šubertas, F. ‏ ‎ ‡d ‏ שוברט, פרנץ‏ シューベルト, F., ‏ シューベルト, フランツ ‏ ‎ ‡d ‏ 舒柏特, 弗朗茨 ‏ Schubert, Franc ̧ ois ‏ ‎ ‡d ‏ Schubert, Franz Peter ‏ ‎ ‡d ‏

Causes of dirty data  Different naming conventions; – e.g. ‘Bach, Johann Sebastian’ or ‘J. S. Bach’  Inclusion of non-name data in name field; – e.g. ‘Schubert, Franz, Songs’, or ‘Allen, Betty (Teresa)’  Different languages (and alphabets);  User input errors. – e.g. ‘Bach, Johan Sebastien’ 10

Dirty data degrades the user experience 11 Searching for compositions by the composer Franz Schubert (1797–1828)... Screencast 2

3. Towards a Solution 12 Assisted Synonymous Entity Alignment

Prototype 1 (musicSpace era) 13

Used Alignment API & Google Docs We used Alignment API to compare the names as strings, using WordNet to enable word stemming, synonym support, etc.  Alignment API produces a similarity measure for each possible match.  We planned to set a threshold for automatic approval.  Matches below that threshold would be sent to a Google Docs spreadsheet for expert review. 14

Shortcoming 1: no threshold It was not possible to identify a threshold for automatic approval.  Terms are judged to be similar if they have just, say, one different character, but a difference of one character is significant in a name.  Names are proper nouns, and so are unsuited to WordNet’s assumptions about misspelling. 15

Shortcoming 1: no threshold False matches with high similarity measures: True matches with low similarity measures: 16

Shortcoming 2: no context  Alignment API compares names as strings, and the system strips the names of their context (i.e. additional metadata). – Lack of context meant the musicologist had no way to verify the match. Significant flaw; automation had failed so we where relying on manual review. 17

Prototype 2 (building a custom tool for MusicNet) 18

Lessons learned  From Prototype 1: – A completely automated solution is out of the question (for the moment...). – We needed a custom tool with a human-friendly UI (we also wanted keyboard shortcuts for speed). – Access to additional metadata (i.e. context), so matches can be researched by the reviewer.  From experience with faceted browsers: – Alphabetically sorted columns enable one to spot synonymous names at a glance.  Normally sources give names surname first; duplication arises from the different representation of given names. 19

Alignment process Data* 20 Suggested groups Algorithm compares hash of alpha-only l.c. version of name No groups suggested User verified*or rejected* Synonym groups Manual grouping (research*) URIs  Alternative names  Back links*

UI of Prototype 2 21

Prototype 2 demo 22 Screencast 3:

4. Up-Take 23

Indicative use cases  Composer URIs: – Music(ological) content providers – Basis of a research portal  Alignment tool: – Aligning databases with no authorities; – Or where authorities are inconsistent. 24

25 Thank you for listening! Acknowledgments JISC ( The British Library ( Copac ( Grove Music Online (OUP) ( RISM UK and Ireland (