Music Linked Data Workshop 12 May 2011 JISC, London MusicNet: Aligning Musicology’s Metadata David Bretherton (Music), Daniel Alexander Smith, Joe Lambert.

Slides:



Advertisements
Similar presentations
IAML Congress Moscow, 2010 The musicSpace project: orchestrating musicological metadata. David Bretherton, Daniel Alexander.
Advertisements

ISVR Seminar 16 March 2010 MusicSpace: Orchestrating Musicological Metadata
Using Pivots to Explore Heterogeneous Collections A Case Study in Musicology Daniel Alexander Smith 8 December 2009.
MusicSpace David Bretherton mc schraefel (PI), Daniel Alexander Smith, Richard Polfreman,
Musicology in the Digital Age 26 April 2010 Introducing musicSpace David Bretherton
Automatic Metadata Generation Charles Duncan
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
All Hands Meeting 2010, Cardiff The MusicNet Composer URI Project Today’s speaker: David Bretherton
Introducing Copac Copac is a national catalogue giving access to the merged catalogues of c.50 major libraries and collections in the UK and Ireland Copac.
Sound Images Video The Mediascape aims to introduce you to some useful online resources for finding sound clips, videos and images. It will help you to.
Demonstration of the use of browser extensions in Mozilla to link from a Google Scholar item to a European Library object.
WISER Humanities: Keeping up to date Kate Petherbridge and Gillian Pritchard Oxford University Library Services.
‘european digital library’ (EDL) Julie Verleyen TEL-ME-MOR / M-CAST Seminar on Subject Access Prague, 24 November 2006.
MusicSpace Principle investigator: dr monica mc schraefel David Bretherton Research Fellow (musicSpace)
University of Rochester / River Campus Libraries eXtensible Catalog (XC) Project Funding Agency: Andrew W. Mellon Foundation Principal Investigators: Ron.
U of R eXtensible Catalog Team MetaCat. Problem Domain.
Rich Tags: Cross-Repository Browsing Cross-site browsing and exploration of digital repositories Daniel Alexander Smith
Federated Searching Pre-Conference Workshop - The federated searching cookbook Qin Zhu HP Labs Research Library February 18, 2007.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
What difference a good tool? using Endeca for a faceted catalog Emily Lynema NCSU Libraries ACRL Delaware Valley Chapter Fall Program November 3, 2006.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Planning for Publishing Lecture Notes on the Web Deirdre Hetherington Educational Technology Unit.
Improving the Catalogue Interface using Endeca Tito Sierra NCSU Libraries.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
IAML Annual Study Weekend 12 April 2010 musicSpace: Music and the Semantic Web
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
‘The Universal Catalogue’ a cultural sector viewpoint David Dawson Senior Policy Adviser (Digital Futures) Museums, Libraries and archives Council.
Google Books, UMI and Other Intriguing Trends in Digital Publishing Joe Wible Hopkins Marine Station of Stanford University October 9, 2006.
NEPTUNE Canada Workshop Oceans 2.0 Project Environment NEPTUNE Canada DMAS Team Victoria, BC February 16, 2009.
Metadata Normalisation in Europeana The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing.
OpenURL Link Resolvers 101
Types of Usability Testing or Usability Inspection Chapter 10.3.
Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Software and Hardware Interaction
NCSU Libraries Andrew Pace & Emily Lynema NCSU Libraries May 24, 2006.
Jenn Riley Metadata Librarian Digital Library Program.
Open access & visibility Management Digital Preservation ORA: Purposes.
Strategies for Conducting Research on the Internet Angela Carritt User Coordinator, Oxford University Library Services Angela Carritt User Education Coordinator,
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Implementing PTFS ArchivalWare at York St John University: a project under the JISC Repositories Start-up and Enhancement (SUE) strand Helen Westmancoat.
Remote Usability Testing S&I Framework Browser: Overview.
From small beginnings: Developing collection level description Mapping the Information Landscape Showcase day British Library Conference Centre, London,25.
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development.
GOOGLE SCHOLAR Compiled by Helene van der Sandt. WHAT IS GOOGLE SCHOLAR?
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Chapter Three Presentation: User interface How to Build a Digital Library Ian H. Witten and David Bainbridge.
Irakli Garibashvili Director, National Scientific Library in Georgia.
Presenting Documents How to Build a Digital Library Ian H. Witten and David Bainbridge.
Using OpenRefine in Digital Collections: the Spencer Sheet Music Project Bruce J. Evans Cataloging & Metadata Unit Leader/Music and Fine Arts Catalog Librarian.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Googling to the Max. What Will be Covered? Synonym searching Google and punctuation Searching by title, site, domain and file type Searching by country.
INHA UNIVERSITY, KOREA Rainer Simon Austrian Institute of Technology.
Moshe Shechter | Alma Product Manager
Resource Discovery Landscape
Current as of April/May 2013
SHERPA/RoMEO Future Features
Primo - functional review call - Getting started with Primo
Video Images Sound Find further information and tutorials
Linking persistent identifiers at the British Library
Library Workshop for ENG1377 Exploring iSearch & Google Scholar
Introduction to Endnote
musicSpace Principle investigator: dr monica mc schraefel
IIIF AV Player Andrew Kam.
Supporting the Digital Humanities Vienna, 19–20 October 2010
WISER Humanities: Keeping up to date
Automatically managing your music metadata
OpenDOAR and ROAR RSP Services Day, Nottingham, 23rd Apr.2008
Presentation transcript:

Music Linked Data Workshop 12 May 2011 JISC, London MusicNet: Aligning Musicology’s Metadata David Bretherton (Music), Daniel Alexander Smith, Joe Lambert and mc schraefel (Electronics and Computer Science)

David Bretherton 2

musicSpace, the precursor to MusicNet 3

Problem 4

Digitised data is often ‘siloed’. Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by: – Media type (text, image, audio, video) – Date of creation/publication – Subject 5

Digitised data is often ‘siloed’. Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by: – Language – Copyright holder – Ad hoc/insecure nature of project funding 6

Digitised data is often ‘siloed’. Interoperability has generally not been given a high enough priority. And, because the datasets are ‘mature’ the data isn’t Linked Data. 7

Solution 8

9 ‘musicSpace’ is a faceted browser

10 Demonstration ‘What recording of works by Cage exist, which performers have recorded a particular work by Cage, and what else by Cage have they recorded? Screencast 1:

How musicSpace provided the motivation for MusicNet 11

Problem: you can align metadata fields, but this doesn’t align the data in those fields 12 Schubert ‏ Schubert, Franz ‏ Schubert, Franz Peter ‏ Shu-po-t ʻ e, ‏ ‎ ‡d ‏ Schubert ‏ ‎ ‡d ‏ F. P. Schubert ‏ Schubert,... ‏ ‎ ‡d ‏ Schubert, F. ‏ Schubert, F. ‏ ‎ ‡d ‏ Schubert, Fr. ‏ Schubert, Fr. ‏ ‎ ‡d ‏ Schubert, Franciszek. ‏ Schubert, Franc ̧. ‏ ‎ ‡d ‏ Schubert, Franc ̧ ois ‏ ‎ ‡d ‏ Schubert, Franz P. ‏ ‎ ‡d ‏ Schubert, Franz Peter ‏ Schubert, Franz Peter, ‏ ‎ ‡d ‏ Schubert, Franz Peter ‏ ‎ ‡d ‏ Schubert, François, ‏ ‎ ‡d ‏ Schubert. ‏ Schubert ‏ ‎ ‡d ‏ Shu-po-t ʿ e ‏ ‎ ‡d ‏ Shubert, F. (Frant ︠ s ︡ ) ‏ ‎ ‡d ‏ Shubert, F. ‏ ‎ ‡q (Frant ︠ s ︡ ), ‏ ‎ ‡d ‏ Shubert, Frant ︠ s ︡, ‏ ‎ ‡d ‏ Shubert, Frant ︠ s ︡ ‏ ‎ ‡d ‏ Sh ū beruto, F. ‏ Sh ū beruto, Furantsu ‏ ‎ ‡d ‏ S ̌ ubert, Franc ‏ ‎ ‡d ‏ S ̌ ubertas, F. (Francas), ‏ ‎ ‡d ‏ S ̌ ubertas, Francas Peteris, ‏ ‎ ‡d ‏ Šubert, F. ‏ Šubertas, F. ‏ ‎ ‡d ‏ שוברט, פרנץ‏ シューベルト, F., ‏ シューベルト, フランツ ‏ ‎ ‡d ‏ 舒柏特, 弗朗茨 ‏ Schubert, Franc ̧ ois ‏ ‎ ‡d ‏ Schubert, Franz Peter ‏ ‎ ‡d ‏

Causes of ‘dirty’ data (for names)  Different naming conventions; – e.g. ‘Bach, Johann Sebastian’ or ‘J. S. Bach’  Inclusion of non-name data in name field; – e.g. ‘Schubert, Franz, Songs’, or ‘Allen, Betty (Teresa)’  Different languages (and alphabets);  User input errors. – e.g. ‘Bach, Johhan Sebastien’ 13

Dirty data degrades the user experience 14 Searching for compositions by the composer Franz Schubert (1797–1828)... Screencast 2:

MusicNet’s alignment tool 15

Prototype 1 (musicSpace era) 16

Used Alignment API & Google Docs We used Alignment API to compare the names as strings, using WordNet to enable word stemming, synonym support, etc.  Alignment API produces a similarity measure for each possible match.  We planned to set a threshold for automatic approval.  Matches below that threshold would be sent to a Google Docs spreadsheet for expert review. 17

Shortcoming: no threshold False matches with high similarity measures: True matches with low similarity measures: 18

Prototype 2 (building a custom tool for MusicNet) 19

Design considerations  From Prototype 1: – A completely automated solution is out of the question (for the moment...). – We needed a custom tool with a human-friendly UI (we also wanted keyboard shortcuts for speed). – Access to additional metadata (i.e. context), so matches can be researched by the reviewer.  From experience with faceted browsers: – Alphabetically sorted columns enable one to spot synonymous names at a glance.  Normally sources give names surname first; duplication arises from the different representation of given names. 20

Alignment process Data* 21 Suggested groups Algorithm compares hash of alpha-only l.c. version of name No groups suggested User verified*or rejected* Synonym groups Manual grouping (research*) URIs  Alternative names  Back links*

UI of Prototype 2 22

Prototype 2 demo 23 Screencast 3:

Daniel Alexander Smith 24

Linked Data 25  URI for everything  e.g. Beethoven is: – 07e07a7f9db8aed7c72d2ebeab2#id 07e07a7f9db8aed7c72d2ebeab2#id – eethoven eethoven – 92-a621-4f c5373b7eac9#artist 92-a621-4f c5373b7eac9#artist

Contribution 26  MusicNet provides links between composers in multiple scholarly repositories  We also link to MusicBrainz and BBC /music  This can be fed back into projects like musicSpace where disambiguation is a problem

27

MusicNet Published Data 28  Links between multiple URIs  Representations from each source  Machine-readable, standardised to build applications over this data  Human searchable and usable too 

29

30

Provenance 31  Retains source of information  e.g. that Grove say “Schubert, Franz (Peter)” and British Library say “Schubert, Franz” and “Schubert”

Provenance 32  When they don’t exist already, musicnet provides individual URIs for a composer from each source, e.g.: – f11c7d625d9aabb27a6174#blcollecti on f11c7d625d9aabb27a6174#blcollecti on  Then links back to search URLs, e.g.: – b&request=Schubert%2C+Franz&find_code= WNA b&request=Schubert%2C+Franz&find_code= WNA

33

34

Links from BBC /music 35  Harvested links from BBC to: – DBPedia – New York Times – IMDB – PBS – etc.

36 Thank you for listening!