Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton

Slides:



Advertisements
Similar presentations
Support.ebsco.com Australia/New Zealand Reference Centre Basic Searching Tutorial.
Advertisements

Full Text Finder Overview Tutorial support.ebsco.com.
IAML Congress Moscow, 2010 The musicSpace project: orchestrating musicological metadata. David Bretherton, Daniel Alexander.
ISVR Seminar 16 March 2010 MusicSpace: Orchestrating Musicological Metadata
Using Pivots to Explore Heterogeneous Collections A Case Study in Musicology Daniel Alexander Smith 8 December 2009.
MusicSpace David Bretherton mc schraefel (PI), Daniel Alexander Smith, Richard Polfreman,
Musicology in the Digital Age 26 April 2010 Introducing musicSpace David Bretherton
All Hands Meeting 2010, Cardiff The MusicNet Composer URI Project Today’s speaker: David Bretherton
Administration & Workflow
Npr.org search analysis 11/1/14 Justin Bend IDIA 630: Information Architecture FALL 2014.
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features INIS Training Seminar 7-11 October 2013, Vienna Domenico.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
1. Scopus Update November 2004 American University of Beirut Presented by:Amanda Hart Date: 11 November 2004.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
‘european digital library’ (EDL) Julie Verleyen TEL-ME-MOR / M-CAST Seminar on Subject Access Prague, 24 November 2006.
MusicSpace Principle investigator: dr monica mc schraefel David Bretherton Research Fellow (musicSpace)
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
School of something FACULTY OF OTHER University Library The Library’s Digital Repository or Whatever happened to MIDESS? Michael Emly Jonathan Ainsworth.
U of R eXtensible Catalog Team MetaCat. Problem Domain.
Rich Tags: Cross-Repository Browsing Cross-site browsing and exploration of digital repositories Daniel Alexander Smith
InfoPath Forms and Workflows in SP 2010 Wylde Solutions Sydney SharePoint User Group 18 September 2011 Sydney, Australia.
Using Social Care Online: an overview Version 1.0 April 2015.
Searching Without a Net:
SDL Proprietary and Confidential Manager th June 2013 Ashley Mandell-Lynn Director, Global Production Services Martin Radford Business Solutions.
Text Search and Fuzzy Matching
Chapter 5 Application Software.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Publishing Digital Content to a LOR Publishing Digital Content to a LOR 1.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
IAEA International Atomic Energy Agency Agenda item 3.3 INIS IT developments 13th INIS/ETDE Joint Technical Committee Meeting October 2011, Vienna,
Classroom User Training June 29, 2005 Presented by:
Improving the Catalogue Interface using Endeca Tito Sierra NCSU Libraries.
IAML Annual Study Weekend 12 April 2010 musicSpace: Music and the Semantic Web
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
Metasearch and the Public Portal Brenda Bailey-Hainer Colorado State Library PreConference on Usability Issues in Metasearch Interface Design ALA Annual.
Website Accessibility Testing. Why consider accessibility People with disabilities – Visual, Hearing, Physical, Cognitive (learning, reading, attention.
Metadata Normalisation in Europeana The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing.
Support.ebsco.com Basic Searching for K-12 School Libraries Tutorial.
Types of Usability Testing or Usability Inspection Chapter 10.3.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Software and Hardware Interaction
© Paradigm Publishing Inc. 5-1 Chapter 5 Application Software.
IST Programme - Key Action III Semantic Web Technologies in IST Key Action III (Multimedia Content and Tools) Hans-Georg Stork CEC DG INFSO/D5
Music Linked Data Workshop 12 May 2011 JISC, London MusicNet: Aligning Musicology’s Metadata David Bretherton (Music), Daniel Alexander Smith, Joe Lambert.
Search & Searchability. Presentation from David Hawking – CSIRO Ineffectual corporate search tools can be the biggest drag on employee productivity. Knowledge.
ITGS Databases.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development.
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features The Role of the International Nuclear Information System.
Renovation of Eurostat dissemination chain
Chapter Three Presentation: User interface How to Build a Digital Library Ian H. Witten and David Bainbridge.
Irakli Garibashvili Director, National Scientific Library in Georgia.
A Faceted Interface to the Library Catalog Tito Sierra NCSU Libraries ALA Midwinter Meeting January 20, 2007.
Presenting Documents How to Build a Digital Library Ian H. Witten and David Bainbridge.
InfoPath Forms and Workflows in SP 2010 Wylde Solutions Sydney SharePoint User Group 18 September 2011 Sydney, Australia.
Food and Agriculture Organization of the UN GILW Library and Documentation Systems Division Food, Nutrition and Agriculture Ontology Portal.
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
Using Social Care Online: an overview
OLA Super-Conference 2007 Session #403
Outcomes of an e-Learning Course at Monash University Library
Author: Joe ‘The CRM Chap’ Griffin Website: crmchap.co.uk
musicSpace Principle investigator: dr monica mc schraefel
AZ.PBSLearningMedia.org Next Generation Digital Content from Eight – Arizona PBS FREE to educators and families I am excited to share with you a free.
Music Around the World 1. Question & Research Task
PREMIS Tools and Services
Supporting the Digital Humanities Vienna, 19–20 October 2010
IL Step 2: Searching for Information
CSE 635 Multimedia Information Retrieval
Jet Global Solutions Overview
Presentation transcript:

Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton Co-Authors: Daniel A. Smith, mc schraefel, Joe

Presentation overview  I am going to focus on one particular outcome of musicSpace: a successor project called ‘MusicNet’.  I will concentrate on how musicSpace provided the motivation for MusicNet 2

musicSpace 3 3-year project that concluded September

musicSpace’s goals  To integrate access to leading online music resources using the mSpace faceted browser.  Demonstrate that integration could support rapid exploration & knowledge building.  Enable complex, multipart queries. 4

MusicNet 5 July 2010 – June

MusicNet’s goals  Mint URIs for composers so that content providers can unambiguously identify them. – Hope to expand to include all music-related entities.  Publish alignment data to back-link into our data partners’ catalogues, and to other resources.  Build a suite of tools to support the alignment and integration of new linked data resources.  Build a demonstration service to illustrate the uses and benefits of the URIs and alignment data. 6

Contents 1.Brief overview of musicSpace 2.How musicSpace provided the motivation for ‘MusicNet’ 3.MusicNet’s alignment tool 7

1. Brief overview of musicSpace 8

Problem 9

10 Centuries of material...

11... is now increasingly digitised

Yet data is often ‘siloed’. Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by: – Media type (text, image, audio, video) – Date of creation/publication – Subject 12

Yet data is often ‘siloed’. Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by: – Language – Copyright holder – Ad hoc/insecure nature of project funding 13

Yet data is often ‘siloed’. Interoperability has generally not been given a high enough priority. 14

Using current online music data resources presents barriers at all stages of the research process: 15  It is hard to speculatively browse around a subject area.  ‘Real-world’ multipart queries are effectively intractable.

16 The barriers to tractability and their solutions  Need to consult several sources … and metadata from one source cannot guide searches of another source.  Insufficient granularity of data and/or search option.  Multi-part queries have to be broken down and results collated manually. Solutions: Integration Increase granularity Optimally interactive UI (‘mSpace’)

Solution 17

18 ‘musicSpace’ is a faceted browser

19 Demonstration ‘What recording of works by Cage exist, which performers have recorded a particular work by Cage, and what else by Cage have they recorded? Screencast 1:

2. How musicSpace provided the motivation for MusicNet 20

Data is not ‘clean’ Schubert ‏ Schubert, Franz ‏ Schubert, Franz Peter ‏ Shu-po-t ʻ e, ‏ ‎ ‡d ‏ Schubert ‏ ‎ ‡d ‏ F. P. Schubert ‏ Schubert,... ‏ ‎ ‡d ‏ Schubert, F. ‏ Schubert, F. ‏ ‎ ‡d ‏ Schubert, Fr. ‏ Schubert, Fr. ‏ ‎ ‡d ‏ Schubert, Franciszek. ‏ Schubert, Franc ̧. ‏ ‎ ‡d ‏ Schubert, Franc ̧ ois ‏ ‎ ‡d ‏ Schubert, Franz P. ‏ ‎ ‡d ‏ Schubert, Franz Peter ‏ Schubert, Franz Peter, ‏ ‎ ‡d ‏ Schubert, Franz Peter ‏ ‎ ‡d ‏ Schubert, François, ‏ ‎ ‡d ‏ Schubert. ‏ Schubert ‏ ‎ ‡d ‏ Shu-po-t ʿ e ‏ ‎ ‡d ‏ Shubert, F. (Frant ︠ s ︡ ) ‏ ‎ ‡d ‏ Shubert, F. ‏ ‎ ‡q (Frant ︠ s ︡ ), ‏ ‎ ‡d ‏ Shubert, Frant ︠ s ︡, ‏ ‎ ‡d ‏ Shubert, Frant ︠ s ︡ ‏ ‎ ‡d ‏ Sh ū beruto, F. ‏ Sh ū beruto, Furantsu ‏ ‎ ‡d ‏ S ̌ ubert, Franc ‏ ‎ ‡d ‏ S ̌ ubertas, F. (Francas), ‏ ‎ ‡d ‏ S ̌ ubertas, Francas Peteris, ‏ ‎ ‡d ‏ Šubert, F. ‏ Šubertas, F. ‏ ‎ ‡d ‏ שוברט, פרנץ‏ シューベルト, F., ‏ シューベルト, フランツ ‏ ‎ ‡d ‏ 舒柏特, 弗朗茨 ‏ Schubert, Franc ̧ ois ‏ ‎ ‡d ‏ Schubert, Franz Peter ‏ ‎ ‡d ‏

Causes of dirty data  Different naming conventions; – e.g. ‘Bach, Johann Sebastian’ or ‘J. S. Bach’  Inclusion of non-name data in name field; – e.g. ‘Schubert, Franz, Songs’, or ‘Allen, Betty (Teresa)’  Different languages (and alphabets);  User input errors. – e.g. ‘Bach, Johan Sebastien’ 22

Dirty data degrades the user experience 23 Searching for compositions by the composer Franz Schubert (1797–1828)... Screencast 2:

3. MusicNet’s alignment tool 24

Prototype 1 (musicSpace era) 25

Used Alignment API & Google Docs We used Alignment API to compare the names as strings, using WordNet to enable word stemming, synonym support, etc.  Alignment API produces a similarity measure for each possible match.  We planned to set a threshold for automatic approval.  Matches below that threshold would be sent to a Google Docs spreadsheet for expert review. 26

Shortcoming 1: no threshold It was not possible to identify a threshold for automatic approval.  Terms are judged to be similar if they have just, say, one different character, but a difference of one character is usually significant in a name.  Names are proper nouns, and so are unsuited to WordNet’s assumptions about misspelling. 27

Shortcoming 1: no threshold False matches with high similarity measures: True matches with low similarity measures: 28

Shortcoming 2: no context  Alignment API compares names as strings, and the system strips the names of their context (i.e. additional metadata). – Lack of context meant the musicologist had no way to verify the match. Significant flaw; automation had failed so we where relying on manual review. 29

Prototype 2 (building a custom tool for MusicNet) 30

Lessons learned  From Prototype 1: – A completely automated solution is out of the question (for the moment...). – We needed a custom tool with a human-friendly UI (we also wanted keyboard shortcuts for speed). – Access to additional metadata (i.e. context), so matches can be researched by the reviewer.  From experience with faceted browsers: – Alphabetically sorted columns enable one to spot synonymous names at a glance.  Normally sources give names surname first; duplication arises from the different representation of given names. 31

Alignment process Data* 32 Suggested groups Algorithm compares hash of alpha-only l.c. version of name No groups suggested User verified*or rejected* Synonym groups Manual grouping (research*) URIs  Alternative names  Back links*

UI of Prototype 2 33

Prototype 2 demo 34 Screencast 3:

Indicative use cases  Composer URIs: – Music(ological) content providers – Basis of a (re)search portal  Alignment tool: – Aligning databases with no authorities; – Or where authorities are inconsistent. 35

36 Thank you for listening!