Search and Annotation Tool for Oral History INTER-VIEWS Henk van den Heuvel, Centre for Language and Speech Technology (CLST) Radboud University Nijmegen,

Slides:



Advertisements
Similar presentations
INTER-VIEWs Curation of Interview Data 1 feb. – 1 nov CLST, Nijmegen,, Henk van den Heuvel Centre for.
Advertisements

A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
Recruitment Booster.
United Streaming Professional Development. Welcome Welcome and thank you for coming! Today we will be learning several helpful ways to integrate United.
MASTER QUOTE OVERVIEW.
Personalized Navigation in the Semantic Web: An Enhanced Faceted Browser Michal Tvarožek FIIT STU BA.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Information & Library Services Australian Education Index, British Education Index and ERIC Sally Giffen August 2006.
Digital Video Archiving. ViArchive Overview ViArchive provides user friendly solutions for… – uploading video clips with metadata (searchable file info.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
New Dialogues with the Research Community. About the Collection, Management and Dissemination of Interview Data Marion Wittenberg
Blogs – what, why and how? A blog is a web-log It is a simple website that anyone can setup without any advanced computer know-how It’s the future: blogs,
Copyright © 2003 Americas’ SAP Users’ Group Simple Document Management in Project Systems Kent Bettisworth BETTISWORTH & ASSOCIATES, INC. Tuesday, May.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Resource Discovery Module DigiTool Version 3.0. Resource Discovery 2 Deposit Approval Search & Index Dispatcher & Viewers Single & Bulk Web Services DigiTool.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Royal Netherlands Academy of Arts and Sciences NARCIS: The Gateway to Dutch Scientific Information Elly Dijk, Chris Baars, Arjan Hogenaar and Marga van.
Simfund Filing Training Introduction First Look Step by Step Training.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
Towards a definition of GestBase - an open database of gestures Milan Rusko Institute of Informatics of the Slovak Academy of Sciences, Bratislava.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. M I C R O S O F T ® Preparing for Electronic Distribution Lesson 14.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
DEVELOPMENT QA REPORTS A Series of Reports to Enforce Compliance with Your PeopleSoft Development Standards Leandro Baca.
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
Contactforum: Digitale bibliotheken voor muziek. 3/6/2005 Real music libraries in the virtual future: for an integrated view of music and music information.
CLARIN-NL Call 4 Jan Odijk CLARIN-NL Call 4 Info-session Amsterdam, 30 Aug
Associate ® Typist – Transcription Module Starting Associate Transcription: To start Associate typist module, double click the Associate dictation icon.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice SISP Training Documentation Template.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
Getting the Most Out of Interfase!.  This presentation will include: Faculty Database Mentor Database Activity Log Mass Templates Record Merge.
WISER Social Sciences: Politics & International Relations Gillian Beattie (Social Science Library) Jane Rawson (Vere Harmsworth Library)
Populating the infrastructure the case of the Netherlands Hans Bennis executive board of CLARIN-NL Meertens Institute (KNAW) CLARIN COORDINATORS BUDAPEST,
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
Training Guide for Inzalo SOP Users. This guide has been prepared to demonstrate the use of the Inzalo Intranet based SOP applications. The scope of this.
NoteSearch - Find what you’re looking for. Prototype Team B.
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
EVIA Digital Archive New Tools William G. Cowan Mike Durbin Digital Library Program EVIA Digital Archive DLP Brown Bag 20 September 2006.
1 THESAURUS LINGUAE LATINAE ONLINE. 2 Content Thesaurus Linguae Latinae is the largest and most detailed Latin dictionary in the world: Ranges from the.
Copenhagen, 7 June 2006 Toolkit update and maintenance Anton Cupcea Finsiel Romania.
Introduction ESDS Qualidata John Southall ESDS Creating and delivering re-usable qualitative data 24 June 2004.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
Introduction to KE EMu
Tekstcollecties in Nederlab Hennie Brugman Meertens Instituut Workshop ‘morfosyntactisch verrijken van historische teksten’,
Adapting to the user's needs and preferences Behzad Kateli.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Video Active Presentation Agenda: –Demonstration of videoactive.eu Frontend and Backend fiatifta.dk Copenhagen September 2008.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Archiving.Net® Document Management System rchiving.Net® is a bi-lingual (Arabic/English) document management system that lets you capture, index, organize,
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
Starting Associate Dictation: To start the Associate Dictation double click the Associate dictation icon on your desktop or from the preferred link set.
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
1 Dr. Cord Pagenstecher Testimonies on Nazi Forced Labor and the Holocaust Building Digital Environments for Research and Education Dr. Cord Pagenstecher.
ELAN as a tool for oral history CLARIN Oral History Workshop Oxford Sebastian Drude CLARIN ERIC 18 April 2016.
Workshop Oral History and Speech Technology
Current as of April/May 2013
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
3.0 Map of Subject Areas.
Data Mining Chapter 6 Search Engines
Chapter 13 Adding Slide Transitions
Presentation transcript:

Search and Annotation Tool for Oral History INTER-VIEWS Henk van den Heuvel, Centre for Language and Speech Technology (CLST) Radboud University Nijmegen, the Netherlands

The Data IPNV Interview corpus: Over 1,100 audio-recorded interviews with veterans -Various missions of the Dutch Army -Collected by Stef Scagliola for the Veterans Institute, Doorn, Netherlands Selection of 250 interviews to be included for the tool: -120: World War II -100: Netherlands East Indies -30: New Guinea -Interviews of about 2 to 2.5 hours -Stored in 16 kHz, 16 bits wave -Stored at DANS using Persistent Identifiers

Background of the tool A result of a range of projects in the years With the CLST, the Veterans Institute & DANS Veteran Tapes: Enhanced publication: Living Oral History Workbench: -index 246 interviews with relevant search terms -using Automatic Speech Recognition, -annotate retrieved fragments in a Wiki-like environment

Background of the tool INTER-VIEWS: -All 246 interviews were persistently stored at DANS -Persistent Identifiers are used to refer to the individual interviews -The metadata components of the interviews were registered in the CLARIN’s metadata component registry and linked to ISOcat categories (CMDI files) -The same was done for the metadata of the Oral History Annotation Tool (e.g. tool name, owner, makers, description, input, output, availability) Further data curation in CLARIN: -Completing the CLARIN metadata for 950 IPNV interviews in total (including the 246 interviews curated before)

Objectives of the tool 1.Find relevant fragments in large collection of audio data 2.Add annotations / comments to selected fragments 3.Make annotations public to other researchers (or not) 4.Verification of research results and claims in publications

Challenges: Disclosing the audio Automatic Speech Recognition -Speaker adaption on 2.5 minutes per speaker -Lexicon with keywords -Language model -Key words from: –Thesaurus –Summaries No exact transcripts, but effective keyword spotting ! Decoding Search Feature Extraction LexiconLM Result Acoustic Models

Features of the tool Retrieval of interviews and fragments of interviews based on Automatic Speech Recognition output Audio playback for retrieved fragments Metadata of all interviews Transcription of audio segments Annotations to fragments to be added by registered users A user administration to restrict the transcription & annotation facilities to registered users Adjustment of a fragment’s start and end point Advanced search options The tool is compliant with CLARIN-NL standards

Heuvel, H. van den, Sanders, E., Rutten, R., Scagliola, S.,Witkamp, P. (2012): An Oral History Annotation Tool for INTER-VIEWs Proceedings LREC2012, Istanbul, pp Heuvel, H. van den, Oostdijk, N. (2016): Falling silent, lost for words... Tracing personal involvement in interviews with Dutch war veterans. In: Proceedings LREC 2016, May 2016, Portorož, Slovenia. Publications

The tool

Desirable extensions of the tool The option to navigate through the audio of the full interview Extend search facility to metadata, annotations, summary texts Integrate the tool with the fragment fitter so as to make it suitable for Enhanced Publications Visualisation by a timeline to show the chronological order of inserted annotations Introduce a shop cart in which a user can collect relevant fragments for his/her own use Employ the tool for other audio collections NB: A newer tool for Document retrieval for full interview (600 interviews in total) is available at:

Login screen Oral History Search and Annotation Tool CLST, Nijmegen,

Search by interview & time code Oral History Search and Annotation Tool CLST, Nijmegen,

Oral History Search and Annotation Tool CLST, Nijmegen, Search by word(s)

Hits: Fragment list Information for selected fragment Change time interval Sound Oral History Search and Annotation Tool CLST, Nijmegen,

Add annotation Add transcription Oral History Search and Annotation Tool CLST, Nijmegen,

Oral History Search and Annotation Tool CLST, Nijmegen, Publish annotation Add attachments My annotations