MALACH Multilingual Access to Large spoken ArCHives Survivors of the Shoah Visual History Foundation Human Language Technologies IBM T. J. Watson Research.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

Chapter 3 Application Software p. 6.
Knowledge is Empowerment Tutorial Guide no. 18 SEARCH IN JSTOR LANGUAGE & LITERATURE and ART & SCIENCE.
Chapter 3: Understanding users. What goes on in the mind?
4/5/05 University of Southern California As If By Magic As If By Magic Presentation to the Coalition for Networked Information April 5, 2005 Presented.
Language Technology Research Serving eHumanities New Ways of Accessing the USC Shoah Foundation Archive in the Center for Visual History Malach Jan Hajič.
Library Online Catalog Tutorial Pentagon Library Last Updated March 2008.
Collection and Service of CADAL Project Huang Chen Zhejiang Uni. Libraries ALA.
Gale Cengage Learning Research Material Tips to Database Navigation.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
Information Management for Science in Korea Hyun Y. Cho Department of Library & Information Science Kyonggi University
Interfaces for Retrieval Results. Information Retrieval Activities Selecting a collection –Talked about last class –Lists, overviews, wizards, automatic.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
E-resources for the social sciences A brief overview of general resources for the social sciences: –Bibliographic databases –Resources for news and statistics.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Open Statistics: Envisioning a Statistical Knowledge Network Ben Shneiderman Founding Director ( ), Human-Computer Interaction.
Access to News Audio User Interaction in Speech Retrieval Systems by Jinmook Kim and Douglas W. Oard May 31, th Annual Symposium and Open House.
WELCOME TO SMART SEARCHING Nov. 14, 2006 Susan Hurst.
Introduction to Information Technology v Session : 07 v Source : Shelly, Gary B. Discovering Computers (2004/2005/2006). Thomson Course Technology. Chapter.
PowerPoint 2007 ©: The Power of Presentations How can Microsoft PowerPoint 2007 help you convey your message?
Europeana: Europe's Digital Library, Museum and Archive Ashley Carter and Dana Sagona.
RSS Feeds in AquaBrowser Library Staff Training Upper Midwest Users Group Conference 18 October 2011 Nina Mentzel, SDLN
PLUG-INs Information Fujariah Colleges
Humanities Centres and the Digital Humanities Alan Liu Australasian Consortium of Humanities Research Centres Annual Meeting, July 8, 2013.
Pasewark & Pasewark Microsoft Office 2003: Introductory 1 INTRODUCTORY MICROSOFT ACCESS Lesson 1 – Access Basics.
Advanced User Guide to Outlook and all its features.
Culture & Sport Science & Technology: iMus – Israeli Museums System Public web portal
Cross-Language Access to Recorded Speech in the MALACH Project Douglas Oard, Dina Demner-Fushman, Jan Hajic, Bhuvana Ramabhadran, Sam Gustman, Bill Byrne,
PowerPoint 2007 ©: The Power of Presentations How can Microsoft PowerPoint 2007 help you convey your message?
Accessing videos in the EBSCO database. The first step to getting to the Secondary Library/Media Center Pages is to click on Academics.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
Finding Primary Documents A Tutorial. What Are Primary Sources? Although the terms primary and secondary are not always sharply divided, in general. primary.
November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.
Producción de Sistemas de Información Agosto-Diciembre 2007 Sesión # 8.
What is a Database? A Database is…  an organized set of stored information usually on one topic  a collection of records  a way to organize information.
The PubMed ® Game Designed for librarians & library staff From PubMed for Experts Brought to you by NN/LM Pacific Southwest Region February 2013 rev 5.
Questionmark’s 2005 Users Conference  New Orleans Copyright © Questionmark Corporation and/or Questionmark Computing Limited, known collectively.
Building the Mother of all Collections: the future of the National Library’s discovery services Warwick Cathro Assistant Director-General, Innovation National.
Tutorial EBSCO Discovery Service for Corporate Users support.ebsco.com.
Using the Kilgore College Library Online Resources Government.
MS Access: Introduction 1Database Design. MS Access: Overview MS Access A Database Management System (DBMS) designed to create applications that organize,
Structure of IR Systems INST 734 Module 1 Doug Oard.
Copyright © 2010 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with Programming Logic & Design Second Edition by Tony Gaddis.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
Information Retrieval
1 / 44 Chapter 3 Application Software. 2/ 44 Chapter 3 Objectives Identify the categories of application software Explain ways software is distributed.
September 16, 2004CLEF 2004 CLEF-2005 CL-SDR: Proposing an IR Test Collection for Spontaneous Conversational Speech Gareth Jones (Dublin City University,
Discovery Environments Barbara DeFelice Director, Digital Resources & Scholarly Communications Programs Dartmouth College Library RUSA/MARS Local Systems.
Colby Smart, E-Learning Specialist Humboldt County Office of Education
November 8, 2005NSF Expedition Workshop Supporting E-Discovery with Search Technology Douglas W. Oard College of Information Studies and Institute for.
1 CLASS Lesson Planning System and Teachers’ Collaboratory Dagobert Soergel With Katy Lawley, Tandeep Sidhu, Ryen White, and David Doermann College of.
Notes: **A Row is considered one Record. **A Column is a Field. A Database is…  an organized set of stored information usually on one topic  a collection.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
CLARIN ERIC Franciska de Jong Oxford April 2016
Multidisciplinary Databases
An Innovative Approach to the Technology Itch
Libraries at the University of Massachusetts Amherst
Research for Seminar Papers
Using computers to search electronic databases
Heidi Johnson The University of Texas at Austin
Large Digital Oral History Archives
OUTLINE Basic ideas of traditional retrieval systems
Application Software Productivity Tools for Educators
Hello, Can you help me? I need to
Research Starters in EBSCO Discovery Service
Download from Zotero Home Page
Unit – V Data Controls.
Presentation transcript:

MALACH Multilingual Access to Large spoken ArCHives Survivors of the Shoah Visual History Foundation Human Language Technologies IBM T. J. Watson Research Center Center for Language and Speech Processing Johns Hopkins University Charles University, Prague / University of West Bohemia HCIL and College of Information Studies University of Maryland UMIACS/HCIL: Douglas Oard, David Doermann CLIS: Dagobert Soergel, Doug Oard, Bruce Dearstyne

MALACH NSF Information Technology Research project 5 years Goals: Facilitate access to spoken collections Advance state of the art in Automatic speech recognition (ASR), especially of spontaneous speech Topic segmentation in speech Automatic summarization Automatic cataloging, retrieval algorithms, and search interfaces

Survivors of the Shoah Visual History Foundation Digital Archive Established 1994 by Steven Spielberg after filming Schindler’s List  52,000 Nazi Holocaust survivors, liberators & witnesses from 57 countries  116,000 hours of speech in 32 languages (60 years of listening)  In the process of being manually cataloged World’s largest coherent archive of digitized videotaped oral history The test bed

Video here

MALACH Architecture Speech Recognition Summarization Categorization Manual cataloging Information retrieval algorithms User interface Metadata store Thesaurus and lexical databases Person, place, event databases User requirements

User requirements analysis methods Discount requirements analysis  Consult experts and literature on potential users and the nature of their work  Talk to curators about intended use of collection  Informed intuition Request analysis  280 “Advance Access” requests  Coded by discipline, access points needed, pieces of information required, etc.

User requirements analysis results A wide variety of users and uses Arts, humanities, and social sciences  History  Social sciences  Literature and linguistics  Publishing and journalism  Material and non-material culture Education Science Psychology Law enforcement

User requirements analysis results For history and education: Importance of context Interview mentions Person Place Event Time More info on this person More info on this place More info on related policy More info on related event More info on this event More info on this time More info on event at time

Interface sketch Video display area Question Place Person Event Question Place Event Person Subject Place Time Question Subject Event Time Person Place Display area for context information, ConnectionView History Scratchpad Display areafor context information, Etc. Query box Transcript area

Interface ideas In panes on the right, use colors to distinguish, task bar to select from open ones, as many open as user wishes (need a drop-down (or drop- up) from task bar) In any of the panes on the right, names, places, etc are clickable Scratch pad functionalities from Anita’s dissertation, esp. Presentation outline, can link to headings, insert text at headings Can drag and drop links to items or actual items For example, could enter a transcript of a portion of the video ConnectViews designed by user Time-stamped to video location in video window Support collaboration among users, possibly put user-entered info, such as transcript pieces, into a public database. Could link to that database from video location viewed. Need to make availability known to users Time line window, interview in parallel with general history.