Presentation is loading. Please wait.

Presentation is loading. Please wait.

MALACH Multilingual Access to Large spoken ArCHives Survivors of the Shoah Visual History Foundation Human Language Technologies IBM T. J. Watson Research.

Similar presentations


Presentation on theme: "MALACH Multilingual Access to Large spoken ArCHives Survivors of the Shoah Visual History Foundation Human Language Technologies IBM T. J. Watson Research."— Presentation transcript:

1 MALACH Multilingual Access to Large spoken ArCHives Survivors of the Shoah Visual History Foundation Human Language Technologies IBM T. J. Watson Research Center Center for Language and Speech Processing Johns Hopkins University Charles University, Prague / University of West Bohemia HCIL and College of Information Studies University of Maryland UMIACS/HCIL: Douglas Oard, David Doermann CLIS: Dagobert Soergel, Doug Oard, Bruce Dearstyne

2 MALACH NSF Information Technology Research project 5 years Goals: Facilitate access to spoken collections Advance state of the art in Automatic speech recognition (ASR), especially of spontaneous speech Topic segmentation in speech Automatic summarization Automatic cataloging, retrieval algorithms, and search interfaces

3 Survivors of the Shoah Visual History Foundation Digital Archive Established 1994 by Steven Spielberg after filming Schindler’s List  52,000 Nazi Holocaust survivors, liberators & witnesses from 57 countries  116,000 hours of speech in 32 languages (60 years of listening)  In the process of being manually cataloged World’s largest coherent archive of digitized videotaped oral history The test bed

4 Video here

5 MALACH Architecture Speech Recognition Summarization Categorization Manual cataloging Information retrieval algorithms User interface Metadata store Thesaurus and lexical databases Person, place, event databases User requirements

6 User requirements analysis methods Discount requirements analysis  Consult experts and literature on potential users and the nature of their work  Talk to curators about intended use of collection  Informed intuition Request analysis  280 “Advance Access” requests  Coded by discipline, access points needed, pieces of information required, etc.

7 User requirements analysis results A wide variety of users and uses Arts, humanities, and social sciences  History  Social sciences  Literature and linguistics  Publishing and journalism  Material and non-material culture Education Science Psychology Law enforcement

8 User requirements analysis results For history and education: Importance of context Interview mentions Person Place Event Time More info on this person More info on this place More info on related policy More info on related event More info on this event More info on this time More info on event at time

9 Interface sketch Video display area Question Place Person Event Question Place Event Person Subject Place Time Question Subject Event Time Person Place Display area for context information, ConnectionView History Scratchpad Display areafor context information, Etc. Query box Transcript area

10 Interface ideas In panes on the right, use colors to distinguish, task bar to select from open ones, as many open as user wishes (need a drop-down (or drop- up) from task bar) In any of the panes on the right, names, places, etc are clickable Scratch pad functionalities from Anita’s dissertation, esp. Presentation outline, can link to headings, insert text at headings Can drag and drop links to items or actual items For example, could enter a transcript of a portion of the video ConnectViews designed by user Time-stamped to video location in video window Support collaboration among users, possibly put user-entered info, such as transcript pieces, into a public database. Could link to that database from video location viewed. Need to make availability known to users Time line window, interview in parallel with general history.

11


Download ppt "MALACH Multilingual Access to Large spoken ArCHives Survivors of the Shoah Visual History Foundation Human Language Technologies IBM T. J. Watson Research."

Similar presentations


Ads by Google