Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Similar presentations


Presentation on theme: "November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information."— Presentation transcript:

1 November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information Studies

2 Telling Our Stories

3 Shoah Foundation’s Collection Enormous scale –116,000 hours; 52,000 interviews; 180 TB Grand challenges –32 languages, accents, elderly, emotional, … Accessible –$100 million collection and digitization investment Annotated –10,000 hours (~200,000 segments) fully described Users –A department working full time on dissemination

4 Who Uses the Collection? History Linguistics Journalism Material culture Education Psychology Political science Law enforcement Book Documentary film Research paper CDROM Study guide Obituary Evidence Personal use DisciplineProducts Based on analysis of 280 access requests

5 Question Types Content –Person, organization –Place, type of place (e.g., camp, ghetto) –Time, time period –Event, subject Mode of expression –Language –Displayed artifacts (photographs, objects, …) –Affective reaction (e.g., vivid, moving, …) Age appropriateness

6 Full-Description Cataloguing SubjectPersonLocation-Time Berlin-1939 Employment Josef Stein Berlin-1939 Family life Gretchen Stein Anna Stein Dresden-1939 Schooling Gunter Wendt Maria Dresden-1939 Relocation Transportation-rail interview time

7 “Real-Time” Cataloguing SubjectPersonLocation-Time Berlin-1939 Dresden-1939 EmploymentJosef Stein Gretchen Stein Anna Stein Relocation Transportation-rail Schooling Gunter Wendt Family Life Maria interview time

8 Thesaurus-Based Search

9 The Goal Dramatically improve access to large multilingual spoken word Collections … … by capitalizing on the unique characteristics of the Survivors of the Shoah Visual History Foundation's collection of videotaped oral history interviews.

10 Joanne Archer

11 Observational Studies Four searchers –History/Political Science –Holocaust studies –Documentary filmmaker Sequential observation Rich data collection –Intermediary interaction –Semi-structured interviews –Observational notes –Think-aloud –Screen capture Four searchers –Ethnography –German Studies –Sociology –High school teacher Simultaneous observation Opportunistic data collection –Intermediary interaction –Semi-structured interviews –Observational notes –Focus group discussions Workshop 1 (June)Workshop 2 (August)

12 Observed Selection Criteria Topicality (57%)  Judged based on: Person, place, … Accessibility (23%)  Judged based on: Time to load video Comprehensibility (14%)  Judged based on: Language, speaking style

13 Functionality Needed FunctionBoolean Search and Ranked Retrieval (13) Testimony summary (12) Pre-Interview Questionnaire search/viewer (9) Rapid access (7) Related/Alternative search terms (3) Adding multiple search terms at once (2) Keywords linked to segment number for easy access(1) Multi-tasking (1) Searching testimonies by places under ‘Experience Search’ (1) Extensive editing within ‘My Project’ (1) Desired FunctionTemporary saving of selected testimonies (4) Remote access (3) Integrated user tools for note taking (3) Map presentation (2) Reference tool (1) More repositories (1) Introductory video of system tutorial (1) Help (1)

14 Xiaoli Huang

15 Supporting Information Access Source Selection Search Query Selection Ranked List Examination Recording Delivery Recording Query Formulation Search System Query Reformulation and Relevance Feedback Source Reselection

16 Automatic Search Boundary Detection Interactive Selection Content Tagging Speech Recognition Query Formulation ASR Spontaneous Accented Language switching NLP Components Multi-scale segmentation Multilingual classification Entity normalization Prototype Evidence integration Multilingual search Spatial/temporal User Needs Observational studies Formative evaluation Summative evaluation

17 Description Strategies Transcription –Manual transcription (with optional post-editing) Annotation –Manually assign descriptors to points in a recording –Recommender systems (ratings, link analysis, …) Associated materials –Interviewer’s notes, speech scripts, producer’s logs Automatic –Create access points with automatic speech processing

18 English ASR Error Rate Training: 65 hours (acoustic model)/200 hours (language model)

19 Effect of ASR Errors

20 Building a Test Collection Overall relevance Assessment is informed by the assessments for the individual reasons for relevance (categories of relevance), but the relationship is not straightforward Provides direct evidence Provides indirect / circumstantial evidence Provides context (e.g., causes for the phenomenon of interest) Provides comparison (similarity or contrast, same phenomenon in different environment, similar phenomenon) Provides pointer to source of information

21 Ammie Feijoo

22 Some Statistics 2,000 U.S. radio stations Webcasting 250,000 hours of oral history in British Library 35,000,000 audio streams on the Web

23 Spoken Word Collections Broadcast programming –News, interview, talk radio, sports, entertainment Scripted stories –Books on tape, poetry reading, theater Spontaneous storytelling –Oral history, folklore Incidental recording –Speeches, oral arguments, meetings, phone calls

24 Building a Web of Spoken Words Affordable storage –For $1, you can store 1.5 million spoken words Adequate network capacity –Internet capacity: 30 million simultaneous programs Works with any modem –You can even read email while playing audio Replay capabilities –38% of US users recently used streaming audio Effective search capabilities –Not quite yet …

25 Looking Forward: 2006 Working systems in five languages –Real users searching real data Rich experience beyond broadcast news –Frameworks, components, systems Affordable application-tuned systems –Oral history, lectures, speeches, meetings, …

26 For More Information The MALACH project –http://www.clsp.jhu.edu/research/malach/ NSF/EU Spoken Word Access Group –http://www.dcs.shef.ac.uk/spandh/projects/swag/ Speech-based retrieval –http://www.glue.umd.edu/~dlrg/speech/


Download ppt "November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information."

Similar presentations


Ads by Google