November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Slides:



Advertisements
Similar presentations
Common Core Standards (What this means in computer class)
Advertisements

1 Technology in the Common Core State Standards Perri Applegate, Ph.D. Tulsa Public Schools
PROQUEST SIRS ISSUES RESEARCHER INSIGHT INTO TODAYS LEADING ISSUES Online Tutorial sks.sirs.com | proquestk12.com.
ELIBRARY CURRICULUM EDITION The ultimate K-12 curriculum and reference solution.
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Perri Applegate, Ph.D. Tulsa Public Schools
Performance Tasks for English Language Arts
Audio and Visual Technologies
Data gathering. Overview Four key issues of data gathering Data recording Interviews Questionnaires Observation Choosing and combining techniques.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
 Assessment Type 1: Text Analysis (35%)  three or four responses  at least one oral (maximum of 5 minutes), or multimodal form of equivalent length.
Live Conferencing Tim Neumann Learning Technologies Unit Institute of Education.
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Speech and Music Retrieval LBSC 796/CMSC828o Session 12, April 19, 2004 Douglas W. Oard.
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
MALACH Multilingual Access to Large spoken ArCHives Survivors of the Shoah Visual History Foundation Human Language Technologies IBM T. J. Watson Research.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
Access to News Audio User Interaction in Speech Retrieval Systems by Jinmook Kim and Douglas W. Oard May 31, th Annual Symposium and Open House.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
Information Access Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies Design Understanding.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Outline of Presentation Introduction of digital video libraries Introduction of the CMU Informedia Project Informedia: user perspective Informedia:
ICS 463, Intro to Human Computer Interaction Design: 8. Evaluation and Data Dan Suthers.
Teaching and Learning with Technology Click to edit Master title style  Allyn and Bacon 2002 Teaching and Learning with Technology Click to edit Master.
Multimedia. Definition What is Multimedia? Multimedia can have a many definitions these include: Multimedia means that computer information can be represented.
SIRS Researcher. What is SIRS? (Social Issues Resources Series) A line of focused, specially constructed online research databases. Materials selected.
Multilingual Access to Large Spoken Archives Douglas W. Oard University of Maryland, College Park, MD, USA.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
ENGLISH LANGUAGE ARTS AND READING K-5 Curriculum Overview.
Cross-Language Access to Recorded Speech in the MALACH Project Douglas Oard, Dina Demner-Fushman, Jan Hajic, Bhuvana Ramabhadran, Sam Gustman, Bill Byrne,
Practical Ideas On Alternative Assessment For ESL Students Jo-Ellen Tannenbaum, Montgomery County Public Schools (MD)
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
Data gathering. Overview Four key issues of data gathering Data recording Interviews Questionnaires Observation Choosing and combining techniques.
ISearch : Discovery Tool for Academic Resources.  An Intelligent Search Engine that lets you find relevant information from Library Catalogue, major.
SIRS Issues Researcher Insight into today’s Leading Issues sks.sirs.com | proquestk12.com.
Finding Primary Documents A Tutorial. What Are Primary Sources? Although the terms primary and secondary are not always sharply divided, in general. primary.
Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.
SUMMON ® 2.0 DISCOVERY REINVENTED. What is Summon 2.0? A new, streamlined, modern interface New and enhanced features providing layers of contextual guidance.
Information Management LIS /1/99 Martha Richardson.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Building the Mother of all Collections: the future of the National Library’s discovery services Warwick Cathro Assistant Director-General, Innovation National.
Grade 8 – Writing Standards Text Types and Purposes (1b) Write arguments to support claims with clear reasons and relevant evidence. Support claim(s) with.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Introducing Intute: Social Sciences Your Guide to the Best of the Web.
Software Engineering User Interface Design Slide 1 User Interface Design.
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Speech and Music Retrieval INST 734 Doug Oard Module 12.
Information Retrieval
AVI/Psych 358/IE 340: Human Factors Data Gathering October 3, 2008.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
September 16, 2004CLEF 2004 CLEF-2005 CL-SDR: Proposing an IR Test Collection for Spontaneous Conversational Speech Gareth Jones (Dublin City University,
Data gathering (Chapter 7 Interaction Design Text)
November 8, 2005NSF Expedition Workshop Supporting E-Discovery with Search Technology Douglas W. Oard College of Information Studies and Institute for.
Memory Masters Preserving Digitized Histories— for today, for tomorrow, and for the future This project is made possible by a grant from the federal Institute.
User Needs Session 6 INST 301 Introduction to Information Science.
1 Dr. Cord Pagenstecher Testimonies on Nazi Forced Labor and the Holocaust Building Digital Environments for Research and Education Dr. Cord Pagenstecher.
1 CLASS Lesson Planning System and Teachers’ Collaboratory Dagobert Soergel With Katy Lawley, Tandeep Sidhu, Ryen White, and David Doermann College of.
Definition, purposes/functions, elements of IR systems Lesson 1.
Summon® 2.0 Discovery Reinvented
Visual Information Retrieval
Lecture3 Data Gathering 1.
Large Digital Oral History Archives
Tim Neumann Learning Technologies Unit Institute of Education
A Fully Integrated Print and Digital Program
Discovery Search vs. Library Catalogue
Presentation transcript:

November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information Studies

Telling Our Stories

Shoah Foundation’s Collection Enormous scale –116,000 hours; 52,000 interviews; 180 TB Grand challenges –32 languages, accents, elderly, emotional, … Accessible –$100 million collection and digitization investment Annotated –10,000 hours (~200,000 segments) fully described Users –A department working full time on dissemination

Who Uses the Collection? History Linguistics Journalism Material culture Education Psychology Political science Law enforcement Book Documentary film Research paper CDROM Study guide Obituary Evidence Personal use DisciplineProducts Based on analysis of 280 access requests

Question Types Content –Person, organization –Place, type of place (e.g., camp, ghetto) –Time, time period –Event, subject Mode of expression –Language –Displayed artifacts (photographs, objects, …) –Affective reaction (e.g., vivid, moving, …) Age appropriateness

Full-Description Cataloguing SubjectPersonLocation-Time Berlin-1939 Employment Josef Stein Berlin-1939 Family life Gretchen Stein Anna Stein Dresden-1939 Schooling Gunter Wendt Maria Dresden-1939 Relocation Transportation-rail interview time

“Real-Time” Cataloguing SubjectPersonLocation-Time Berlin-1939 Dresden-1939 EmploymentJosef Stein Gretchen Stein Anna Stein Relocation Transportation-rail Schooling Gunter Wendt Family Life Maria interview time

Thesaurus-Based Search

The Goal Dramatically improve access to large multilingual spoken word Collections … … by capitalizing on the unique characteristics of the Survivors of the Shoah Visual History Foundation's collection of videotaped oral history interviews.

Joanne Archer

Observational Studies Four searchers –History/Political Science –Holocaust studies –Documentary filmmaker Sequential observation Rich data collection –Intermediary interaction –Semi-structured interviews –Observational notes –Think-aloud –Screen capture Four searchers –Ethnography –German Studies –Sociology –High school teacher Simultaneous observation Opportunistic data collection –Intermediary interaction –Semi-structured interviews –Observational notes –Focus group discussions Workshop 1 (June)Workshop 2 (August)

Observed Selection Criteria Topicality (57%)  Judged based on: Person, place, … Accessibility (23%)  Judged based on: Time to load video Comprehensibility (14%)  Judged based on: Language, speaking style

Functionality Needed FunctionBoolean Search and Ranked Retrieval (13) Testimony summary (12) Pre-Interview Questionnaire search/viewer (9) Rapid access (7) Related/Alternative search terms (3) Adding multiple search terms at once (2) Keywords linked to segment number for easy access(1) Multi-tasking (1) Searching testimonies by places under ‘Experience Search’ (1) Extensive editing within ‘My Project’ (1) Desired FunctionTemporary saving of selected testimonies (4) Remote access (3) Integrated user tools for note taking (3) Map presentation (2) Reference tool (1) More repositories (1) Introductory video of system tutorial (1) Help (1)

Xiaoli Huang

Supporting Information Access Source Selection Search Query Selection Ranked List Examination Recording Delivery Recording Query Formulation Search System Query Reformulation and Relevance Feedback Source Reselection

Automatic Search Boundary Detection Interactive Selection Content Tagging Speech Recognition Query Formulation ASR Spontaneous Accented Language switching NLP Components Multi-scale segmentation Multilingual classification Entity normalization Prototype Evidence integration Multilingual search Spatial/temporal User Needs Observational studies Formative evaluation Summative evaluation

Description Strategies Transcription –Manual transcription (with optional post-editing) Annotation –Manually assign descriptors to points in a recording –Recommender systems (ratings, link analysis, …) Associated materials –Interviewer’s notes, speech scripts, producer’s logs Automatic –Create access points with automatic speech processing

English ASR Error Rate Training: 65 hours (acoustic model)/200 hours (language model)

Effect of ASR Errors

Building a Test Collection Overall relevance Assessment is informed by the assessments for the individual reasons for relevance (categories of relevance), but the relationship is not straightforward Provides direct evidence Provides indirect / circumstantial evidence Provides context (e.g., causes for the phenomenon of interest) Provides comparison (similarity or contrast, same phenomenon in different environment, similar phenomenon) Provides pointer to source of information

Ammie Feijoo

Some Statistics 2,000 U.S. radio stations Webcasting 250,000 hours of oral history in British Library 35,000,000 audio streams on the Web

Spoken Word Collections Broadcast programming –News, interview, talk radio, sports, entertainment Scripted stories –Books on tape, poetry reading, theater Spontaneous storytelling –Oral history, folklore Incidental recording –Speeches, oral arguments, meetings, phone calls

Building a Web of Spoken Words Affordable storage –For $1, you can store 1.5 million spoken words Adequate network capacity –Internet capacity: 30 million simultaneous programs Works with any modem –You can even read while playing audio Replay capabilities –38% of US users recently used streaming audio Effective search capabilities –Not quite yet …

Looking Forward: 2006 Working systems in five languages –Real users searching real data Rich experience beyond broadcast news –Frameworks, components, systems Affordable application-tuned systems –Oral history, lectures, speeches, meetings, …

For More Information The MALACH project – NSF/EU Spoken Word Access Group – Speech-based retrieval –