Julia Hirschberg, Michiel Bacchiani, Phil Isenhour, Aaron Rosenberg, Larry Stead, Steve Whittaker, Jon Wright, and Gary Zamchick (with Martin Jansche,

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

Improved Name Recognition with Meta-data Dependent Name Networks published by Sameer R. Maskey, Michiel Bacchiani, Brian Roark, and Richard Sproat presented.
Chapter 5: Introduction to Information Retrieval
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Information Retrieval in Practice
Search Engines and Information Retrieval
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
ADVISE: Advanced Digital Video Information Segmentation Engine
Information Retrieval in Practice
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Scalable Text Mining with Sparse Generative Models
2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.
Why is ASR Hard? Natural speech is continuous
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Simon Tucker NLP Presentation Efficient user-centred access to multimedia meeting content Simon Tucker and Steve Whittaker University.
Information Retrieval in Practice
© 2006 Cisco Systems, Inc. All rights reserved. CUDN v1.1—4-1 Designing VPIM Solutions Migrating Voice Mail to Unified Messaging and Interoperability.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
Search Engines and Information Retrieval Chapter 1.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Temporal Compression Of Speech: An Evaluation IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 4, MAY 2008 Simon Tucker and Steve.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Information Filtering LBSC 796/INFM 718R Douglas W. Oard Session 10, April 13, 2011.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
A methodology for the creation of a forensic speaker recognition database to handle mismatched conditions Anil Alexander and Andrzej Drygajlo Swiss Federal.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Effective Information Access Over Public Archives Progress Report William Lee, Hui Fang, Yifan Li For CS598CXZ Spring 2005.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Performance Comparison of Speaker and Emotion Recognition
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
CS 4705 Corpus Linguistics and Machine Learning Techniques.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Information Retrieval in Practice
Information Retrieval (in Practice)
Proposal for Term Project
Multimedia Information Retrieval
Searching and Summarizing Speech
Searching and Summarizing Speech
Speaker Identification:
Presentation transcript:

Julia Hirschberg, Michiel Bacchiani, Phil Isenhour, Aaron Rosenberg, Larry Stead, Steve Whittaker, Jon Wright, and Gary Zamchick (with Martin Jansche, Meredith Ringel, and Litza Stark) SCANMAIL: Audio Browsing and Retrieval in a Voic Domain

2 The Problem: Navigating Audio Data Increasing amounts of audio data available in corporate, public and private collections (recorded meetings, broadcast news and entertainment, voic ) – but useless without tools for searching SCANMail prototype: tool for searching speech data in voic domain

3 SCANMail Inspired by interviews, surveys and usage logs identifying problems of heavy voic users: It’s hard to quickly scan through new messages to find the ones you need to deal with (e.g. during a meeting break) It’s hard to find the message you want in your archive It’s hard to locate the information you want in any message (e.g. the telephone number) SCANMail provides technology to help solve these problems, supporting content-based audio navigation

4 Related Research Cambridge video mail retrieval by voice (1994) NIST TREC Spoken Document Retrieval track IBM voic transcription (1998) and information extraction (2001) AT&T voic user studies (1998) AT&T automatic speaker identification and browsing/search for voic (2000, 2001)

SCANMail Architecture

6 Training Corpus Messages collected from 138 AT&T Labs voic boxes 100 hr corpus includes ~10K messages from 2500 speakers Hand-labeled for caller id, gender, age, recording condition, entities (names, dates, telephone numbers) Gender balanced, ~12% non-native speakers ~10% of calls not from ordinary handsets Mean message duration 36.4 secs, median 30.0 secs

7 ASR Server: baseline system Trained on 60 hour training set Gender independent, 8k tied states, emission probabilities modeled by 12 component Gaussian mixtures. Uses 14k vocabulary and Katz-style backoff trigram trained on 700k words Lexicon automatically generated by the AT&T Labs NextGen text to speech system Decoder uses finite state transducers to construct recognition network Initial search pass produces lattices used as grammars in all subsequent search passes

8 Accuracy 24.4% wer  ~21% with adaptation Speed 2x real time for first pass Will approach 5-6x real time for final transcription Details: Bacchiani (HLT2000, ICASSP2000); Hirschberg et al (Eurospeech2001)

9 ASR Server: rescoring passes Compensation techniques for speaker/channel variation and invalid modeling assumptions Gender dependency (GD) Vocal Tract Length Normalization (VTLN) (Kamm et. al. 1995, Wegmann et. al. 1996) Semi-Tied Covariances (STC) (Gales 1999) Constrained Model-space Adaptation (CMA) (Gales 1998) Maximum Likelihood Linear Regression (Legetter and Woodland 1995) MLLR likelihood-based clustering algorithm to ensure sufficient data for compensation algorithms (Bacchiani 2000)

10 ASR Transcription Accuracy SystemNormalizationWER (%) Baseline GD GDVTLN32.3 VTLN 32.0 VTLN+STC 30.8 VTLN+STC+CMA 29.3 VTLN+STC+CMAVTLN+STC+CMA+MLR28.7

12 Information Retrieval Uses SMART IR engine (Salton 1971, Buckley 1985) Generates weighted term vectors for ASR transcripts and queries and computes similarity based on vector inner products Both ASR transcripts and queries are preprocessed into tokens by removing common words (stop-listing) and stemming

14 Information Extraction Extracts entities from the ASR transcripts Old implementation used finite state transducers with hand designed costs New statistical (trainable) system extracts phone numbers and caller names

16 Caller Identification Proposes caller names by matching new incoming messages against existing Text Independent Gaussian Mixture Models (TIGMMs) If no PBX-supplied caller identification, caller ID hypothesis presented to user Caller models trained/adapted based on user feedback Initial model trained after 1 minute of speech collected from single caller Model updates with each 20sec increment up to 180sec (mature model)

17 Setting thresholds to keep outgroup acceptance low (2.7%), system had 11.5% ingroup rejection and 1.2% ingroup confusion for 20-caller ingroup. For more detailed experimental results see Rosenberg (ICSLP 2000, Eurospeech 2001)

20 Server Composes multi-part message and sends to address specified in user profile ASR transcript Speech file Entity transcriptions and speech segments Uses time aligned ASR transcript and IE information to include audio excerpts corresponding to entities

21 Evaluation: User Studies Compared SCANMail with standard over-the-phone interface (Audix) 8 subject performed fact-finding, relevance ranking and summarization tasks SCANMail Better for fact-finding and ranking tasks in quality/time measures (p <0.05) Faster solutions for fact-finding task (p<0.01) Rated higher on all subjective measures Normalized performance scores higher when subject employed successful IR searches (p<0.05)

22 Trials 18 subjects in 2 month field trial Usage: 52% of messages weren’t played completely through Only ~1% of messages deleted After using SCANMail people thought: “Scanning messages is difficult” (2.8  4.7) “I frequently replay messages” (1.9  3.5) “I frequently take notes” (2.6  4.3) “It’s hard to locate old messages” (2.7  5.0) “It’s hard to extract info from messages” (2.5  5.0)

23 Current Status 37 users Recent improvements More accurate ASR Lighter-weight IR (Lucene) Presentation of information as it becomes available (e.g. audio only, rough transcript of message) Options for SCANMail First versions of phone and Ipaq interfaces built (many interface issues)

24 Research Foci Additional information extracted from messages (Jansche & Abney) Dates, times Message gisting Message threading ‘Urgent’ and ‘personal’ messages automatically identified (Ringel & Hirschberg) Faster/more accurate ASR Migrate client features to