2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007.

Slides:

Advertisements

Similar presentations

Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.

Advertisements

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

Text Categorization.

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

Dialogue Policy Optimisation

Language Modeling.

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

Introduction to Language Models Evaluation in information retrieval Lecture 4.

Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.

1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,

Why is ASR Hard? Natural speech is continuous

Chapter 5: Information Retrieval and Web Search

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

Natural Language Understanding

Introduction to Automatic Speech Recognition

Image Annotation and Feature Extraction

1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.

Li Deng Microsoft Research Redmond, WA Presented at the Banff Workshop, July 2009 From Recognition To Understanding Expanding traditional scope of signal.

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.

1 High Resolution Statistical Natural Language Understanding: Tools, Processes, and Issues. Roberto Pieraccini SpeechCycle

Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.

Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Chapter 6: Information Retrieval and Web Search

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.

1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.

Survey of Approaches to Information Retrieval of Speech Message Kenney Ng Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Evaluation of Agent Building Tools and Implementation of a Prototype for Information Gathering Leif M. Koch University of Waterloo August 2001.

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.

Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.

DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.

Vector Space Models.

Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition Brian Mak and Tom Ko Hong Kong University of Science and Technology.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Modern Information Retrieval Lecture 2: Key concepts in IR.

Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.

Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.

Evolvable dialogue systems

Information Retrieval: Models and Methods

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Information Retrieval: Models and Methods

College of Engineering

3.0 Map of Subject Areas.

Multimedia Information Retrieval

Automatic Speech Recognition

CS4705 Natural Language Processing

Hsien-Chin Lin, Chi-Yu Yang, Hung-Yi Lee, Lin-shan Lee

Presentation transcript:

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Its an hot and important application of speech understanding technology Its a challenging problem and provides a fertile ground for research Typical a utomation rate 30%~60%

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Residential DA French Telecom Bell Canada Tele Auto- attendant IBM AT&T Microsoft Stock Quote Tellme Business DA Tellme Nuance Jingle Networks Google Microsoft Product Rating Microsoft Music search Daimler Microsoft Conference papers Carnegie Mellon AT&T,ICSI, Edinburg Univ. etc

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Form Filling (ATIS) Call Routing (AT&T HMIHY) Voice Search (Directory Assistance) ShowFlights From=SeattleTo=BostonDate=12/23 Show flights from Seattle to Boston on December 23 NetworkingOS Hardware I need the number of Big 5 My outlook does not show any new messages for about 5 hours th Ave Redmond, WA (425) Sears Roebuck & Company 132 Lake St S, Kirkland, WA (425) Calabria Ristorante Italiano 4315 University Way Seattle WA (206) Big 5 Sporting Goods

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 ASRSLU/Search Dialog Manager TTSUser

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Y. Gao, et al. "Innovative approaches for large vocabulary name recognition." ICASSP 2001

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Noisy environment Different channel conditions Speaker variance Acoustic Model Foreign names Unseen words Pronunciation variance Pronunciation Model Large vocabulary Linguistic variance Little training data from users Language Model

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Huge semantic space Linguistic variance SLU/Search Multi-source uncertainty High level confusability Dialog Management Large vocabulary Foreign names Unseen words TTS System tuning is a necessity High maintenance cost Feedback Loop

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Acoustic model clustering More precise modeling of noisy environment and channel conditions Massive adaptation More precise modeling of speaker variance Self-adaptation Better models for unseen speakers Y. Gao, et al. "Innovative approaches for large vocabulary name recognition." ICASSP 2001

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Pronunciation variants – Common Approach ASR Auto Phonetic Transcription Canonical Pronunciation Learning Variation rules/model Word Dictionary Requires canonical pronunciation (No OOV) Derive pronunciation from acoustics only B. Ramabhadtan, L. R. Bahl, P. V. deSouza, and M. Padmanabhan, "Acoustics-only based automatic phonetic baseform generation," ICASSP 1998

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 F. Béchet, R. De Mori, and G. Subsol, "Dynamic generation of proper name pronunciations for directory assistance," ICASSP 2002 : from acoustic model : from data/knowledge source

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Calabria Ristorante Italiano Calabria Jack J DO Calabria Electric Calabria Ristorante Italiano Electric Jack J DO (425) (425) (206) DB listings dont match users expression Exact rule matching – lack of robustness High perplexity due to the lack of probabilities

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Signature: Subsequence of a listing that uniquely identifies the listing Listing 1: 3-L Hair World on North 3rd Street Listing 2: Suzies Hair World on Main Street Signatures: 3-L, Hair 3rd, and Hair Main Non-Signatures: Hair World and World on FST LM: S:= 3-L Hair World? On? North? 3rd ? Street? :1 | 3-L Hair? World? On? North 3rd ? Street? :1 | 3-L Hair? World on? North? 3rd ? Street? :1 | 3-L? Hair World? On? North 3rd ? Street? :1 | Suzies? Hair World? On? Main Street? :2 | Suzies Hair World? On? Main? Street? :2 | Suzies Hair? World on? Main? Street? :2 | Suzies? Hair? World on? Main Street? :2

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 LM is constructed from listing database. No training data from users is required. Presumption: users are more likely to say the words to avoid ambiguity and skip the rest May not be true: users may not have the knowledge about confusable entries. E.g. Calabria for Calabria Ristorante Italiano when Calabria Electric is in the DB E. E. Jan, B. ı. Maison, L. Mangu, and G. Zweig, "Automatic Construction of Unique Signatures and Confusable Sets for Natural Language Directory Assistance Applications," Eurospeech 2003.

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Finite state transducer LM – not an issue May be good enough for residential DA Poor coverage, not robust for business DA Statistical n-gram language model SLU/Search is required Statistical model for robustness

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 : static rank Designed for Frequently requested names (FRNs) : directly estimated from data P. Natarajan, R. Prasad, R. M. Schwartz, and J. Makhoul, "A scalable architecture for directory assistance automation," ICASSP 2002

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Tf*Idf weighted vector space model Represent queries and listings as vectors Each dimension represents the importance of a term (e.g., word, word bigram) in a query/document The importance is proportional to the terms frequency in the query/document (TF). It reduces as the term occurs in many different documents – proportional to the logarithm of the inverse document frequency (IDF). Measure the similarity as the cosine of the vector

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Enhancements Duplicated words for short documents (listing) and queries – term frequencies are not reliable Big 5 will get Big 5 Sorting Goods instead of 5 star 5 Category smoothing Calabria Ristorante Italiano is favored over Calabria Electric on the query Calabria Restaurant Character-ngrams as terms Lime Wire vs. Dime Wired (ASR Error) $Lim Lime ime_ me_W e_Wi _Wir Wire ire$ $Dim Dime ime_ me_W e_Wi _Wir Wire ired red$

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 User: Tell me about restaurants in London. System: I know of 596 restaurants in London. All price ranges are represented. Some of the cuisine options are Italian, British, European, and French. User: Im interested in Chinese food. System: I know of 27 restaurants in London that serve Chinese cuisine. All price ranges are represented. Some are near the Leicester square tube station. User: How about a cheap one? System: I know of 14 inexpensive restaurants that serve Chinese cuisine. Some are near the Leicester Square tube station. Some are in Soho. J. Polifroni and M. Walker, "An Analysis of Automatic Content Selection Algorithms for Spoken Dialogue System Summaries," SLT 2006.

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 ASR Search Features Search DM Features DM Correct Incorrect MaxEnt Classifier Y.-Y. Wang, D. Yu, Y.-C. Ju, G. Zweig, and A. Acero, "Confidence Measures for Voice Search Applications," INTERSPEECH 2007.

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 ASR confidence score ASR semantic confidence [No ASR internal features] ASR TF*IDF VSM scores w/ and w/o category smoothing VSM score gap from the best hypothesis Normalized # of char. matches btw. ASR and hypo. Covered/uncovered words IDF ratio Search Same hypothesis made in previous turn Dialog turn City match (application-specific) DM ASR confidence on the word with max. IDF value Joint ASR confidence score/TF*IDF Combined

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 Two sweet features added... gas and voice... this software rocks! Isnt this program great! I feel like the NSA guys in enemy of the state Hey Google map user put up your stylus and check out Live Search. No stylus needed. Now that's handy or should I say one-handed.

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007 A Confidence Measure that is generally applicable for Voice Search Applications Roadmap to the goal As many initial easy-to-obtain features as possible Feature elimination with significance testing Confidence measure with 5 domain independent features illustrates good performance Acceptation/rejection rate, Confidence distribution, Correlation with end-to-end accuracy