1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.

Slides:

Advertisements

Similar presentations

Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?

Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptation Resources: RS: Unsupervised vs. Supervised RS: Unsupervised.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

Input-Output Relations in Syntactic Development Reflected in Large Corpora Anat Ninio The Hebrew University, Jerusalem The 2009 Biennial Meeting of SRCD,

Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.

WEB-DATA AUGMENTED LANGUAGE MODEL FOR MANDARIN SPEECH RECOGNITION Tim Ng 1,2, Mari Ostendrof 2, Mei-Yuh Hwang 2, Manhung Siu 1, Ivan Bulyko 2, Xin Lei.

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.

Introduction to Automatic Speech Recognition

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.

A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者：郝柏翰 2013/06/19.

McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)

1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.

1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.

Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.

1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

11/24/2006 CLSP, The Johns Hopkins University Random Forests for Language Modeling Peng Xu and Frederick Jelinek IPAM: January 24, 2006.

Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.

Exploiting lexical information for Meeting Structuring Alfred Dielmann, Steve Renals (University of Edinburgh) {

Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.

A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.

Information Technology – Dialogue Systems Ulm University (Germany) Speech Data Corpus for Verbal Intelligence Estimation.

COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,

1 Just-in-Time Interactive Question Answering Language Computer Corporation Sanda Harabagiu, PI John Lehmann John Williams Paul Aarseth.

Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,

Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.

Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.

DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.

CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.

Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.

DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.

UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.

Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.

Yuya Akita , Tatsuya Kawahara

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.

Search and Decoding Final Project Identify Type of Articles Using Property of Perplexity By Chih-Ti Shih Advisor: Dr. V. Kepuska.

Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.

Performance Comparison of Speaker and Emotion Recognition

Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.

Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.

A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004.

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.

G. Anushiya Rachel Project Officer

Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.

Conditional Random Fields for ASR

Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing

PROJ2: Building an ASR System

Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.

Presentation transcript:

1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu

2 Introduction Poor speech recognition performance is often described as the single most important factor prohibiting the wider use of spoken dialogue systems Many dialogue systems use recognition grammars instead of statistical language models In this paper, they compare the standard approach with using SLMs that trained from different corpora

3 Experimental Setup Collection of example utterences –Ask 9 co-researchers who were familiar with the task to submit a set of 10 “example” interactions with the system and a number of more advanced dialogues –The training set was mostly used as held out data for interpolation, selection of the best model and other purposes –The test set was used for all the test runs done in the tourist information domain

4 Experimental Setup (cont.) Generation and recognition grammar –A simple HTK grammar was written consisting of around 80 rules in EBNF and used in two ways it was converted into a word network (16996 nodes and transitions) A corpus of random sentences was generated from it –The task grammar was structured 1. task specific semantic concepts (prices, hotel names, …) 2. General concepts (local relations, numbers, dates, …) 3. Query predicates (Want, Find, Exists, Select, …) 4. basic phrases (yes, No, DontMind, Grumble, …) 5. List of sub-grammars for user answers to all prompts 6. Main Grammar

5 Acoustic Models Use trigram decoder in Application Toolkit for HTK Acoustic models: –Training: WSJCAM0, internal triphone models (92 speakers of British English, 7900 read sentences, 130k words) –Adpatation: SACTI, 43 users, 3000 sentences, 20k words) –MAP and HLDA

6 Experiments Baseline: use generation grammar which was compiled into a recognition network and yielded an WER of 40.4% Grammar networks vs. statistical language models –Use HTK-random sentence generator to generate a corpus –Relative improvement of 29% 30 A B A. Random sentence generator overweights common n-gram by repeating short sentencs more often than long sentences B. WERs are less sensitive to the size of the corpus

7 Experiments (cont.) In-domain language model –WOZ Corpus –A WOZ corpus in the Tourist information domain (SACTI) was collected –The major part of the dialogues was recorded in a simulated automated speech recognition channel –Half of the corpus consists of speech only dialogues –In the other half an interactive map interface could also be used along with speech

8 Experiments (cont.) Ideal dialogue data for statistical methods –Human-human (HH) channel Symmetric, prosodic information, few “recognition” error –ASR channel (Human-Computer (HC) ) Using a end-pointer, eliminate the prosodic information –It is not desirable for the collection framework to use a fixed or random policy A policy is a mapping from states to system actions Modified framework Simulated ASR channel

9 Experiments (cont.) Wizard of Oz Experiments

10 Experiments (cont.) The problem with this corpus is that it was recorded well before the dialogue system was defined that was assumed for the collection of the test set Given the near-identical WER, it is much faster and cheater to adapt a simple grammar than to record and transcribe in-domain data

11 Experiments (cont.) Combining grammar and “in-domain” language models The perplexity on the held out data is a good predictor for the test set WER The interpolation of grammar corpus with a corpus with a natural speech straightens out most of the artifacts caused by the random generator

12 Experiments (cont.) Interpolation language models derived from a grammar and a standard corpus –Fisher Corpus contains transcriptions of conversations about different topics –The idea behind this is that the grammar generated data will contribute in-domain n-grams and the general corpus will add colloquial phrases –For the reasons of comparison, the vocabulary used to build the Fisher SLM contained all words of the grammar and WOZ data collection – The absolute WER improvement that would have been selected based on perplexity minimum at the held out set is only 0.25%

13 Experiments (cont.) Sentence selection using perplexity filtering –Collect a corpus of 2.4 M words (188,909 sentences) from Internet –It might be rewarding to extract all relevant sentences of the corpus and give them a higher weight than the remaining sentences –Build a LM from a seed corpus LM Seed –Build a LM from the large corpus LM Large –Calculate for each sentence and sort corpus according to PP Rel –Select n lines with lowest PPRel and build a LM LMSelected –Do the same with the remaining lines –Interpolate LM Seed, LM Selected and LM NotSelected

14 Experiments (cont.) –LM Seed : 10M grammar generated sentences –Second round of LM Seed : interpolated Grammar-Fisher SLM Interpolating with the web corpus results in relative improvement of 3% Applying ppl filtering improved WER by another 2% relative

15 Experiments (cont.) After a noisy initial segment (one-word sentences), the perplexity stabilizes at a certain level

16 Conclusions Compare speech recognition results using a recognition grammar with different kinds of SLMs 1. Corpus trained on 30k sentences generated by grammar with WER by 29% relative 2. In-domain corpus with 21% relative improvement 3. Standard Corpus with Sentence Filtering Effective language models can be built for boot-strapping a spoken dialogue systems without recourse to expensive WOZ data collections