Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.

Slides:



Advertisements
Similar presentations
Improved Name Recognition with Meta-data Dependent Name Networks published by Sameer R. Maskey, Michiel Bacchiani, Brian Roark, and Richard Sproat presented.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Nonparametric-Bayesian approach for automatic generation of subword units- Initial study Amir Harati Institute for Signal and Information Processing Temple.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptation Resources: RS: Unsupervised vs. Supervised RS: Unsupervised.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Advances in WP2 Chania Meeting – May
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
WEB-DATA AUGMENTED LANGUAGE MODEL FOR MANDARIN SPEECH RECOGNITION Tim Ng 1,2, Mari Ostendrof 2, Mei-Yuh Hwang 2, Manhung Siu 1, Ivan Bulyko 2, Xin Lei.
Scalable Text Mining with Sparse Generative Models
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
Li Deng Microsoft Research Redmond, WA Presented at the Banff Workshop, July 2009 From Recognition To Understanding Expanding traditional scope of signal.
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.
World Languages Mandarin English Challenges in Mandarin Speech Recognition  Highly developed language model is required due to highly contextual nature.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
1 Using TDT Data to Improve BN Acoustic Models Long Nguyen and Bing Xiang STT Workshop Martigny, Switzerland, Sept. 5-6, 2003.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
National Taiwan University, Taiwan
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,
Unsupervised Relation Detection using Automatic Alignment of Query Patterns extracted from Knowledge Graphs and Query Click Logs Panupong PasupatDilek.
Learning a Monolingual Language Model from a Multilingual Text Database Rayid Ghani & Rosie Jones School of Computer Science Carnegie Mellon University.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of new variational inference.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Speech Enhancement based on
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search Jorge Silva University of Southern California Ciprian Chelba and.
Designing a framework For Recommender system Based on Interactive Evolutionary Computation Date : Mar 20 Sat, 2011 Project Number :
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Using Speech Recognition to Predict VoIP Quality
Fusion of Multiple Corrupted Transmissions and its effect on Information Retrieval Walid Magdy Kareem Darwish Mohsen Rashwan.
Automatic Speech Recognition
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Conditional Random Fields for ASR
Maties Machine Learning meeting Emre Yılmaz
Automatic Speech Recognition: Conditional Random Fields for ASR
Cheng-Kuan Wei1 , Cheng-Tao Chung1 , Hung-Yi Lee2 and Lin-Shan Lee2
3. Adversarial Teacher-Student Learning (AT/S)
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha Xie* Lei Chen Microsoft ETS 6/13/2013

Copyright © 2013 by Educational Testing Service. All rights reserved. Model Adaptation, Key to ASR Success

Copyright © 2013 by Educational Testing Service. All rights reserved. Adaptation Modern ASR systems are statistics-rich – Acoustic model (AM) uses GMM or DNN – Language model (LM) uses n-gram (n can be large) Parameters needed be adjusted to meet new accent, speakers, environment, content domains, etc. Adaptation is important for achieving satisfactory recognition accuracy Need and prefer using large-sized transcribed speech files

Copyright © 2013 by Educational Testing Service. All rights reserved. LM Adaptation Methods Supervised LM adaptation – A large amount of in-domain human annotated transcripts Unsupervised LM adaptation – No human transcripts Using web data to support LM adaptation – Large-amount, no cost, up-to-date, but noisy Semi-supervised LM adaptation – Minimize the human supervision – A small amount of human transcripts

Copyright © 2013 by Educational Testing Service. All rights reserved. Supervised Adaptation Given a large amount of in-domain speech recordings Random sampling Obtain the human transcripts of selected speech Model adaptation The figure is from Riccardi, Giuseppe, and Dilek Hakkani-Tur. "Active learning: Theory and applications to automatic speech recognition." Speech and Audio Processing, IEEE Transactions on 13.4 (2005):

Copyright © 2013 by Educational Testing Service. All rights reserved. Unsupervised Adaptation ASR transcripts of in- domain data Use of ASR transcripts – All of the transcripts – Random selection (Bacchiani and Roark, 2003) – Selected ASR transcripts with fewer recognition errors (Gretter and Riccardi, 2001) ASR Transcripts Selection

Copyright © 2013 by Educational Testing Service. All rights reserved. Using Web Data in LM Adaptation Use Web data to obtain in-domain LM training material From ASR tasks, generate search queries and use search engines, i.e., Google or Bing, to retrieve a large amount of LM training data (noisy) in a very fast and economical way Searching Web to collect LM training data

Copyright © 2013 by Educational Testing Service. All rights reserved. Semi-supervised LM Adaptation Using active learning to allow human transcribers to focus on “critical” utterances Run the seed ASR on the entire un-transcribed adaptation data set Select the utterances that are poorly recognized for being transcribed This can be an iterative process Based on seed ASR performance to sample responses for being transcribed

Copyright © 2013 by Educational Testing Service. All rights reserved. Related works Michiel Bacchiani, and Brian Roark, “Unsupervised Language Model Adaptation”, ICASSP, Giuseppe Riccardi, and Dilek Hakkani-Tur, “Active Learning: Theory and Applications to Automatic Speech Recognition”, IEEE Transactions on Speech and Audio Processing, Vol. 13, No. 4, Roberto Gretter, and Giuseppe Riccardi, “On-line Learning of Language Models with Word Error Probability Distributions”, ICASSP, Ng, T., Ostendorf, M., Hwang, M. Y., Siu, M., Bulyko, I., & Lei, X. (2005). Web-data augmented language models for Mandarin conversational speech recognition. Proc. ICASSP (Vol. 1, pp. 89– 593).

Copyright © 2013 by Educational Testing Service. All rights reserved. Experiment Setup Data – In-domain data: TOEIC ®, work place English assessment – Out-of-domain data: TOEFL ®, academic English assessment Seed ASR – A state-of-the-art HMM ASR trained on TOEFL data – WER of 33.0% is found when being tested on in-domain data

Copyright © 2013 by Educational Testing Service. All rights reserved. Experiment of Unsupervised Adaptation Data – Training data for the seed ASR: ~700 hrs (out-of- domain) – Adaptation: 1,470 responses, about 24.5 hrs (in- domain) – Testing: 184 responses, about 3 hrs (in-domain) Baselines – No-adaptation 42.8% WER – Supervised adaptation 34.7% WER Selection method – Average word confidence scores – Duration-based confidence scores

Copyright © 2013 by Educational Testing Service. All rights reserved. Experimental Results

Copyright © 2013 by Educational Testing Service. All rights reserved. Using Web as a corpus A recent trend in NLP & ASR research fields is to use Web as a corpus For example, (Ng et al. 2005) used Chinese data collected from Web to do topic adaption, they achieved 28% reduction in LM perplexity and 7% in character recognition.

Copyright © 2013 by Educational Testing Service. All rights reserved. Using Web data in LM adaptation Create query terms based on test prompts Using search engine to find URLs Collect and process the text content in those URLs Apply the collected Web texts on LM adaptation task

Copyright © 2013 by Educational Testing Service. All rights reserved. An example Prompt –Do you agree or disagree with the following statement? Advances in technology have made the world a better place. Use specific reasons and examples to support your opinion Query – "advances in technology" made "the world" "a better place“

Copyright © 2013 by Educational Testing Service. All rights reserved. An example (continuing) Cleaned URLs – rID – stop – – world/0 – – life-easier-and-safer/page0.html – – – changed-the-world-page5.html – world-better-place.html

Copyright © 2013 by Educational Testing Service. All rights reserved. A More promising collection Query: – “reading a book is more relaxing than watching television” Returned URLs – Several Q/A posts are related to the prompt

Copyright © 2013 by Educational Testing Service. All rights reserved.

Experimental results Web: using 5,312 utterances collected web Web + ASR: Using both unsupervised data and web data

Copyright © 2013 by Educational Testing Service. All rights reserved. Semi-Supervised Adaptation Our approaches – Initial ASR system trained on out-of-domain data – Obtain the ASR transcripts of adaptation data – Selecting low certainty responses, and replacing the ASR transcripts with human transcripts Average confidence scores Duration-based confidence scores – Adaptation ASR transcripts with high certainty + Human transcripts of selected responses

Copyright © 2013 by Educational Testing Service. All rights reserved. Semi-Supervised Adaptation Experimental Results – Similar performance for two selection methods: average and duration-based confidence score – Significant WER reduction by only using human transcripts of ~130 responses

Copyright © 2013 by Educational Testing Service. All rights reserved. Summary of Results

Copyright © 2013 by Educational Testing Service. All rights reserved. Conclusions Several of unsupervised LM adaptation methods that have been actively investigated in the ASR field have been tried on the ASR improvement task needed by speaking assessment These methods are found to be useful for achieving a faster and more economical LM adaptation – Unsupervised => no need of transcription – Web-as-a-corpus => even can pass pre-tests – Semi-supervised => jointly use machine efficiency and human wisdom

Copyright © 2013 by Educational Testing Service. All rights reserved. Future works Multiple iterations of active learning More refined web-resource collection methods – Such as query formation, web data cleaning Try some other new subset-selection methods (I just saw in NAACL 2013)