Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,

Slides:



Advertisements
Similar presentations
LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
A Study on Detection Based Automatic Speech Recognition Author : Chengyuan Ma Yu Tsao Professor: 陳嘉平 Reporter : 許峰閤.
Phoneme Alignment. Slide 1 Phoneme Alignment based on Discriminative Learning Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Dynamic Time Warping Applications and Derivation
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Natural Language Understanding
Semantic and phonetic automatic reconstruction of medical dictations STEFAN PETRIK, CHRISTINA DREXEL, LEO FESSLER, JEREMY JANCSARY, ALEXANDRA KLEIN,GERNOT.
IBM Haifa Research Lab © 2008 IBM Corporation Retrieving Spoken Information by Combining Multiple Speech Transcription Methods Jonathan Mamou Joint work.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
Graphical models for part of speech tagging
Speech and Language Processing
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Handing Uncertain Observations in Unsupervised Topic-Mixture Language Model Adaptation Ekapol Chuangsuwanich 1, Shinji Watanabe 2, Takaaki Hori 2, Tomoharu.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
公司 標誌 Question Answering System Introduction to Q-A System 資訊四 B 張弘霖 資訊四 B 王惟正.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Using Conversational Word Bursts in Spoken Term Detection Justin Chiu Language Technologies Institute Presented at University of Cambridge September 6.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search Jorge Silva University of Southern California Ciprian Chelba and.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
An overview of decoding techniques for LVCSR
Statistical Models for Automatic Speech Recognition
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Statistical Models for Automatic Speech Recognition
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Presentation transcript:

Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu, Oct 26, 2009

Introduction OOVs ~ Names Mountainous vocabulary ?? John, Jean, Joana.... ?? okay, a multiple personality disorder ! Each OOV token contributes on average 1.5 errors (Hetherington '95 Major source of word errors in ASR hypothesis. why ? o [from TDT broadcast corpus news] o 9.4% of the words are part of name phrases o 45.1% of the utterances contain at least one name phrase. o WER: 38.6% for words within name phrases o WER: 29.4% for non-name words o OOV rate is less than 1% for large-vocab (48-64k) systems, but significantly higher for words in name phrases. o [ ]

Primary sources of OOV person names 1." New" names of global importance o News worthy: World leader, terrorists, criminals and corporate leaders o Assuming entities of global importance appear both in broadcast and print during same period 2.News Reporters o "CNN's John Zarrella has the story..." o readily available from news agency itself 3. Spelling and Morphological variants 4.Sports Figures 5.Villagers and human interest personalities (Joe the plumber is an outlier ?)

Approach D.D. Palmer, M.Ostendorf /Computer Speech and Language 19 (2005)

Name Error Detection Named Entity Recognition: A HMM like model with state dependent bigrams to detect NEs. (Palmer and Ostendorf 2001a) Finding OOV names by detecting word errors in the hyp. o Acoustic cues, ASR error patterns o More information sources like surrounding language context. Integration of word confs. into probabilistic model jointly identify names and errors. Simple lattice from hypothesis with error arcs in parallel. Iterative refinement of Word confidence estimates. (Gillick et al. '97; Palmer and Ostendorf 2001b) Viterbi decoding to find the best path through lattice.

Name Error Detection Errors are explicitly modeled using parallel arcs a sequence of error indicator variables k=1, h is error otherwise k=0 A : confidence score and other confs. Find the maximum posterior prob state sequence assuming specific value of h at an error does not provide additional information

Name Error Detection Part1: the error model, P(K|H, A), errors are assumed to be conditionally independent given the hypothesis H and evidence A. Part2: but there is no efficient decoding algorithm, hence where, Goal: to find words that are in error (for subsequent correction) as well as the NEs

Offline Name List Generation Identify good lexical resources Rank words based on frequency statistics (from the txt srcs) o Alternatively, filter the text sources based on document relevance (Iyer and Ostendorf '97) Final list contains both IV and OOV items (to allow the option of not changing the recognizer's output) Do G2P: produce phoneme based pronounciation strings for each word (for use in online scoring).

Online list pruning Input: candidate name error, phone sequence for that word. Compare pronounciations: for each of the words in the extended word list o Compute distance: using a string matchine procedure and a set of phone (sub, ins, del) costs. o Rank according to distance and optionally word frequency. Did you say Phonetic distance???

Phonetic Distance Akin to noisy channel approach (stochastic transduction model) o Measure edit distance between two phoneme sequences o According to trainable weighting system (edit weights based on all possible sequences) Phonetic feature based weighting function (Bates and Ostendorf 2001) Automatically derived weights from training data using EM. (Ristad and Yianilos '97). o Weight estimation: Used a set of ASR output from a portion of TDT data separate from the experiments. Automatic_alignment(Reference, ASR words) and conversion toT2P (Lenzo '98). In essence, ASR output is treated as phonemic misspellings. Applications of Phonetic Distance: o Name-list pruning, Error Correction and Name normalization

Error Resolution Obj: Error correction in the regions of high info. content. Impact: quality of IE of NE. Error token detection algo (automatic & oracle) name detection. Several candidates from the pruned set. o phonetic or lm score or via additional pass. Rerunning: Larger gains, but impractical(say IR apps). Using, adapted language model based on temporally or topically relevant text containing target words to achieve high accuracy, like for resolving spelling alternatives (Lewinsky vs Lewinski) Valuable hindsight about the context in which the candidate OOVs appeared.

Numb3rs DATA: TDT4 broadcast news. Error detection: 65.7% recall, 59.0% precision, Fmeasure:62.2 (with iterative confidence estimation, Gillick et al. '97) with simple confidence threshold : 66.1%R, 48.8%P and 56.1%F For OOV correction : R is more important than P, since the correction step involves leaving the hypothesized word unchanged.

more 1,2,3,4,5... error correction using phonetic distance DATA: NYT/APW, coverage: 43%, 40% of corrected names are covered. Although, there is a direct impact on IE, there is minor improvement in overall WER of the data.

Recap Detect OOV errors. Generate targeted name lists for candidate OOV Offline generation of a large name list and online pruning based on a phonetic distance. The resulting list can be used in a rescoring pass in automatic speech recognition. Wide variety of sources, including automatic name phrase tagging of temporally relevant news text can be used for NE correction.

Conclusion Error detection combined with phonetically ranked list helps. Same name list generation could be useful for generating homophones list. Phoneme lattice could be a richer representation instead of word lattice. Correction of multi-word phrases would help as oppposed single word because of automated alignment issues. Dealing with plural and possesive forms could be addressed.

Thanks ! The Hanks