Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

Language Modeling.
N-gram model limitations Important question was asked in class: what do we do about N-grams which were not in our training corpus? Answer given: we distribute.
Albert Gatt Corpora and Statistical Methods – Lecture 7.
Smoothing N-gram Language Models Shallow Processing Techniques for NLP Ling570 October 24, 2011.
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Measuring the Influence of Long Range Dependencies with Neural Network Language Models Le Hai Son, Alexandre Allauzen, Franc¸ois Yvon Univ. Paris-Sud and.
Bag-Of-Word normalized n-gram models ISCA 2008 Abhinav Sethy, Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY Presented by Patty.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
N-Gram Language Models CMSC 723: Computational Linguistics I ― Session #9 Jimmy Lin The iSchool University of Maryland Wednesday, October 28, 2009.
Smoothing Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
1 Smoothing LING 570 Fei Xia Week 5: 10/24/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A AA A A A.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Learning Bit by Bit Class 4 - Ngrams. Ngrams Counting words Using observation to make predictions.
Natural Language Understanding
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
12/13/2007Chia-Ho Ling1 SRILM Language Model Student: Chia-Ho Ling Instructor: Dr. Veton Z. K ë puska.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
11/24/2006 CLSP, The Johns Hopkins University Random Forests for Language Modeling Peng Xu and Frederick Jelinek IPAM: January 24, 2006.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者:郝柏翰 2013/06/04 Thorsten Brants, Ashok.
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
An Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing Narges Sharif-Razavian and Andreas Zollmann.
1 Using a Large LM Nicolae Duta Richard Schwartz EARS Technical Workshop September 5, Martigny, Switzerland.
Lecture 4 Ngrams Smoothing
Language modelling María Fernández Pajares Verarbeitung gesprochener Sprache.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
Estimating N-gram Probabilities Language Modeling.
Search and Decoding Final Project Identify Type of Articles Using Property of Perplexity By Chih-Ti Shih Advisor: Dr. V. Kepuska.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Natural Language Processing Statistical Inference: n-grams
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.
Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.
Language Model for Machine Translation Jang, HaYoung.
An overview of decoding techniques for LVCSR
N-Gram Model Formulas Word sequences Chain rule of probability
CSCE 771 Natural Language Processing
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Dynamic Programming Search
INF 141: Information Retrieval
A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert Bang-Xuan Huang Department of Computer.
Presentation transcript:

Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research Europe Limited, Cambridge Research Lab, Cambridge, UK Presented by Patty Liu

2 Introduction The basic idea of LMLA is to use look-ahead probabilities as linguistic scores when the current word is unknown. This paper focuses on efficient LMLA probability generation. A new method to generate the higher order LMLA probabilities from the lower order LMLA trees is presented. The method takes advantage of the sparseness of the n-gram LM to avoid unnecessary computation. Only nodes that are related to explicitly estimated n-gram value are updated, the rest of the nodes are backoff of the corresponding nodes in the (n-1)-gram LMLA tree.

3 The Computation Cost of LMLA Given a particular LM context, the calculation of LM look-ahead probabilities can be divided into 2 parts. -The first part calculates the LM probabilities of every word in the vocabulary based on the LM context. -The second part involves assigning LM look-ahead probability to every node in the LM look-ahead network through a dynamic programming procedure.

4 The Computation Cost of LMLA Ex: Supposing that the vocabulary contains V words and the LM look-ahead network contains M nodes. For each LM history occurring in the search space, the LVCSR system has to step 1 : look up V probabilities step 2 : generate M look-ahead probabilities The values of V and M are quite big in LVCSR system. Typically during the recognition process of one sentence, there are several hundred bi-gram contexts and several thousand trigram contexts occurring in the search space. For the higher order n gram, the number of LM contexts in the search space is bigger.

5 The New Method of Efficient LMLA Probability Generation I. The data sparseness of n-gram model It indicates that when the history-word pair can not be found in the n-gram data, the lower order model is used as the back-off estimate. f(.) : the discounted LM probability read from n-gram file C(.) : the frequency of the event occurring in training corpus Backoff(h) : the backoff parameters of history h h’ : the lower order history of h

6 The New Method of Efficient LMLA Probability Generation Practically speaking, for large vocabulary applications, given a history h, the number of different history-word pairs that can be found in the training data is much smaller than the size of the vocabulary V. This means that for every word history h, most of the n- gram probabilities are given by the back-off estimate. This phenomenon can be used to accelerate the calculation in language modeling.

7 The New Method of Efficient LMLA Probability Generation II. Calculating the LMLA probabilities from lower order LMLA information In this new method, only the LM look-ahead probabilities in a small subset of the nodes need to be updated, while for most of the nodes in the LM look-ahead tree, their LM look-ahead probability can be copied directly from the backoff LM look- ahead tree. The definition of the LM look-ahead in node n is the maximum LM probability over all the words that can be reached from n, which can be expressed as: W(n) represents the set of the words that can be reached from node n.

8 The New Method of Efficient LMLA Probability Generation The definition of LM look-ahead can be re-written as: Therefore, the nodes in the LMLA tree can be divided into 2 parts, i.e.

9 The New Method of Efficient LMLA Probability Generation The new method can be divided into 4 steps : Step 1: generating a lower order LM look-ahead network, T, for each node n in T Step 2: multiplying lower order LM look-ahead probabilities by the backoff parameters of history h, to generate a backoff LM look-ahead network tree, T’, for each node n in T’ Step 3: for each word w that co-occurred with the LM context h in the training corpus, replace the backoff LM probability in the leaf nodes of T’ with the explicit LM probabilities in n-gram model, i.e. if C(h,w)>0, using f(w|h) to replace f(w|h’)*backoff(h) in T’. Step 4: for each word w in W={w|C(h,w)>0}, update the LM look- ahead probabilities in the node from which the w can be reached, using the dynamic programming procedure.

10 The New Method of Efficient LMLA Probability Generation Given the LM context h, supposing that only 2 words: w1 and w3, have explicit LM probabilities, the new method only needs to calculate the LMLA probabilities in the nodes from which the w1 and w3, can be reached. The proposed method reduces the CPU cost significantly by calculating only a subset of nodes in the LM look-ahead tree.

11 Multi-layer Cache System for LMLA

12 Experimental Results

13 Experimental Results Because the trigram data is very sparse compared to bi-gram data, the nodes to be updated in trigram LMLA are much less than those in bi-gram LMLA. Therefore, most of the calculation cost is from bi-gram LMLA even it is called less frequently.