University of Alberta Letter-to-phoneme conversion Sittichai Jiampojamarn CMPUT 500 / HUCO 612 September 26, 2007.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Linear Classifiers (perceptrons)
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Support Vector Machines
Machine learning continued Image source:
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Online Learning Algorithms
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Data mining and machine learning A brief introduction.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Graphical models for part of speech tagging
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
7-Speech Recognition Speech Recognition Concepts
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Transcription of Text by Incremental Support Vector machine Anurag Sahajpal and Terje Kristensen.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Classification: Feature Vectors
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Today Ensemble Methods. Recap of the course. Classifier Fusion
L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Letter to Phoneme Alignment Using Graphical Models N. Bolandzadeh, R. Rabbany Dept of Computing Science University of Alberta 1 1.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
John Lafferty Andrew McCallum Fernando Pereira
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Part of Speech Tagging in Context month day, year Alex Cheng Ling 575 Winter 08 Michele Banko, Robert Moore.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Chapter 6 Neural Network.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Applying Deep Neural Network to Enhance EMPI Searching
Artificial Intelligence
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
CS 4/527: Artificial Intelligence
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

University of Alberta Letter-to-phoneme conversion Sittichai Jiampojamarn CMPUT 500 / HUCO 612 September 26, 2007

University of Alberta Outline Part I –Introduction to letter-phoneme conversion Part II –Many-to-Many alignments and Hidden Markov Models to Letter- to-phoneme conversion., NAACL 2007 Part III –On-going work: discriminative approaches for letter-to-phoneme conversion Part IV –Possible term projects for CMPUT 500 / HUGO 612

University of Alberta The task Converting words to their pronunciations –study -> [ s t ʌ d I ] –band-> [b æ n d ] –phoenix-> [ f i n I k s ] –king -> [ k I ŋ ] Words  sequences of letters. Pronunciations  sequence of phonemes. –Ignoring syllabifications, and stresses.

University of Alberta Why is it important? Major component in speech synthesis systems Word similarity based on pronunciation –Spelling correction. (Toutanova and Moore, 2001) Linguistic interest of relationships between letters and phonemes. Not a trivial task, but tractable.

University of Alberta Trivial solutions ? Dictionary – searching answers on database –Great effort to construct such large lexicon database. –Can’t handle new words and misspellings. Rule-based approaches –Work well on non-complex languages –Fail on complex languages Each word creates its own rules. --- end up with remembering word-phoneme pairs.

University of Alberta John Kominek and Alan W. Black, “Learning Pronunciation Dictionaries: Language Complexity and Word Selection Strategies”, In proceeding of HLT-NAACL 2006, June 4-9, pp

University of Alberta Learning-based approaches Training data –Examples of words and their phonemes. Hidden structure –band  [b æ n d ] b  [b], a  [æ], n  [n], d  [d] –abode  [ə b o d] a  [ ə ], b  [b], o  [o], d  [d], e  [ _ ]

University of Alberta Alignments To train L2P, we need alignments between letters and phonemes a ->[ə] b ->[b] o ->[o] d ->[d] e ->[_]

University of Alberta Overview standard process

University of Alberta Letter-to-phoneme alignments Previous work assumed one-to-one alignment for simplicity (Daelemans and Bosch, 1997; Black et al., 1998; Damper et al., 2005). Expectation-Maximization (EM) algorithms are used to optimize the alignment parameters. Matching all possible letters and phonemes iteratively until the parameters converge.

University of Alberta 1-to-1 alignments Initially, alignments parameters can start from uniform distribution, or counting all possible letter-phoneme mapping. Ex. abode  [ə b o d] P(a, ə) = 4/5 P(b,b) = 3/5 …

University of Alberta 1-to-1 alignments Find the best possible alignments based on current alignment parameters. Based on the alignments found, update the parameters.

University of Alberta Finding the best possible alignments Dynamic programming: –Standard weighted minimum edit distance algorithm style. –Consider the alignment parameter P(l,p) is a mapping score component. –Try to find alignments which give the maximum score. –Allow to have null phonemes but not null letters It is hard to incorporate null letters in the testing data

University of Alberta Visualization

University of Alberta Visualization

University of Alberta Visualization

University of Alberta Visualization

University of Alberta Visualization

University of Alberta Visualization

University of Alberta Visualization

University of Alberta Visualization

University of Alberta Visualization

University of Alberta Visualization

University of Alberta Problems with 1-to-1 alignments Double letters: two letters map to one phoneme. (e.g. ng [ŋ], sh [ ʃ ], ph [f])

University of Alberta Problem with 1-to-1 alignments Double phonemes: one letter maps to two phonemes. (e.g. x [k s], u [j u])

University of Alberta Previous solutions for double phonemes Preprocess using a fix list of phonemes. –[k s] -> [X] –[j u] -> [U]

University of Alberta Applying many-to-many alignments and Hidden Markov Models to Letter-to-Phoneme conversion Sittichai Jiampojamarn, Grzegorz Kondrak and Tarek Sherif Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL- HLT 2007), Rochester, NY, April 2007, pp

University of Alberta Overview system Prediction process Alignment process

University of Alberta Many-to-many alignments EM-based method. Extended from the forward-backward training of a one-to-one stochastic transducer (Ristad and Yianilos, 1998). Allow one or two letters to map to null, one, or two phonemes.

University of Alberta Many-to-many alignments

University of Alberta Many-to-many alignments

University of Alberta Many-to-many alignments

University of Alberta Prediction problem Should the prediction model generate phonemes from one or two letters ? –gash [g æ ʃ ] gasholder [g æ s h o l d ə r]

University of Alberta Letter chunking A bigram letter chunking prediction automatic discovers double letters. Ex. longs

University of Alberta Overview system Prediction process Alignment process

University of Alberta Phoneme prediction Once the training examples are aligned, we need a phoneme prediction model. “Classification task” or “sequence prediction”?

University of Alberta Instance based learning Store the training examples. The predicted class is assigned by searching the “most similar” training instance. The similarity functions: –Hamming distance, Euclidean distance, etc.

University of Alberta Basic HMMs A basic sequence-based prediction method. In L2P, –letters are observations –phonemes are states Output phoneme sequences depend on both emission and transition probabilities.

University of Alberta Applying HMM Use an instance based learning to produce a list of candidate phones with confidence values “ conf(phone i ) ” for each letter i. (emission probability). Use a language model of phoneme sequence in the training data to obtain transition probability P(phone i | phone i-1, … phone i-n ).

University of Alberta Visualization Buried -> [ b E r aI d ] = 2.38 x Buried -> [ b E r I d ] = 2.23 x 10 -6

University of Alberta Evaluation Data sets –English: CMUDict (112K), Celex (65K). –Dutch: Celex (116K). –German: Celex (49K). –French: Brulex (27K). IB1 algorithm implemented in TiMBL package as the classifier. (W. Daelemans et al., 2004.) Results are reported in word accuracy rate based on 10- fold cross validation.

University of Alberta

Messages Many-to-many alignments show significant improvements over one-to-one traditional alignments. HMM-like approach helps when a local classify has difficulty to predict phonemes.

University of Alberta Criticism Joint models –Alignments, chunking, prediction, and HMM. Error propagation –Errors from one model to other models which are unlikely to re-correct. Can we combine and optimize at once ? Or at least allow the system to re-correct past errors ?

University of Alberta On-going work Discriminative approach for letter-to-phoneme conversion

University of Alberta Online discriminative learning Let x is an input word and y is an output phonemes. represents features describing x and y. is a weight vector for

University of Alberta Online training algorithm 1.Initially, 2.For k iterations 1.For all letter-phoneme sequence pairs (x,y) update weights according to and

University of Alberta Perceptron update (Collins, 2002) Simple update training method. Try to move the weights to the direction of correct answers when predicting wrong answers.

University of Alberta Examples Separable case Adapted from Dan Klein’s tutorial slides at NAACL 2007.

University of Alberta Examples Non-separable case Adapted from Dan Klein’s tutorial slides at NAACL 2007.

University of Alberta Issues with Perceptron Overtraining: test / held-out accuracy usually rises, then falls. Regularization: –if the data isn’t separable, weights often thrash around. –Finds a “barely” separating solution Taken from Dan Klein’s tutorial slides at NAACL 2007.

University of Alberta Margin Infused Relaxed Algorithm (MIRA) (Crammer and Singer, 2003) Use n-best list to update weights. separate by a margin at least as large as a loss function and keep the weight changes as small as possible.

University of Alberta Loss function in letter-to-phoneme Describe the loss of an incorrect prediction compared to the correct one. Word error (0/1), phoneme error, or combination.

University of Alberta Results Incomplete !!! –MIRA outperforms Perceptron. –Using 0/1 loss and combination loss are better than the phoneme loss function alone. –Overall, results show better performance than previous work.

University of Alberta Possible term projects

University of Alberta Possible term projects 1.Explore more linguistic features. 2.Explore machine translation systems for letter-to- phoneme conversion. 3.Unsupervised approaches for letter-to-phoneme conversion. 4.Other cool ideas to improve on a partial system –Data for evaluation are provided –Alignments are provided. –L2P model are provided.

University of Alberta Linguistic features Looking for linguistic features to help L2P –Most systems incorporate letter feature (n-gram) type in some ways. The new features (must) be obtained by using (only) word information. Works been already done –Syllabification : Susan’s thesis Find syllabification break on letters using SVM approach.

University of Alberta Machine translation approach L2P problem can be seen as a (simple) machine translation problem. Where, we’d like to translate letters to phonemes. –Consider: L2P  MT Letters  words Words  sentences Phonemes  target sentences Moses -- a baseline SMT system, ACL 2007 – –May need to also look at GIZA++, Pharaoh, Carmel, etc.

University of Alberta Unsupervised approaches Assuming, we don’t have examples of word-phoneme pairs to train a model. We can start from a list of possible letter-phoneme mappings Or assuming, we have a small set of example pairs (~100 pairs). Don’t expect to outperform the supervised approach but take advantage of being unsupervised methods

University of Alberta References Collins, M Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Proceedings of the Acl-02 Conference on Empirical Methods in Natural Language Processing - Volume 10 Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 1-8 Crammer, K. and Singer, Y Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3 (Mar. 2003), Kristina Toutanova and Robert C. Moore “Pronunciation modeling for improved spelling correction”. In ACL’02: pp , John Kominek and Alan W Black, “Learning Pronunciation Dictionaries Language Complexity and Word Selection Strategies”, NAACL06, pp , Walter M. P. Daelemans and Antal P. J. van den Bosch “Language-independent data- oriented grapheme-to-phoneme conversion.” In Progress in Speech Synthesis, pages Springer, New York. Alan W Black, Kevin Lenzo, and Vincent Pagel “Issues in building general letter to sound rules”. In The Third ESCA Workshop in Speech Synthesis, pages

University of Alberta References Robert I. Damper, Yannick Marchand, John DS. Marsters, and Alexander I. Bazin “Aligning text and phonemes for speech technology applications using an EM-like algorithm”, International Journal of Speech Technology, 8(2): , June Eric Sven Ristad and Peter N. Yianilos “Learning string-edit distance.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5): Walter Daelemans, Jakub Zavrel, Ko Van Der Sloot, and Antal Van Den Bosch “TiMBL: Tilburg Memory Based Leaner, version 5.1, reference guide.” In ILK Technical Report Series , 2004.