Construction of phoneme-to-phoneme converters -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Slides:



Advertisements
Similar presentations
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Advertisements

Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Language and Cognition Colombo, June 2011 Day 8 Aphasia: disorders of comprehension.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Advances in WP2 Torino Meeting – 9-10 March
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
DYNAMIC ADAPTATION FOR LANGUAGE AND DIALECT IN A SPEECH SYNTHESIS SYSTEM Craig Olinsky Media Lab Europe / University College Dublin.
Nonparametric-Bayesian approach for automatic generation of subword units- Initial study Amir Harati Institute for Signal and Information Processing Temple.
J. Kunzmann, K. Choukri, E. Janke, A. Kießling, K. Knill, L. Lamel, T. Schultz, and S. Yamamoto Automatic Speech Recognition and Understanding ASRU, December.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
ETRW Modelling Pronunciation variation for ASR ESCA Tutorial & Research Workshop Modelling pronunciation variation for ASR INTRODUCING MULTIPLE PRONUNCIATIONS.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Advances in WP2 Trento Meeting – January
Non-native Speech Languages have different pronunciation spaces
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
VARIABLE PRESELECTION LIST LENGTH ESTIMATION USING NEURAL NETWORKS IN A TELEPHONE SPEECH HYPOTHESIS-VERIFICATION SYSTEM J. Macías-Guarasa, J. Ferreiros,
Methodologies for improving the g2p conversion of Dutch names Henk van den Heuvel, Nanneke Konings (CLST, Radboud Universiteit Nijmegen) Jean-Pierre Martens.
Why is ASR Hard? Natural speech is continuous
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Semantic and phonetic automatic reconstruction of medical dictations STEFAN PETRIK, CHRISTINA DREXEL, LEO FESSLER, JEREMY JANCSARY, ALEXANDRA KLEIN,GERNOT.
Aural Processing and Barriers to Listening. Aims and Objectives To describe the listening process To suggest some barriers that the teacher imposes To.
PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.
Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
1 Computational Linguistics Ling 200 Spring 2006.
1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
SLTU 2014 – 4th Workshop on Spoken Language Technologies for Under-resourced Languages St. Petersburg, Russia KIT – University of the State.
Korea Maritime and Ocean University NLP Jung Tae LEE
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
1 A Hierarchical Approach to Wrapper Induction Presentation by Tim Chartrand of A paper bypaper Ion Muslea, Steve Minton and Craig Knoblock.
Introduction to CL & NLP CMSC April 1, 2003.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
PLS Considerations on using PLS for Slovenian Pronunciation Lexicon Construction Jerneja Žganec Gros Alpineon d.o.o., Ljubljana, Slovenia
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE symposium - 6/2/091 Language modelling (word FST) Operational model for categorizing mispronunciations.
Assessment of Phonology
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 8.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
© 2013 by Larson Technical Services
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
Chapter 1 Introduction PHONOLOGY (Lane 335). Phonetics & Phonology Phonetics: deals with speech sounds, how they are made (articulatory phonetics), how.
Speech Recognition Created By : Kanjariya Hardik G.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Search and Annotation Tool for Oral History INTER-VIEWS Henk van den Heuvel, Centre for Language and Speech Technology (CLST) Radboud University Nijmegen,
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
Linguistic Phonics Coordinator’s Training Pack 2.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
VISUAL WORD RECOGNITION. What is Word Recognition? Features, letters & word interactions Interactive Activation Model Lexical and Sublexical Approach.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Olivier Siohan David Rybach
Linguistic knowledge for Speech recognition
Rohit Kumar *, Amit Kataria, Sanjeev Sofat
Artificial Intelligence 2004 Speech & Natural Language Processing
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Construction of phoneme-to-phoneme converters P2P learning requires the orthographic transcription, an initial G2P transcription and a target phonemic transcription (e.g. TY or AV) of a sufficiently large collection of name utterances. These 3-tuples are supplied to a 4 step training procedure: Two-fold alignment: Orthography ↔ Initial transcription ↔ Target transcription Transformation retrieval Generation of training examples: describe linguistic context  Previous and next phonemes and graphemes  Lexical context (Part Of Speech)  Prosodic context (stressed syllable or not)  Morphological context (word prefix/suffix)  External features: e.g. name type, name source, speaker tongue Rule induction  Learn decision tree per input (pattern): stochastic rules in leaf nodes  Rule formalism: if context → leaf node then [input pattern] → [output pattern] with probability P fir In generation mode: rules applied to initial G2P transcription of unseen name  variants with probabilities Towards improved proper name recognition Bert Réveil and Jean-Pierre Martens DSSP group, Ghent University, Department of Electronics and Information Systems Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium Topic description Automatic proper name recognition is a key component of multiple speech-based applications (e.g. voice-driven navigation systems). This recognition is challenged by the mismatch between the way the names are represented in the recognizer and the way they are actually pronounced: Incorrect phonemic name transcriptions: common grapheme-to-phoneme (G2P) converters can’t cope with archaic spelling and foreign name parts, manual transcriptions are too costly (e.g. Ugchelsegrensweg, Haînautlaan) Multiple plausible name pronunciations: within or across languages (e.g. Roger) Cross-lingual pronunciation variation: foreign names, foreign application users In order to improve the phonemic transcriptions and capture the pronunciation variation we adopt acoustic and lexical modeling approaches. Acoustic modeling targets a better modeling of the expected utterance sounds. Lexical modeling tries to foresee the most plausible phonemic transcription(s) for each name in the recognition lexicon. Experimental set-up Database: Autonomata Spoken Name Corpus (ASNC) 120 Dutch, 40 English, 20 French, 40 Moroccan and 20 Turkish speakers Every speaker reads 181 names with either Dutch, English, French, Moroccan or Turkish origin Non-overlapping train and test set (disjunctive names, speakers) Human expert transcriptions -TY: typical Dutch transcription (one for each name from TeleAtlas) -AV: auditory verified Dutch transcription (one for each name utterance) This work: only Dutch native utterances + non-native utterances of Dutch names Speech recognizer: state-of-the-art VoCon 3200 from Nuance Grammar: name loop with 21K different names (3.5K names of ASNC K others) RECOGNITION SYSTEM GPS Please guide me towards ‘A&u.stIn HMMs … “O” Lexicon … Austin 'O.stIn … Table 1: Number of utterances for all (speaker,name) pairs in train and test set SetDUENFRMOTU (DU,*)train test (*,DU)train test Acknowledgments The presented work was carried out in the Autonomata TOO project, granted under the Dutch-Flemish STEVIN program ( with partners RU Nijmegen, Universiteit Utrecht, Nuance and TeleAtlas. References [1] B. Réveil, J.-P. Martens and B. D’hoore, How speaker tongue and name source language affect the automatic recognition of spoken names, in Proc. InterSpeech 2009, UK, Brighton [2] H. van den Heuvel, B. Réveil and J.-P. Martens, Pronunciation-based ASR for names, in Proc. InterSpeech 2009, UK, Brighton [3] B. Réveil, J.-P. Martens and H. van den Heuvel, Improving proper name recognition by adding automatically learned pronunciation variants to the lexicon, in Proc. LREC 2010, Valletta, Malta Acoustic and lexical modeling strategies The modeling approaches are firstly conceived for the primary targeted users, also called the native (NAT) users (in our case Dutch natives). W.r.t. these users, two types of non-native languages are distinguished: foreign languages that most NAT speakers are familiar with (NN1), and other foreign languages (NN2). Strategy 1: Incorporating NN1 language knowledge Acoustic modeling: two model sets - AC-MONO: standard NAT Dutch model (trained on Dutch speech alone) - AC-MULTI: Dutch (20%) and NN1 training data (English, French and German)  Lexical modeling - G2P transcribers for NAT and NN1 languages (Nuance RealSpeak TTS)  Foreign transcriptions are nativized in combination with AC-MONO - Data-driven selection of one extra G2P converter per name origin Strategy 2: Creating pronunciation variants (lexical modeling) - Computed per (speaker, name) combination - Created from initial G2P transcriptions by means of automatically learned phoneme-to-phoneme (P2P) converters ~Dirk()Van Den ~Bo~ssche ‘dIrK_fAn_dEn_‘bO.s$ ‘dirk_vAn_d$m_~bO.s$ High level features Orthography Initial transcription Alignment process (letter-to-sound) Alignment process (sound-to-sound) Target transcription Transformation learning Example generation Learn morphological classes Stochastic rule induction Experimental assessment Incorporating NN1 language knowledge Including extra G2P transcriptions (acoustic model = AC-MONO) - Boost for (DU,-DU): NAT speakers use NN1 knowledge when reading foreign names, including NN2 names - Degradation for (DU,DU): reduced by selecting only one extra G2P Decoding with multilingual acoustic model - NAT speakers: loss for NAT names, boost for English names only  Dutch sounds not as well modeled as before  English better known than French?  English and Dutch sound inventories differ more than French and Dutch? - Foreign speakers: boost for both NN1 name origins -mother tongue sounds better modeled Plain multilingual G2P transcriptions bring no improvement Creating pronunciation variants Baseline P2Ps: Dutch G2P transcriptions as initials, AV transcriptions as targets - Alternative P2Ps for (DU,NN1) and (NN1,DU) cells -create additional P2P that starts from NN1 G2P transcriptions -combine most probable variants generated by both P2P converters P2P variants lead to significant improvements for all (speaker, name) cells % relative for NAT + foreign names, % for foreign speakers Table 2: Name Error Rate (%) for systems with G2P lexicons (spkr,name)SystemDUENFRMOTU (DU,*)AC-MONO + DUN G2P AC-MONO + 4G2P (nativized) AC-MONO + G2P-selection (nativized) AC-MULTI + G2P-selection (nativized) AC-MULTI + G2P-selection (plain) (*,DU)AC-MONO + DUN G2P AC-MONO + 4G2P (nativized) AC-MONO + G2P-selection (nativized) AC-MULTI + G2P-selection (nativized) AC-MULTI + G2P-selection (plain) Table 3: Name Error Rate (%) for systems with P2P transcription variants (spkr,name)SystemDUENFRMOTU (DU,*)AC-MULTI + G2P-selection (nativized) P2P variants (baseline) P2P variants (alternative) (*,DU)AC-MULTI + G2P-selection (nativized) P2P variants (baseline) P2P variants (alternative)