Advances in WP2 Nancy Meeting – 6-7 July 2006 www.loquendo.com.

Slides:

Advertisements

Similar presentations

Zhijie Yan, Qiang Huo and Jian Xu Microsoft Research Asia

Advertisements

Slide number 1 EE3P BEng Final Year Project Group Session 2 Processing Patterns using a Multi- Layer Perceptron (MLP) Martin Russell.

Advances in WP2 Torino Meeting – 9-10 March

Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.

歡迎 IBM Watson 研究員詹益毅博士蒞臨國立台灣師範大學. Hai-Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, Franc¸ois Yvon ICASSP 2011 許曜麒 Structured Output.

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

Advances in WP1 Turin Meeting – 9-10 March

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Advances in WP1 Nancy Meeting – 6-7 July

Neural Networks Basic concepts ArchitectureOperation.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.

Advances in WP2 Trento Meeting – January

Speaker Adaptation for Vowel Classification

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary.

Advances in WP2 Chania Meeting – May

VARIABLE PRESELECTION LIST LENGTH ESTIMATION USING NEURAL NETWORKS IN A TELEPHONE SPEECH HYPOTHESIS-VERIFICATION SYSTEM J. Macías-Guarasa, J. Ferreiros,

HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.

Chapter 6: Multilayer Neural Networks

Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.

Optimal Adaptation for Statistical Classifiers Xiao Li.

Advances in WP1 and WP2 Paris Meeting – 11 febr

HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.

Advances in WP1 Chania Meeting – May

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Conditional Random Fields   A form of discriminative modelling   Has been used successfully in various domains such as part of speech tagging and other.

Isolated-Word Speech Recognition Using Hidden Markov Models

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.

7-Speech Recognition Speech Recognition Concepts

Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

NEURAL NETWORKS FOR DATA MINING

Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,

Multiple parallel hidden layers and other improvements to recurrent neural network language modeling ICASSP 2013 Diamantino Caseiro, Andrej Ljolje AT&T.

Algoritmi e Programmazione Avanzata

Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.

Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

CS621 : Artificial Intelligence

Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

Research & Technology Progress in the framework of the RESPITE project at DaimlerChrysler Research & Technology Dr-Ing. Fritz Class and Joan Marí Sheffield,

Statistical Models for Automatic Speech Recognition Lukáš Burget.

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.

A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.

1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

NTNU Speech and Machine Intelligence Laboratory 1 Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models 2016/05/31.

Recurrent Neural Networks for Natural Language Processing

Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang

Conditional Random Fields for ASR

RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION

Self organizing networks

Statistical Models for Automatic Speech Recognition

Jeremy Morris & Eric Fosler-Lussier 04/19/2007

Automatic Speech Recognition: Conditional Random Fields for ASR

network of simple neuron-like computing elements

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.

Sequence Student-Teacher Training of Deep Neural Networks

DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS Mirac Goksu Ozturk1, Okan Ulusoy1, Cenk.

Deep Neural Network Language Models

Presentation transcript:

Advances in WP2 Nancy Meeting – 6-7 July

2 Recent Work on NN Adaptation in WP2 State of the art LIN adaptation method implemented and experimented on the benchmarks (m12) Innovative LHN adaptation method implemented and experimented on the benchmarks (m21) Experimental results on benchmark corpora and Hiwire database with LIN and LHN (m21) Further advances on new adaptation methods (m24)

3 LIN Adaptation Output layer …. Input layer 1 st hidden layer 2 nd hidden layer Emission Probabilities Acoustic phonetic Units Speech Signal parameters …. Speaker Independent MLP SI-MLP LIN

4 LHN Adaptation Output layer …. Input layer 1 st hidden layer 2 nd hidden layer Emission Probabilities Acoustic phonetic Units Speech Signal parameters …. Speaker Independent MLP SI-MLP LHN

5 Results Summary (W.E.R.) Test setbaselineLIN adapted E.R.LHN adapted E.R WSJ0 16kHz bigr LM % % WSJ1 Spoke-3 16kHz bigr LM % % HIWIRE 8kHz % %

6 Papers presented: Roberto Gemello, Franco Mana, Stefano Scanzio, Pietro Laface, Renato De Mori, “Adaptation of Hybrid ANN/HMM models using hidden linear transformations and conservative training”, Proc. of Icassp 2006, Toulouse, France, May 2006 Dario Albesano, Roberto Gemello, Pietro Laface, Franco Mana, Stefano Scanzio, “Adaptation of Artificial Neural Networks Avoiding Catastrophic Forgetting”, Proc. of IJCNN 2006, Vancouver, Canada, July 2006

7 The “Forgetting” problem in ANN Adaptation It is well known, in connectionist learning, that acquiring new information in the adaptation process can damage previously learned information (Catastrophic Forgetting) This effect must be taken into account when adapting an ANN with limited amount of data, which do not include enough samples for all the classes. The “absent” classes may be forgotten during adaptation as the discriminative training (Error Backpropagation) assigns always zero targets to absent classes

8 “Forgetting” in ANN for ASR While Adapting ASR ANN/HMM model, this problem can arise when the adaptation set does not contain examples for some phonemes, due to the limited amount of adaptation data or the limited vocabulary The ANN training is discriminative, contrary to that of GMM-HMMs, and absent phonemes will be penalized by assigning to them a zero target during the adaptation That induces in the ANN a forgetting of the capability to classify the absent phonemes. Thus, while the HMM models for phonemes with no observations remain un-adapted, the ANN output units corresponding to phonemes with no observations loose their characterization, rather than staying not adapted

9 Example of Forgetting Adaptation examples only of E, U, O (e.g. from words: uno, due, tre); no examples for the other vowels (A, I, ə ) The classes with examples adapt themselves, but tend to invade the classes with no examples, that are partially “forgotten” F1 (kHz) F2 (kHz) E e A U O F1 (kHz) F2 (kHz) I E e A U O I

10 “Conservative” Training We have introduced “conservative training” to avoid the forgetting of absent phonemes The idea is to avoid zero target for the absent phonemes, using for them the output of the Original NN as target; Let be F P the set of phonemes present in the adaptation set and F A the set of absent ones. The target are assigned according to the following equations: Standard policy Conservative policy

11 Conservative Training target assignment policy A1P1P1 P2P2 P3P3 A2 P 2 is the class corresponding to the correct phoneme P x : class in the adaptation set A x : absent class Posterior probability computed using the original network Standard target assignment policy

12 “Conservative” Training In this way, the phonemes that are absent in the adaptation set are “represented” by the response given by the Original NN Thus, the absent phonemes are not “absorbed” by the neighboring present phonemes The results of adaptation with conservative training are: –Comparable performances on target environment –Preservation of performances on generalist environment –Great improvement of performances in speaker adaptation, when only few sentences are available

13 Adaptation tasks –Application data adaptation: Directory Assistance 9325 Italian city names training test utterances –Vocabulary adaptation: Command words 30 command words 6189 training test utterances –Channel-Environment adaptation: Aurora training test utterances

14 Adaptation Results on different tasks (%WER) Adaptation Task Adaptation Method Application Directory Assistance Vocabulary Command Words Channel- Environment Aurora-3 CH1 No adaptation LIN LIN + CT LHN LHN + CT

15 Mitigation of Catastrophic Forgetting using Conservative Training Models Adapted on Application Directory Assistance Vocabulary Command Words Channel- Environment Aurora-3 CH1 Adaptation Method LIN LIN + CT LHN LHN + CT No Adaptation29.3 Tests using adapted models on Italian continuous speech (% WER)

16 Conclusions –The new LHN adaptation method, developed within the project, outperforms standard LIN adaptation –In adaptation tasks with missing classes, Conservative Training reduces the catastrophic forgetting effect, preserving the performance on another generic task

17 Workplan Selection of suitable benchmark databases (m6) Baseline set-up for the selected databases (m8) LIN adaptation method implemented and experimented on the benchmarks (m12) Experimental results on Hiwire database with LIN (m18) Innovative NN adaptation methods and algorithms for acoustic modeling and experimental results (m21) Further advances on new adaptation methods (m24) Unsupervised Adaptation: algorithms and experimentation (m33)