LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01

Slides:

Advertisements

Similar presentations

BEST FIRST SEARCH - BeFS

Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.

Building an ASR using HTK CS4706

Sphinx-3 to 3.2 Mosur Ravishankar School of Computer Science, CMU Nov 19, 1999.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.

Application of HMMs: Speech recognition “Noisy channel” model of speech.

Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.

2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.

Natural Language Understanding

Automatic Transcript Generation Helmer Strik A 2 RT Dept. of Language & Speech University of Nijmegen.

1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic.

Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.

Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,

The 2000 NRL Evaluation for Recognition of Speech in Noisy Environments MITRE / MS State - ISIP Burhan Necioglu Bryan George George Shuttic The MITRE.

1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Speech and Language Processing

7-Speech Recognition Speech Recognition Concepts

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.

1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,

Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.

HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.

8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.

Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.

The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.

Chapter 12 search and speaker adaptation 12.1 General Search Algorithm 12.2 Search Algorithms for Speech Recognition 12.3 Language Model States 12.4 Speaker.

Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

A NONPARAMETRIC BAYESIAN APPROACH FOR

CS 224S / LINGUIST 285 Spoken Language Processing

Automatic Speech Recognition

Juicer: A weighted finite-state transducer speech decoder

An overview of decoding techniques for LVCSR

Automatic Speech Recognition Introduction

專題研究 week3 Language Model and Decoding

8.0 Search Algorithms for Speech Recognition

HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs

Tight Coupling between ASR and MT in Speech-to-Speech Translation

Mohamed Kamel Omar and Lidia Mangu ICASSP 2007

Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )

Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems

Automatic Speech Recognition: Conditional Random Fields for ASR

Dynamic Programming Search

A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert Bang-Xuan Huang Department of Computer.

Presenter : Jen-Wei Kuo

Presentation transcript:

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) The SPEERAL Decoder NOCERA Pascal Laboratoire d Informatique d Avignon AGROPARC BP 1228, AVIGNON Cedex 9 Tel :

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop The SPEERAL System Stochastic approach Find the best hypothesis among all the possible hypotheses with the A* algorithm.

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop The SPEERAL System Stochastic approach

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Acoustic Models Hidden Markov Models Gaussian Mixture Models Contextual Models (Phonemes)

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Acoustic Model Toolkit Parameterization program Text to phone program Alignment program HMM learning program Supervised and unsupervised Model Adaptation –MLLR –MAP –Structural Model Space Transformation

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Linguistic Models Stochastic Language Models –N-grams –Class based language models

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Linguistic Model Toolkit Text Normalization Tools Language Model Training –CMU toolkit –SRI toolkit –AT&T toolkit Language Model Compilation Lexicon Compilation

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Standard A* algorithm « best-first » search algorithm –Extend the best path to generate new candidates –Assign a score F(x) to all explored path g(x) combines Language Model and acoustic scores h(x) estimates the probability of the best extension –Keep the list of explored paths as a priority queue –When the best path reaches end then stop

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Requires an admissible heuristic function –h(x) underestimates the true remaining cost path (the more accurate the better). H euristics samples –h(x) = 0 Breadth-First search –h(x) = true remaining cost (i.e. F(x) never changes) Deterministic search Standard A* algorithm (2/2)

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop The SPEERAL System Language model –Stochastic n-gram LM (n=3) Lexical, phonetic and acoustic knowledge source –Acoustic model (HMM, …) –Decoding vocabulary (lexicon) –Input signal Phoneme lattice ( p, beg, end, sc ) with score sc = P(X beg..end /p) + …/…

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Sounding function h Remaining path estimation –Acoustic score only –Computed with a backward Viterbi, during the phoneme lattice generation Heuristic admissibility –Underestimate remaining cost : no LM information –Cannot be true cost (lack of LM information)

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Lexicon Prefix-tree organization –Widely applied –Compact representation search effort occurs at word begin W 1 : p 1 p 2 p 3 W 2 1 p 3 W 3 2 p 1 Lexicon p 1 p 1 p 2 p 2 p 3 p 3 W 1 W 2 W 3

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Search space Phoneme lattice Concatenation of lexical trees W 1 W 2 W 3 Lexicon: W 2 W 1 W 1 W 2 W 1 W 2 W 3 W 2 W 2 W 3 W 2 W 1 W 1 W 1 Sentence beginning

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop LM look-ahead Word anticipation –n is a lexicon node –w n is any leaf (i.e. word) of the sub-tree starting at n P(n/...w i-2 w i-1 ) = Part_LM(n, w i-2 w i-1 ) Part_LM(n, w i-2 w i-1 ) = max Wn [P(w n /w i-2 w i-1 )] Paths leading to improbable words are early penalized p 1 p 1 p 2 p 2 p 3 p 3 W 1 W 2 W 3

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Start-synchronous tree Asynchronous search –The search processes the same part (lexicon) with a different history. With start-synchronous capabilities –Most advanced path can be reused when encountered twice. For each frame x, the lexicon starting at x is stored. Only the deepest nodes (or leaves) are stored.

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Principle (1/5) p 1 p 1 p 2 W 3 p 1 p 1 p 2 p 2 p 3 p 3 Frame tFrame 0 Deepest lexicon nodes at frame 0 Deepest lexicon nodes at frame t p 1 p 1 p 2 p 2 p 3 p 3 W 1 W 2 W 3 W 1

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Principle (2/5) p 1 p 1 p 2 p 2 p 3 W 3 p 1 p 1 p 2 p 2 p 3 p 3 Frame tFrame 0 W 1

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Principle (3/5) p 1 p 1 p 2 p 2 p 3 W 3 Frame tFrame 0 W 2 p 1 p 1 p 2 p 2 p 3 p 3 W 1

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Principle (4/5) p 1 p 1 p 2 p 2 p 3 W 3 Frame tFrame 0 W 2 p 1 p 1 p 2 p 2 p 3 p 3 W 1

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Principle (5/5) …. p 1 p 1 p 2 p 2 p 3 W 3 Frame tFrame 0 W 2 p 1 p 1 p 2 p 2 p 3 p 3 W 1 p 1 p 2 p 1 p 2 Frame t+n

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Search space pruning Optimization –If two candidates end with the same 3 words, only the best is kept. Cut –Short candidates are dropped when their distance increase too much with the deepest.

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop ASR Output –1 best hypothesis –N best hypothesis –word graph Applications –Transcription –Question answering –Named entities extraction –Information Retrieval –Call-type classification –…

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop French Broadcast News Campain ESTER Acoustic Segmentation Broadcast News (1h long show) Speaker Segmentation Information Extraction Speech transcription Acoustic models Language models

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop System Description Acoustic Models : 10k HMM contextual 3.6k states 230k gaussian Lexicon : 65K Words Language model Combination : (Le Monde 87-02, 0.41) (Le Monde 02-03, 0.24) (ESTER, 0.35)

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0) FLAVOR workshop Results and Demonstration WER 25 % (10 RT) Demonstration on TV