S1S1 S2S2 S3S3 8 October 2002 DARTS 2002 1 ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,

Slides:



Advertisements
Similar presentations
WP4: Normalization of Transcriptions. From Transcriptions to Subtitles Erik Tjong Kim Sang University of Antwerp.
Advertisements

CNTS LTG (UA) (i) Phoneme-to-Grapheme (ii) Transcription-to-Subtitles Bart Decadt Erik Tjong Kim Sang Walter Daelemans.
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
S1S1 S2S2 S3S3 ATraNoS Workshop 12 April 2002 Patrick Wambacq.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
CS 4705 Automatic Speech Recognition Opportunity to participate in a new user study for Newsblaster and get $25-$30 for hours of time respectively.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
8/12/2003 Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye International Computer Science Institute.
A new framework for Language Model Training David Huggins-Daines January 19, 2006.
1 Security problems of your keyboard –Authentication based on key strokes –Compromising emanations consist of electrical, mechanical, or acoustical –Supply.
June 14th, 2005Speech Group Lunch Talk Kofi A. Boakye International Computer Science Institute Mixed Signals: Speech Activity Detection and Crosstalk in.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
FYP0202 Advanced Audio Information Retrieval System By Alex Fok, Shirley Ng.
2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.
Why is ASR Hard? Natural speech is continuous
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Final Presentation. Lale AkarunOya Aran Alexey Karpov Milos Zeleny Hasim Sak Erinc Dikici Alp Kindiroglu Marek Hruz Pavel Campr Daniel Schorno Alexander.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
Automated Compounding as a means for Maximizing Lexical Coverage Vincent Vandeghinste Centrum voor Computerlinguïstiek K.U. Leuven.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Application of Audio and Video Processing Methods for Language.
Speech and Language Processing
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent CAIR Twente (10/10/2003) Audio Indexing as a first step in an Audio Information Retrieval System Jean-Pierre.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Automatic Speech Recognition
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Speech Recognition UNIT -5.
Automatic Speech Recognition
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Automatic Speech Recognition: Conditional Random Fields for ASR
Research on the Modeling of Chinese Continuous Speech Recognition
Automatic Speech Recognition
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Automatic Speech Recognition
1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe.
Presentation transcript:

S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere, Jean-Pierre Martens, Vincent Vandeghinste, Frank Van Eynde, Erik Tjong Kim Sang, Walter Daelemans

S1S1 S2S2 S3S3 8 October 20022DARTS 2002 Outline l Project overview l Tasks + results l Conclusions

S1S1 S2S2 S3S3 8 October 20023DARTS 2002 ATraNoS l Automatic Transcription and Normalization of Speech l IWT-STWW TOP project, 2x2years, €1.25M l Started 1 October 2000 l Partners: ESAT/KULeuven, ELIS/UGent, CCL/KULeuven, CNTS/UIA

S1S1 S2S2 S3S3 8 October 20024DARTS 2002 Project aims l Automatic transcription of spontaneous speech l Conversion of transcriptions according to application, e.g. subtitling (test vehicle in this project)

S1S1 S2S2 S3S3 8 October 20025DARTS 2002 Work packages l WP1: segmentation of audio stream in homogeneous segments (ELIS): –preprocessor for speech decoder –segments containing single type of signal (wideband speech, telephone speech, background, etc.) –label segments, cluster speakers –induce only small delay

S1S1 S2S2 S3S3 8 October 20026DARTS 2002 WP1 Results Speech/non-speech segmentation using GMM’s (Gaussian Mixture Models) l 65% of the non-speech removed while preserving more than 98% of the speech. l mean duration of the speech segments is 40 seconds (already easy to handle) l performance in accordance with literature

S1S1 S2S2 S3S3 8 October 20027DARTS 2002 WP1 Results Segmentation of speech segments using BIC (Bayesian Information Criterion) l Recall = 65%: detection of 72.5% of the speaker changes, 24.3% of the acoustic condition changes, 19.0% false alarms l Recall = 72%: detection of 78.5% of the speaker changes, 37.4% of the acoustic condition changes, 41.3% false alarms l Results competitive with literature l Very fast algorithm (1 minute per hour)

S1S1 S2S2 S3S3 8 October 20028DARTS 2002 Work packages (cont’d) l WP2: detection and handling of OOV words: –extension of the lexicon (CCL): compounding module  reduce OOV rate –augment recognition results with confidence measures (ESAT): OOV detection –phoneme-to-grapheme conversion (CNTS): transcribe OOV words

S1S1 S2S2 S3S3 8 October 20029DARTS 2002 Architecture Speech Recognizer input: speech output: text Confidence threshold Suspected OOV Phoneme Recognizer Phoneme string P2G Converter Spelling Spelling correction with large vocabulary Training Data

S1S1 S2S2 S3S3 8 October DARTS 2002 WP2 Results Detection and handling of Out-Of-Vocabulary (OOV) words l Compounding module in combination with ASR: recognition accuracy does not drop because of shorter lexical units; after recomposition: 10 to 20% relative improvement on OOV-rate, compared with baseline

S1S1 S2S2 S3S3 8 October DARTS 2002 WP2 Results Detection and handling of Out-Of-Vocabulary (OOV) words l Confidence measures with ASR: based on combination of measures from literature, plus own work l Phoneme-to-grapheme conversion based on machine learning methods

S1S1 S2S2 S3S3 8 October DARTS 2002 P2G converter results Performance: all wordsOOVs grapheme-level word-level Spelling correction:Net effect: 8.6 (OOVs) (Simulated) interaction with speech recognizer: Increases WER, but improves readability

S1S1 S2S2 S3S3 8 October DARTS 2002 Work packages (cont’d) l WP3: spontaneous speech problems: –detection of disfluencies (ELIS): use acoustic/prosodic features; supply info to HMM recognizer –statistical language model (ESAT): extend traditional trigram LM to incorporate hesitations, filled pauses, self-corrections, repetitions  sequence of clean speech islands.

S1S1 S2S2 S3S3 8 October DARTS 2002 Work packages (cont’d) l WP4: subtitling: –data collection and automatic alignment (CNTS) –input/output specifications (CCL): linguistic characteristics –subtitling: statistical approach (CNTS) –subtitling: linguistic approach (CCL) –hybrid system possible?

S1S1 S2S2 S3S3 8 October DARTS 2002 Data collection and alignment News autocuesSubtitles (semi-)automatic alignment (semi-)automatic data capture Machine Learner Training Data Linguistic Annotation Classifier autocues subtitles

S1S1 S2S2 S3S3 8 October DARTS 2002 Conclusions