Linguistic knowledge for Speech recognition

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

1 Speech Sounds Introduction to Linguistics for Computational Linguists.
PHONE MODELING AND COMBINING DISCRIMINATIVE TRAINING FOR MANDARIN-ENGLISH BILINGUAL SPEECH RECOGNITION Yanmin Qian, Jia Liu ICASSP2010 Pei-Ning Chen CSIE.
Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Using Emotion Recognition and Dialog Analysis to Detect Trouble in Communication in Spoken Dialog Systems Nathan Imse Kelly Peterson.
Detecting missrecognitions Predicting with prosody.
1 Phonetics Study of the sounds of Speech Articulatory Acoustic Experimental.
Sound and Speech. The vocal tract Figures from Graddol et al.
Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source.
A PRESENTATION BY SHAMALEE DESHPANDE
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Natural Language Understanding
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Automated Scoring of Picture- based Story Narration Swapna Somasundaran Chong Min Lee Martin Chodorow Xinhao Wang.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
I. INTRODUCTION.
Introduction to Dialogue Systems. User Input System Output ?
Towards Common Standards for Studies of Software Engineering Tools and Tool Features Timothy C. Lethbridge University of Ottawa.
Introduction to Neural Networks and Example Applications in HCI Nick Gentile.
IPSOM Indexing, Integration and Sound Retrieval in Multimedia Documents.
Institute of Information Science, Academia Sinica 12 July, IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
A Primer on Reading Terminology. AUTOMATICITY Readers construct meaning through recognition of words and passages (strings of words). Proficient readers.
公司 標誌 Question Answering System Introduction to Q-A System 資訊四 B 張弘霖 資訊四 B 王惟正.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
Course Projects Speech Processing
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
Survey on state-of-the-art approaches: Neural Network Trends in Speech Recognition Survey on state-of-the-art approaches: Neural Network Trends in Speech.
Automatic Speech Recognition
Michael C. W. Yip The Education University of Hong Kong
Copyright © American Speech-Language-Hearing Association
Towards Emotion Prediction in Spoken Tutoring Dialogues
3.0 Map of Subject Areas.
Why Study Spoken Language?
Error Analysis  What is EA? EA is a technique which aims to describe and explain the systematic nature of deviations or errors generated in the learner’s.
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Information Structure and Prosody
Why Study Spoken Language?
Turn-taking and Disfluencies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Automatic Speaker Identification Using Sentinel Word Discrimination
Automatic Detection of Causal Relations for Question Answering
CSE 635 Multimedia Information Retrieval
University of Illinois System in HOO Text Correction Shared Task
Natural Language Processing (NLP) Systems Joseph E. Gonzalez
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Linguistic knowledge for Speech recognition By : Ahmed Aly 06/05/2013

Project description The main goal of this project is to study the effect of using linguistics knowledge on the task of speech recognition. I am studying the usage of such knowledge in the following contexts : Using higher level linguistics knowledge for speech recognition error correction Using prosody models for conversational speech recognition The effect of using syntactic and semantic information on the performance of speech recognition error detection The effect of using prosody features in spoken speech subjectivity analysis

Work done so far … Studying the theoretical part of the automatic speech recognizer module Literature review of the usage of linguistics knowledge in the speech recognition task Literature review of the usage of prosody in spoken speech subjectivity analysis

Main findings so far … In the context of error correction It has been shown that the usage of syllable-based models has shown s a superior performance in domain- specific IR applications in comparison to word-based models In this case a relatively smaller training set is needed. And it could handle intra-word transformation and syllable-to-syllable transformation The usage of semantic knowledge has proved also to be successful in the task of error correction. An example of this semantic knowledge is lexico-semantic pattern (LSP) which is a structure where linguistic entries and semantic types are used in combination to abstract certain sequences of the words in a text

Main findings so far … (continued) In the context of prosody experiments in acoustic model clustering show that representing syllable position and stress as conditioning factors lead to gains in recognition performance and reduced system complexity Other experiments with pronunciation modeling show gains in surface-form phone prediction due to directly conditioning on acoustic-prosodic features For subjectivity analysis, It has been shown that the usage of prosody features is not very usefull as Ngram characters and words features outperformed prosody features

Q&A