Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Slides:



Advertisements
Similar presentations
Trajectory Analysis of Broadcast Soccer Videos Computer Science and Engineering Department Indian Institute of Technology, Kharagpur by Prof. Jayanta Mukherjee.
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
: Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Discriminative and generative methods for bags of features
Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu ( ) Supervisor: Robert Dale.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Computer Science Department A Speech / Music Discriminator using RMS and Zero-crossings Costas Panagiotakis and George Tziritas Department of Computer.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Object Class Recognition using Images of Abstract Regions Yi Li, Jeff A. Bilmes, and Linda G. Shapiro Department of Computer Science and Engineering Department.
Rowing Motion Capture System Simon Fothergill Ph.D. student, Digital Technology Group, Computer Laboratory Jesus College graduate conference May 2009.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Separation of Multispeaker Speech Using Excitation Information B.Yegnanarayana, R.Kumara Swamy and S.R.Mahadeva Prasanna Dept of Computer Science and.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
Project 1 : Eigen-Faces Applied to Speech Style Classification Brad Keserich, Senior, Computer Engineering College of Engineering and Applied Science;
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.
Natural Language Processing
VISUAL MONITORING OF RAILROAD GRADE CROSSING AND RAILROAD TRACKS University of Central Florida.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
MVPA Tutorial Last Update: January 18, 2012 Last Course: Psychology 9223, W2010, University of Western Ontario Last Update:
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
The vowel detection algorithm provides an estimation of the actual number of vowel present in the waveform. It thus provides an estimate of SR(u) : François.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Detection, Classification and Tracking in a Distributed Wireless Sensor Network Presenter: Hui Cao.
NEURAL - FUZZY LOGIC FOR AUTOMATIC OBJECT RECOGNITION.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
A Fully Annotated Corpus of Russian Speech
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 25 Nov 4, 2005 Nanjing University of Science & Technology.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Automated Interpretation of EEGs: Integrating Temporal and Spectral Modeling Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
Speech Enhancement using Excitation Source Information B. Yegnanarayana, S.R. Mahadeva Prasanna & K. Sreenivasa Rao Department of Computer Science & Engineering.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Course Name: Speech Recognition Course Number: Instructor: Hossein Sameti Department of Computer Engineering Room 706 Phone:
Soran University- College of Education English Department Phonology Talib M. Sharif Omer Assistant lecturer
Automatically Labeled Data Generation for Large Scale Event Extraction
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Recurrent Neural Networks for Natural Language Processing
Spoken Digit Recognition
Text-To-Speech System for English
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
College of Engineering
Pick samples from task t
In-class Exercise Guidelines
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Zhengjun Pan and Hamid Bolouri Department of Computer Science
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Indian Institute of Technology Bombay
Speaker Identification:
Presentation transcript:

Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech and Vision Laboratory Department of Computer Science and Engineering Indian Institute of Technology Madras, Chennai – India {svg,chandra,

isbuleTinki mu khyasa mAchAr mu nnAL mu dalameiccarselvijeylali ta IrOjuvAr ta lolu mukhya m sa lu Speech Signal-to-Symbol Transformation Phonetic engine: Capable of speech signal-to-symbol transformation independent of vocabulary and language

Approaches to Speech Signal-to-Symbol Transformation Based on segmentation and labeling –Segmentation of continuous speech signal into regions of subword units –Assignment of labels to the segmented regions using a subword unit classifier Based on spotting subword units in continuous speech –Detection of anchor points in continuous speech –Assignment of labels to the segments around the anchor points using a subword unit classifier

Spotting CV Units in Continuous Speech CV type units have the highest frequency of occurrence in speech in Indian languages Subword units of CCV, CCCV and CVC types also contain CV segments Vowel onset point (VOP) can be used as an anchor point for recognition of CV units Detection of VOPs using distributions of feature vectors of C and V regions Models for classification of CV segments

Significant Events in a CV Unit

VOP Detection using AANN Models AANN models for capturing the distribution of data One AANN for the consonant region of a CV unit Another AANN for the vowel region of a CV unit

System for Detection of VOPs using AANNs

Illustration of Detection of VOPs (a) Waveform, (b) Hypothesised region labels for each frame, (c)Hypothesised VOPs, and (d) Manually marked (actual) VOPs for the Tamil language sentence /kArgil pahudiyilirundu UDuruvalkArarhaL/

Broadcast News Corpus of Indian Languages Description (Number of) Language TamilTeluguHindiMultilingual Bulletins Training bulletins Testing bulletins CV classes considered Training CV segments43,541 41,725 20,2361,05,502 Sentences for testing1,4161, ,094

Performance for Detection of VOPs Matching hypothesis: A hypothesis with a deviation upto 25 msecs from an actual VOP Missing hypothesis: There is no hypothesis with a deviation upto 25 msecs from an actual VOP Spurious hypothesis: –Multiple hypotheses with a deviation upto 25 msecs –A hypothesis with a deviation greater than 25 msecs VOP Hypotheses (in %) Matching Missing Spurious

Classification of CV Segments using SVMs

System for Spotting CV Units The system gives a 5-best performance of about 74.63% for spotting CV units in 300 test sentences containing 3,924 syllable-like units

Illustration of Spotting CV Units VOP locations (Sample numbers) Lattice of 5-best hypothesised CVs Actual syllable ActualHypothesised pA kA vAhashukAr kApAhAnApa gi yE hi yayaigil hApA pa sAsapa hu gumuvupuhu bIviTiNi dI di yi lAlizi tIyi li nirujalaili VOP Missedrun du RujadE rAdu mumu kU vapO vAU Du dadAnAtuDu VOP Missedru va dakai hivAval kA kagachazAkA VOP Missedrar ha kAkaga sahaL

Summary and Conclusions Spotting multilingual CV units in continuous speech AANN models for detecting VOPs SVM classifier for recognition of CV units around the VOPs Need to reduce # missing VOPs Further processing of hypothesised CV lattice

Thank You