Progress Report - V Ravi Chander

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Human Pose detection Abhinav Golas S. Arun Nair. Overview Problem Previous solutions Solution, details.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Speaker Clustering using MDL Principles Kofi Boakye Stat212A Project December 3, 2003.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
Advances in WP2 Chania Meeting – May
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Optimal Adaptation for Statistical Classifiers Xiao Li.
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
FYP0202 Advanced Audio Information Retrieval System By Alex Fok, Shirley Ng.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab The University of Texas at Dallas.
Radial Basis Function (RBF) Networks
9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Macquarie RT05s Speaker Diarisation System Steve Cassidy Centre for Language Technology Macquarie University Sydney.
Study of Word-Level Accent Classification and Gender Factors
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Machine Learning.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
An Evaluation of Many-to-One Voice Conversion Algorithms with Pre-Stored Speaker Data Sets Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari.
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Data Sampling & Progressive Training T. Shinozaki & M. Ostendorf University of Washington In collaboration with L. Atlas.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Performance Comparison of Speaker and Emotion Recognition
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Predicting Voice Elicited Emotions
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Does music affect reaction speed?
Data-intensive Computing Algorithms: Classification
University of Rochester
Detecting Semantic Concepts In Consumer Videos Using Audio Junwei Liang, Qin Jin, Xixi He, Gang Yang, Jieping Xu, Xirong Li Multimedia Computing Lab,
ARTIFICIAL NEURAL NETWORKS
Statistical Models for Automatic Speech Recognition
PROJECT PROPOSAL Shamalee Deshpande.
Data Collection Learning Objectives
CSSE463: Image Recognition Day 23
Statistical Models for Automatic Speech Recognition
Sfax University, Tunisia
Automatic Speech Recognition: Conditional Random Fields for ASR
Decision Making Based on Cohort Scores for
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
CSSE463: Image Recognition Day 23
Unsupervised Learning: Clustering
Speaker Identification:
CSSE463: Image Recognition Day 23
3. Adversarial Teacher-Student Learning (AT/S)
Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Progress Report - V Ravi Chander ASR for the Elderly Progress Report - V Ravi Chander

Organisation of Talk Goal and Research Questions Experimental set-up Results Next Steps

Goal Project : MATCH (Mobilizing Advanced Technologies for Care at Home) Role : ASR for spoken dialogue system in home care environment for elderly people.

-different demographics Research Questions (1) Adaptation Reduce the amount of annotation required. Existing Recogniser -different domain -different demographics MATCH

Research Questions (2) Use corpus from a different domain to improve performance in the target domain.

Approach Only a small corpus of MATCH exists. Validate the ideas using AMI corpus.

Experimental Set-up AMI 1 AMI 2 AMI 3 AMI 4 CTS-NIST BASELINE Adapted ASR Test AMI 5

Sampling Techniques Random Supervised (likelihood based) Speaker level Utterance level Unsupervised GMM based closest speakers Utterances selected based on potential informativeness

GMM based closest Speakers No of speakers: 136 GMM for each speaker Feature vectors used : 13 MFCC coefficients Only speech frames used for modelling No of Gaussian components : 32

Distance measure KL divergence between Gaussians Distance between GMMs f g

Clustering (using CluTo)

Clustering Results Good pattern seen in age: If seen at the top two clusters: Females: 77.3% Males: 62.5% If seen at the level of 4 clusters (regrouping according to known information): Females: 92.45% Males: 98.79% No pattern seen with respect to age.

Next steps (1) Create one GMM from half of AMI 5 test speakers data (AMI 5). Sample the closest speakers from AMI 3 and AMI 4 to the above GMM, to adapt the baseline recognizer. Test against the other half. Repeat the experiment interchanging the two halves.

Next Steps (2) Utterance sampling based on the potential informativeness of the sample. Data Buffer Baseline ASR Adaptation corpus ranking sampling labelling

Thank you!! Questions / Suggestions