Dijana Petrovska-Delacrétaz 1 Asmaa el Hannani 1 Gérard Chollet 2 1: DIVA Group, University of Fribourg 2: GET-ENST, CNRS-LTCI,

Slides:



Advertisements
Similar presentations
PHONE MODELING AND COMBINING DISCRIMINATIVE TRAINING FOR MANDARIN-ENGLISH BILINGUAL SPEECH RECOGNITION Yanmin Qian, Jia Liu ICASSP2010 Pei-Ning Chen CSIE.
Advertisements

Zhijie Yan, Qiang Huo and Jian Xu Microsoft Research Asia
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Speaker Recognition G. CHOLLET, G. GRAVIER,
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
Speech Recognition. What makes speech recognition hard?
Biometrics and Security Tutorial 8. 1 (a) Understand the enrollment and verification of hand geometry? (P9: 8)
Speaker Detection Without Models Dan Gillick July 27, 2004.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
8/12/2003 Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye International Computer Science Institute.
9/20/2004Speech Group Lunch Talk Speaker ID Smorgasbord or How I spent My Summer at ICSI Kofi A. Boakye International Computer Science Institute.
1 Cours parole du 9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet Reconnaissance du locuteur 1.Introduction, Historique, Domaines.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
A PRESENTATION BY SHAMALEE DESHPANDE
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Automatic Continuous Speech Recognition Database speech text Scoring.
Introduction to Automatic Speech Recognition
PROSODY MODELING AND EIGEN- PROSODY ANALYSIS FOR ROBUST SPEAKER RECOGNITION Zi-He Chen, Yuan-Fu Liao, and Yau-Tarng Juang ICASSP 2005 Presenter: Fang-Hui.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
Notes on ICASSP 2004 Arthur Chan May 24, This Presentation (5 pages)  Brief note of ICASSP 2004  NIST RT 04 Evaluation results  Other interesting.
A Talking Elevator, WS2006 UdS, Speaker Recognition 1.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
7-Speech Recognition Speech Recognition Concepts
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On-line Learning of Sequence Data Based on Self-Organizing.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
1 Phoneme and Sub-phoneme T- Normalization for Text-Dependent Speaker Recognition Doroteo T. Toledano 1, Cristina Esteve-Elizalde 1, Joaquin Gonzalez-Rodriguez.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
AGA 4/28/ NIST LID Evaluation On Use of Temporal Dynamics of Speech for Language Identification Andre Adami Pavel Matejka Petr Schwarz Hynek Hermansky.
Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Nick Wang, 25 Oct Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000 Presented.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Performance Comparison of Speaker and Emotion Recognition
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.
Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Handwritten Characters Recognition Based on an HMM Model
Cheng-Kuan Wei1 , Cheng-Tao Chung1 , Hung-Yi Lee2 and Lin-Shan Lee2
Speaker Identification:
Presentation transcript:

Dijana Petrovska-Delacrétaz 1 Asmaa el Hannani 1 Gérard Chollet 2 1: DIVA Group, University of Fribourg 2: GET-ENST, CNRS-LTCI, Paris 3-4 December 2003, Biometrics Tutorials, Uni. Fribourg ALISP based improvement of GMM ’ s for Text-independent Speaker Verification

Biometrics, 3-4 Dec. 2003, Fribourg 2 Overview 1. Why segmental speaker verification systems ? 2. Speech segmentation problems 3. Proposed segmental system based on DTW distance measure 4. Experimental setup 5. Results 6. Conclusions and perspectives

Biometrics, 3-4 Dec. 2003, Fribourg 3 1 Why segmental speaker verification systems ? Current reference speaker verification systems are based on Gaussian Mixture Models (each speech frame is treated independently) Speech is composed of different sounds Phonemes have different discriminant characteristics for speaker verification (see Eatock, al. ‘94, J.Olsen ‘97, Petrovska al.’98, 2000…) nasals and vowels convey more speaker characteristics than other speech classes we would like to exploit this fact We need a automatic speech segmentation tool !

Biometrics, 3-4 Dec. 2003, Fribourg Advantages and disadvantages of the speech segmentation Problems: Need of a speech segmentation tool Speaker modeling per speech classes => more data needed More complicated systems Advantages Possibility to use it in combination with a dialogue based systems, for which a speech segmentation is already done Possibility to use it in combination with a dialogue based systems, for which a speech segmentation is already done Possibility to introduce text-prompted speaker verification, designed to include a maximum number of speaker specific units

Biometrics, 3-4 Dec. 2003, Fribourg 5 2 Speech Segmentation Large Vocabulary Continuous Speech Recognition (LVCSR) System good results for a small set of languages need huge amount of annotated speech data language (and task) dependent we do not have such a for American English

Biometrics, 3-4 Dec. 2003, Fribourg ALISP Speech Segmentation Data-driven speech segmentation not yet usable for speech recognition purposes no annotated databases needed language and task independent we could use it to segment the speech data for a text-independent speaker verification task We will use the data driven speech segmentation method ALISP (Automatic Language Independent Speech Processing)

Biometrics, 3-4 Dec. 2003, Fribourg ALISP principles

Biometrics, 3-4 Dec. 2003, Fribourg 8 3 Proposed speaker verification system: ALISP segments and DTW 3.1 Segmentation problem Segmentation of the speech data with N ALISP HMM models N= 64 speech classes Need of (not transcribed) speech data, to train the 64 ALISP HMM models With so much speech classes we should change the speaker modeling method, not enough data for GMM adaptation===> Use of Dynamic Time Warping (DTW)

Biometrics, 3-4 Dec. 2003, Fribourg DTW distance measure for speaker verification Dynamic Time Warping (DTW) was already used for speaker verification, in a text-dependent mode (Rosenberg `76, Rabiner Schafer ’76, Furui ’81, Pandit and Kittler ’98…) The DTW distance measure between two speech segments conveys speaker specific characteristics Originality: used DTW in text-independent mode We first proceed to the segmentation of speech data in ALISP classes Measure the “distance “ between speaker and non-speaker segments Speaker specific information is extracted from the : ALISP based speech segments = > Client Dictionary Non-speaker (world speakers) : ALISP based speech segments => World Dictionary

Biometrics, 3-4 Dec. 2003, Fribourg Searching in the client and world speech dictionaries for speaker verification purposes

Biometrics, 3-4 Dec. 2003, Fribourg 11 4 Evaluation of the proposed system: experimental setup Development data: one subset from NIST 2002 cellular data (American English) world speakers (60 female + 59 male): used to train the ALISP speech segmenter and to model the non-speakers (world speakers) Evaluated on another subset from NIST 2002 ( male speakers)

Biometrics, 3-4 Dec. 2003, Fribourg Speech segmentation example 2 another occurrences of the English phone : ay ; the corresponding ALISP sequences: HX - Hf and (HM) - Hf - Ha- previous slide : (Hf )-Ha or (HM) - HZ -Ha previous slide : (Hf )-Ha or (HM) - HZ -Ha

Biometrics, 3-4 Dec. 2003, Fribourg Results: GMM, ALISP-DTW systems and their fusion

Biometrics, 3-4 Dec. 2003, Fribourg Results: EER comparison System EER % ALISP-DTWGMM Linear fusion (no score normalization) LR fusion (no score normalization ) LR fusion (normalized scores) Linear fusion (normalized scores)

Biometrics, 3-4 Dec. 2003, Fribourg Importance of fusion (33% improvement)

Biometrics, 3-4 Dec. 2003, Fribourg Using only GMM’s scores to segments=> segmental Gmm system

Biometrics, 3-4 Dec. 2003, Fribourg Conclusions State of the art NIST 2002 results for EER: (best 8% to worst 28%) Fusion of classical system with a segmental systems : big improvements Why: higher level informations present in the segmental system complement usefully the short therm frequency informations present in the GMM system