Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Audio-based Emotion Recognition for Advanced Information Retrieval in Judicial Domain ICT4JUSTICE 2008 – Thessaloniki,October 24 G. Arosio, E. Fersini,
An Overview of Machine Learning
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
Emotions and Voice Quality: Experiments with Sinusoidal Modeling Authors: Carlo Drioli, Graziano Tisato, Piero Cosi, Fabio Tesser Institute of Cognitive.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
Speech Recognition. What makes speech recognition hard?
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Speaker Adaptation for Vowel Classification
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Optimal Adaptation for Statistical Classifiers Xiao Li.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Machine Learning Queens College Lecture 13: SVM Again.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
This week: overview on pattern recognition (related to machine learning)
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
7-Speech Recognition Speech Recognition Concepts
Multimodal Integration for Meeting Group Action Segmentation and Recognition M. Al-Hames, A. Dielmann, D. Gatica-Perez, S. Reiter, S. Renals, G. Rigoll,
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Multimodal Information Analysis for Emotion Recognition
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
AGA 4/28/ NIST LID Evaluation On Use of Temporal Dynamics of Speech for Language Identification Andre Adami Pavel Matejka Petr Schwarz Hynek Hermansky.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
National Taiwan University, Taiwan
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Knowledge Acquistion for Speech and Language from a Large Number of Sources JHU Workshop04 James Baker, CMU July 23, 2004.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Performance Comparison of Speaker and Emotion Recognition
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
Automatic Speech Recognition
Intelligent Information System Lab
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
EE513 Audio Signals and Systems
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Prepared by: Mahmoud Rafeek Al-Farra
LECTURE 15: REESTIMATION, EM AND MIXTURES
Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech Emotion Recognition Combining Acoustic and Semantic Analyses Institute for Human-Machine Communication Technische Universität München

Slide -2-   System Overview   Emotional Speech Corpus   Acoustic Analysis   Semantic Analysis   Stream Fusion   Results Outline Outline

Slide -3- System Overview Speech signal Prosodic features ASR-unit Semantic interpretation (Bayesian Networks) Classifier(SVM) Stream fusion (MLP) Emotion

Slide -4-   Emotion set: Anger, disgust, fear, joy, neutrality, sadness, surprise   Corpus 1: Practical course   404 acted samples per emotion    13 speakers (1 female)   Recorded within one year   Corpus 2: Driving simulator   5 00 spontaneous emotion samples   200 acted samples (disgust, sadness) Emotional Speech Corpus

Slide -5- System Overview Speech signal Prosodic features ASR-unit Semantic interpretation (Bayesian Networks) Classifier(SVM) Stream fusion (MLP) Emotion

Slide -6- Acoustic Analysis   Low-level features   Pitch contour (AMDF, low-pass filtering)   Energy contour   Spectrum   Signal   High-level features   Statistic analysis of contours   Elimination of mean, normalization to standard dev.   Duration of one utterance (1-5 seconds)

Slide -7- Acoustic Analysis   Feature selection (1/2)   Initial set of 200 statistical features   Ranking 1: Single performance of each feature (nearest-mean classifier)   Ranking 2: Sequential Forward Floating Search wrapping by nearest-mean classifier

Slide -8- Acoustic Analysis   Feature selection (2/2)   Top 10 features Acoustic FeatureSFFS-RankSingle Perf. Pitch, maximum gradient131.5 Pitch, standard deviation of distance between reversal points Pitch, mean value325.6 Signal, number of zero-crossings416.9 Pitch, standard deviation527.6 Duration of silences, mean value617.5 Duration of voiced sounds, mean value718.5 Energy, median of fall-time817.8 Energy, mean distance between reversal points Energy, mean of rise-time1017.6

Slide -9- Acoustic Analysis   Classification   Evaluation of various classification methods 33 features Classifier Error, % Speaker indep.Speaker dep. kMeans kNN GMM MLP SVM ML-SVM Output: Vector of (pseudo-) recognition confidences

Slide -10- Acoustic Analysis   Classification   Multi-Layer Support Vector Machines acoustic feature vector ang, ntl, fea, joy / dis, sur, sad ang, ntl / fea, joy dis, sur / sad ang / ntl fea / joy dis / sur angntlfeajoy sad dissur  No confidence vector to forward to fusion

Slide -11- System Overview Speech signal Prosodic features ASR-unit Semantic interpretation (Bayesian Networks) Classifier(SVM) Stream fusion (MLP) Emotion

Slide -12- Semantic Analysis   ASR-Unit   HMM-based   1300 words german vocabulary   No language model   5-best phrase hypotheses   Recognition confidences per word   Example output (first hypothesis): Ican‘tstandthiseverytraytraffic-jam

Slide -13- Semantic Analysis   Conditions   Natural language   Erroneous speech recognition   Uncertain knowledge   Incomplete knowledge   Superfluous knowledge  Probabilistic spotting approach  Bayesian Belief Networks

Slide -14- Semantic Analysis Bayesian Belief Networks   Acyclic graph of nodes and directed edges   One state variable per node (here states, )   Setting node-dependencies via cond. probability matrices   Setting initial probabilities in root nodes   Observation A causes evidence in a child node (i.e. is known)   Inference to direct parent nodes and finally to root nodes Bayes‘ rule :

Slide -15- Semantic Analysis   Emotion modelling... I I_hateBadAdhorrence first_person Joy Negative Positive Disgust Inputlevel Words Superwords Phrases Super- phrases Disgust I can‘t stand this nasty every tray traffic-jam can‘tstandnasty cannotstandbaddisgusting Interpretation Good Anger Clustering Sequence Handling Clustering Clustering Spotting I_like... Output: Vector of “real“ recognition confidences

Slide -16- System Overview F&F of HMC Overview Speech signal Prosodic features ASR-unit Semantic interpretation (Bayesian Networks) Classifier(SVM) Stream fusion (MLP) Emotion

Slide -17- Stream Fusion   Pairwise mean   Discriminative fusion applying MLP   Input layer: 2 x 7 confidences   Hidden layer: 100 nodes   Output layer: 7 recognition confidences

Slide -18- Results Results Emotion angdisfeajoyntlsadsurMean % Acoustic recognition rates (SVM): Semantic recognition rates: Emotion angdisfeajoyntlsadsurMean %

Slide -19- Results Results Emotion angdisfeajoyntlsadsurMean % Recognition rates after discriminative fusion: Acoustic Information Language Information Fusion by means Fusion by MLP % Overview:

Slide -20- Summary Summary   Acted Emotions   7 discrete emotion categories   Prosodic feature selection via   Singe feature performance   Sequential forward floating search   Evaluative comparision of different classifiers   Outperforming SVMs   Semantic analysis applying Bayesian Networks   Significant gain by discriminative stream fusion

Slide -21-