“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.

Slides:



Advertisements
Similar presentations
ARTIFICIAL PASSENGER.
Advertisements

A. Hatzis, P.D. Green, S. Howard (1) Optical Logo-Therapy (OLT) : Visual displays in practical auditory phonetics teaching. Introduction What.
Lets Pronounce English
Phonetics as a scientific study of speech
DDDAS: Stochastic Multicue Tracking of Objects with Many Degrees of Freedom PIs: D. Metaxas, A. Elgammal and V. Pavlovic Dept of CS, Rutgers University.
Rhee Dong Gun. Chapter The speaking process The differences between spoken and written language Speaking skills Speaking in the classroom Feedback.
: Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian.
M. Emre Sargın, Ferda Ofli, Yelena Yasinnik, Oya Aran, Alexey Karpov, Stephen Wilson,Engin Erzin, Yücel Yemez, A. Murat Tekalp Combined Gesture- Speech.
PHONETICS AND PHONOLOGY
3D Face Modeling Michaël De Smet.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Department of Electrical and Computer Engineering He Zhou Hui Zheng William Mai Xiang Guo Advisor: Professor Patrick Kelly ASLLENGE.
Language, Culture and Communication: Introduction
Presented By: Karan Parikh Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent.
By
Chapter 6. Speech Disorder- difficulty producing sounds & the disorders of voice quality. As well as fluency (aka stuttering) Language Disorder- difficulty.
12.0 Computer-Assisted Language Learning (CALL) References: 1.“An Overview of Spoken Language Technology for Education”, Speech Communications, 51, pp ,
English Phonetics arifsuryopriyatmojo.com. Questions to consider? what is a language? how many languages are there? why do people need a language? how.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
[1] Processing the Prosody of Oral Presentations Rebecca Hincks KTH, The Royal Institute of Technology Department of Speech, Music and Hearing The Unit.
Helsinki University of Technology Laboratory of Computational Engineering Modeling facial expressions for Finnish talking head Michael Frydrych, LCE,
Abstract Research Questions The present study compared articulatory patterns in production of dental stop [t] with conventional dentures to productions.
Phonological Constraints on the Acquisition of Mid Vowels in English for Students in Taiwan author: 黃俐雯 presented by Lisa Liu 報告人: 劉莉莎.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Technology to support psychosocial self-management Kurt L. Johnson, Ph.D. Henry Kautz, Ph.D.
Using ICT to Support Students who are Deaf. 2 Professional Development and Support: Why? Isolation Unique and common problems Affirmation Pace of change.
As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.
Björkner, Eva Researcher, Doctoral Student Address Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing P.O. Box 3000.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Clinical Applications of Speech Technology Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield
A Multimedia English Learning System Using HMMs to Improve Phonemic Awareness for English Learning Yen-Shou Lai, Hung-Hsu Tsai and Pao-Ta Yu Chun-Yu Chen.
Time state Athanassios Katsamanis, George Papandreou, Petros Maragos School of E.C.E., National Technical University of Athens, Athens 15773, Greece Audiovisual-to-articulatory.
By Luisa and Noriko. Description of class  Language Proficiency level: low intermediate  Class size: 12 students  Age: 15 to 25  Native Language background:
Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ? Experimental.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
Automated Reading Assistance System Using Point-of-Gaze Estimation M.A.Sc. Thesis Presentation Automated Reading Assistance System Using Point-of-Gaze.
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
Roman Kálecký UČO: Segmental features  Sounds  Speach Trainer 3D Suprasegmental features and accents  Speak English  SpeakAP  Accentuate!
4.1.4 The four groups’ average performances of / ʃ /, /t ʃ / and /d ʒ / 3176Hz English native speakers place their tips of tongues in a further back location.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Lecture 1 Phonetics – the study of speech sounds
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
What is Multimedia Anyway? David Millard and Paul Lewis.
An Articulatory Analysis of Phonological Transfer Using Real-Time MRI Joseph Tepperman, Erik Bresch, Yoon-Chul Kim, Sungbok Lee, Louis Goldstein, and Shrikanth.
Preliminary project assignment Smart house Natural User Interface for Business NUIT4B.
Teaching pronunciation
CS 445/656 Computer & New Media
Acoustic to Articoulatory Speech Inversion by Dynamic Time Warping
Types of Communication
Correlational and Regressive Analysis of the Relationship between Tongue and Lips Motion - An EMA and Video Study of Selected Polish Speech Sounds Robert.
Chapter 15 Gestures and Sign Languages
Types of Communication
English Phonetics and Phonology
THE NATURE OF SPEAKING Joko Nurkamto UNS Solo.
Lecture A4 How we produce Speech.
What is blue eyes ? aims on creating computational machines that have perceptual and sensory ability like those of human beings. interactive computer.
¨Educating for a new Citizenship¨
WBLT Information The primary audience for this WBLT
Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,
Audio and Speech Computers & New Media.
Multimodal Caricatural Mirror
PHONETICS AND PHONOLOGY
A Generative Audio-Visual Prosodic Model for Virtual
Presentation transcript:

“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre for Speech Technology

“Articulatory Talking Head” Showcase Project, INRIA, KTH. Aim of the INRIA-KTH collaboration recreate in real-time the articulatory movements of a speaker with an talking head, using the speech signal only. Applications: – Communication help for HOH people – Second language learning – Speech therapy

“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulation Articulation display is important to understand English pronunciation as well as a support for perception. In this demonstration: 1.Voice Activation Detection is achieved (separation of dpeech and and non-speech) 2.English phoneme recognition is performed 3.English articulation is analysed 4.Articulation is displayed

“Articulatory Talking Head” Showcase Project, INRIA, KTH. New VAD is a combination GMM and automaton 64 gaussiennes Type de mfccalgorithme % reconnaissa nce globale % de trame de parole correctement reconnu % de trame reconnu comme parole au lieu de non parole % de trame de non parole correctement reconnu % de trame reconnu comme non parole au lieu de parole energy 79,6298,6424,5550,024,042 esperegmm_direct91,3495,879,51984,297,072 esperemicro_viterbi74,7784,4223,4559,7428,86 esperemicro_viterbi286,1487,0710,1584,6819,19 htkgmm_direct87,9897,9915,3172,414,131 htkmicro_viterbi83,1994,6819,0565,3111,24 htkmicro_viterbi287,3996,8115,3172,746,382

“Articulatory Talking Head” Showcase Project, INRIA, KTH. 3D Reconstruction The reconstruction was made using a semi-polar grid of 20 gridlines A polygon mesh of 420 vertices and about 800 polygons was constructed. One contour per image.

“Articulatory Talking Head” Showcase Project, INRIA, KTH. Models have been adapted to English Movetrack Electromagnetic Articulograph: 6 coils; upper lip, upper & lower incisors, three tongue coils: 8, 20 and 52 mm from the tip. Qualisys optical motion tracking: 4 IR cameras 28 reflectors 3 reference reflectors on headmount CC C CC R Audio & video recorders V VRf

“Articulatory Talking Head” Showcase Project, INRIA, KTH.

Prosody Prosody is important for message understanding. It is present both in speech sound and in facial expressions and gestures. Some prosody information is extracted from the signal as: –Fundamental frequency (F0) –Energy –Speech rate and displayed with the talking head.

“Articulatory Talking Head” Showcase Project, INRIA, KTH. Pitch(F0): Comb filters estimation

“Articulatory Talking Head” Showcase Project, INRIA, KTH. F0 comparison between French and Native Speaker Please note that the F0 and narrow band spectogram scales are different

“Articulatory Talking Head” Showcase Project, INRIA, KTH. Speech rate Speech rate can be computed as the average number of phonemes produced by second. We define it as a ratio between: the average duration of the produced phonems The average duration of the same phonems in the phonem recognizer trainning database.

“Articulatory Talking Head” Showcase Project, INRIA, KTH. Speech rate

“Articulatory Talking Head” Showcase Project, INRIA, KTH.

Usage Scenario 1.The teacher, the learner or the speech therapist speaks 2.The talking head reproduces what has been uttered showing articulators 3.The talking head shows what should have been articulated. This is a first step towards an interactive learning loop.

“Articulatory Talking Head” Showcase Project, INRIA, KTH. A French student pronounces an English sentence…

“Articulatory Talking Head” Showcase Project, INRIA, KTH.

The student and the teacher can have a closer look at the articulation and prosody…

“Articulatory Talking Head” Showcase Project, INRIA, KTH.

The teacher can pronouce the sentence as it should be…

“Articulatory Talking Head” Showcase Project, INRIA, KTH.

The student and the teacher can watch together the correct articulation and prosody…

“Articulatory Talking Head” Showcase Project, INRIA, KTH.

And of course the teacher can give more detailed explanations and advices…

“Articulatory Talking Head” Showcase Project, INRIA, KTH.

Thank you.