Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU.

Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU

Overview  Objectives  Scenario  Equipment specifications  Subjects & Procedure  Visual aspects  Acoustic aspects  Future processing  Please try this at home…

Objectives  Collection of emotional multimodal data  Process different modalities  Holy Grail: “EMOTION RECOGNITION”

Scenario  Inspired by GEMEP corpus  Pseudo-language sentence (“Toko”, damato ma gali sa)  Standing body posture  10 subjects  8 emotions uniformly distributed through the quadrants (2D emotion theory, valence-arousal)  3 repetitions of emotion specific gesture  3 repetitions of emotion independent gesture

Emotion specific gestures despairleave me alone hot angerviolent descend of hands irritationsmooth go away sadnesssmooth falling hands interestraise hands pleasureopen hands joyitalianate/explain prideclose hands

Equipment specifications  2 DV cameras Full body Face  Wireless microphone (shirt-mounted)  PC + External sound card  Uniform dark background  2 artificial light sources  Light coloured, long sleeves shirt ;)

Subjects & Procedure  Subjects 10 “actors” 6 males 4 females  despair, hot anger, irritation sadness, interest, pleasure, joy, pride Procedure Subject instructions Clap before every execution: synchronize streams

Video quality issues  Highest possible resolution  Progressive video (not interlaced)  Correct exposure  Good color quality  No compression artifacts  Uniform lighting

Interlacing / Over-exposure  Interlacing / De- Interlacing  Over-exposure 70% zebra pattern Prefer lower-exposure so signal will not be clipped

Colour/Lighting  Medium Y/C Resolution  Compression Artifacts  Exposure  Good Video quality  Source: DV

Archiving PAL: 720x576 @ 25 frames/second  DV Format: ~36Mbit/sec ~16 GBytes/hour  MPEG2 @ 4-8Mbit/sec (DVD quality) ~1.8-3.5 GB/hour  MPEG-1 @ 1.1 Mbit/sec ~500MBytes/hour

Visual Aspects Summary  Video Camera DV or Better Progressive Scan Capability Over-Exposure Indication, Zebra Patterns  Shooting Use the zebra patterns at 70% Zoom in as much as possible to increase subject’s resolution Facial features must be visible for facial analysis Try to avoid occlusions (hair, glasses, clothes, hand movement) Uniform lighting conditions  Archive DV tapes, DV Video or Frames, (not MPEG-1)

Acoustic aspects  Why: “Toko, damato ma gali sa”? Toko: solicitation by naming the interlocutor Vowels found in majority of language Meaning: Toko, can you open it? (request) for maintaining semantic aspect  Sampling frequency 44.1 kHz  16 bits mono information depth  Uncompressed.wav files

Future processing  Process different modalities Facial feature extraction Gesture expressiveness analysis Acoustic analysis  Gesture recognition  Synchronization  Modalities fusion RNN RSOM + Markov SVM …  Emotion recognition

Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU.

Similar presentations

Presentation on theme: "Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU.

Similar presentations

Presentation on theme: "Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU."— Presentation transcript:

Similar presentations

About project

Feedback