Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU.

Similar presentations


Presentation on theme: "Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU."— Presentation transcript:

1 Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU

2 Overview  Objectives  Scenario  Equipment specifications  Subjects & Procedure  Visual aspects  Acoustic aspects  Future processing  Please try this at home…

3 Objectives  Collection of emotional multimodal data  Process different modalities  Holy Grail: “EMOTION RECOGNITION”

4 Scenario  Inspired by GEMEP corpus  Pseudo-language sentence (“Toko”, damato ma gali sa)  Standing body posture  10 subjects  8 emotions uniformly distributed through the quadrants (2D emotion theory, valence-arousal)  3 repetitions of emotion specific gesture  3 repetitions of emotion independent gesture

5 Emotion specific gestures despairleave me alone hot angerviolent descend of hands irritationsmooth go away sadnesssmooth falling hands interestraise hands pleasureopen hands joyitalianate/explain prideclose hands

6 Equipment specifications  2 DV cameras Full body Face  Wireless microphone (shirt-mounted)  PC + External sound card  Uniform dark background  2 artificial light sources  Light coloured, long sleeves shirt ;)

7 Subjects & Procedure  Subjects 10 “actors” 6 males 4 females  despair, hot anger, irritation sadness, interest, pleasure, joy, pride Procedure Subject instructions Clap before every execution: synchronize streams

8 Video quality issues  Highest possible resolution  Progressive video (not interlaced)  Correct exposure  Good color quality  No compression artifacts  Uniform lighting

9 Interlacing / Over-exposure  Interlacing / De- Interlacing  Over-exposure 70% zebra pattern Prefer lower-exposure so signal will not be clipped

10 Colour/Lighting  Medium Y/C Resolution  Compression Artifacts  Exposure  Good Video quality  Source: DV

11 Archiving PAL: 720x576 @ 25 frames/second  DV Format: ~36Mbit/sec ~16 GBytes/hour  MPEG2 @ 4-8Mbit/sec (DVD quality) ~1.8-3.5 GB/hour  MPEG-1 @ 1.1 Mbit/sec ~500MBytes/hour

12 Visual Aspects Summary  Video Camera DV or Better Progressive Scan Capability Over-Exposure Indication, Zebra Patterns  Shooting Use the zebra patterns at 70% Zoom in as much as possible to increase subject’s resolution Facial features must be visible for facial analysis Try to avoid occlusions (hair, glasses, clothes, hand movement) Uniform lighting conditions  Archive DV tapes, DV Video or Frames, (not MPEG-1)

13 Acoustic aspects  Why: “Toko, damato ma gali sa”? Toko: solicitation by naming the interlocutor Vowels found in majority of language Meaning: Toko, can you open it? (request) for maintaining semantic aspect  Sampling frequency 44.1 kHz  16 bits mono information depth  Uncompressed.wav files

14 Future processing  Process different modalities Facial feature extraction Gesture expressiveness analysis Acoustic analysis  Gesture recognition  Synchronization  Modalities fusion RNN RSOM + Markov SVM …  Emotion recognition


Download ppt "Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU."

Similar presentations


Ads by Google