eNTERFACE 08 Project 2 “multimodal high-level data integration” Mid-term presentation August 19th, 2008
Team Olga Vybornova (Université catholique de Louvain, UCL-TELE, Belgium) Hildeberto Mendonça (Université catholique de Louvain, UCL-TELE, Belgium) Ao Shen (University of Birmingham, UK) Daniel Neiberg (TMH/CTT, KTH Royal Institute of Technology, Sweden) David Antonio Gomez Jauregui (TELECOM and Management SudParis, France)
Project objectives to augment and improve the previous work, look for new methods of data fusion to resolve the problem and implement a/the technique distinguishing between the data from different modalities that should be fused and the data that should not be fused but analyzed separately to explore and employ a context-aware cognitive architecture for decision-making purposes. 3
4 A set of variables describing states of the world (user’s input, an object, an event, behavior, etc.) represented in different media and through different information channels. GOAL OF DATA FUSION: The result of the fusion (merging semantic content from multiple streams) should give an efficient joint interpretation of the multimodal behavior of the user(s) – to provide effective and advanced interaction Background - Multimodality
Audio Stream Video Stream Speech Recognizer Video Analyzer Sound Waves Syntactic Analyzer Recognized String Sequence of Images Semantic Analyzer Syntactic Triple Knowledge Base Fusion Mechanism Human Behavior Analyzer Movements Coordinates Movements Meanings Advise People Linguistic meanings
Audio Stream Video Stream Speech Recognizer Video Analyzer Sound Waves Syntactic Analyzer Recognized String Sequence of Images Semantic Analyzer Syntactic Triple Knowledge Base Fusion Mechanism Human Behavior Analyzer Movements Coordinates Movements Meanings Advise People Linguistic meanings
Audio Stream Video Stream Sphinx-4 Open CV Sound Waves C & C Tool Parser Recognized String Sequence of Images C & C Tool Boxer Syntax Analysis Protegè Jena Fusion Mechanism Human Behavior Analyzer Movements Coordinates Movements Meanings Advise People Linguistic meanings Semantic Validation
Integration 8 All tools are integrated through socket communication C++ and Java interoperating normally The interchanging data format is XML Verifiable Easy data identification Easy data compatibility Low cost of manipulation Processing XML on demand Main issues: transparency, extensibility and customization
Speech Recognition 9 Sphinx 4 Integrated in system! Fined tuned for maximum length of n-best lists 2 Language models created Scenario dependent 3-grams, 150 Words 86,9% Accuracy, Speed: 0,94 X real time Wall Street Journal + scenarios 3-grams, 5000 words 68,6% Accuracy, Speed: 3,19 X real time
Speech Identification 10 Standard GMM-based speaker identification system Developed in Matlab To the right are the results from a 2-person development set as a function of Gaussians
Speech Recognition Output 11 yesterday i received an from nick yesterday i received an from nick yesterday i received an from nick to yesterday i received an from nick for
Syntax and Semantics 12
Syntax and Semantics 13
Syntax and Semantics 14
Image Processing 15 OpenCV Library (Open Source) Motion History to calculate the motion direction Matching template to identify objects in the scene Gaussian probability distribution to model the color of clothes Background subtraction technique to detect the foreground Blob identification to track people in the scene
Image Processing 16
Image Processing Output 17
Ontology 18 Restricted-domain ontology – structure and its instantiation Pattern situations (semantic frames) User profile - a priori collected information about users - preferences, social relationships information, etc. - and dynamically obtained data Using Protegè to create and edit Using Jena to manage the ontology data
Ontology 19
Project schedule 20 Overall progress: 65 % WP1: Workshop preparation – Done WP2: Integration of multimodal components – Done WP3: Multimodal fusion implementation – Running WP4: Scenario implementation and reporting – To do Strategic changes to achieve the goal: Everybody focusing on the fusion mechanism Less priority on the improvement of modalities Each risky task has a plan B associated with less time consuming, but less robust too.
Next Steps 21 Intergration of WordNet into the ontology Rules to process human behavior Mapping the semantic analysis with the ontology Fusion mechanism