NM – LREC 2008 /1 N. Moreau 1, D. Mostefa 1, R. Stiefelhagen 2, S. Burger 3, K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU s:

Slides:

Advertisements

Similar presentations

ARTIFICIAL PASSENGER.

Advertisements

Enhancing Learning Experiences through Context-Aware Collaborative Services: Software Architecture and Prototype System Nikolaos Dimakis, Lazaros Polymenakos.

Sinew Technology Co., Ltd.. DTS II- Digital Language Training System with embedded system and 32 bit DSP processor makes language learning more efficient.

Descriptive schemes for facial expression introduction.

Blue Eye T E C H N O L G Y.

Sinew Technology Co., Ltd. DTS II Digital Language Training System.

Introduction to Video Game Design BBrewer Fall 2013.

Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab.

Irek Defée Signal Processing for Multimodal Web Irek Defée Department of Signal Processing Tampere University of Technology W3C Web Technology Day.

Team Pakistan: Ahmad Humayun Ozair Muazzam Tayyab Javed Yahya Cheema.

Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.

Internet Vision - Lecture 3 Tamara Berg Sept 10. New Lecture Time Mondays 10:00am-12:30pm in 2311 Monday (9/15) we will have a general Computer Vision.

Supporting Collaboration: Digital Desktops to Intelligent Rooms Mary Lou Maher Design Computing and Cognition Group Faculty of Architecture University.

CSCW – Evaluation P. Dillenbourg & N. Nova Evaluation & Exam.

MUSCLE movie data base is a multimodal movie corpus collected to develop content- based multimedia processing like: - speaker clustering - speaker turn.

Introduction. Diane Brauner, a local Orientation and Mobility specialist, desires a street crossing simulator for kids who are visually impaired. She.

Facial Tracking and Animation Project Proposal Computer System Design Spring 2004 Todd BeloteDavid Brown Brad BusseBryan Harris.

Introduce about sensor using in Robot NAO Department: FTI-FHO-FPT Presenter: Vu Hoang Dung.

Building the Design Studio of the Future Aaron Adler Jacob Eisenstein Michael Oltmans Lisa Guttentag Randall Davis October 23, 2004.

SG-VoIP Page 1 / 14 PLANET Pan / Tilt Internet Camera Internet Surveillance Solution.

Google home: Experience, support and re-experience of social home activities Anton Nijholt 소프트컴퓨팅연구실황주원.

David van Leeuwen, Stephan Raaijmakers, Wessel Kraaij AMI community of interest meeting Automatic segmentation of meeting recordings.

SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.

Smart Meeting Systems Josh Reilly. Why are Smart Meeting Systems worth studying?

Kinect Player Gender Recognition from Speech Analysis

Where, Who and Intelligent Affective Interaction ICANN, Sept. 14, Athens, Greece Aristodemos Pnevmatikakis, John Soldatos and Fotios Talantzis.

WOZ acoustic data collection for interactive TV A. Brutti*, L. Cristoforetti*, W. Kellermann+, L. Marquardt+, M. Omologo* * Fondazione Bruno Kessler (FBK)

Twenty-First Century Automatic Speech Recognition: Meeting Rooms and Beyond ASR 2000 September 20, 2000 John Garofolo

Hospital Asset Tracking

Use Case Description Hospital Asset Tracking. Introduce the scenario – This scenario prototypes tracking valuable assets leaving the hospital building.

The New Era of Business Meetings Making time between and during meetings more effective.

Macquarie RT05s Speaker Diarisation System Steve Cassidy Centre for Language Technology Macquarie University Sydney.

1 Advanced Computer Programming Databases. Overview What is a database? Database Basics Database Components Data Models Normalization Database Design.

September 29, 2002Ubicomp 021 NIST Meeting Data Collection Jean Scholtz National Institute of Standards and Technology Gaithersburg, MD USA.

14 Chapter 11: Designing the User Interface. 14 Systems Analysis and Design in a Changing World, 3rd Edition 2 Identifying and Classifying Inputs and.

New Meeting IDIAP Daniel Gatica-Perez, Iain McCowan, Samy Bengio Corpus Administration – Joanne Schulz Technical Assistance – Thierry Collado,

Privacy Protection for Life-log Video Jayashri Chaudhari, Sen-ching S. Cheung, M. Vijay Venkatesh Department of Electrical and Computer Engineering Center.

SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,

 ELRA & ELDA TC-STAR General Meeting Lux KC 1 European Language Resources Association (ELRA) HLT Evaluations Khalid CHOUKRI ELRA/ELDA 55 Rue.

By NIST/ITL/IAD, Mike Rubinfeld, January 16, 2002 Page 1 L3 Overview L3 Standards Overview By Mike Rubinfeld Chairman, INCITS/L3 (MPEG & JPEG) NIST, Gaithersburg,

COMPUTER PARTS AND COMPONENTS INPUT DEVICES

Images and Sounds: Audio and Video for Education Joe Wise and Michael Hamilton.

Multimodal Information Analysis for Emotion Recognition

Multimedia ITGS. Multimedia Multimedia: Documents that contain information in more than one form: Text Sound Images Video Hypertext: A document or set.

Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab.

ESR 2 / ER 2 Testing Campaign Review A. CrivellaroY. Verdie.

VSX 7000s Product Briefing July 2005 For more information, contact: 1 PC Network Inc. 1 PC Network Inc. Phone Fax

LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,

ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.

1Seminar „Multimodale Räume“ Uni Karlsruhe, The FAME Project Acronym:Facilitating Agent for Multicultural Exchange Partners: Universität Karlsruhe,INPG.

1 Workshop « Multimodal Corpora » Jean-Claude MARTIN Patrizia PAGGIO Peter KÜEHNLEIN Rainer STIEFELHAGEN Fabio PIANESI.

LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

Detecting Eye Contact Using Wearable Eye-Tracking Glasses.

ENTERFACE 08 Project #1 “ MultiParty Communication with a Tour Guide ECA” Final presentation August 29th, 2008.

ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.

Elluminate Live! Participant's Guide Ensure your computer meets the minimum system requirements recommended for running an Elluminate Live! session on.

Copyright 2007 by Rombix. R CyClops is a computer vision solution which could integrate most of the Real World Computer Vision Application. Available.

MIT Artificial Intelligence Laboratory — Research Directions Intelligent Perceptual Interfaces Trevor Darrell Eric Grimson.

REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR

Attention Tracking Tool

Tasks processing Strong participation . Large results comparison on

The next generation of collaboration

Conferencing with Video and Presenting

ConnectPro User Guide for Students

Speaker Localization: introduction to system evaluation

Human-centered Interfaces

Knowledge-based event recognition from salient regions of activity

 When entering the competition hall, please give your ID-card:

Automated Detection of Human Emotion

Report 2 Brandon Silva.

Presentation transcript:

NM – LREC 2008 /1 N. Moreau 1, D. Mostefa 1, R. Stiefelhagen 2, S. Burger 3, K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU s: Evaluations and Language resources distribution agency (ELDA) Data Collection for the CHIL CLEAR 2007 Evaluation Campaign

NM – LREC 2008 /2 Plan 1.CHIL project 2.Evaluation campaigns 3.Data recordings 4.Annotations 5.Evaluation package 6.Conclusion

NM – LREC 2008 /3 CHIL Project CHIL: Computers in the Human Interaction Loop Integrated project funded by the European Commission (FP6) January 2004 – August partners, 9 countries (ELDA responsible for data collection and evaluations) Multimodal and perceptual user interface technologies Context: –Real-life meetings (small meeting rooms) –Activities and interactions of attendees

NM – LREC 2008 /4 CHIL evaluation campaigns June 2004: Dry run January 2005: Internal evaluation campaign February 2006: CLEAR 2006 campaign February 2007: CLEAR 2007 campaign CLEAR = Classification of Events, Activities and Relationships –Opened to external participants –Supported by CHIL and NIST (VACE Program) –Co-organized with the NIST RT (Rich Transcription) Evaluation

NM – LREC 2008 /5 CLEAR 2007 evaluation campaign 9 technologies evaluated –Vision technologies Face Detection and Tracking Visual Person Tracking Visual Person Identification Head Pose Estimation –Acoustic technologies Acoustic Person Tracking Acoustic Speaker Identification Acoustic Event Detection –Mutlimodal technologies Multimodal Person Tracking Multimodal Speaker Identification

NM – LREC 2008 /6 CHIL Scenarios Non Interactive Lectures Interactive Seminars

NM – LREC 2008 /7 CHIL Data Sets CLEAR 2007 Data Collection: –25 highly interactive seminars –Attendees: between 3 and 7 –Events: several presenters, discussions, coffee breaks, people entering / leaving the room,... Campaign# Lectures# Interactive Seminars Internal120 CLEAR CLEAR

NM – LREC 2008 /8 Recording set up 5 recording rooms Sensors: –64-channel microphone array –4-channel T-shaped microphones –Table-top microphones –Close talking microphones Audio Video –4 fixed corner cameras –1 ceiling wide-angle camera –Pan-tilt-zoom (PTZ) cameras

NM – LREC 2008 /9 Camera Views

NM – LREC 2008 /10 Quality Santards Recording of 25 seminars in 2007 (5 per CHIL room) Audio-visual clap at beginning and end Cameras (JPEG files at 15, 25 or 30 fps) –Max. desynchronisation = 200 ms Microphone array –Max. desynchronisation = 200 ms Other microphones (T-shape, table) –Max. desynchronisation = 50 ms If desynchronisation > max => recording to be remade

NM – LREC 2008 /11 Annotations CLEAR 2007 Annotations: –Audio: transcriptions, acoustic events –Video: facial features, head pose CampaignDevelopment dataEvaluation data Internal2h 201h 40 CLEAR 20062h 303h 10 CLEAR 20072h 453h 25

NM – LREC 2008 /12 Audio Annotations Orthographic transcriptions –2 channels Based on near filed recordings (close-talking microphones) Compared with one far-field recording –Speaker turns –Non verbal events (laugh, pauses...) –See: S. Burger “The CHIL RT07 Evaluation Data” Acoustic events –Based on one microphone array channel –15 categories of sounds: Speech, door slam, step, chair moving, cup jingle, applause, laugh, key jingle, cough, keyboard, phone, music, knock, paper wrapping, unknown

NM – LREC 2008 /13 Video Annotations Facial Features (Face detection, Person tracking) –annotations every 1 second –all attendees –4 camera views –facial labels head centroïd left and right eyes nose bridge face bounding box –2D head centroïds  3D ”ground truth” Person Identification Database –28 persons to identify –audio-visual excerpts for each person ID –video labels every 200 ms

NM – LREC 2008 /14 Video Annotations

NM – LREC 2008 /15 Head Pose Data Set Persons captured with different head orientations –standing in the middle of a CHIL room (ISL) –captured by the 4 corner cameras Annotations: –Head bounding box –Head orientation: Pan, Tilt, Roll 10 persons for development 5 persons for evaluation

NM – LREC 2008 /16 Head Pose Data Set

NM – LREC 2008 /17 Evaluation package The CLEAR 2007 evaluation package is publicly available through the ELRA catalog Enable external players to evaluate their system offline For each of the evaluated technologies: –Data sets (development/evaluation) –Evaluation and scoring tools –Results of the official campaign

NM – LREC 2008 /18 Conclusion 9 technologies evaluated during the 3rd CHIL evaluation campaign The CHIL 2007 evaluation package available through the ELRA catalog: For more on the evaluations see: CLEAR 2007: RT 2007: