Presentation is loading. Please wait.

Presentation is loading. Please wait.

NM – LREC 2008 /1 N. Moreau 1, D. Mostefa 1, R. Stiefelhagen 2, S. Burger 3, K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU s:

Similar presentations


Presentation on theme: "NM – LREC 2008 /1 N. Moreau 1, D. Mostefa 1, R. Stiefelhagen 2, S. Burger 3, K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU s:"— Presentation transcript:

1 NM – LREC 2008 /1 N. Moreau 1, D. Mostefa 1, R. Stiefelhagen 2, S. Burger 3, K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU E-mails: {moreau;mostefa;choukri}@elda.org, stiefel@ira.uka.de, sburger@cs.cmu.edu Evaluations and Language resources distribution agency (ELDA) www.elda.org Data Collection for the CHIL CLEAR 2007 Evaluation Campaign

2 NM – LREC 2008 /2 Plan 1.CHIL project 2.Evaluation campaigns 3.Data recordings 4.Annotations 5.Evaluation package 6.Conclusion

3 NM – LREC 2008 /3 CHIL Project CHIL: Computers in the Human Interaction Loop Integrated project funded by the European Commission (FP6) January 2004 – August 2007 15 partners, 9 countries (ELDA responsible for data collection and evaluations) Multimodal and perceptual user interface technologies Context: –Real-life meetings (small meeting rooms) –Activities and interactions of attendees

4 NM – LREC 2008 /4 CHIL evaluation campaigns June 2004: Dry run January 2005: Internal evaluation campaign February 2006: CLEAR 2006 campaign February 2007: CLEAR 2007 campaign CLEAR = Classification of Events, Activities and Relationships –Opened to external participants –Supported by CHIL and NIST (VACE Program) –Co-organized with the NIST RT (Rich Transcription) Evaluation

5 NM – LREC 2008 /5 CLEAR 2007 evaluation campaign 9 technologies evaluated –Vision technologies Face Detection and Tracking Visual Person Tracking Visual Person Identification Head Pose Estimation –Acoustic technologies Acoustic Person Tracking Acoustic Speaker Identification Acoustic Event Detection –Mutlimodal technologies Multimodal Person Tracking Multimodal Speaker Identification

6 NM – LREC 2008 /6 CHIL Scenarios Non Interactive Lectures Interactive Seminars

7 NM – LREC 2008 /7 CHIL Data Sets CLEAR 2007 Data Collection: –25 highly interactive seminars –Attendees: between 3 and 7 –Events: several presenters, discussions, coffee breaks, people entering / leaving the room,... Campaign# Lectures# Interactive Seminars Internal120 CLEAR 20063415 CLEAR 2007025

8 NM – LREC 2008 /8 Recording set up 5 recording rooms Sensors: –64-channel microphone array –4-channel T-shaped microphones –Table-top microphones –Close talking microphones Audio Video –4 fixed corner cameras –1 ceiling wide-angle camera –Pan-tilt-zoom (PTZ) cameras

9 NM – LREC 2008 /9 Camera Views

10 NM – LREC 2008 /10 Quality Santards Recording of 25 seminars in 2007 (5 per CHIL room) Audio-visual clap at beginning and end Cameras (JPEG files at 15, 25 or 30 fps) –Max. desynchronisation = 200 ms Microphone array –Max. desynchronisation = 200 ms Other microphones (T-shape, table) –Max. desynchronisation = 50 ms If desynchronisation > max => recording to be remade

11 NM – LREC 2008 /11 Annotations CLEAR 2007 Annotations: –Audio: transcriptions, acoustic events –Video: facial features, head pose CampaignDevelopment dataEvaluation data Internal2h 201h 40 CLEAR 20062h 303h 10 CLEAR 20072h 453h 25

12 NM – LREC 2008 /12 Audio Annotations Orthographic transcriptions –2 channels Based on near filed recordings (close-talking microphones) Compared with one far-field recording –Speaker turns –Non verbal events (laugh, pauses...) –See: S. Burger “The CHIL RT07 Evaluation Data” Acoustic events –Based on one microphone array channel –15 categories of sounds: Speech, door slam, step, chair moving, cup jingle, applause, laugh, key jingle, cough, keyboard, phone, music, knock, paper wrapping, unknown

13 NM – LREC 2008 /13 Video Annotations Facial Features (Face detection, Person tracking) –annotations every 1 second –all attendees –4 camera views –facial labels head centroïd left and right eyes nose bridge face bounding box –2D head centroïds  3D ”ground truth” Person Identification Database –28 persons to identify –audio-visual excerpts for each person ID –video labels every 200 ms

14 NM – LREC 2008 /14 Video Annotations

15 NM – LREC 2008 /15 Head Pose Data Set Persons captured with different head orientations –standing in the middle of a CHIL room (ISL) –captured by the 4 corner cameras Annotations: –Head bounding box –Head orientation: Pan, Tilt, Roll 10 persons for development 5 persons for evaluation

16 NM – LREC 2008 /16 Head Pose Data Set

17 NM – LREC 2008 /17 Evaluation package The CLEAR 2007 evaluation package is publicly available through the ELRA catalog Enable external players to evaluate their system offline For each of the evaluated technologies: –Data sets (development/evaluation) –Evaluation and scoring tools –Results of the official campaign

18 NM – LREC 2008 /18 Conclusion 9 technologies evaluated during the 3rd CHIL evaluation campaign The CHIL 2007 evaluation package available through the ELRA catalog: http://catalog.elra.info/ For more on the evaluations see: CLEAR 2007: http://www.clear-evaluation.org/ RT 2007: http://www.nist.gov/speech/tests/rt/


Download ppt "NM – LREC 2008 /1 N. Moreau 1, D. Mostefa 1, R. Stiefelhagen 2, S. Burger 3, K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU s:"

Similar presentations


Ads by Google