Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Similar presentations


Presentation on theme: "Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2."— Presentation transcript:

1 Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2

2 Members Permanents Lisa Harper Michael Kipp Emiel Krahmer Jean-Claude Martin Dagmar Schmauks Visiting Scientists Harry Bunt Kioto Hasida John Lee Thomas Rist Laurent Romary

3 Needs What is a multimodal corpus? What corpora do exist? How to collect a corpus? How to develop/choose a coding scheme? What tool to develop/choose? What is the organizational infrastructure? What is the future?

4 Multimodal Corpus Sound –human speech (e.g. MPEG7) transcription, (morphology) part-of-speech syntax (linguistic DS) binary relations –thematic roles –rhetorical relations –co-reference –computer voice, sound, music –environmental sounds

5 Multimodal Corpus (2) Vision –head: movement, gaze, facial expression –gesture: hands/arms basic phases formal features (handshape, trajectories, direction, location etc.) encode qualities (Laban efforts?) functional/semiotic categories (emblem, iconic, deictic, self- adaptors etc.) –posture: including feet/legs –computer graphics (charts/tables), characters –static/dynamic environment (people/objects): moving camera

6 Multimodal Corpus (3) Haptic –pressure of feet/hands/back/on seat, texture –force feedback Biometric –heartrate, eye dilation, skin sensitivity, eyebrow movement, breathing Smell & taste (VR) Balance (VR) Thermal (VR) –body/object temperature, conduit properties

7 Multimodal Corpus (4) Within-modality/cross-modality relations –mirror behavior, synchronized behavior, repeated behavior, postural congruence –distance and touch Behavioral/Social units? often across modalities!

8 Needs What is a multimodal corpus? What corpora do exist? How to collect a corpus? How to develop/choose a coding scheme? What tool to develop/choose? What is the organizational infrastructure? What is the future?

9 Existing Corpora: Meta Survey Existing surveys: –ISLE and NIMM (D8, EU & US) –ELRA (EU) –COCOSDA (Japan) –LDC (US) –TalkBank (US)

10 Existing Corpora: Dagstuhl 2001 Survey with Dagstuhl participants Collected 28 questionnaires From 24 different institutes number of corpora number of participants 06 112 2-98 10+2

11 Questionnaire annotated modalities –speech: 20 –gestures: 17 –facial expression: 5 –gaze: 3 –posture: 3 file format –analogue: 4 –digital: 12 –I don't know: 4 tool –own tool: 9 –other tool: 3 –no tool: 8 –I don't know: 1 application areas –tourism/navigation (10), consumer electronics, info kiosk, realty, storytelling, instruction, cinema, graphical design, everyday gestures, education, car, face guessing, games, talk shows

12 Questionnaire (2) Languages –English: 11 –German: 5 –French: 2 –Japanese: 3 –Italian: 2 –Dutch, Swedish, Finnish: 1 Planning to collect: 21

13 Needs What is a multimodal corpus? What corpora do exist? How to collect a corpus? How to develop/choose a coding scheme? What tool to develop/choose? What is the organizational infrastructure? What is the future?

14 Data Collection: Methodology? Legal issues: –ethical –commercial –country dependent legislature Practical guidelines (best practice) –technical setup for recording –field-specific coder training, models for coding manuals Specify meta-data

15 Needs What is a multimodal corpus? What corpora do exist? How to collect a corpus? How to develop/choose a coding scheme? What tool to develop/choose? What is the organizational infrastructure? What is the future?

16 Coding Schemes Survey on existing schemes: ISLE D9 Guidelines for developing schemes: –encoding vs. inference –can scheme accommodate semantics or generation languages for MM players (MPEG) Standardization –partial standards like in speech –standards for computer output log files (graphics output, locations, xml, trajectories, time-stamping, granularity)

17 Needs What is a multimodal corpus? What corpora do exist? How to collect a corpus? How to develop/choose a coding scheme? What tool to develop/choose? What is the organizational infrastructure? What is the future?

18 Tools Surveys of existing tools: –ISLE D11 –(Bigbee, Loehr, Harper 2001) –TalkBank proposal Underlying frameworks: –track-based –annotation graphs –spatial annotation?

19 Tools: Checklists Checklist for coding support –fast and efficient annotation –efficient view, search & find, customizable –extensibility of annotation –easy access to scheme definitions (online) –automatic extraction of modality-specific specimen (images, sound bits, transcription sequences) Checklist for multi-coder support –update/merge, concurrent coding, reliability Checklist for Import/Export

20 Tools: Visions Bootstrapping (semi-automatic or fully automatic annotation) Use MM techniques for coding tools (3D, haptic, VR) Standardized analysis (e.g. metrics) and visualization (metaphors) Modular generic framework for –tools –schemes

21 Tools (4) Annotation Framework (Tracks, types, objects etc.) Coding Tool Coding Scheme specific analysis ML classifier parser Logical Layer data viewer general analysis

22 Annotation Framework (Tracks, types, objects etc.) Coding Tool scheme framework analysis module ML classifier parser speech gaze gesture

23 Needs What is a multimodal corpus? What corpora do exist? How to collect a corpus? How to develop/choose a coding scheme? What tool to develop/choose? What is the organizational infrastructure? What is the future?

24 Organizations Initiatives: –EAGLES/ISLE –ATLAS, MATE/NITE –TalkBank, Childes International: –ELRA/ELDA, US? Asia? National agencies (Eurospeech): –BAS, LDC, MPI Nijmegen

25 Needs What is a multimodal corpus? What corpora do exist? How to collect a corpus? How to develop/choose a coding scheme? What tool to develop/choose? What is the organizational infrastructure? What is the future?

26 Future Data collection project –sample videos with illustrative MM data –pre-coded minimal data (speech transcription) Comparison/integration of schemes Encourage collaborative coding?

27 Future (2) Workshop on LREC Language Resources and Evaluation Canary Islands! May 2002 –deadline: 20 Nov 2001 –paper on Dagstuhl and follow-ups –coding excercise based on data coll. –questionnaire based on Dagstuhl survey


Download ppt "Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2."

Similar presentations


Ads by Google