Exploiting Subjective Annotations Dennis Reidsma and Rieks op den Akker Human Media Interaction University of Twente

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.
Conceptualization, Operationalization, and Measurement
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
From Decision Trees To Rules
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Chapter 10.  Real life problems are usually different than just estimation of population statistics.  We try on the basis of experimental evidence Whether.
TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.
Asa MacWilliams Lehrstuhl für Angewandte Softwaretechnik Institut für Informatik Technische Universität München Dec Software.
Evaluating Search Engine
TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.
1 Testing Oral Ability Pertemuan 22 Matakuliah: >/ > Tahun: >
University of Athens, Greece Pervasive Computing Research Group Predicting the Location of Mobile Users: A Machine Learning Approach 1 University of Athens,
Security Models for Trusting Network Appliances From : IEEE ( 2002 ) Author : Colin English, Paddy Nixon Sotirios Terzis, Andrew McGettrick Helen Lowe.
Operational Definitions In our last class, we discussed (a) what it means to quantify psychological variables and (b) the different scales of measurement.
Seminar Topic on Content Analysis
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 14.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
BETA, , V.Turnovo 1 Is it difficult to make a listening test? Svetla Tashevska New Bulgarian University, Sofia.
Active Learning for Class Imbalance Problem
Effective Public Speaking Chapter # 3 Setting the Scene for Community in a Diverse Culture.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Human Computer Interaction
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
1 Chapter 1 Research Methods When sociologists do quantitative research, they generally use either surveys or precollected data.quantitative research Qualitative.
Understanding Human Behavior Helps Us Understand Investor Behavior MA2N0246 Tsatsral Dorjsuren.
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
Global Communication Skills Tosspon UNO IPD Meeting 6 Agenda Conflict Management Active Listening.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
CHAPTER 1 HUMAN INQUIRY AND SCIENCE. Chapter Outline  Looking for Reality  The Foundation of Social Science  Some Dialectics of Social Research  Quick.
GRASP: Designing Objects with Responsibilities
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Issues in Multiparty Dialogues Ronak Patel. Current Trend  Only two-party case (a person and a Dialog system  Multi party (more than two persons Ex.
Randy Y. Hirokawa and Abran J. Salazar Task-Group Communication and Decision-Making Performance.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Automatic Identification of Pro and Con Reasons in Online Reviews Soo-Min Kim and Eduard Hovy USC Information Sciences Institute Proceedings of the COLING/ACL.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
 The point estimators of population parameters ( and in our case) are random variables and they follow a normal distribution. Their expected values are.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Image Classification for Automatic Annotation
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Exam 2 Review Software Engineering CS 561. Outline Requirements Development UML Class Diagrams Design Patterns Users, Usability, and User Interfaces Software.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
Ashley James & Tom Flammini October 8, 2013
Chapter 6 - Standardized Measurement and Assessment
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
MANAGEMENT INFORMATION SYSTEMS (MIS) AND OTHER INFORMATION SYSTEMS.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Supporting the design of interactive systems a perspective on supporting people’s work Hans de Graaff 27 april 2000.
Conversational role assignment problem in multi-party dialogues Natasa Jovanovic Dennis Reidsma Rutger Rienks TKI group University of Twente.
Interpersonal Communication NON-VERBAL COMMUNICATION by Jay Barrett What do you know about me through my non- verbal communication in class?
The Toulmin Method. Why Toulmin…  Based on the work of philosopher Stephen Toulmin.  A way to analyze the effectiveness of an argument.  A way to respond.
Methods of Training.
Learning part-of-speech taggers with inter-annotator agreement loss EACL 2014 Barbara Plank, Dirk Hovy, Anders Søgaard University of Copenhagen Presentation:
Automatic cLasification d
Accuracy Assessment of Thematic Maps
Erasmus University Rotterdam
MOIS 508 Spring 2006 Dr. Dina Rateb
Data Mining Practical Machine Learning Tools and Techniques
Supervised vs. unsupervised Learning
Dr. Debaleena Chattopadhyay Department of Computer Science
Predicting Body Movement and Recognizing Actions: an Integrated Framework for Mutual Benefits Boyu Wang and Minh Hoai Stony Brook University Experiments:
Presentation transcript:

Exploiting Subjective Annotations Dennis Reidsma and Rieks op den Akker Human Media Interaction University of Twente

Types of content Annotation as a task of subjective judgments? Manifest content Pattern latent content Projective latent content Cf. Potter and Levine-Donnerstein 1999

Projective latent content Why annotate data as projective latent content? Because it cannot be defined exhaustively, whereas annotators have good `mental schema’s’ for it Because the data should be annotated in terms that fit with the understanding of `naïve users’

Inter-annotator agreement and projective content Disagreements may be caused by Errors by annotators Invalid scheme (no true label exists) Different annotators having different `truths’ in interpretation of behavior (subjectivity)

Subjective annotation People communicate in different ways, and therefore, as an observer, may also judge the behavior of others differently

Subjective annotation People communicate in different ways, and therefore, as an observer, may also judge the behavior of others differently Projective content may be especially vulnerable to this problem

Subjective annotation People communicate in different ways, and therefore, as an observer, may also judge the behavior of others differently Projective content may be especially vulnerable to this problem How to work with subjectively annotated data?

Subjective annotation How to work with subjectively annotated data? Unfortunately, it leads to low levels of agreement, and therefore usually would be avoided as `unproductive material’

I. Predicting agreement One way to work with subjective data is to try to find out in which contexts annotators would agree, and focus on those situations. Result: a classifier that will not always classify all instances, but if it does, it will do so with greater accuracy

II. Explicitly modeling intersubjectivity A second way: model different annotators separately, then find the cases where the models agree, and assume that those are the cases where the annotators would have agreed, too. Result: a classifier that tells you for which instances other annotators would most probably agree with its classification

Advantages Both solutions lead to `cautious classifiers’ that only render a judgment in those cases where annotators would have been expected to agree This may carry over to users, too… Neither solution needs to have all data multiply annotated for this

Time?

Pressing questions so far? (The remainder of the talk will give two case studies.)

Case studies I. Predicting agreement from information in other (easier) modalities: The case of contextual addressing II. Explicitly modeling intersubjectivity in dialog markup: The case of Voting Classifiers

Data used: The AMI Corpus 100h of recorded meetings, annotated with dialog acts, focus of attention, gestures, addressing, decision points, and other layers

I. Contextual addressing Addressing, and focus of attention. Agreement is highest for certain FOA contexts. In those contexts, the classifier also performed better. … more in paper

II. Modeling intersubjectivity Modeling single annotators, for `yeah’ utterances Data annotated non-overlapping, 3 annotators All data dsv Trn (3585) Tst (2289) Trn (1753) Tst (528) Trn (3500) Tst (1362)

II. Modeling intersubjectivity Cross annotator training and testing TST_dTST_sTST_vTST_all C_d C_s C_v

II. Modeling intersubjectivity Building a voting classifier: Only classify an instance when all three annotator-specific expert classifiers agree

II. Modeling intersubjectivity In the unanimous voting context, performance is higher due to increased precision (avg 6%)

Conclusions Possible subjective aspects to annotation should be taken into account Agreement metrics are not designed to handle this We proposed two methods designed to cope with subjective data

Thank you! Questions?