ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.

ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004

2 Overview Motivation Corpus Description Kappa Shortcomings Kappa Augmentation Classification of messages Corpus annotation analysis Next step: Sharpening method Summary

3 Motivation The ACM email corpus annotation raises two problems. –By allowing annotators to assign a message one or two labels, there is no clear way to calculate an annotation statistic. An augmentation to the kappa statistic is proposed –Interannotator reliability is low (K <.3) Annotator reeducation and/or annotation material redesign are most likely necessary. Available annotated data can be used, hypothetically, to improve category assignment.

4 Corpus Description 312 email messages exchanged between the Columbia chapter of the ACM. Annotated by 2 annotators with one or two of the following 10 labels –question, answer, broadcast, attachment transmission, planning, planning scheduling, planning-meeting scheduling, action item, technical discussion, social chat

5 Kappa Shortcomings Before running ML procedures, we need confidence in assigning labels to the messages. In order to compute kappa (below) we need to count up the number of agreements. How do you determine agreement with an optional secondary label? –Ignore the secondary label?

6 Kappa Shortcomings (ctd.) Ignoring the secondary label isn’t acceptable for two reasons. –It is inconsistent with the annotation guidelines. –It ignores partial agreements. {a,ba}- singleton matches secondary {ab,ca}- primary matches secondary {ab,cb}- secondary matches secondary {ab,ba}- secondary matches primary, and vice versa Note: The purpose is not to inflate the kappa value, but to accurately assess the data.

7 Kappa Augmentation When a labeler employs a secondary label, consider it as a single annotation divided between two categories Select a value of p, where 0.5≤p≤1.0, based on how heavily to weight the secondary label –Singleton annotations assigned a score of 1.0 –Primary p –Secondary 1-p

Kappa Augmentation example AB 1a,bb,d 2b,aa,b 3bb 4ca,d 5b,cc Annotator labels Judge Aabcd 10.60.4 2 0.6 3 1 4 1 5 0.4 Total12.61.405 Judge Babcd 1 0.6 0.4 20.60.4 3 1 40.6 0.4 5 1 Total1.2210.85 Annotation Matrices with p=0.6

9 Kappa Augmentation example (ctd.) abcd 100.2400 2 00 301.000 40000 5000.40 Total0.241.480.40 2.12 Agreement matrix Judge Aabcd 10.60.4 2 0.6 3 1 4 1 5 0.4 Total12.61.405 Judge Babcd 1 0.6 0.4 20.6.4 3 1 40.6 0.4 5 1 Total1.2210.85 Annotation Matrices

10 Kappa Augmentation example (ctd.) To calculate p(E), use the relative frequencies of each annotators label usage. P(Topic)Judge AJudge BP(A)*P(B) a0.20.240.048 b0.520.40.208 c0.280.20.056 d00.160 p(E)=0.312 Kappa is then computed as originally:

11 Classification of messages This augmentation allows us to classify messages based their individual kappa’ values at different values of p. –Class 1: high kappa’ at all values of p. –Class 2: low kappa’ at all values of p. –Class 3: high kappa’ at p = 1.0 –Class 4: high kappa’ at p = 0.5 Note: mathematically kappa’ needn’t be monotonic w.r.t. p, but with 2 annotators it is.

12 Corpus Annotation Analysis Agreement is low at all values of p –K’(p=1.0) = 0.299 –K’(p=0.5) = 0.281 Other views of the data will provide some insight into how to revise the annotation scheme. –Category distribution –Category co-occurrence –Category confusion –Class distribution –Category by class distribution

13 Corpus Annotation Analysis: Category Distribution totalgrdb Question1758689 Answer1699079 Broadcast13223109 Attachment Transmission312 Planning Meeting Scheduling633231 Planning Scheduling27225 Planning927616 Action Item19109 Technical Discussion31229 Social Chat36297

14 Corpus Annotation Analysis: Category Co-occurrence QABA.T.P.M.SP.S.P.A.IT.DS.C Questionx191218617167 Answerxx201534172 Broadcastxxx0228001 Attachment Transmissionxxxx000000 Planning Meeting Schedulingxxxxx21000 Planning Schedulingxxxxxx0000 Planningxxxxxxx320 Action Itemxxxxxxxx10 Technical Discussionxxxxxxxxx1 Social Chatxxxxxxxxxx

15 Corpus Annotation Analysis: Category Confusion QABA.T.P.M.S.P.SPA.IT.D.S.C. Question623621018134771310 Answerx60150247195173 Broadcastxx1401213523822 Attachment Transmissionxxx0001001 Planning Meeting Schedulingxxxx1363200 Planning Schedulingxxxxx24110 Planningxxxxxx7550 Action Itemxxxxxxx121 Technical Discussionxxxxxxxx21 Social Chatxxxxxxxxx4

16 Corpus Annotation Analysis: Class Distribution Constant High (Class 1):820.262821 Constant Low (Class 2):1500.480769 Low to High (Class 3):400.128205 High to Low (Class 4):400.128205 Total Messages312

17 Corpus Annotation Analysis: Category by Class Distribution-1/2 Num messages Class : Total Question520.29714 Answer620.36686 Broadcast160.12121 Attachment Transmission00 Planning Meeting Scheduling180.28571 Planning Scheduling20.07407 Planning80.08695 Action Item00 Technical Discussion20.06451 Social Chat40.11111 Num messages Class : Total Question370.21142 Answer420.24852 Broadcast920.69697 Attachment Transmission31 Planning Meeting Scheduling240.38095 Planning Scheduling130.48148 Planning600.65217 Action Item140.73684 Technical Discussion170.54838 Social Chat220.61111 Class 1:const. highClass 2:const. low

Corpus Annotation Analysis: Category by Class Distribution-2/2 Num messages Class : Total Question460.26285 Answer400.23668 Broadcast60.04545 Attachment Transmission00 Planning Meeting Scheduling40.06349 Planning Scheduling50.18518 Planning50.05434 Action Item40.21052 Technical Discussion110.35483 Social Chat640.16666 Num messages Class : Total Question400.22857 Answer250.14972 Broadcast180.13636 Attachment Transmission00 Planning Meeting Scheduling170.26984 Planning Scheduling70.25925 Planning190.20652 Action Item10.05263 Technical Discussion10.03225 Social Chat20.11111 Class 3:low to highClass 4:high to low

19 Next step: Sharpening method In determining interannotator agreement with kappa, etc., two available pieces of information are overlooked: –Some annotators are “better” than others –Some messages are “easier to label” than others By limiting the contribution of known poor annotators and difficult messages, we gain confidence in the final category assignment of each message. How do we rank annotators? Messages?

20 Sharpening Method (ctd.) Ranking Annotators –Calculate kappa between each annotator and the rest of the group. –“Better” annotators have a higher agreement with the group Ranking messages –Variance (or -p*log(p)) of label vector summed over annotators. –Messages with high variance are more consistently annotated

21 Sharpening Method (ctd.) How do we use these ranks? –Weight the annotators based on their rank. –Recompute the message matrix with weighted annotator contributions. –Weight the messages based on their rank. –Recompute the kappa values with weighted message contributions. –Repeat these steps until the weights change beneath a threshold.

22 Summary The ACM email corpus annotation raises two problems. –By allowing annotators to assign a message one or two labels, there is no clear way to calculate an annotation statistic. An augmentation to the kappa statistic is proposed –Interannotator reliability is low (K <.3) Annotator reeducation and/or annotation material redesign are most likely necessary. Available annotated data can be used, hypothetically, to improve category assignment.

ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.

Similar presentations

Presentation on theme: "ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004.

Similar presentations

Presentation on theme: "ACM email corpus annotation analysis Andrew Rosenberg 2/26/2004."— Presentation transcript:

Similar presentations

About project

Feedback