Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.

Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002

Outline TDT Evaluation Overview TDT-2003 Evaluation Result Summaries New Event Detection Topic Detection Topic Tracking Link Detection Other Investigations

TDT 101 “Applications for organizing text” 5 TDT Applications Story Segmentation Topic Tracking Topic Detection New Event Detection Link Detection Terabytes of Unorganized data

TDT’s Research Domain Technology challenge Develop applications that organize and locate relevant stories from a continuous feed of news stories Research driven by evaluation tasks Composite applications built from Automatic Speech Recognition Story Segmentation Document Retrieval

Definitions An event is … A specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequences. A topic is … an event or activity, along with all directly related events and activities A broadcast news story is … a section of transcribed text with substantive information content and a unified topical focus

TDT-02 Evaluation Corpus TDT4 Corpus TDT4 Corpus used for last year’s evaluation October 1, 2000 to January 31, 2001 20 sources: 8 English, 5 Arabic, 7 Mandarin Chinese 90735 news, 7513 non-news stories 80 annotated topics 40 topics from 2002 40 new topics See LDC’s presentation for more details

What was new in 2002 40 new topics Same number of “On- Topic” stories 20, 10, 10 seed stories for Arabic, English and Mandarin respectively. Much more Arabic “On- Topic” stories Large influence on scores

Participants Carnegie Mellon Univ. (CMU) Royal Melbourne Insititute of Technology (RMIT) Stottler Henke Associates, Inc. (SHAI) Univ. Massachusetts (UMass) New Event Topic DetectionTopic Tracking Link Detection CMU22611 RMIT12 SHAI10 UMass831817

TDT Evaluation Methodology Evaluation tasks are cast as detection tasks: YES there is a target, or NO there is not Performance is measured in terms of detection cost: “a weighted sum of missed detection and false alarm probabilities” C Det = C Miss * P Miss * P target + C FA * P FA * (1- P target ) C Miss = 1 and C FA = 0.1 are preset costs P target = 0.02 is the a priori probability of a target

TDT Evaluation Methodology (cont’d) Detection Cost is normalized to generally lie between 0 and 1: (C Det ) Norm = C Det /min{C Miss *P target, C FA * (1-P target )} When based on the YES/NO decisions, it is referred to as the actual decision cost Detection Error Tradeoff (DET) curves graphically depict the performance tradeoff between P Miss and P FA Makes use of likelihood scores attached to the YES/NO decisions Minimum DET point is the best score a system could achieve with proper thresholds

TDT: Experimental Control Good research requires experimental controls Conditions that affect performance in TDT Newswire vs. Broadcast news Manual vs. automatic transcription of Broadcast News Manual vs. automatic story segmentation Mono vs. multilingual language material Topic training amounts and languages Default, automatic English translation vs. native orthography Decision deferral periods

Outline TDT Evaluation Overview TDT-02 Evaluation Result Summaries New Event Detection (NED) Topic Detection Topic Tracking Link Detection Other Investigations

New Event Detection Task System Goal: To detect each new event discussing each topic for the first time Evaluating “part” of a Topic Detection system, I.e., when to start a new cluster New Event on two topics Not First Stories of Events = Topic 1 = Topic 2

TDT-03 Primary NED Results SR=nwt+bnasr TE=eng,nat boundary DEF=10

Primary NED Results 2002 vs. 2003 Topics

Topic Detection Task System Goal: To detect topics in terms of the (clusters of) stories that discuss them. “Unsupervised” topic training New topics must be detected as the incoming stories are processed Input stories are then associated with one of the topics Topic 1 Topic 2 Story Stream

TDT-03 Topic Detection Results Multilingual Sources, English Translations, Reference Boundaries, 10 File Deferral Period Newswire+BNews ASR Newswire+BNews Manual Trans Not a primary system

Topic Tracking Task System Goal: To detect stories that discuss the target topic, in multiple source streams Supervised Training  Given N t samples stories that discuss a given target topic Testing  Find all subsequent stories that discuss the target topic training data test data on-topic unknown

TDT-03 Primary TRK Results Newswire+BNews Human Trans., Multilingual sources, English Translations, Reference Boundaries, 1 Training story, 0 Negative Training Stories Newswire + BNews Human Trans., Nt=1 Nn=0 Newswire+ BNews ASR, Nt=1 Nn=0 RMIT1 UMass01CMU1

Primary Topic Tracking Results 2002 vs. 2003 Topics Minimum DET Cost

Link Detection Task System Goal: To detect whether a pair of stories discuss the same topic. (Can be though of as a “primitive operator” to build a variety of applications) ?

TDT-03 Primary LNK Results Newswire+BNews ASR, Multilingual Sources, English or Native Translations, Reference Boundaries, 10 File Deferral Period

TDT-03 Primary LNK Results 2002 vs. 2003 Topics Topic Weighted, Minimum DET Cost UMass01 CMU1

Outline TDT Evaluation Overview 2002 TDT Evaluation Result Summaries New Event Detection (NED) Topic Detection Topic Tracking Link Detection Other Investigations

History of performance

Evaluation Performance History Link Detection yearconditionsitescore 1999 SR=nwt+bnasr TE=eng,nat DEF=10 CMU11.0943 2000 SR=nwt+bnasr TE=eng+man,eng boundary DEF=10 UMass1.3134 2001“ CMU1.2421 2002 SR=nwt+bnasr TE=eng+man+arb, eng boundary DEF=10 PARC1.1947 2003 SR=nwt+bnasr TE=eng+man+arb, eng boundary DEF=10 UMass01.1839* * 0.1798 on 2002 Topics

Evaluation Performance History Tracking yearconditionsitescore 1999 SR=nwt+bnasr TR=eng TE=eng+man,eng boundary NT=4 BBN1.0922 2000 SR=nwt+bnman TR=eng TE=eng+man,eng boundary NT=1_Nn=0 IBM1.1248 2001“ LIMSI1.1213 2002 SR=nwt+bnman TR=eng TE=eng+man+arb, eng boundary Nt=1 Nn=0 UMass1.1647 2003 SR=nwt+bnman TR=eng TE=eng+man+arb, eng boundary Nt=1 Nn=0 UMass1.1949* * 0.1618 on 2002 Topics

Evaluation Performance History Topic Detection yearconditionsitescore 1999 SR=nwt+bnasr TE=eng+man,eng boundary DEF=10 IBM1.2645 2000 SR=nwt+bnasr TE=eng+man,eng noboundary DEF=10 Dragon1.3326 2001“ TNO1 (late).3551 2002 SR=nwt+bnasr TE=eng+man+arb, eng boundary DEF=10 UMass1.2021 2003 “” CMU1.3035* * 0.3007 on 2002 Topics

Evaluation Performance History New Event Detection yearconditionsitescore 1999 SR=nwt+bnasr TE=eng,nat boundary DEF=10 UMass1.8110 2000 SR=nwt+bnasr TE=eng,nat noboundary DEF=10 UMass1.7581 2001“ UMass1.7729 2002 SR=nwt+bnasr TE=eng,nat boundary DEF=10 CMU1.4449 2003 “” CMU1.5971* * 0.4283 on 2002 Topics

Summary and Issues to Discuss TDT Evaluation Overview 2003 TDT Evaluation Results 2002 vs. 2003 topic sets are very different 2003 set was weighted more towards Arabic Dramatic increase in error rates with new topics; link detection, topic tracking and new event detection Need to calculate the effect of topic set on topic detection TDT 2004 Release 2003 topics and TDT4 corpus? Ensure 2004 evaluation will support Go/No Go decisions What tasks will 2004 include?

Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.

Similar presentations

Presentation on theme: "Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.

Similar presentations

Presentation on theme: "Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002."— Presentation transcript:

Similar presentations

About project

Feedback