Presentation is loading. Please wait.

Presentation is loading. Please wait.

Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.

Similar presentations


Presentation on theme: "Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington."— Presentation transcript:

1 Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington

2 What’s New in TDT 2000 TDT3 corpus used in both 1999 and 2000 –120 topics used in the 2000 test: 60 1999 topics, 60 new topics Of 44K news stories, 24% were at least singly judged YES or BRIEF 1999 and 2000 topics are very different in terms of size and cross-language makeup –Annotation of new topics using search engine-guided annotation: “Use a search engine with on-topic story feedback and interactive searching techniques to limit the number stories read by annotators” Evaluation Protocol Changes –Only minor changes to the Tracking (Negative example stories) –Link Detection test set selection changed in light of last year’s experience

3 Search-Guided Annotation: How will it affect scores? Simulate search-guided annotation using 1999 topics, 1999 annotations and 1999 Systems Probability of a human reading a judged story Number of read stories Stability in “Region of Interest”

4 TDT Topic Tracking Task training data test data on-topic unknown 7 Participants: –Dragon, IBM, Texas A&M Univ., TNO, Univ. of Iowa, Univ. of Massachusetts, Univ. of Maryland System Goal: –To detect stories that discuss the target topic, in multiple source streams. Supervised Training –Given N t sample stories that discuss a given target topic Testing –Find all subsequent stories that discuss the target topic

5 Topic Tracking Results With Negative Without Negative Basic Condition: Newswire + BNews, reference story boundaries, English training: 1 On-topic Challenge Condition: Newswire + BNews ASR, automatic story boundaries, English training: 4 On-topic, 2 Negative training

6 Topic Tracking Results (Expanded Basic Condition DET Curve)

7 Topic Tracking Results (Expanded Challenge Condition DET Curve) With Negative Without Negative

8 Effect of Automatic Story Boundaries Evaluation conditioned jointly by source and Language –Newswire, Broadcast News, English and Mandarin Degradation due to story boundaries source for ASR Test Condition: NWT+Bnasr, 4 English Training Stories, Reference Boundaries IBM1 UMass1

9 Variability of Tracking Performance Based on Training Stories BBN ran their 1999 system on this year’s index files: –Same topics, but different training stories –One caveat: these results based on different “test epochs”, 2000 index files contain more stories There could be several reasons for the difference …needs future investigation

10 NIST Speech Group TDT Link Detection Task One Participant: University of Massachusetts System Goal: –To detect whether a pair of stories discuss the same topic. (Can be thought of as a “primitive operator” to build a variety of applications) ?

11 2000 Link Detection Results A lot was learned last year: –The test set must be properly sampled “Linked” story pairs were selected by randomly sampling all possible on-topic story pairs “Unlinked” pairs were selected using all on-topic stories as one of the pair, and a randomly chosen story was chosen as the second –This year, the task was made multilingual –More story pairs were used Link Detection Test Set Composition

12 Link Detection Results Required Condition –Multilingual texts –Newswire + Broadcast News ASR, –Reference story boundaries –10 file decision deferral Overall

13 TDT Topic Detection Task Three Participants Chinese Univ. of Hong Kong, Dragon, Univ. of Massachusetts System Goal: –To detect topics in terms of the (clusters of) stories that discuss them. “Unsupervised” topic training New topics must be detected as the incoming stories are processed. Input stories are then associated with one of the topics. a topic!

14 Required Condition (in yellow) –Multilingual Topic Detection –Newswire+Broadcast News ASR –Automatic Story Boundaries Performance on the 1999 and 2000 topic sets are different 2000 Topic Detection Evaluation Using English Translations for Mandarin Using Native Orthography

15 Effect of Topic Size on Detection Performance The 1999 topics have more on-topic stories than the 2000 topics Distribution of scores are related to topic size –Bigger topics tend to have higher scores. –Is this a behavior induced by setting a topic size parameter in training? Dragon1Results NWT+BNasr, Reference Boundary, Multilingual Texts

16 Fractional Components of Detection Cost Evaluations conditioned on factors (like language) are problematic Instead, compute the additive contributions to detection costs for different subsets of data. Dragon1Results NWT+BNasr, Reference Boundary, Multilingual Texts Interesting Reversal

17 Effects of Automatic Boundaries On Detection Performance –Multilingual Topic Detection –Newswire+Broadcast News ASR –Reference Vs. Automatic Story Boundaries 19%, 21% and 41% relative increase in cost respectively

18 TDT Segmentation Task Transcription: text (words) Story: Non-story: One Participant: MITRE (For TDT 2000, Story segmentation is an integral part of the other tasks, not just a separate evaluation task) System Goal: –To segment the source stream into its constituent stories, for all audio sources.

19 Story Segmentation Results Required Condition: –Broadcast News ASR

20 TDT First Story Detection (FSD) Task Two Participants: National Taiwan University and University of Massachusetts System Goal: –To detect the first story that discusses a topic, for all topics. Evaluating “part” of a Topic Detection system, (i.e., when to start a new cluster) First Stories on two topics Not First Stories = Topic 1 = Topic 2

21 First Story Detection Results Required Condition: –English Newswire and Broadcast News ASR transcripts –Automatic story boundaries –One file decision deferral Required Condition

22 1999 and 2000 Topic Set Differences in FSD Evaluation For UMass there is a slight difference, but a marked difference for the NTU system

23 Summary Many, many things remaining to look at –Results appear to be a function of topic size and topic set in the detection task, but it’s unclear why. The re-usability of last year’s detection system outputs enable valuable studies Conditioned detection evaluation should be replaced with a “contribution to cost” model –Performance variability on tracking training stories should be further investigated –…and the list goes on When should the annotations be released? Need to find cost effective annotation technique –Consider TREC ad-hoc style annotation via simulation


Download ppt "Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington."

Similar presentations


Ads by Google