Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU.

Similar presentations


Presentation on theme: "CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU."— Presentation transcript:

1 CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU

2 Time Line for TDT Activities (Re)Start: Summer 2001 Baseline FSD, Link, Det: Sept 2001 Evaluation (of baseline): Oct 2001 New Techniques: Nov 2001 – Onwards Topic-conditional Novelty Situated NE’s (all tasks) Source-conditional interpolated training

3 Baseline FSD Method (Unconditional) Dissimilarity with Past Decision threshold on most-similar story (Linear) temporal decay Length-filter (for teasers) Cosine similarity with standard weights:

4 FSD Results Story weighted Topic weighted P(miss).6028 P(F/A).0207.0186 Cost.0141.0143 Norm Cost.7043.7217 Opt N. Cost.6807

5 Comparative FSD DET Curves

6 FSD Observations Cross-site comparable baselines (cost =.7) Data/labeling issues (from error analysis) “Events-vs-Topics” issue (e.g. Asia crisis) A few mislabled stories wreak havoc for FSD Eager auto-segmentation a problem (misses) Recommendations for TDT labeling FSD on true events, or events within topic(s) Change auto-segmentation optimality criterion ?? Recommendations for TDT reserachers Keep working hard on FSD – not cracked yet

7 New FSD Directions Topic-conditional models E.g. “airplane,” “investigation,” “FAA,” “FBI,” “casualties,”  topic, not event “TWA 800,” “March 12, 1997”  event First categorize into topic, then use maximally-discriminative terms within topic Rely on situated named entities E.g. “Arcan as victim,” “Sharon as peacemaker ”

8 A New Approach to First Story Detection for TDT

9 Baseline Story-Link Detection Use same term-weighting and cosine similarity as FSD and detection Decision Thresholds conditioned on language and source Lower threshold for cross-language Lower threshold cross-ASR/newswire Thresholds trained on development set 15% improvement over universal threshold

10 Primary Link

11 CMU Link

12 CMU2 Link

13 CMU Detection Auto-segmented boundaries Pre-established boundaries C det (basic).0076.0063 C det (norm).3786.3138 Incremental Retrospective Clustering Group-Average in Forward Deferral Window Same cosine similarity and terms weight as FSD


Download ppt "CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU."

Similar presentations


Ads by Google