Presentation is loading. Please wait.

Presentation is loading. Please wait.

TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004.

Similar presentations


Presentation on theme: "TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004."— Presentation transcript:

1 TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004

2 TDT 2004 Outline Create a training corpus Unsupervised tracking Supervised Tracking Discussion

3 TDT 2004 Creating a training corpus For Tracking –50% topics are English –50% are multilingual Created a training corpus (supervised and unsupervised) –30 topics from TDT4 –50% stories with primarily English topics. –50% multilingual stories

4 TDT 2004 Unsupervised Tracking Ideas Ideas –Models Vector Space Relevance Models –Adaptation –Native Language comparisons

5 TDT 2004 Unsupervised Tracking Models Vector Space –TF-IDF –IDF is incremental Relevance Models – –State of the art, high performance system Adaptation

6 TDT 2004 Native Language Hypothesis TDT tasks involve comparisons of models: –Story link detection: sim(S i, S j ) –Topic tracking: sim(S i, T j ) It is more effective to measure similarity between models in the original language of the stories, than after machine translation into English –Quality of translation –Differences in score distributions –Trivially obvious? Hard to demonstrate in tracking

7 TDT 2004 Topic tracking with Native Models [SIGIR 2004]

8 TDT 2004 Unsupervised Tracking Results (training set: nwt+TDT4)

9 TDT 2004 Submitted Runs TF-IDF (UMASS4) TF-IDF + adaptation (UMASS1) TF-IDF + adaptation + native models (UMASS2) Relevance Models + adaptation (UMASS5) All submissions for primary evaluation condition.

10 TDT 2004

11 Unsupervised Tracking Results ModelMin-CostSystem Cost TF-IDF0.07360.0973 TF-IDF + adaptation 0.09050.1545 TFIDF + adaptation+ native lang. 0.09100.1186 RM + adapt0.05610.0616

12 TDT 2004 Supervised Tracking Creating a newswire only training corpus. Ideas –Models Vector Space Relevance Models –Native Language comparisons –Incremental Thresholds –Negative Feedback

13 TDT 2004 Incremental Thresholds Utility Relevance judgments for both Hits and False-Alarms Increment the YES/NO threshold by when Utility falls below zero.

14 TDT 2004 Negative Feedback Relevance judgments for both Hits and False-Alarms – for a hit. – for a false alarm.

15 TDT 2004 From Unsupervised to Supervised

16 TDT 2004 Native Language Comparisons

17 TDT 2004 Submitted Runs Rel. Models (UMASS-2) –Optimized for TDT cost Rel. Models + Inc. Thresholds (UMASS-1) TF-IDF + adaptation + neg. feedback + inc thresholds (UMASS-3) TF-IDF + adaptation + native models (UMASS-4) TF-IDF + adaptation + native models + neg feedback + increase thresh. (UMASS-7) Optimized for T11SU

18 TDT 2004 Supervised Tracking Results Cost: 0.0467

19 TDT 2004 Results and Discussion Supervision clearly helps. Relevance models – a clear winner. Negative Feedback helps. Training set did not reflect test very well. Min-cost versus T11SU

20 TDT 2004 Future Work Exploration Exploitation trade-off. What about feedback that is less on demand? –more realistic –Can add costs for judgments. What about feedback like in the HARD task – Clarification forms?


Download ppt "TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004."

Similar presentations


Ads by Google