TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004.

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 13.
Advertisements

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
UMass Amherst at TDT 2003 James Allan, Alvaro Bolivar, Margie Connell, Steve Cronen-Townsend, Ao Feng, FangFang Feng, Leah Larkey, Giridhar Kumaran, Victor.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
CS347 Lecture 8 May 7, 2001 ©Prabhakar Raghavan. Today’s topic Clustering documents.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
A Scalable Semantic Indexing Framework for Peer-to-Peer Information Retrieval University of Illinois at Urbana-Champain Zhichen XuYan Chen Northwestern.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
A novel log-based relevance feedback technique in content- based image retrieval Reporter: Francis 2005/6/2.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.
December 2, 2004TDT-2004 Adaptive Topic Tracking at Maryland Tamer Elsayed, Douglas W. Oard, David Doermann University of Maryland, College Park Gary Kuhn.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,
Topic Detection and Tracking Introduction and Overview.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
Overview of the TDT 2004 Evaluation and Results Jonathan Fiscus Barbara Wheatley National Institute of Standards and Technology Gaithersburg, Maryland.
Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.
Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
Multilingual Relevant Sentence Detection Using Reference Corpus Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen Department of CSIE National Taiwan University.
UMass at TDT 2000 James Allan and Victor Lavrenko (with David Frey and Vikas Khandelwal) Center for Intelligent Information Retrieval Department of Computer.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
TDT 2002 Straw Man TDT 2001 Workshop November 12-13, 2001.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
November 10, 2004Dmitriy Fradkin, CIKM'041 A Design Space Approach to Analysis of Information Retrieval Adaptive Filtering Systems Dmitriy Fradkin, Paul.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
CMU at TDT 2004 — Novelty Detection Jian Zhang and Yiming Yang Carnegie Mellon University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
ProjFocusedCrawler CS5604 Information Storage and Retrieval, Fall 2012 Virginia Tech December 4, 2012 Mohamed M. G. Farag Mohammed Saquib Khan Prasad Krishnamurthi.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
 TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Topics Detection and Tracking Presented by CHU Huei-Ming 2004/03/17.
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Copyright Paula Matuszek Kinds of Machine Learning.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
TDT 2000 Workshop Lessons Learned These slides represent some of the ideas that were tried for TDT 2000, some conclusions that were reached about techniques.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
New Event Detection at UMass Amherst Giridhar Kumaran and James Allan.
Hierarchical Topic Detection UMass - TDT 2004 Ao Feng James Allan Center for Intelligent Information Retrieval University of Massachusetts Amherst.
What’s wrong with this picture? ActivityOutcomeModification Students participate in literature circle groups on the theme of friendship. Each group is.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
Lecture 16: Filtering & TDT
Illustrate your football problem on the template on page 2
Personal Response: The Émigrée
Incremental Boosting Incremental Learning of Boosted Face Detector ICCV 2007 Unsupervised Incremental Learning for Improved Object Detection in a Video.
JSA Quality Assessment Tool Guidelines for Assessors
Applying Key Phrase Extraction to aid Invalidity Search
Proposed Formative Evaluation Adaptive Topic Tracking Systems
Exploiting Topic Pragmatics for New Event Detection in TDT-2004
John Lafferty, Chengxiang Zhai School of Computer Science
Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Cross-lingual Information Retrieval (CLIR) Johns Hopkins University
Presentation transcript:

TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004

TDT 2004 Outline Create a training corpus Unsupervised tracking Supervised Tracking Discussion

TDT 2004 Creating a training corpus For Tracking –50% topics are English –50% are multilingual Created a training corpus (supervised and unsupervised) –30 topics from TDT4 –50% stories with primarily English topics. –50% multilingual stories

TDT 2004 Unsupervised Tracking Ideas Ideas –Models Vector Space Relevance Models –Adaptation –Native Language comparisons

TDT 2004 Unsupervised Tracking Models Vector Space –TF-IDF –IDF is incremental Relevance Models – –State of the art, high performance system Adaptation

TDT 2004 Native Language Hypothesis TDT tasks involve comparisons of models: –Story link detection: sim(S i, S j ) –Topic tracking: sim(S i, T j ) It is more effective to measure similarity between models in the original language of the stories, than after machine translation into English –Quality of translation –Differences in score distributions –Trivially obvious? Hard to demonstrate in tracking

TDT 2004 Topic tracking with Native Models [SIGIR 2004]

TDT 2004 Unsupervised Tracking Results (training set: nwt+TDT4)

TDT 2004 Submitted Runs TF-IDF (UMASS4) TF-IDF + adaptation (UMASS1) TF-IDF + adaptation + native models (UMASS2) Relevance Models + adaptation (UMASS5) All submissions for primary evaluation condition.

TDT 2004

Unsupervised Tracking Results ModelMin-CostSystem Cost TF-IDF TF-IDF + adaptation TFIDF + adaptation+ native lang RM + adapt

TDT 2004 Supervised Tracking Creating a newswire only training corpus. Ideas –Models Vector Space Relevance Models –Native Language comparisons –Incremental Thresholds –Negative Feedback

TDT 2004 Incremental Thresholds Utility Relevance judgments for both Hits and False-Alarms Increment the YES/NO threshold by when Utility falls below zero.

TDT 2004 Negative Feedback Relevance judgments for both Hits and False-Alarms – for a hit. – for a false alarm.

TDT 2004 From Unsupervised to Supervised

TDT 2004 Native Language Comparisons

TDT 2004 Submitted Runs Rel. Models (UMASS-2) –Optimized for TDT cost Rel. Models + Inc. Thresholds (UMASS-1) TF-IDF + adaptation + neg. feedback + inc thresholds (UMASS-3) TF-IDF + adaptation + native models (UMASS-4) TF-IDF + adaptation + native models + neg feedback + increase thresh. (UMASS-7) Optimized for T11SU

TDT 2004 Supervised Tracking Results Cost:

TDT 2004 Results and Discussion Supervision clearly helps. Relevance models – a clear winner. Negative Feedback helps. Training set did not reflect test very well. Min-cost versus T11SU

TDT 2004 Future Work Exploration Exploitation trade-off. What about feedback that is less on demand? –more realistic –Can add costs for judgments. What about feedback like in the HARD task – Clarification forms?