Download presentation
Presentation is loading. Please wait.
Published byMaude Holmes Modified over 8 years ago
1
Planning for the TREC 2008 Legal Track Douglas Oard Stephen Tomlinson Jason Baron
2
Thursday’s Discussion Deciding on a document collection “Beating Boolean” Handling nasty OCR Making the best use of the metadata Ad hoc task design Interactive task design Relevance feedback task design
3
Choosing a Collection FERC Enron (w/attachments, full headers) –Email is high-interest for E-discovery practice! IIT CDIP version 1.0 (same as 2006/07) –Same 83 topics, plus some new ones State Department Cables –Task: Freedom of Information Act requests
4
Plans for 2008 Some things stay the same: –Same collection –Same three tasks (Ad Hoc, RF, Interactive) Some new things –Deep assessment ( fewer new topics) –Additional ranking-sensitive eval measures
5
Backup Slides
6
Handling Nasty OCR Index pruning Error estimation Character n-grams Duplicate detection Expansion using a cleaner collection
7
How to “Beat Boolean” Work from reference Boolean? –Swap out low-ranked-in for high-ranked-out Relax Boolean somehow? –Cover density, proximity perturbation, …
8
Using Metadata Title (term match) Author (social network Bates number (sequence)
9
Ad Hoc Task Design Evaluation measures –R@B?, P@R?, Index size? –Error bars / Statistical significance testing –Limits on post-hoc use of the collection? –What are “meaningful” differences? Topic design –Negotiation transcript? Inter-annotator agreement
10
Interactive Track Design Evaluation measure –Precision-oriented? –Recall-oriented? –Effect of assessor disagreement
11
Relevance Feedback Task Evaluation measure –Residual recall at B_Residual? Two-stage feedback?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.