Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lessons Learned from Information Retrieval Chris Buckley Sabir Research

Similar presentations


Presentation on theme: "Lessons Learned from Information Retrieval Chris Buckley Sabir Research"— Presentation transcript:

1 Lessons Learned from Information Retrieval Chris Buckley Sabir Research chrisb@sabir.com

2 Chris Buckley – ICAIL 072 Legal E-Discovery Important, growing problem Current solutions not fully understood by people using them Imperative to find better solutions that scale Evaluation required How do we know we are doing better? Can we prove a level of performance?

3 Chris Buckley – ICAIL 073 Lack of Shared Context The basic problem of both search and e- discovery Searcher does not necessarily know beforehand “vocabulary” or background of either author or intended audience of documents to be searched

4 Chris Buckley – ICAIL 074 Relevance Feedback Human judges some documents as relevant, system finds others based on judgements Only general technique to improve system knowledge of context proven successful –works from small collections of 1970’s to large collections of present (TREC HARD track) Difficult to apply to discovery –Need to change entire discovery process

5 Chris Buckley – ICAIL 075 Toolbox of other techniques Many other aids to search –Ontologies, linguistic analysis, semantic analysis, data mining, term relationships Good techniques for IR uniformly: –Give big wins for some searches –Give mild losses for others Need a set of techniques, a toolbox In practice for IR research, issue not finding big wins, but avoiding the losses

6 Chris Buckley – ICAIL 076 Implications of toolbox No expected silver bullet AI solution Boolean search will not expand to accommodate combinations of solutions Test collections are critical

7 Chris Buckley – ICAIL 077 Test Collection Importance Needed to develop tools Needed to develop decision procedures of when to use tools Toolbox requirement means needed to distinguish a good overall system from one with a good tool –All systems are able to show searches on which individual tools work well –Good system shows performance gain on entire set of searches.

8 Chris Buckley – ICAIL 078 Test Collection Composition Large set of realistic documents Set (at least 30) of topics or information needs Set of judgements: what documents are responsive (or non-responsive) to each topic –Judgements are expensive and limit how test collection results can be interpreted

9 Chris Buckley – ICAIL 079 Incomplete Judgements Judgements are too time consuming and expensive to be complete (judge every one) Pool retrieved documents from a variety of systems Feasible, but: –Known incomplete –We can’t even accurately estimate how incomplete

10 Chris Buckley – ICAIL 0710 Inexact Judgements Humans differ substantially on judgements Standard TREC collections: –Topics include 1-3 paragraphs describing what makes a document relevant –Given same pool of documents, 2 humans overlap on 70% of their relevant sets 76% agreement on small TREC legal test

11 Chris Buckley – ICAIL 0711 Implications of Judgements No gold standard of perfect performance is even possible Any system claiming better than 70% precision at 70% recall is working on a problem other than general search Almost impossible to get useful absolute measures of performance

12 Chris Buckley – ICAIL 0712 Comparative Evaluation Comparisons between systems on moderate size collections (several GBytes) are solid. Comparative results on larger collections (500 GBytes) are showing strains –Believable but larger error margin –Active area of research Overall goal for e-discovery has to be comparative evaluation

13 Chris Buckley – ICAIL 0713 Sabir TREC Legal Results Submitted 7 runs –Very basic approach (1995 technology) –3 tools from my toolbox –3 query variations One of the top systems All results basically the same –tools did not help on average


Download ppt "Lessons Learned from Information Retrieval Chris Buckley Sabir Research"

Similar presentations


Ads by Google