Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Thi Nhu Truong, ChengXiang Zhai Paul Ogilvie, Bill Jerome John Lafferty, Jamie Callan Carnegie Mellon University David Fisher, Fangfang Feng Victor Lavrenko.

Similar presentations


Presentation on theme: "1 Thi Nhu Truong, ChengXiang Zhai Paul Ogilvie, Bill Jerome John Lafferty, Jamie Callan Carnegie Mellon University David Fisher, Fangfang Feng Victor Lavrenko."— Presentation transcript:

1 1 Thi Nhu Truong, ChengXiang Zhai Paul Ogilvie, Bill Jerome John Lafferty, Jamie Callan Carnegie Mellon University David Fisher, Fangfang Feng Victor Lavrenko James Allan, Bruce Croft University of Massachusetts The Lemur Toolkit

2 2 Outline What is the Lemur toolkit Release 1.9 Release 2.0 Plans for the future Audience comments and suggestions

3 3 What is Lemur? Objective: A flexible toolkit to support research on language modeling applied to text retrieval and other language technologies Written in C++ Three releases, available at http://www.cs.cmu.edu/~lemur Developed as part of the Lemur Project sponsored by ARDA Open source (BSD-style license) –Use, change, distribute as you see fit

4 4 Current Components Three Indexers General retrieval architecture Specific retrieval models (TFIDF, Okapi, Unigram LM) General support for language models Smoothing algorithms Feedback algorithms Distributed information retrieval Text summarization Various utilities Runs on Unix (Solaris, Linux) and Windows (NT, 2000, XP)

5 5 Lemur Architecture Architecture features –Modular API –Minimum dependency among modules “Horizontally” extendable, allowing –easy support for different text formats –easy support for different index data structures –easy incorporation of new algorithms

6 6 Lemur Usage The Lemur Toolkit has played an important role in recent research at CMU and UMass Also being used at other locations –95 people on Lemur email list –Downloaded to over 600 locations worldwide »http://www.cs.cmu.edu/~lemur/ –Example uses: Question answering, filtering, educational uses, noun phrases, cross-lingual IR

7 7 Lemur Version 1.9 Released July 2002 New features: –Two-stage smoothing –Text summarization: whole document, query-based –Distributed IR: query-based sampling, DB selection, result merging –Simple document manager –Index upgrades –Bug fixes

8 8 Lemur Version 2.0 Planned for late September 2002 New features: –Upgraded distributed IR result merging –Integrate UMass additions »Chinese and Arabic retrieval, preliminary support for multilingual operations »Inquery-style query operators »Simple incremental indexing »Passage indexer »New KL relevance models »Cosine similarity retrieval method »Query clarity –Bug fixes (and probably new bugs)

9 9 Plans for the Future Efficiency (speed) Incremental indexing Structured query operators for language modeling retrieval Better support for interactive applications (and a GUI) Quality assurance and regression test sets Better modularity, “Lemur lite” Locality-based retrieval Clustering Aspect-oriented retrieval Filtering Multilingual capability New feedback models Fields and Metadata Your suggestions and comments….?


Download ppt "1 Thi Nhu Truong, ChengXiang Zhai Paul Ogilvie, Bill Jerome John Lafferty, Jamie Callan Carnegie Mellon University David Fisher, Fangfang Feng Victor Lavrenko."

Similar presentations


Ads by Google