Presentation is loading. Please wait.

Presentation is loading. Please wait.

TextMOLE: Text Mining Operations Library and Environment Daniel B. Waegel and April Kontostathis, Ph.D. Ursinus College Collegeville PA.

Similar presentations


Presentation on theme: "TextMOLE: Text Mining Operations Library and Environment Daniel B. Waegel and April Kontostathis, Ph.D. Ursinus College Collegeville PA."— Presentation transcript:

1 TextMOLE: Text Mining Operations Library and Environment Daniel B. Waegel and April Kontostathis, Ph.D. Ursinus College Collegeville PA

2 What? Advanced application for indexing and searching a text database. Allows users to quickly analyze a corpus of documents and determine which parameters will provide maximal retrieval performance.

3 Who? Instructors - demonstrate information retrieval concepts in the classroom Students – hands-on exploration of concepts often covered in an introductory course in information retrieval or artificial intelligence Reseachers - ‘quick and dirty’ analysis of an unfamiliar collection Juniors and Seniors – capstone experiences in computer science

4 Why? Students unfamiliar with applications which require manipulation of unstructured text IR students develop basic IR systems, but do not have time to implement and test a variety of parameters Existing systems do not tightly integrate indexing and retrieval functions – –R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley/ACM Press, New York, 1999. – –R. K. Belew. Finding Out About. Cambridge University Press, 2000. – –G. Salton. The SMART Retrieval System–Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs, New Jersey, 1971. Time! Students in AI do not even have time to implement a basic IR system.

5 How? Overview of the Application –Indexing –Single Query Retrieval –Multiple Query Retrieval Sample Assignments –Artificial Intelligence –Information Retrieval –Capstone Projects

6 Indexing

7 Single Query Specification

8 Single Query Results

9 Multiple Query Specification

10 Multiple Query Results

11 How? Overview of the Application –Indexing –Single Query Retrieval –Multiple Query Retrieval Sample Assignments –Artificial Intelligence –Information Retrieval –Capstone projects

12 Information Retrieval Course Assignment 2 –Assumes Assignment 1 was having students develop their own rudimentary IR systems –Using a corpus provided by the instructor or developed by the student (min. 100 documents) Convert to XML format Parse with TextMOLE Identify a set of standard queries for the collection (truth set not necessary) Vary parameters (stemming vs. no stemming, various weighting schemes, various stop lists) Decide which set of parameters work best for your collection. Write a paper describing your experiments and the results, be sure to defend your conclusions!

13 Information Retrieval Course Assigment 3 or 4 –Using the corpus from the previous assignment (minimum of 100 documents) –Develop a set of standard queries –Determine which documents are truly relevant to these queries (involves lots of reading and frustration) –Use the Multiple Query function of TextMOLE to determine precision and recall Alternate –Use one or more of the Gold Standard Collections that have set of standard queries with truth sets (TextMOLE can convert them to XML format)

14 Artificial Intelligence Course IR Assignment –Instructor provides set of documents in XML format and set of standard queries (with or without result set) –Instructor provides students with parameters to use (ex. Stemming, log entropy weighting for both indexing and retrieval) –Students try to find the ‘best’ stop word list for this collection –Write brief paper describing experiments and results

15 Capstone Experiences in Computer Science Migrate TextMOLE to another platform –Open GL –Java –Web based –Relational Database –Library Functions Add additional parameters to basic Search and Retrieval –N-grams instead of words –Noun phrases (using a tool like flex) –Clustering –Latent Semantic Indexing Add additional IR applications –Emerging trend detection –Classification –First Story Detection –Filtering –Summarization Research in Computer Science –Develop your own weighting scheme –Identify additional features for indexing –Develop a new Gold Standard collection

16 Where? Version 1.0 now available online! http://webpages.ursinus.edu/akontostathis/TextMOLE Contact akontostathis@ursinus.edu with questions and comments akontostathis@ursinus.edu


Download ppt "TextMOLE: Text Mining Operations Library and Environment Daniel B. Waegel and April Kontostathis, Ph.D. Ursinus College Collegeville PA."

Similar presentations


Ads by Google