Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Mining Application Programming Chapter 1 Introduction Manu Konchady, 2006.

Similar presentations


Presentation on theme: "Text Mining Application Programming Chapter 1 Introduction Manu Konchady, 2006."— Presentation transcript:

1 Text Mining Application Programming Chapter 1 Introduction Manu Konchady, 2006

2

3 Definition: Text Mining  all types of text processing that deal with finding, organizing, and analyzing information.  (formal) the creation of new information that is not obvious in a collection of documents.  New information is defined as a pattern, trend, or relationship that can’t be easily gleaned by reading individual documents.  The term document to refer to any unit of text, such as a Web page, an e-mail, a formatted article, a set of slides, or a plain text file.

4 Data Mining vs. Text Mining  Data mining deals with structured numeric data, text mining deals with unstructured text.  Data used for data mining is extracted, transformed, and loaded in a data warehouse.  Text mining attempts to build a model from data that is assumed to be imprecise.

5 Origins of Text Mining  Information Retrieval  Natural Language Processing

6 Understanding Text  “Alice saw the rabbit with glasses,”  Polysemy  “In what state would you find Lincoln”  “free software”  Synonymy  More than one word can be expressed the same meaning.  Exuberant: lush, luxuriant, profuse, and riotous.

7 An Architecture for Text Mining Applications

8 Text Mining Functions  Searching  Information Extraction  Clustering  Categorization  Summarization  Information Monitor  Question and Answer

9 A Layered Model

10 Text Mining Installation  Text Mine (http://textmine.sf.net) is a collection of Perl modules and code on SourceForge to index, cluster, classify, and summarize text.http://textmine.sf.net

11 Usage  Command line  Web-based interface.

12 Web Interface


Download ppt "Text Mining Application Programming Chapter 1 Introduction Manu Konchady, 2006."

Similar presentations


Ads by Google