Presentation is loading. Please wait.

Presentation is loading. Please wait.

NITISH MANOCHA. Platforms §AIX workstation §OS/390 §Sun Solaris §Windows NT.

Similar presentations


Presentation on theme: "NITISH MANOCHA. Platforms §AIX workstation §OS/390 §Sun Solaris §Windows NT."— Presentation transcript:

1 NITISH MANOCHA

2 Platforms §AIX workstation §OS/390 §Sun Solaris §Windows NT

3 Tools to Use §Topic categorization tool l Categorizing emails l Categorizing Web Pages

4 Text Analysis Tool §Topic Categorization Tool

5 Text Analysis Tool §Topic Categorization Tool l Category 1 (AI Schedule)

6 Text Analysis Tool l Category2 (Database Schedule)

7 Text Analysis Tool §Target Category ( Data Mining Schedule)

8 Text Analysis Tool §Result - Category 2 (Databases)

9 Tools to Use §Clustering Tool (Finding Similar Information) l Dividing Documents into Groups l Identifying hidden similarities in documents l Identifying duplicate documents from a collection l Finding Documents that are out of place

10 Text Analysis Tool §Hierarchical Clustering - imzhclst

11 Text Analysis Tool §Binary Clustering - imzcrlst

12 Text Analysis Tool §Results

13 Text Analysis Tool §Results

14 Tools to Use §Feature Extraction Tool l Name Extraction l Abbreviation Extraction l Relation Extraction

15 Text Analysis Tool §Using Feature Extraction tool to extract names l imzxrun -b 2 -f C -x n -o faculty.out faculty.htm

16 Text Analysis Tool

17 Tools to Use §Language Identification Tool l Organize collection of documents by language l Restrict Search Results to documents in a particular language

18 Text Analysis Tool §Using Language Identification tool l imzlgini -b 2 -v < mydoc.htm

19 Text Analysis Tool §Language Identification Tool Results l Supports 13 Languages, New Languages Can be trained

20 Text Analysis Tool §Using Summarizer tool l imzsum -l 4 project.html

21 Text Analysis Tool §Summarizer tool - Results

22 Tools to Use §Web Crawler l Follows the Link topology for a fast search l Produces a Web Site Map l Use to Recognize the Authoritative pages l Provides a filtered collection of pages

23 Web Crawler §imyclean - to define a web space l Created include.re, exclude.re, types.re §imycrawl - to crawl a defined web space l imycrawl url webspace §imystat - to track what happens during a crawl

24 Tools to Use §Text Search Engine l Complicated Text Search l Powerful Linguistic Capabilities l Fuzzy searches l Query based on structure of document

25 Text Search Engine §Operates on a Previously based index

26 Text Search Engine §Types of Index l Linguistic Index (bought as buy) l Feature Index (Linguistics + Names) l Precise Index (bought as bought) l Normalized Precise Index (Case Insensitive) l Ngram Index

27 Combining Tools for Solutions §Searching with Categories l combining Text Search Engine and Topic Categorization Tool §Surviving a flood of email l by using Topic Categorization Tools §Selectively indexing Web Pages l by combining Web Crawler, Topic Categorization Tool & Text Search Engine

28 Views of the Tool §Command Line (Good for Unix) §Not very useful on Windows NT §Not a good stand-alone Tool §Should be viewed as a Library


Download ppt "NITISH MANOCHA. Platforms §AIX workstation §OS/390 §Sun Solaris §Windows NT."

Similar presentations


Ads by Google