Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TIARA: A Visual Exploratory Text Analytic System Presenter.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TIARA: A Visual Exploratory Text Analytic System Presenter."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TIARA: A Visual Exploratory Text Analytic System Presenter : Wei-Hao Huang Authors : Furu Wei, Shixia Liu, Yangqiu Song, Shimei Pan Michelle X. Zhou, Weihong Qian, Lei Shi, Li Tan Qiang Zhang SIGKDD 2010

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines Motivation Objectives Methodology Experiments Conclusions Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation  The large collection of text to locate needed information or simply deciding is very costly and time-consuming.  Although a number of text analysis technologies are often abstract and complex, may not be consumable by users.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives 4 To present exploratory visual analytic system called TIARA (Text Insight via Automated Responsive Analytics). To combine text analytics and interactive visualization to help users explore and analyze large collections of text. Documents TIARA System

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Methodology  TIARA Topic Analysis Topic Ranking Keyword based Topic Summarization Time-sensitive Keyword Extraction

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. TIARA 6

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. TIARA System architecture 7 DatabaseFile system

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Topic Analysis  To use unsupervised learning methods.  is the number of Document  is word of Document  is vocabulary of size  K is the number of topic  is document-topic distribution matrix  is topic-word distribution matrix 8 N1N2 K101 K211 K1K2 V10.30.7 V20.80.1 Term frequencies in each cluster

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Topic Ranking  Topic rank is measured by a combination of both topic content coverage and topic variance. 9

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Keyword based Topic Summarization 10

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Time-sensitive Keyword Extraction 11

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Time-sensitive Keyword Extraction

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Time-sensitive keyword extraction procedure Completeness Distinctiveness  Response Time  Data set : A personal email collection with 8326 email messages. Emergency room data set containing 23,501 patient records. 13

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Completeness  Defined as whether we can recover the original keywords of a topic by combining the keywords associated associated with each time segment. 14

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Distinctiveness  Defined as whether we can distinguish one topic segment from another based on their associated keywords to avoid redundancy. 15

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Completeness and Distinctiveness Results 16

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Response Time 17

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 Conclusions TIARA tightly integrates text analytics with interactive visualization to support effective exploratory text analysis. Future work Add sentence-base summaries Support other languages Improve performance

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19 Comments  Advantages ─ To explore and analyze large text collections with interactive visualization  Applications ─ Text mining


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TIARA: A Visual Exploratory Text Analytic System Presenter."

Similar presentations


Ads by Google