Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.

Slides:



Advertisements
Similar presentations
1 A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang Zhai : University of Illinois.
Advertisements

CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
A probabilistic model for retrospective news event detection
A Cross-Collection Mixture Model for Comparative Text Mining
Pattern Finding and Pattern Discovery in Time Series
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Mixture Language Models and EM Algorithm
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Latent Dirichlet Allocation a generative model for text
Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.
Scalable Text Mining with Sparse Generative Models
In Situ Evaluation of Entity Ranking and Opinion Summarization using Kavita Ganesan & ChengXiang Zhai University of Urbana Champaign
Generating Impact-Based Summaries for Scientific Literature Qiaozhu Mei, ChengXiang Zhai University of Illinois at Urbana-Champaign 1.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.
Comparative Text Mining Q. Mei, C. Liu, H. Su, A. Velivelli, B. Yu, C. Zhai DAIS The Database and Information Systems Laboratory. at The University of.
Sampling Approaches to Pattern Extraction
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 7. Topic Extraction.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
Latent Dirichlet Allocation
Consensus Group Stable Feature Selection
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Supporting Knowledge Discovery: Next Generation of Search Engines Qiaozhu Mei 04/21/2005.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Automatic Labeling of Multinomial Topic Models
2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai DAIS The Database and Information Systems Laboratory.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Survey of segmentation method Haei-Ming Chu 2004/0817.
Queensland University of Technology
Online Multiscale Dynamic Topic Models
Probabilistic Topic Model
Hidden Markov Models (HMMs)
Hidden Markov Models (HMMs)
Hidden Markov Models (HMMs)
Qiaozhu Mei†, Chao Liu†, Hang Su‡, and ChengXiang Zhai†
A Consensus-Based Clustering Method
Finding Story Chains in Newswire Articles
Hidden Markov Models (HMMs)
Data Warehousing and Data Mining
Bursty and Hierarchical Structure in Streams
Pairwise Sequence Alignment (cont.)
Bayesian Inference for Mixture Language Models
Topic Models in Text Processing
EM Algorithm 主講人:虞台文.
Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.
Presentation transcript:

Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science University of Illinois at Urbana Champaign SIGKDD’05

Introduction Temporal Text Mining (TTM): discovering temporal patterns in text information collected over time. In this paper –Discovering and summarizing the evolutionary patterns of themes in a text stream –Revealing the life cycle of a theme

Introduction (cont.) We solve this problem through –1. Discovering latent themes from text –2. Constructing an evolution graph of themes –3. Analyzing life cycles of themes. Evaluation –News articles –The abstracts of the ACM KDD conference papers

Definitions Time indexed documents C = {d 1, d 2, …, d T } vocabulary set V = {w 1, …, w |V| } Theme θ –A unigram language model {p(w|θ)} Theme Span γ = 〈 θ, s(γ), t(γ) 〉 –If s = 1 and t = T, then γ is a trans-collection theme Evolutionary Transition –If t(γ 1 ) ≦ s(γ 2 ) and similarity(γ 1,γ 2 ) ﹥ threshold, we say that there is an evolutionary transition from γ 1 to γ 2, γ 1 γ 2

Definitions (cont.) Theme Evolution Thread –A sequence of theme spans γ 0, γ 1, …, γ n such that γ i γ i+1 Theme Evolution Graph –Weighted directed graph G = (N,E), where N is the set of all theme spans, E is the set of evolutionary transition.

Example of Theme Evolution Graph Theme Span Theme Evolution Thread Evolutionary Transition

Evolution Graph Discovery Partition the documents into sub-collections C = C 1 ∪ C 2 ∪ … ∪ C n Extract the most salient themes = {, …, } from each sub-collection C i For any themes and where i < j, decide whether there is an evolutionary transition.

Theme Extraction Let θ 1, …, θ k be k themes and θ B be a background model for the whole collection C. A document d is regarded as a sample of the following mixture model: w: a word in d, π d,j : mixing weight for d choosing θ j, λ: mixing weight for θ B The log-likelyhood of C i, Using EM algorithm to train

Parameter Estimation {z d,w } is a hidden variable p(z d,w = j) indicates that the word w in document d is generated using theme j given that w is not generated from the background mode.

Evolutionary Transition Discovery For every pair of theme spans γ 1 = 〈 θ 1, s(γ 1 ), t(γ 1 ) 〉 and γ 2 = 〈 θ 2, s(γ 2 ), t(γ 2 ) 〉 where t(γ 1 ) ≦ s(γ 2 ) Kullback-Leibler divergence If D(θ 2 || θ 1 ) ﹥ ξ, then γ 1 γ 2

Analysis of Theme Life Cycles Theme Life Cycle : the strength distribution of the trans-collection theme over the entire time line. Assume the collection is generated from HMM –States → Themes –Output symbol set → V –Output probability distribution → the multinomial distribution of words of that state Obtain state sequence with Viterbi algorithm

Analysis of Theme Life Cycles (cont.) The absolute and relative strengths of theme i at time t = 1 if word is labeled as theme i 0 otherwise

Experimental Data Sets News about Asia Tsunami –Dec to Feb (50 days) –Downloaded with query “tsunami” The abstracts in KDD conference proceeding from 1999 to 2004

Theme Spans from Tsunami

Theme Evolution Graph for Tsunami c:

Theme Life Cycle in CNN Absolute life cycle in CNN data

Theme Life Cycle in XINHUA A Absolute life cycle in XINHUA dataNormalized life cycle in XINHUA data

Theme Spans from KDD

Theme Evolution Graph for KDD classification Web classification Clustering & random variables a: Typical classification tech.

Theme Life Cycle for KDD BusinessBiology Data Web Info.Time series ClassificationAssociation RuleClustering

Conclusions We propose methods to discover evolutionary theme patterns and analyze the life cycle of each theme The proposed methods can generate meaningful temporal theme structures on the two experimental data sets. Our methods are generally applicable to any text stream data. Future works –Hierarchical theme clustering –Temporal theme mining system