Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.

Similar presentations


Presentation on theme: "Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science."— Presentation transcript:

1 Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science University of Illinois at Urbana Champaign SIGKDD’05

2 Introduction Temporal Text Mining (TTM): discovering temporal patterns in text information collected over time. In this paper –Discovering and summarizing the evolutionary patterns of themes in a text stream –Revealing the life cycle of a theme

3 Introduction (cont.) We solve this problem through –1. Discovering latent themes from text –2. Constructing an evolution graph of themes –3. Analyzing life cycles of themes. Evaluation –News articles –The abstracts of the ACM KDD conference papers

4 Definitions Time indexed documents C = {d 1, d 2, …, d T } vocabulary set V = {w 1, …, w |V| } Theme θ –A unigram language model {p(w|θ)} Theme Span γ = 〈 θ, s(γ), t(γ) 〉 –If s = 1 and t = T, then γ is a trans-collection theme Evolutionary Transition –If t(γ 1 ) ≦ s(γ 2 ) and similarity(γ 1,γ 2 ) ﹥ threshold, we say that there is an evolutionary transition from γ 1 to γ 2, γ 1 γ 2

5 Definitions (cont.) Theme Evolution Thread –A sequence of theme spans γ 0, γ 1, …, γ n such that γ i γ i+1 Theme Evolution Graph –Weighted directed graph G = (N,E), where N is the set of all theme spans, E is the set of evolutionary transition.

6 Example of Theme Evolution Graph Theme Span Theme Evolution Thread Evolutionary Transition

7 Evolution Graph Discovery Partition the documents into sub-collections C = C 1 ∪ C 2 ∪ … ∪ C n Extract the most salient themes = {, …, } from each sub-collection C i For any themes and where i < j, decide whether there is an evolutionary transition.

8 Theme Extraction Let θ 1, …, θ k be k themes and θ B be a background model for the whole collection C. A document d is regarded as a sample of the following mixture model: w: a word in d, π d,j : mixing weight for d choosing θ j, λ: mixing weight for θ B The log-likelyhood of C i, Using EM algorithm to train

9 Parameter Estimation {z d,w } is a hidden variable p(z d,w = j) indicates that the word w in document d is generated using theme j given that w is not generated from the background mode.

10 Evolutionary Transition Discovery For every pair of theme spans γ 1 = 〈 θ 1, s(γ 1 ), t(γ 1 ) 〉 and γ 2 = 〈 θ 2, s(γ 2 ), t(γ 2 ) 〉 where t(γ 1 ) ≦ s(γ 2 ) Kullback-Leibler divergence If D(θ 2 || θ 1 ) ﹥ ξ, then γ 1 γ 2

11 Analysis of Theme Life Cycles Theme Life Cycle : the strength distribution of the trans-collection theme over the entire time line. Assume the collection is generated from HMM –States → Themes –Output symbol set → V –Output probability distribution → the multinomial distribution of words of that state Obtain state sequence with Viterbi algorithm

12 Analysis of Theme Life Cycles (cont.) The absolute and relative strengths of theme i at time t = 1 if word is labeled as theme i 0 otherwise

13 Experimental Data Sets News about Asia Tsunami –Dec. 19 2004 to Feb. 8 2005 (50 days) –Downloaded with query “tsunami” The abstracts in KDD conference proceeding from 1999 to 2004

14 Theme Spans from Tsunami

15 Theme Evolution Graph for Tsunami c:

16 Theme Life Cycle in CNN Absolute life cycle in CNN data

17 Theme Life Cycle in XINHUA A Absolute life cycle in XINHUA dataNormalized life cycle in XINHUA data

18 Theme Spans from KDD

19 Theme Evolution Graph for KDD classification Web classification Clustering & random variables a: Typical classification tech.

20 Theme Life Cycle for KDD BusinessBiology Data Web Info.Time series ClassificationAssociation RuleClustering

21 Conclusions We propose methods to discover evolutionary theme patterns and analyze the life cycle of each theme The proposed methods can generate meaningful temporal theme structures on the two experimental data sets. Our methods are generally applicable to any text stream data. Future works –Hierarchical theme clustering –Temporal theme mining system


Download ppt "Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science."

Similar presentations


Ads by Google