Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xinran He1, Theodoros Rekatsinas2,

Similar presentations


Presentation on theme: "Xinran He1, Theodoros Rekatsinas2,"— Presentation transcript:

1 HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades
Xinran He1, Theodoros Rekatsinas2, James Foulds3, Lise Getoor3 and Yan Liu1 07/08/2015 1University of Southern California 2University of Maryland, College Park 3University of California, Santa Cruz

2 Introduction Diffusion is an important and fundamental phenomenon:
Viral marketing, detection of rumors, modeling news dynamics … Abundant text-based cascades in a variety of social platforms t=2 A B C D E F G t=1 t=1.5 t=3.5 t=0 He et al. HawkesTopic ICML 2015 01/17

3 Traditional vs Text-based Cascades
Traditional cascades Text-based cascades B A C D E F G t=0 t=3.5 t=1 t=2 t=1.5 t=0 t=3.5 t=1 t=2 t=1.5 - Temporal information - Temporal information - Content information Incorporate content information => better model of diffusion Incorporate temporal information => better model of documents He et al. HawkesTopic ICML 2015 02/17

4 Network Inference aaa aab bbb bba bbc ccc Topic 1 Topic 2 Topic 3 aaa bbb ccc bba aab bbc t=0 t=3.5 t=1 t=2 t=1.5 A C D E F G B B A C D E F G 0.1 0.3 0.2 0.2 0.5 0.6 0.1 Some friend did and someone did not Network Inference focuses on inferring a hidden diffusion network Related work: - NetInf, NetRate [Gomez et al. 11,12], MMHP [Yang and Zha 13], KernelCascades [Du el al. 12] - TopicCascades [Du el al. 13] He et al. HawkesTopic ICML 2015 03/17

5 Topic Modeling aaa aab bbb bba bbc ccc Topic 1 Topic 2 Topic 3 aaa bbb ccc bba aab bbc aaa bbb ccc bba aab bbc Corpus aaa bbb ccc bba aab bbc t=0 t=3.5 t=1 t=2 t=1.5 A C D E F G B B A C D E F G Some friend did and someone did not Topic modeling aims to discover the latent thematic topics Related work: - LDA [Blei et al. 03], CTM [Blei and Lafferty 06] - Citation Influence model [Dietz el al. 07], TIR model [Foulds et al. 13] He et al. HawkesTopic ICML 2015 04/17

6 Our Contribution aaa bbb ccc bba aab bbc Topic 1 Topic 2 Topic 3 Topic Modeling aaa bbb aab ccc bbc bba t=0 t=3.5 t=1 t=2 t=1.5 A C D E F G Network Inference A B C D E F G 0.6 0.4 0.1 0.2 0.3 B transition HawkesTopic: joint model for simultaneous Network Inference and Topic Modeling from text-based cascades He et al. HawkesTopic ICML 2015 05/17

7 HawkesTopic: Intuition
ccc bbb ccc cca bbb bba 𝒕 𝑣 1 𝑣 2 aaa bbb aaa aba bbb 𝒕 Mutual exciting nature: A posting event can trigger future events transition Content cascades: The content of a document should be similar to the document that triggers its publication He et al. HawkesTopic ICML 2015 06/17

8 Modeling Posting Times
Mutually exciting nature captured via Multivariate Hawkes Process (MHP) [Liniger 09]. For MHP, intensity process 𝜆 𝑣 (𝑡) takes the form: = + Rate Base intensity Influence from previous events 𝜆 𝑣 𝑡 = 𝜇 𝑣 𝑒: 𝑡 𝑒 <𝑡 𝐴 𝑣 𝑒 ,𝑣 𝑓 Δ (𝑡− 𝑡 𝑒 ) transition 𝐴 𝑢,𝑤 : influence strength from 𝑢 to 𝑣 𝑓 Δ (⋅): probability density function of the delay distribution He et al. HawkesTopic ICML 2015 07/17

9 Generating Posting Times
𝒕 𝑣 1 𝑣 2 Level 0 Level 1 Level 2 𝒕 Generate events and their posting times in a breadth first order by interpreting the MHP as clustered Poisson process [Simma 10] transition Provide explicit parent relationship for evolution of the content information He et al. HawkesTopic ICML 2015 08/17

10 Modeling Documents 𝒕 𝑣 1 𝑣 2 𝛼 1 𝛼 2 … 𝛽 1:𝐾 aaa aab aac ccc ccb cac
Topic 1 aab aac ccc Topic 2 ccb cac 𝛽 1:𝐾 Step 1: Generate the topics 𝛽 1:𝐾 : 𝛽 𝑘 ∼𝐷𝑖𝑟(𝛼) 𝑣 1 𝑣 2 𝒕 ccb cac ccc aab aaa aac 𝛼 1 𝛼 2 transition Step 2: For spontaneous events (level=0): 𝜂 𝑒 ∼𝑁( 𝛼 𝑣 , 𝜎 2 𝐼) Step 3: For triggered events (level>0): 𝜂 𝑒 ∼𝑁( 𝜂 parent[𝑒] , 𝜎 2 𝐼) Step 4: For each word in each document: 𝑧 𝑒,𝑛 ∼Discrete 𝜋 𝜂 𝑒 , 𝑥 𝑒,𝑛 ∼Discrete( 𝛽 𝑧 𝑒,𝑛 ) He et al. HawkesTopic ICML 2015 09/17

11 Inference Joint variational inference based on full mean-field approximation 𝑄 𝜼,𝒛,𝑷 = 𝑒∈𝐸 𝑞 𝜂 𝑒 𝜂 𝑒 𝑞 𝑃 𝑒 𝑟 𝑒 𝑛=1 𝑁 𝑒 𝑞( 𝑧 𝑒,𝑛 | 𝜙 𝑒,𝑛 ) -- Laplace approximation for non-conjugate variable: 𝜂 𝑒 ∼𝑁( 𝜂 𝑒 , 𝜎 2 𝐼) -- Other variables: 𝑃 𝑒 ∼Discrete 𝑟 𝑒 , 𝑧 𝑒,𝑛 ∼Discrete 𝜙 𝑒,𝑛 Update for the 𝑞 𝑃 𝑒 𝑟 𝑒 : 𝑟 𝑒, 𝑒 ′ ∝ 𝑁 𝜂 𝑒 𝜂 𝑒 ′ , 𝜎 2 𝐼 × 𝐴 𝑣 𝑒 ′ , 𝑣 𝑒 × 𝑓 Δ ( 𝑡 𝑒 − 𝑡 𝑒 ′ ) Hawkes Process Some friend did and someone did not Similarity between document topics Influence between users Proximity of events in time He et al. HawkesTopic ICML 2015 10/17

12 Experiments: setting “Ebola” news articles ~4 months
~9k articles, 330 news media sites Copying information as ground truth High-energy physics theory papers ~12 years Top 50/100/200 researchers Citation network as ground truth Some friend did and someone did not Evaluation metrics: -- Topic modeling: document competition likelihood [Wallach et al. 09] -- Network Inference: AUC against the ground truth network He et al. HawkesTopic ICML 2015 11/17

13 Experiments: algorithms
Description Topic Modeling Network Inference HTM Our method with topic number K=50 and K=100 for ArXiv with 200 authors LDA Latent Dirichlet Allocation with collapsed Gibbs sampling CTM Correlated topic modeling with variational inference Hawkes Hawkes process considering only event posting time Hawkes-LDA Two steps approach that first infers topics with LDA Hawkes-CTM Two steps approach that first infers topics with CTM Some friend did and someone did not He et al. HawkesTopic ICML 2015 12/17

14 Result: EventRegistry
Network Inference accuracy: 10% improvement Hawkes Hawkes-LDA Hawkes-CTM HTM Component 1 0.622 0.669 0.673 0.697 Component 2 0.670 0.704 0.716 0.730 Component 3 0.666 0.665 0.700 Topic modeling accuracy: LDA CTM HTM Component 1 -42945 -42458 -42325 Component 2 -22558 -22181 -22164 Component 3 -17574 -17571 Some friend did and someone did not He et al. HawkesTopic ICML 2015 13/17

15 Result: EventRegistry
Some friend did and someone did not He et al. HawkesTopic ICML 2015 14/17

16 Result: ArXiv Network Inference accuracy: 40% improvement Top50 0.594
Hawkes Hawkes-LDA Hawkes-CTM HTM Top50 0.594 0.656 0.645 0.807 Top100 0.588 0.589 0.614 0.687 Top200 0.618 0.630 0.629 0.659 Topic modeling accuracy: LDA CTM HTM Top50 -11074 -10769 -10708 Top100 -15711 -15477 -15252 Top200 -27758 -27630 -27443 Some friend did and someone did not He et al. HawkesTopic ICML 2015 15/17

17 Result: ArXiv He et al. HawkesTopic ICML 2015 16/17
Some friend did and someone did not He et al. HawkesTopic ICML 2015 16/17

18 Conclusion HawkesTopic model unifies Correlated Topic Model and Hawkes process: infers hidden diffusion network discovers thematic topics of documents Joint model of temporal information and content information in text-based cascades gets the best result Experiments on ArXiv and EventRegistry datasets EventRegistry: 10% improvement in AUC ArXiv: 40% improvement in AUC transition He et al. HawkesTopic ICML 2015 17/17

19 Thank You Questions?

20 Result: ArXiv Inferred Topics
Author LDA CTM HTM Andrei Linde black, hole ,holes black, holes, entropy black, holes, hole supersymmetry, supersymmetric, solutions supersymmetry, supersymmetric superspace universe, inflation, may universe, cosmological, cosmology metrics, holonomy, spaces supersymmetry, supersymmetric, breaking Arkady Tseytin magnetic, field, conformal solutions, solution, x string, theory, type type, lib, theory action, effective, background action, actions, duality action, superstring, actions Type, iib, iia bound, configurations, states Some friend did and someone did not He et al. HawkesTopic ICML 2015 Appendix


Download ppt "Xinran He1, Theodoros Rekatsinas2,"

Similar presentations


Ads by Google