Xinran He1, Theodoros Rekatsinas2,

Slides:



Advertisements
Similar presentations
1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.
Advertisements

A probabilistic model for retrospective news event detection
Topic models Source: Topic models, David Blei, MLSS 09.
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
Unsupervised Modeling of Twitter Conversations
Title: The Author-Topic Model for Authors and Documents
Text-Based Measures of Document Diversity Date : 2014/02/12 Source : KDD’13 Authors : Kevin Bache, David Newman, and Padhraic Smyth Advisor : Dr. Jia-Ling,
Probabilistic Clustering-Projection Model for Discrete Data
Statistical Topic Modeling part 1
Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.
Generative Topic Models for Community Analysis
Data Mining and Machine Learning Lab Document Clustering via Matrix Representation Xufei Wang, Jiliang Tang and Huan Liu Arizona State University.
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
British Museum Library, London Picture Courtesy: flickr.
Multiscale Topic Tomography Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty & Kin Ung (Johnson and Johnson Group)
Phrase Mining and Topic Modeling for Structure Discovery from Text
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.
1 1 MPI for Intelligent Systems 2 Stanford University Manuel Gomez Rodriguez 1,2 David Balduzzi 1 Bernhard Schölkopf 1 UNCOVERING THE TEMPORAL DYNAMICS.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
2009 IEEE Symposium on Computational Intelligence in Cyber Security 1 LDA-based Dark Web Analysis.
Modeling Text and Links: Overview William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
What are developers talking about? AN ANALYSIS OF TOPICS AND TRENDS IN STACK OVERFLOW DENNIS PORTENGEN.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,
Integrating Topics and Syntax -Thomas L
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Topic Modeling using Latent Dirichlet Allocation
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Jointly Modeling Topics, Events and User Interests on Twitter Qiming DiaoJing Jiang School of Information Systems Singapore Management University.
Manuel Gomez Rodriguez Bernhard Schölkopf I NFLUENCE M AXIMIZATION IN C ONTINUOUS T IME D IFFUSION N ETWORKS , ICML ‘12.
How can we maintain an error bound? Settle for a “per-step” bound What’s the probability of a mistake at each step? Not cumulative, but Equal footing with.
1 1 MPI for Intelligent Systems 2 Stanford University Manuel Gomez Rodriguez 1,2 Bernhard Schölkopf 1 S UBMODULAR I NFERENCE OF D IFFUSION NETWORKS FROM.
2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
Topic Modeling and Latent Dirichlet Allocation: An Overview
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Manuel Gomez Rodriguez
Online Multiscale Dynamic Topic Models
Manuel Gomez Rodriguez
Inside of SQL Server Indexes
Mixture of Mutually Exciting Processes for Viral Diffusion
Collapsed Variational Dirichlet Process Mixture Models
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Topic Models in Text Processing
Hierarchical Relational Models for Document Networks
Human-centered Machine Learning
Human-centered Machine Learning
Presentation transcript:

HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He1, Theodoros Rekatsinas2, James Foulds3, Lise Getoor3 and Yan Liu1 07/08/2015 1University of Southern California 2University of Maryland, College Park 3University of California, Santa Cruz

Introduction Diffusion is an important and fundamental phenomenon: Viral marketing, detection of rumors, modeling news dynamics … Abundant text-based cascades in a variety of social platforms t=2 A B C D E F G t=1 t=1.5 t=3.5 t=0 He et al. HawkesTopic ICML 2015 01/17

Traditional vs Text-based Cascades Traditional cascades Text-based cascades B A C D E F G t=0 t=3.5 t=1 t=2 t=1.5 t=0 t=3.5 t=1 t=2 t=1.5 - Temporal information - Temporal information - Content information Incorporate content information => better model of diffusion Incorporate temporal information => better model of documents He et al. HawkesTopic ICML 2015 02/17

Network Inference aaa aab bbb bba bbc ccc Topic 1 Topic 2 Topic 3 aaa bbb ccc bba aab bbc t=0 t=3.5 t=1 t=2 t=1.5 A C D E F G B B A C D E F G 0.1 0.3 0.2 0.2 0.5 0.6 0.1 Some friend did and someone did not Network Inference focuses on inferring a hidden diffusion network Related work: - NetInf, NetRate [Gomez et al. 11,12], MMHP [Yang and Zha 13], KernelCascades [Du el al. 12] - TopicCascades [Du el al. 13] He et al. HawkesTopic ICML 2015 03/17

Topic Modeling aaa aab bbb bba bbc ccc Topic 1 Topic 2 Topic 3 aaa bbb ccc bba aab bbc aaa bbb ccc bba aab bbc Corpus aaa bbb ccc bba aab bbc t=0 t=3.5 t=1 t=2 t=1.5 A C D E F G B B A C D E F G Some friend did and someone did not Topic modeling aims to discover the latent thematic topics Related work: - LDA [Blei et al. 03], CTM [Blei and Lafferty 06] - Citation Influence model [Dietz el al. 07], TIR model [Foulds et al. 13] He et al. HawkesTopic ICML 2015 04/17

Our Contribution aaa bbb ccc bba aab bbc Topic 1 Topic 2 Topic 3 Topic Modeling aaa bbb aab ccc bbc bba t=0 t=3.5 t=1 t=2 t=1.5 A C D E F G Network Inference A B C D E F G 0.6 0.4 0.1 0.2 0.3 B transition HawkesTopic: joint model for simultaneous Network Inference and Topic Modeling from text-based cascades He et al. HawkesTopic ICML 2015 05/17

HawkesTopic: Intuition ccc bbb ccc cca bbb bba 𝒕 𝑣 1 𝑣 2 aaa bbb aaa aba bbb 𝒕 Mutual exciting nature: A posting event can trigger future events transition Content cascades: The content of a document should be similar to the document that triggers its publication He et al. HawkesTopic ICML 2015 06/17

Modeling Posting Times Mutually exciting nature captured via Multivariate Hawkes Process (MHP) [Liniger 09]. For MHP, intensity process 𝜆 𝑣 (𝑡) takes the form: = + Rate Base intensity Influence from previous events 𝜆 𝑣 𝑡 = 𝜇 𝑣 + 𝑒: 𝑡 𝑒 <𝑡 𝐴 𝑣 𝑒 ,𝑣 𝑓 Δ (𝑡− 𝑡 𝑒 ) transition 𝐴 𝑢,𝑤 : influence strength from 𝑢 to 𝑣 𝑓 Δ (⋅): probability density function of the delay distribution He et al. HawkesTopic ICML 2015 07/17

Generating Posting Times 𝒕 𝑣 1 𝑣 2 Level 0 Level 1 Level 2 𝒕 Generate events and their posting times in a breadth first order by interpreting the MHP as clustered Poisson process [Simma 10] transition Provide explicit parent relationship for evolution of the content information He et al. HawkesTopic ICML 2015 08/17

Modeling Documents 𝒕 𝑣 1 𝑣 2 𝛼 1 𝛼 2 … 𝛽 1:𝐾 aaa aab aac ccc ccb cac Topic 1 aab aac ccc Topic 2 ccb cac 𝛽 1:𝐾 … Step 1: Generate the topics 𝛽 1:𝐾 : 𝛽 𝑘 ∼𝐷𝑖𝑟(𝛼) 𝑣 1 𝑣 2 𝒕 ccb cac ccc aab aaa aac 𝛼 1 𝛼 2 transition Step 2: For spontaneous events (level=0): 𝜂 𝑒 ∼𝑁( 𝛼 𝑣 , 𝜎 2 𝐼) Step 3: For triggered events (level>0): 𝜂 𝑒 ∼𝑁( 𝜂 parent[𝑒] , 𝜎 2 𝐼) Step 4: For each word in each document: 𝑧 𝑒,𝑛 ∼Discrete 𝜋 𝜂 𝑒 , 𝑥 𝑒,𝑛 ∼Discrete( 𝛽 𝑧 𝑒,𝑛 ) He et al. HawkesTopic ICML 2015 09/17

Inference Joint variational inference based on full mean-field approximation 𝑄 𝜼,𝒛,𝑷 = 𝑒∈𝐸 𝑞 𝜂 𝑒 𝜂 𝑒 𝑞 𝑃 𝑒 𝑟 𝑒 𝑛=1 𝑁 𝑒 𝑞( 𝑧 𝑒,𝑛 | 𝜙 𝑒,𝑛 ) -- Laplace approximation for non-conjugate variable: 𝜂 𝑒 ∼𝑁( 𝜂 𝑒 , 𝜎 2 𝐼) -- Other variables: 𝑃 𝑒 ∼Discrete 𝑟 𝑒 , 𝑧 𝑒,𝑛 ∼Discrete 𝜙 𝑒,𝑛 Update for the 𝑞 𝑃 𝑒 𝑟 𝑒 : 𝑟 𝑒, 𝑒 ′ ∝ 𝑁 𝜂 𝑒 𝜂 𝑒 ′ , 𝜎 2 𝐼 × 𝐴 𝑣 𝑒 ′ , 𝑣 𝑒 × 𝑓 Δ ( 𝑡 𝑒 − 𝑡 𝑒 ′ ) Hawkes Process Some friend did and someone did not Similarity between document topics Influence between users Proximity of events in time He et al. HawkesTopic ICML 2015 10/17

Experiments: setting “Ebola” news articles ~4 months ~9k articles, 330 news media sites Copying information as ground truth High-energy physics theory papers ~12 years Top 50/100/200 researchers Citation network as ground truth Some friend did and someone did not Evaluation metrics: -- Topic modeling: document competition likelihood [Wallach et al. 09] -- Network Inference: AUC against the ground truth network He et al. HawkesTopic ICML 2015 11/17

Experiments: algorithms Description Topic Modeling Network Inference HTM Our method with topic number K=50 and K=100 for ArXiv with 200 authors LDA Latent Dirichlet Allocation with collapsed Gibbs sampling CTM Correlated topic modeling with variational inference Hawkes Hawkes process considering only event posting time Hawkes-LDA Two steps approach that first infers topics with LDA Hawkes-CTM Two steps approach that first infers topics with CTM Some friend did and someone did not He et al. HawkesTopic ICML 2015 12/17

Result: EventRegistry Network Inference accuracy: 10% improvement Hawkes Hawkes-LDA Hawkes-CTM HTM Component 1 0.622 0.669 0.673 0.697 Component 2 0.670 0.704 0.716 0.730 Component 3 0.666 0.665 0.700 Topic modeling accuracy: LDA CTM HTM Component 1 -42945 -42458 -42325 Component 2 -22558 -22181 -22164 Component 3 -17574 -17571 Some friend did and someone did not He et al. HawkesTopic ICML 2015 13/17

Result: EventRegistry Some friend did and someone did not He et al. HawkesTopic ICML 2015 14/17

Result: ArXiv Network Inference accuracy: 40% improvement Top50 0.594 Hawkes Hawkes-LDA Hawkes-CTM HTM Top50 0.594 0.656 0.645 0.807 Top100 0.588 0.589 0.614 0.687 Top200 0.618 0.630 0.629 0.659 Topic modeling accuracy: LDA CTM HTM Top50 -11074 -10769 -10708 Top100 -15711 -15477 -15252 Top200 -27758 -27630 -27443 Some friend did and someone did not He et al. HawkesTopic ICML 2015 15/17

Result: ArXiv He et al. HawkesTopic ICML 2015 16/17 Some friend did and someone did not He et al. HawkesTopic ICML 2015 16/17

Conclusion HawkesTopic model unifies Correlated Topic Model and Hawkes process: infers hidden diffusion network discovers thematic topics of documents Joint model of temporal information and content information in text-based cascades gets the best result Experiments on ArXiv and EventRegistry datasets EventRegistry: 10% improvement in AUC ArXiv: 40% improvement in AUC transition He et al. HawkesTopic ICML 2015 17/17

Thank You Questions?

Result: ArXiv Inferred Topics Author LDA CTM HTM Andrei Linde black, hole ,holes black, holes, entropy black, holes, hole supersymmetry, supersymmetric, solutions supersymmetry, supersymmetric superspace universe, inflation, may universe, cosmological, cosmology metrics, holonomy, spaces supersymmetry, supersymmetric, breaking Arkady Tseytin magnetic, field, conformal solutions, solution, x string, theory, type type, lib, theory action, effective, background action, actions, duality action, superstring, actions Type, iib, iia bound, configurations, states Some friend did and someone did not He et al. HawkesTopic ICML 2015 Appendix