Anant Pradhan PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)

Slides:



Advertisements
Similar presentations
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Advertisements

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
CLEar (Clairaudient Ear) A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Overarching Goal: Understand that computer models require the merging of mathematics and science. 1.Understand how computational reasoning can be infused.
Understanding Research Articles Microbiology Laboratory.
Tru-Alarm: Trustworthiness Analysis of Sensor Network in Cyber Physical Systems Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Chih-Chieh Hung, Wen-Chih.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Topic Modeling with Network Regularization Md Mustafizur Rahman.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Blogosphere  What is blogosphere?  Why do we need to study Blog-space or Blogosphere?
2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering.
Towards Scalable Critical Alert Mining Bo Zong 1 with Yinghui Wu 1, Jie Song 2, Ambuj K. Singh 1, Hasan Cam 3, Jiawei Han 4, and Xifeng Yan 1 1 UCSB, 2.
Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.
Computer Science 1 Web as a graph Anna Karpovsky.
EARLY DETECTION OF TWITTER TRENDS MILAN STANOJEVIC UNIVERSITY OF BELGRADE SCHOOL OF ELECTRICAL ENGINEERING.
Topic Modeling with Network Regularization Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
TwitterSearch : A Comparison of Microblog Search and Web Search
Cookies, Spreadsheets, and Modeling: Dynamic, Interactive, Visual Science and Math Scott A. Sinex Prince George’s Community College Presented at Network.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Bei Pan (Penny), University of Southern California
How to conduct good investigation in social sciences Albu Iulian Alexandru.
Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Popularity-Aware Topic Model for Social Graphs Junghoo “John” Cho UCLA.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Google News Personalization: Scalable Online Collaborative Filtering
Microblogs: Information and Social Network Huang Yuxin.
Introduction to research methods 10/26/2004 Xiangming Mu.
Introduction to Social Psychology What is Social Psychology?
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Facilitating Document Annotation using Content and Querying Value.
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
1 A Probabilistic Model for Bursty Topic Discovery in Microblogs Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng CAS Key Laboratory of Web Data.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
What Is Text Mining? Also known as Text Data Mining Process of examining large collections of unstructured textual resources in order to generate new.
Measuring Behavioral Trust in Social Networks
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
. Now try this one Challenge:  Is this even possible?  One gets stuck….
Chapter 1.1 – What is Science?. State and explain the goals of science. Describe the steps used in the scientific method. Daily Objectives.
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences Lu Bai,
Link Distribution on Wikipedia [0407]KwangHee Park.
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Analyzing and Predicting Question Quality in Community Question Answering Services Baichuan Li, Tan Jin, Michael R. Lyu, Irwin King, and Barley Mak CQA2012,
Scientific Method Notes
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Computational Reasoning in High School Science and Math
Social Knowledge Mining
KDD Reviews 周天烁 2018年5月9日.
Chapter 1.1 – What is Science?
Topic Models in Text Processing
Presentation transcript:

Anant Pradhan PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)

Introduction Challenge: Tracking the evolution of a popular topic 2

Introduction Observing and tracking: – Popular events – Topics that evolve over time Existing approaches focus on: – Burstiness – Evolution of networks – Ignore interplay between textual topics and network structures. 3

Propose a novel statistical method (PET) that: – Models the popularity of events over time – Considers burstiness of user interest – Information diffusion on the network structure – Evolution of textual topics Introduction 4

Gibbs Random Field used to model: – Influence of historical status – Dependency relationships in the graph Topic Model: – designed to explain the generation of text data Interplay by regularizing each other. 5

Problem Definition Set of vertices: V k Set of edges: E k Network Stream: G = {G 1, G 2, · · ·, G T } Snapshot of network: G k = {V k, E k } Document Stream: D = {D 1,D 2, · · ·, D T } Topic: θ Event: Θ E = {θ E 0, θ E 1, θ E 2,· · ·, θ E T } Interest: H k = {h k (1), h k (2), · · ·, h k (N)} 6

Problem Definition Event-related information in a social community: – An observed stream of network structures – An observed stream of text documents – A latent stream of topics about the event – A latent stream of interests 7

The General Model Task is cast as the inference of previous H k and Θ k : P(H k,Θ k |G k, D k, H k−1 ) Assumption 1: Current interest status H k is independent of the document collection D k Assumption 2: Current topic model θ k is independent of the network structure G k and the previous interest status H k−1 8

From the assumptions: P(H k,Θ k |G k,D k,H k−1 ) = P(H k |G k,H k−1 ) · P(Θ k |H k,D k ) The General Model Interest Model Topic Model 9

The Interest Model Modelled as a Gibbs Random Field on the network G k Uses specially designed potential functions Uses weighting scheme motivated by real world networks 10

The Topic Model Models historical interest status and relationships on the network. Allows the topics and popularity of the events to mutually influence each other over time. P(Θ k |H k,D k ) ∝ P(D k |H k,Θ k ) P(Θ k |H k ) 11

Connection to Existing Models Special cases of PET under certain conditions. The State Automation Model: – When the network effect is omitted The Contagion Model – When the topic effect is omitted 12

Complexity Analysis PLSA (Probabilistic Latent Semantic Analysis): O((N +M)mt) PET: O(NMmT) N documents involving t topics with M words, m rounds and time T. Reasonable. 13

Experiments JonK: State automation model. First Baseline. Cont: The contagion model. Second Baseline. PET- : PET minus network structures. BOM: Box Office Earning. Gold Standard for movie-related events. GInt: Google Insight. Gold Standard for news related events. 14

Experiments Twitter – 5000 users – 1,438,826 tweets – From Oct 2009 to Jan 2010 – Events: 2 movies (Avatar, Twilight) 2 news events (Tiger Woods affair, Copenhagen Climate Conference) 15

Setup: λ T : Interest model. Weight for historical info. λ A : Interest model. Weight for structural info. μ E : Topic model. λ T = 1 λ A = 3 μ E = 1 Experiments 16

17

18

Result Analysis PET has the best performance. Cont has the worst performance. JonK generally performs well, but less accurate than PET. 19

Network Diffusion Analysis Cont can’t tell the difference between interest levels. Both PET and PET– are able to catch the rising trend of popularity. PET is still superior. 20

21

22

Events Analysis on DBLP For popular events, PET generates: – More accurate trends – smoother diffusion – meaningful content evolution 23

Future Work Apply this model to track evolution of ideas, scientific innovation. Real-time event search system.

Conclusion A novel approach. Experimental evidence is convincing. Complexity might be a reason of concern.

Thank you. Questions?