1 A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang Zhai : University of Illinois.

Slides:



Advertisements
Similar presentations
Towards Data Mining Without Information on Knowledge Structure
Advertisements

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
1 MPE and Partial Inversion in Lifted Probabilistic Variable Elimination Rodrigo de Salvo Braz University of Illinois at Urbana-Champaign with Eyal Amir.
Advanced Piloting Cruise Plot.
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 38.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 5 second questions
ZMQS ZMQS
Alexander Kotov, ChengXiang Zhai, Richard Sproat University of Illinois at Urbana-Champaign.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
A Cross-Collection Mixture Model for Comparative Text Mining
ABC Technology Project
Extending the Capacity of Mobile Devices Through Cloud Offloading Francisco Airton – PhD Student 04 of may, 2014 Workshop MoDCS
ACM CIKM 2008, Oct , Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
VOORBLAD.
15. Oktober Oktober Oktober 2012.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 Directed Depth First Search Adjacency Lists A: F G B: A H C: A D D: C F E: C D G F: E: G: : H: B: I: H: F A B C G D E H I.
Squares and Square Root WALK. Solve each problem REVIEW:
We are learning how to read the 24 hour clock
Do you have the Maths Factor?. Maths Can you beat this term’s Maths Challenge?
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Week 1.
We will resume in: 25 Minutes.
A SMALL TRUTH TO MAKE LIFE 100%
1 Unit 1 Kinematics Chapter 1 Day
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Lecture 14 Nonlinear Problems Grid Search and Monte Carlo Methods.
2008 © ChengXiang Zhai 1 Contextual Text Analysis with Probabilistic Topic Models ChengXiang Zhai Department of Computer Science Graduate School of Library.
Statistical Topic Models for Integrating and Analyzing Opinions in Blog articles Yue Lu Qiaozhu Mei ChengXiang Zhai.
Topic Modeling with Network Regularization Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Comparative Text Mining Q. Mei, C. Liu, H. Su, A. Velivelli, B. Yu, C. Zhai DAIS The Database and Information Systems Laboratory. at The University of.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 7. Topic Extraction.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Automatic Labeling of Multinomial Topic Models
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
Probabilistic Topic Model
Qiaozhu Mei†, Chao Liu†, Hang Su‡, and ChengXiang Zhai†
Topic Models in Text Processing
Presentation transcript:

1 A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang Zhai : University of Illinois at Urbana-Champaign : Vanderbilt University

2 Weblog as an emerging new data… …

3 An Example of Weblog Article The time stamp Location Info. Blog Contents

4 Characteristics of Weblogs Weblog Article Highly personal With opinions With mixed topics LocationTime Associated with time & location Interlinking & Forming communities Immediate response to events

5 Existing Work on Weblog Analysis Interlinking and Community Analysis –Identifying communities –Monitoring the evolution and bursting of communities –E.g., [Kumar et al. 2003] # of nodes in communities # of communities Content Analysis –Blog level topic analysis –Information diffusion through blogspace –Use topic bursting to predict sales spikes –E.g., [Gruhl et al. 2005] Sales rank Blog mentions

6 How to Perform Spatiotemporal Theme Mining? Given a collection of Weblog articles about a topic with time and location information –Discover multiple themes (i.e., subtopics) being discussed in these articles –For a given location, discover how each theme evolves over time (generate a theme life cycle) –For a given time, reveal how each theme spreads over locations (generate a theme snapshot) –Compare theme life cycles in different locations –Compare theme snapshots in different time periods –…

7 Locations Spatiotemporal Theme Patterns A theme snapshot Discussion about Government Response in articles about Hurricane Katrina Discussion about Release of iPod Nano in articles about iPod Nano Strength Time Unite States China Canada Theme life cycles 09/20/05 – 09/26/05

8 Applications of Spatiotemporal Theme Mining Help answer questions like –Which country responded first to the release of iPod Nano? China, UK, or Canada? –Do people in different states (e.g., Illinois vs. Texas) respond differently/similarly to the increase of gas price during Hurricane Katrina? Potentially useful for –Summarizing search results –Monitoring public opinions –Business Intelligence –…

9 Challenges in Spatiotemporal Theme Mining How to represent a theme? How to model the themes in a collection? How to model their dependency on time and location? How to compute the theme life cycles and theme snapshots? All these must be done in an unsupervised way…

10 Our Solution: Use a Probabilistic Spatiotemporal Theme Model Each theme is represented as a multinomial distribution over the vocabulary (language model) Consider the collection as a sample from a mixture of these theme models Fit the model to the data and estimate the parameters Spatiotemporal theme patterns can then be computed from the estimated model parameters

11 Probabilistic Spatiotemporal Theme Model Theme 1 Theme k Theme 2 … Background B price 0.3 oil donate 0.1 relief 0.05 help city 0.2 new 0.1 orleans Is 0.05 the 0.04 a Draw a word from i Choose a theme i oil donate city the … k 1 2 B + TL P( i |d) Probability of choosing theme i =... TL P( i |t, l) Document d Time=t Location=l TL = weight on spatiotemporal theme distribution

12 The Generation Process A document d of location l and time t is generated, word by word, as follows –First, decide whether to use the background theme B With probability B, well use the background theme and draw a word w from p(w| B ) –If the background theme is not to be used, well decide how to choose a topic theme With probability TL, well sample a theme using the shared spatiotemporal distribution p( |t,l) With probability 1- TL, well sample a theme using p( |d) –Draw a word w from the selected theme distribution p(w| i ) Parameters –{ p(w| B ), p(w| i ), p( |t,l), p( |d)} (will be estimated) – B =Background noise; TL =Weight on spatiotemporal modeling (will be manually set)

13 The Likelihood Function Count of word w in document d Generating w using the background theme Generating w using a topic theme Choosing a topic theme according to the document Choosing a topic theme according to the spatiotemporal context

14 Parameter Estimation Use the maximum likelihood estimator Use the Expectation-Maximization (EM) algorithm p(w| B ) is set to the collection word probability E Step M Step

15 Probabilistic Analysis of Spatiotemporal Themes Once the parameters are estimated, we can easily perform probabilistic analysis of spatiotemporal themes –Computing theme life cycles given location –Computing theme snapshots given time

16 Experiments and Results Three time-stamped data sets of weblogs, each about one event (broad topic): Extract location information from author profiles On each data set, we extract a set of salient themes and their life cycles / theme snapshots Data Set # docsTime Span(2005)Query Katrina937708/16 -10/04Hurricane Katrina Rita175408/ /04Hurricane Rita iPod Nano172009/ /26iPod Nano

17 Theme Life Cycles for Hurricane Katrina city orleans new louisiana flood evacuate storm … price oil gas increase product fuel company … Oil Price New Orleans

18 Theme Snapshots for Hurricane Katrina Week4: The theme is again strong along the east coast and the Gulf of Mexico Week3: The theme distributes more uniformly over the states Week2: The discussion moves towards the north and west Week5: The theme fades out in most states Week1: The theme is the strongest along the Gulf of Mexico

19 Theme life cycles for Hurricane Rita Hurricane Katrina: Government Response Hurricane Rita: Government Response Hurricane Rita: Storms A theme in Hurricane Katrina is inspired again by Hurricane Rita

20 Theme Snapshots for Hurricane Rita Both Hurricane Katrina and Hurricane Rita have the theme Oil Price The spatiotemporal patterns of this theme at the same time period are similar

21 Theme Life Cycles for iPod Nano ipod nano apple september mini screen new … Release of Nano United States China United Kingdom Canada

22 Contributions and Future Work Contributions –Defined a new problem -- spatiotemporal text mining –Proposed a general mixture model for the mining task –Proposed methods for computing two spatiotemporal patterns -- theme life cycles and theme snapshots –Applied it to Weblog mining with interesting results Future work: –Capture content dependency between adjacent time stamps and locations –Study granularity selection in spatiotemporal text mining

23 Thank You!