Presentation on theme: "Mining Event Periodicity from Incomplete Observations Zhenhui (Jessie) Li*, Jingjing Wang, Jiawei Han University of Illinois at Urbana-Champaign *Now at."— Presentation transcript:
Mining Event Periodicity from Incomplete Observations Zhenhui (Jessie) Li*, Jingjing Wang, Jiawei Han University of Illinois at Urbana-Champaign *Now at Penn State University KDD 2012 Beijing, China 1Zhenhui Jessie Li
Prologue: Detect Periodicity in Movements [Li et al., KDD’10] 2Zhenhui Jessie Li Problem: What is the periodicity of the movement? Bee example: 8 hours in hive 16 hours fly nearby
Prologue: Detect Periodicity in Movements [Li et al., KDD’10] Observe the in-and-out movements from the reference spot (i.e., hive). in hive outside hive time 3Zhenhui Jessie Li Two-Dimensional Movement One-Dimensional Binary Sequence Easy to see the periodicity.
Challenge: Periodicity Detection for Incomplete Observations Two factors result in incomplete observations: inconsistent + low sampling rate Movement data collection in real scenarios: – Human movements data collected from cellphones: only report locations when making calls – Animal movement data: 2~3 locations in 3~5 days Zhenhui Jessie Li :03 in :30 out :12 in :03 in :14 out :15 in … in hive outside hive Complete ObservationsIncomplete Observations
A Challenging Case of Detecting Periodicity for Incomplete Observations Zhenhui Jessie Li :03 in :30 out :12 in :03 in :14 out :15 in … Sparse Raw Data inoutin Any periodicity in the above sequence?
Mining Periodicity in Incomplete Data Zhenhui Jessie Li6 Event has a period of 20 Occurrences of the event happen between 20k+5 to 20k+10
A Probabilistic Model for Periodic Event Zhenhui Jessie Li7 Example: Human daily periodicity visiting office Period as 24 Visiting office at 10-11am, 14-16pm
A Probabilistic Model for Periodic Event with Random Observation Zhenhui Jessie Li8 generate x(5)=1 x(62)=0
Periodicity Detection by Overlaying Observations Zhenhui Jessie Li9 Skewed distribution Even distribution True period Wrong period
Relationship between Observation Ratio and Probabilistic Model Zhenhui Jessie Li10 Pos/Neg RatioPeriodic Distribution Vector
Discrepancy Score to Measure Periodicity Zhenhui Jessie Li11 If T (=24) is the correct period, the discrepancy score should be large for certain set of timestamps If T (=23) is the wrong period, the discrepancy scores are likely to be zero for any set of timestamps
Periodicity Measure Zhenhui Jessie Li12
Performance Comparisons Zhenhui Jessie Li13 Sampling rate (Ratio of observed points in the complete sequence)
Experiment on Real Human Data Zhenhui Jessie Li14 One person’s visits to a specific location Sampling rate: 20min Sampling rate: 1hour
Problems with Using Fourier Transform to Detect Periodicity Zhenhui Jessie Li15 T=4 T=16
Summary: Mining Event Periodicity from Incomplete Observations Motivation – Challenge of the real data: incomplete observations (inconsistent + low sampling rate) Method – Overlay the segments and measure the “skewness” of the distribution – Theoretically prove the correctness of the method Application – Location prediction – 2 nd place in Nokia Mobile Data Challenge 2012 – Periodicity-based feature + SVM Zhenhui Jessie Li16 Thanks! Questions?