Download presentation

Presentation is loading. Please wait.

Published byQuinn Hutchings Modified over 2 years ago

1
**Mining Event Periodicity from Incomplete Observations**

Zhenhui (Jessie) Li*, Jingjing Wang, Jiawei Han University of Illinois at Urbana-Champaign *Now at Penn State University Key points: (1) data is widely available (2) lots of applications (3) data complexity (4) data scalability Methodogies: (1) real data (2) real problem (3) collaboration with biogists (4) fundamental patterns that can be applied to many areas Before presentation: motivate myself; important applications; biologists SOME pronunciations: Periodicity (perio’dicity) consecutive [kuhn-sek-yuh-tiv] longitude [lon-ji-tood, -tyood] parameter [puh-ram-i-ter] hypothesis [hahy-poth-uh-sis, hi-] hurricane [hur-i-keyn] These kinds of methods This kind of methods Comment: Application, Challenge, Solutions. What do you expect your audience to take away? The major difference with statistical methods, signal processing, stream processing, dynamic system. Not contradictory. KDD 2012 Beijing, China Zhenhui Jessie Li

2
**Prologue: Detect Periodicity in Movements [Li et al., KDD’10]**

Problem: What is the periodicity of the movement? Bee example: 8 hours in hive 16 hours fly nearby Mention “reference spot” Zhenhui Jessie Li

3
**Prologue: Detect Periodicity in Movements [Li et al., KDD’10]**

Observe the in-and-out movements from the reference spot (i.e., hive). Easy to see the periodicity. in hive outside hive Mention “reference spot” time Two-Dimensional Movement One-Dimensional Binary Sequence Zhenhui Jessie Li

4
**Challenge: Periodicity Detection for Incomplete Observations**

in hive outside hive :03 in :30 out :12 in :03 in :14 out :15 in … Complete Observations Incomplete Observations Two factors result in incomplete observations: inconsistent + low sampling rate Movement data collection in real scenarios: Human movements data collected from cellphones: only report locations when making calls Animal movement data: 2~3 locations in 3~5 days See figure. Please note that this work, we assume the observation spots are already detected and we are only interested in detecting periods from the in-and-out sequence. Zhenhui Jessie Li

5
**A Challenging Case of Detecting Periodicity for Incomplete Observations**

Sparse Raw Data :03 in :30 out :12 in :03 in :14 out :15 in … in out in It is hard… but we do believe we can find the period if the observations are generated from some periodic pattern. So now let’s introduce the idea of our approach. Any periodicity in the above sequence? Zhenhui Jessie Li

6
**Mining Periodicity in Incomplete Data**

Event has a period of 20 Occurrences of the event happen between 20k+5 to 20k+10 even though the observations are sparse, it has little affect on the overall distribution when we segment and overlay the data using the correct period. So our high-level idea is as a generate-and-test framework. We will try all potential periods and see which results in a skew distribution of observations. [pause] There are many ways to measure the skewness. In our work, we propose to use the discrepancy measure between the ratios of in and out observations. Comment: The high-level idea is enough; probability is not the major contribution. Put the observation graph first, pause a few seconds, let people guess the periods. (Zhai) Mixture probabilistic model; entropy? (Zhai) Baseline is too weak. Why not other probabilistic methods? (Zhai) Zhenhui Jessie Li

7
**A Probabilistic Model for Periodic Event**

Example: Human daily periodicity visiting office Period as 24 Visiting office at 10-11am, 14-16pm Example Periodic distribution vectors; human; period 24; visiting office at some timestamps; low probability visiting the office at other times; for each timestamp, we model it as bernoulli distribution Zhenhui Jessie Li

8
**A Probabilistic Model for Periodic Event with Random Observation**

x(t) = 1, 0, -1; Generative model; overlay idea; formally use this generative model to understand and justify our overlay idea generate x(5)=1 x(62)=0 Zhenhui Jessie Li

9
**Periodicity Detection by Overlaying Observations**

True period Wrong period Say clearly the definition. Suppose we have already segment and overlay the original sequence using length T. Then, for any set of timestamps from 1 to T, we can compute the number of positive samples that fall into these timestamps. And we can define to ratio of positive observations as this number divided by the total number of positive samples. Even distribution Skewed distribution Zhenhui Jessie Li

10
**Relationship between Observation Ratio and Probabilistic Model**

Pos/Neg Ratio Periodic Distribution Vector Generative model Zhenhui Jessie Li

11
**Discrepancy Score to Measure Periodicity**

If T (=24) is the correct period, the discrepancy score should be large for certain set of timestamps If T (=23) is the wrong period, the discrepancy scores are likely to be zero for any set of timestamps Zhenhui Jessie Li

12
Periodicity Measure The discrepancy score will be large for certain timestamps using the correct period T. However, when potential period T is wrong, for ANY set of timestamps, the discrepancy scores are likely to be 0. So we propose periodicity score… Though such measures are simple, the nice thing is that we can formally prove that … Motivate discrepancy scores. And explain the differences between periodicity measures for T0 and T. Discrepancy score of a set of timestamp I and a potential period T is defined as the difference on ratios of positive and negative observations. As we discussed, if T is the true period, then it is very likely that for some timestamps, this discrepancy score will be very high. But if T is not the true period, then for any set of timestamps, we expect to score to be approximately zero, Therefore, we define the periodicity measure as the maximal discrepancy among all set of timestamps. Our main contribution is that, we can formally prove, if the observations have a true period T0, then the periodicity score of T0 will be no less than any other periodicity score. So we can treat the period with highest periodicity score as the discovered period. Comment: (1) one of the main results is that we theoretically prove that.... (QQ) Zhenhui Jessie Li

13
**Performance Comparisons**

What is the data, what are the methods No need to explain each existing period detection method in time series T = 24, SEG = [9 : 10, 14 : 16]. TN = 1000, Gamma = 0.1, alpha = 0.5, and beta = 0.2. Sampling rate (Ratio of observed points in the complete sequence) Zhenhui Jessie Li

14
**Experiment on Real Human Data**

One person’s visits to a specific location Sampling rate: 20min Sampling rate: 1hour To evaluate our method on real human movements, we use the data from Nokia Mobile Data Challenge The data contains movements of 80 persons across 200 to 500 days. The raw location data (based on GPS and WLAN) is first transformed into a set of symbolic places. Each place corresponds to a circle with radius of 100 meters. In this section, we select one person who has tracking record for 492 days for a case study. Zhenhui Jessie Li

15
**Problems with Using Fourier Transform to Detect Periodicity**

Zhenhui Jessie Li

16
**Summary: Mining Event Periodicity from Incomplete Observations**

Motivation Challenge of the real data: incomplete observations (inconsistent + low sampling rate) Method Overlay the segments and measure the “skewness” of the distribution Theoretically prove the correctness of the method Application Location prediction 2nd place in Nokia Mobile Data Challenge 2012 Periodicity-based feature + SVM Thanks! Questions? Real scenario Zhenhui Jessie Li

Similar presentations

Presentation is loading. Please wait....

OK

Chao Liu, Chen Chen, Jiawei Han, Philip S. Yu

Chao Liu, Chen Chen, Jiawei Han, Philip S. Yu

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on x.509 authentication service Ppt on cleanliness of surroundings Download ppt on transportation in human beings Ppt on review of related literature search Ppt on electrical power generation system using railway track slides Ppt on neural networks Ppt on viruses and bacteria lesson Download free ppt on child labour Ppt on revolt of 1857 images Ppt on information system audit