Presentation is loading. Please wait.

Presentation is loading. Please wait.

DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia

Similar presentations


Presentation on theme: "DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia"ā€” Presentation transcript:

1 DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia
A Traffic Flow Approach to Early Detection of Gathering Events X. Zhou, A. V. Khezerlou, A. Liu, Z. Shafiq, F. Zhang DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia

2 Motivation Why detecting gathering events is IMPORTANT? 2

3 Challenge Why detecting gathering event is DIFFICULT?
Many candidate gathering footprints Need to balance result quality and computation time 3

4 Outline Problem Formulation Computational Solution
Case Study and Experimental Evaluations 4

5 Problem Formulation 5

6 Traffic Flow Spatial field ā€œsā€
Directed edge e=(si,sj) where si and sj are adjacent grids. Observed traffic flow (Ce) Baseline traffic blow (Be) What is a abnormal flow (EBP model) 6

7 LLR EBP Test maximizes the likelihood ratio between H0 and H1 LLR
Significant Flow Lemma 1 7

8 Definitions Shortest Path constraint Most likely destination? 8

9 G-Score 9

10 G-Graph k- dominant G-Graph set 10

11 Problem Definition 11

12 Computational Solution

13 Brute-Force Algorithm
Significant flows are identified Construct G- graphs at each grid Significant flow are fetched for each root Find most likely path Calculate G-Score Sorted and Scanned Top k-graphs are reported Disadvantage: G-graph is suppose to be created only when one significant flow exist Costly exhaustive search No ability to prune candidate G- graphs

14 Smart Edge Algorithm To address the mentioned computational bottlenecks, we present a new algorithm SmartEdge with three design decisions for better computational efficiency: Candidate Root Filter Build G-graphs with dynamic processing G-graphs Pruning G-score Upper bound G-graphs Pruning Strategy

15 Candidate Root Filter Filter locations with no significant flow
Candidate Root Index (CRI) data structure (hash table) Number of significant flows are stored in bins as required to calculate upper bound values Total number of significant flows in each bin and near the root are calculated When Esig is identified, find the roots of that flow and update the counter in CRI Roots with no values are filtered

16 Generate G-Graph For a given root and a list of significant flows nearby (fetched from the Candidate Root Index), the algorithm picks each significant flow and traverse all the grids in the rectangular area bounded by the root and this significant flow in a breadth-first manner The most likely path to the root from every grid is calculated until the significant flow is reached. After finding the most likely path, all the flows along this path will be added to the G-Graph

17 G-Score Upper Bound Ne(r) -> upper bound of insignificant flows
LLR(eins) -> Maximum LLR score of insignificant flows Calculating LLR(eins) Calculating Ne(r)

18 G-Graph Pruning Higher G-score means higher number of significant flows Sorted in descending order based on number of significant flows G-score upper bound is calculated and compared with lowest value in Priority Queue If higher than Priority queue value, then it is pruned or actual G-score is calculated and compared. A particular node is not added until we can verify that it is not dominated by any other node This is done by recursively calling all the roots

19 Evaluations

20 Case Study Dataset Data Processing Run Algorithm
10,000 taxis in Shenzhen 31 days in August 2013 128x64 grids Data Processing Traffic flow: taxi GPS data pass two neighbor grids Baseline flow: average monthly flow crossing same boundary Separate baselines for weekends and weekdays Run Algorithm Find top 5 gathering event for every 10-minute interval

21 Result Pop music concert @ 8PM, August 16th Friday Two waves
Direction information G = 329 G = 459 G = 493 G = 632 G = 634 G = 506 G = 717

22 Performance Experiment
Same as case study For entire month of August 2013 Report average CPU time Conditions Brute Force CRF: candidate root filtering CRF+DP: add G-Graph building with Dynamic Programming CRF+DP+GPR: add G-Graph Pruning

23 Number of Grids From 16x32 to 64x128
SmartEdge sublinear increase, because it filters impossible roots and G-Graphs Brute-Force linear Best 50% time reduce .

24 Distance Threshold From 500m (1) to 4.5km (9)
Brute-Force and CRF exponential growth, because they list all possible paths when generate G-Graph DP G-Graph building reduce to super-linear Best 49% time reduce When the distance threshold increase, the algorithm need to consider a bigger range for every G-Graph. As shown in the picture, from 500 meter to 4.5 kilometer, the Brute Force method and SmartEdge without optimization grow exponentially. The dynamic programming and G-Graph pruning were able to reduce this significantly to sub-linear. The best condition can still reduce time by 50% compared to Brute Force.

25 Result Size k From k = 1 to 10 Increase k only impact CPF+DP+GPR, because it uses top-k list to do pruning GPR no advantage for big k Best ~50% time reduce

26 P-Value Threshold From 0.01% to 0.1%
More total number of significant flows All linear increase, but DP and GPR grow slower Best 52% time reduce

27 Conclusion An computationally effective solution to the problem of ā€œearly detection of gathering eventsā€ Offer gathering directions in addition to destinations G-Graph, G-Score Smart Edge algorithm with two optimizations 50% time reduce over Brute-Force approach

28 Questions


Download ppt "DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia"

Similar presentations


Ads by Google