DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia

DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia
A Traffic Flow Approach to Early Detection of Gathering Events X. Zhou, A. V. Khezerlou, A. Liu, Z. Shafiq, F. Zhang DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia

Motivation Why detecting gathering events is IMPORTANT? 2

Challenge Why detecting gathering event is DIFFICULT?
Many candidate gathering footprints Need to balance result quality and computation time 3

Outline Problem Formulation Computational Solution
Case Study and Experimental Evaluations 4

Problem Formulation 5

Traffic Flow Spatial field “s”
Directed edge e=(si,sj) where si and sj are adjacent grids. Observed traffic flow (Ce) Baseline traffic blow (Be) What is a abnormal flow (EBP model) 6

LLR EBP Test maximizes the likelihood ratio between H0 and H1 LLR
Significant Flow Lemma 1 7

Definitions Shortest Path constraint Most likely destination? 8

G-Score 9

G-Graph k- dominant G-Graph set 10

Problem Definition 11

Computational Solution

Brute-Force Algorithm
Significant flows are identified Construct G- graphs at each grid Significant flow are fetched for each root Find most likely path Calculate G-Score Sorted and Scanned Top k-graphs are reported Disadvantage: G-graph is suppose to be created only when one significant flow exist Costly exhaustive search No ability to prune candidate G- graphs

Smart Edge Algorithm To address the mentioned computational bottlenecks, we present a new algorithm SmartEdge with three design decisions for better computational efficiency: Candidate Root Filter Build G-graphs with dynamic processing G-graphs Pruning G-score Upper bound G-graphs Pruning Strategy

Candidate Root Filter Filter locations with no significant flow
Candidate Root Index (CRI) data structure (hash table) Number of significant flows are stored in bins as required to calculate upper bound values Total number of significant flows in each bin and near the root are calculated When Esig is identified, find the roots of that flow and update the counter in CRI Roots with no values are filtered

Generate G-Graph For a given root and a list of significant flows nearby (fetched from the Candidate Root Index), the algorithm picks each significant flow and traverse all the grids in the rectangular area bounded by the root and this significant flow in a breadth-first manner The most likely path to the root from every grid is calculated until the significant flow is reached. After finding the most likely path, all the flows along this path will be added to the G-Graph

G-Score Upper Bound Ne(r) -> upper bound of insignificant flows
LLR(eins) -> Maximum LLR score of insignificant flows Calculating LLR(eins) Calculating Ne(r)

G-Graph Pruning Higher G-score means higher number of significant flows Sorted in descending order based on number of significant flows G-score upper bound is calculated and compared with lowest value in Priority Queue If higher than Priority queue value, then it is pruned or actual G-score is calculated and compared. A particular node is not added until we can verify that it is not dominated by any other node This is done by recursively calling all the roots

Evaluations

Case Study Dataset Data Processing Run Algorithm
10,000 taxis in Shenzhen 31 days in August 2013 128x64 grids Data Processing Traffic flow: taxi GPS data pass two neighbor grids Baseline flow: average monthly flow crossing same boundary Separate baselines for weekends and weekdays Run Algorithm Find top 5 gathering event for every 10-minute interval

Result Pop music concert @ 8PM, August 16th Friday Two waves
Direction information G = 329 G = 459 G = 493 G = 632 G = 634 G = 506 G = 717

Performance Experiment
Same as case study For entire month of August 2013 Report average CPU time Conditions Brute Force CRF: candidate root filtering CRF+DP: add G-Graph building with Dynamic Programming CRF+DP+GPR: add G-Graph Pruning

Number of Grids From 16x32 to 64x128
SmartEdge sublinear increase, because it filters impossible roots and G-Graphs Brute-Force linear Best 50% time reduce .

Distance Threshold From 500m (1) to 4.5km (9)
Brute-Force and CRF exponential growth, because they list all possible paths when generate G-Graph DP G-Graph building reduce to super-linear Best 49% time reduce When the distance threshold increase, the algorithm need to consider a bigger range for every G-Graph. As shown in the picture, from 500 meter to 4.5 kilometer, the Brute Force method and SmartEdge without optimization grow exponentially. The dynamic programming and G-Graph pruning were able to reduce this significantly to sub-linear. The best condition can still reduce time by 50% compared to Brute Force.

Result Size k From k = 1 to 10 Increase k only impact CPF+DP+GPR, because it uses top-k list to do pruning GPR no advantage for big k Best ~50% time reduce

P-Value Threshold From 0.01% to 0.1%
More total number of significant flows All linear increase, but DP and GPR grow slower Best 52% time reduce

Conclusion An computationally effective solution to the problem of “early detection of gathering events” Offer gathering directions in addition to destinations G-Graph, G-Score Smart Edge algorithm with two optimizations 50% time reduce over Brute-Force approach

Questions

DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia

Similar presentations

Presentation on theme: "DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia

Similar presentations

Presentation on theme: "DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia"— Presentation transcript:

Similar presentations

About project

Feedback