Presentation is loading. Please wait.

Presentation is loading. Please wait.

A K-Main Routes Approach to Spatial Network Activity Summarization

Similar presentations


Presentation on theme: "A K-Main Routes Approach to Spatial Network Activity Summarization"— Presentation transcript:

1 A K-Main Routes Approach to Spatial Network Activity Summarization
Authors: Dev Oliver Shashi Shekhar James M. Kang Renee Bousselaire Abdussalam Bannur

2 Outline Motivation Problem Statement Contributions Validation
Analytical Experimental Case Studies Summary and Future Work

3 Motivation: Crime Analysis (application domain)
Street Place Neighborhood Crime hotspot Area of concentrated crime **J. E. Eck et. al. Mapping Crime: Understanding Hot Spots. US National Inst. of Justice (http://www.ncjrs.gov/pdffiles1/nij/ pdf), 2005. “Most clustering algorithms will show areas of concentration even when a line is the most appropriate dimension.” – National Institute of Justice** Star Tribune, January 26, 2011

4 Examples of Linear Patterns
Linear patterns resulting from deforestation in Brazil Linear patterns of crime in a major US city

5 Motivation: Environmental Criminology (scientific domain)
Spatial theories in Environmental Criminology Routine Activity Theory1 Crime location related to criminal’s frequently visited areas Crime Pattern Theory2 Based on spatial model Nodes (e.g. home, work, entertainment), Paths (e.g. routes between nodes), Edges Crime locations close to edges Near criminal’s activity boundaries where residents may not recognize him/her Source: Rossmo, Kim (2000). Geographic Profiling. Boca Raton, FL: CRC Press. Network based summarization adds value to Environmental Criminology Assist with large scale verification of real-world data matching theories Opportunities to develop hypotheses for new theory formulation 1L.E. Cohen et al., Social change and crime rate trends: A routine activity approach, American sociological review, 1979. 2P. L. Brantingham et al., Environmental Criminology, Waveland Press, 1990.

6 Other Domains Disaster Relief Accident Analysis and Prevention

7 Each edge has a weight of 1
Key Concepts Activity Object of interest located at node or edge Summary path A path chosen by KMR to summarize activities Activity coverage Total number of activities of a path or set of paths Active node A node having n ≥ 1 activities or joined by an edge having n ≥ 1 activities e.g., A, B, C, D, E Inactive node A node having n = 0 activities and joined by edges all having n = 0 activities e.g., F Active node ratio Total # active nodes/Total # nodes e.g., 5/6 Each edge has a weight of 1 7

8 Given P = the set of Shortest Paths
Problem Statement Given P = the set of Shortest Paths Given A spatial network G = (N, E) A set of activities, A and their locations (e.g. a node or edge) A set of Paths, P K (Number of routes) Edge weights Find A cardinality k subset P′ of P, i.e., a subset P′⊆ P with |P′| = k Objective Maximize the activity coverage (AC) by P′ Constraints 1 ≤ k ≤ |P|. k = 2 Edge Weights are 1 8

9 Challenges Measures of interestingness Computational Complexity
Activity coverage, average distance, etc Computational Complexity Choose(N,2) paths, given N nodes Exponential number of k subsets of paths 9

10 SNAS is NP-Complete (Proof Sketch)
Devising an NP-Completeness proof for decision problem Π [1] Show that Π is in NP Select a known NP-Complete Problem Π’ Construct a transformation f from Π’ to Π Prove that f is a polynomial transformation 1M. Garey and D. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness. WH freeman San Francisco, 1979.

11 Step 1: SNAS is in NP Verify in polynomial time whether activity coverage of P’ ≥ B SNAS Decision problem Given A spatial network G = (N, E) A set of activities, A and their locations (e.g. a node or edge) A set of Paths, P K (Number of routes) Edge weights B (bound on number of activities) Find A cardinality k subset P′ of P, i.e., a subset P′⊆ P with |P′| = k Objective Activity coverage (AC) by P′ ≥ B Constraints 1 ≤ k ≤ |P|.

12 Step 2: Select a known NP-Complete Problem
Maximum Coverage Input Sets s1, s2, …,sm (the sets may have some elements E = {e1, e2, …, en} in common) A number k < min (m,n) Output k sets such that the maximum number of elements are covered, i.e. the union of the selected sets has maximal size.

13 Step 3: Construct a transformation f from Π to Π’ (1/3)
Known NP Complete Problem Polynomial transformation A new Problem Solution to New Problem Solution to NP-Complete Problem Maximum Coverage Problem SNAS

14 Step 3: Construct a transformation f from Π to Π’ (2/3)
Maximum coverage input to SNAS input Impose a total order, TO, to m elements E = {e1, e2, …, en} Convert each element in E into a node with one activity Convert each set si to a path pi Sort elements in si using TO Add edge (eij, eij+1) ∀ j ∈ 1 …. |si| Example Maximum Coverage: E = {e1, e2, e3, e5, e6} K = 2 S1 = {e1, e2} S2 = {e2, e3} S3 = {e1, e2, e3} S4 = {e5, e6} KMR: P = {(e1→e2), (e2→e3), (e1→e2→e3), (e5→e6)} K = 2 Activity = {a1, a2, a3, a5, a6} Activity node = {a1–e1, a2–e2, a3–e3, a5–e5, a6–e6} Candidate Solutions: (e1→e2→e3), (e5→e6) e1 e2 e3 e5 e6

15 Step 3: Construct a transformation f from Π to Π’ (3/3)
SNAS output to maximum coverage output For each K route, Ri, produced by SNAS, convert the activities on the route into elements and form a set Si Example Given the K Routes: (e1→e2→e3), (e5→e6) S1 = {e1, e2, e3} S2 = {e5, e6}

16 Network Summarization by Grouping/Clustering
Related Work Network Summarization by Grouping/Clustering Zero or One routes Multiple routes Clumping (Okabe), e.g. NT-VCM (Shiode) Max. Subgraph, e.g. path, tree (Buchin) Our Work 16

17 Contributions K-Main Routes (KMR) algorithm
Finds a set of k routes to group activities New design decisions added Network Voronoi Activity assignment Divide and Conquer Summary path recomputation Spatial network activity summarization is shown to be NP-complete. Analytically demonstrate correctness of design decisions and show cost analysis Experimental evaluation of the various algorithms Performance evaluated using synthetic and real world datasets Case study comparing KMR with geometry based summarization 17

18 K-Main Routes (KMR) Algorithm
P = the set of Shortest Paths, K=2 K-Main Routes (KMR) Algorithm K-Main Routes Algorithm Select k paths as initial summary paths Repeat Form k clusters by assigning each activity to its closest summary path Recompute summary path of each cluster Until summary paths do not change Design Decisions Inactive node pruning Network Voronoi Activity assignment Divide and Conquer Summary path recomputation The lower left graph shows 2 active nodes N7 and N8. With inactive node pruning, we would only need to calculate and store shortest paths between these 2 nodes, as opposed to calculating and storing shortest paths between all the nodes in the graph. 18

19 Design Decision: Inactive Node Pruning
Only consider paths between active nodes Optimal solution will still be in this set Given the set of shortest paths 20 shortest paths calculated and stored versus 30

20 Design Decision: Network Voronoi (NV) Activity Assignment
Goals Form k clusters by assigning each activity to its closest summary path Improve execution time of current assignment strategy Example (execution trace) Next K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat Network Voronoi Activity Assignment Recompute summary path of each cluster Until summary paths do not change K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat Form k clusters by assigning each activity to its closest summary path Recompute summary path of each cluster Until summary paths do not change

21 Design Decision: Network Voronoi (NV) Activity Assignment
X Open: X A E D H Closed: X A B 3 4 C 7 8 D ACTIVITIES 1 9 2 10 1 2 3 4 5 6 7 8 9 10 A E D H AE DH E 5 6 F G H DISTANCE FROM Activity Active Node Inactive Node Virtual Node Summary Path Edge weight = 1 Edge weight = 0 Closed Node

22 Design Decision: Network Voronoi (NV) Activity Assignment
X Open: A E D H B Closed: X A 1 A B 3 4 C 7 8 D ACTIVITIES 1 9 2 10 1 2 3 4 5 6 7 8 9 10 A E D H AE DH E 5 6 F G H DISTANCE FROM 1 < 0? Activity Active Node Inactive Node Virtual Node Summary Path Edge weight = 1 Edge weight = 0 Closed Node

23 Design Decision: Network Voronoi (NV) Activity Assignment
X Open: E D H B F Closed: X A E 1 A B 3 4 C 7 8 D ACTIVITIES 1 9 2 10 1 2 3 4 5 6 7 8 9 10 A E D H AE DH E 5 6 F G H DISTANCE FROM 1 Activity Active Node Inactive Node Virtual Node Summary Path Edge weight = 1 Edge weight = 0 Closed Node

24 Design Decision: Network Voronoi (NV) Activity Assignment
X Open: D H B F C Closed: X A E D 1 1 A B 3 4 C 7 8 D ACTIVITIES 1 9 2 10 1 2 3 4 5 6 7 8 9 10 A E D H AE DH E 5 6 F G H DISTANCE FROM 1 1 < 0? Activity Active Node Inactive Node Virtual Node Summary Path Edge weight = 1 Edge weight = 0 Closed Node

25 Design Decision: Network Voronoi (NV) Activity Assignment
X Open: H B F C G Closed: X A E D H 1 1 A B 3 4 C 7 8 D ACTIVITIES 1 9 2 10 1 2 3 4 5 6 7 8 9 10 A E D H AE DH E 5 6 F G H DISTANCE FROM 1 1 Activity Active Node Inactive Node Virtual Node Summary Path Edge weight = 1 Edge weight = 0 Closed Node

26 Design Decision: Network Voronoi (NV) Activity Assignment
X Open: B F C G 2 < 1? Closed: X A E D H B 1 1 A B 3 4 C 7 8 D ACTIVITIES 1 9 2 10 1 2 3 4 5 6 7 8 9 10 A E D H AE DH E 5 6 F G H DISTANCE FROM 1 1 1 1 2 < 1? Activity Active Node Inactive Node Virtual Node Summary Path Edge weight = 1 Edge weight = 0 Closed Node 1 1

27 Design Decision: Network Voronoi (NV) Activity Assignment
X Open: F C G Closed: X A E D H B F 1 1 A B 3 4 C 7 8 D ACTIVITIES 1 9 2 10 1 2 3 4 5 6 7 8 9 10 A E D H AE DH E 5 6 F G H DISTANCE FROM 1 1 1 1 2 < 1? Activity Active Node Inactive Node Virtual Node Summary Path Edge weight = 1 Edge weight = 0 Closed Node 1 1

28 Design Decision: Network Voronoi (NV) Activity Assignment
X Open: C G Closed: X A E D H B F C 1 1 A B 3 4 C 7 8 D ACTIVITIES 1 9 2 10 1 2 3 4 5 6 7 8 9 10 A E D H AE DH E 5 6 F G H DISTANCE FROM 1 1 1 1 2 < 1? Activity Active Node Inactive Node Virtual Node Summary Path Edge weight = 1 Edge weight = 0 Closed Node 1 1 1 1 1 1

29 Design Decision: Network Voronoi (NV) Activity Assignment
Network Voronoi Activity Assignment algorithm Input: Graph G = (N, E), a set of Activities A, a set of k Summary Paths, S Output: A set of k clusters formed by assigning all ai ∈A to one si ∈S, where dist(ai, si) ≤ dist(ai, sj) and sj ∈S and sj ≠ si 1. Open ← all nodes ∈ S, Closed ← Ø 2. Tnodes ← all nodes ∈ S, 3. Tactivities ← activities on si ∈S 4. repeat nc ← next node ∈ Open remove nc from Open Closed ← nc X ← neighbors of nc foreach xi ∈ X if xi ∉ Tnodes and xi ∉ Closed Tnodes ← xi xi.prev ← nc, xi.dist ← dist(xi, nc) + nc.dist xi.sp ← nc.sp else if xi ∈Tnodes update xi if new dist < xi.dist if xi ∉ Open Open ← xi Y ← activities on edge {nc, xi} foreach yi ∈ Y if yi ∉ Tactivities Tactivities ← yi yi.prev ← nc yi.dist ← xi.dist yi.sp ← xi.sp else update yi if new dist < yi.dist until all active nodes ∈ Closed return currentClusters

30 Design Decision: Divide and Conquer Summary PAth REcomputation
Goals Recompute the summary path of each cluster Improve execution time of current recomputation strategy Example (execution trace) Next K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat Network Voronoi Activity Assignment Divide and Conquer Summary path Recomputation Design Decision Until summary paths do not change K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat Network Voronoi Activity Assignment Recompute summary path of each cluster Until summary paths do not change

31 Design Decision: Divide and Conquer Summary PAth REcomputation
Summary Path Recomputation Algorithm Input: Graph G = (N, E), a set of Clusters, C Output: A set of summary paths, S where si ∈S has max coverage for ci ∈ C and si ∈ ci nextClusters ← Ø foreach ci ∈ C X ← active nodes of ci maxP ← Ø foreach xi ∈ X foreach xj ∈ X if (i ≠ j) cP ← getSP(xi, xj) if (maxP = Ø) maxP ← cP if (maxP.activities < cP.activities) if (maxP ≠ ci.summaryPath nextClusters ← maxP else nextClusters ← ci.summaryPath return nextClusters A B C D E F G H 1 2 3 4 5 6 7 8 9 10 Activity Active Node Inactive Node Summary Path Edge weights are 1 Cluster

32 Validation Analytical Experimental Case studies
Cost analysis explaining computational savings Experimental Comparative analysis of KMR with various design decisions Performed on real and synthetic data Network voronoi activity assignment and divide and conquer summary path recomputation saves computational costs Savings increase with number of nodes, routes, activities and active node ratio Case studies Qualitatively shows the usefulness of network based summarization on Crime data

33 Analytical Evaluation: Computational Analysis
KMR Execution Time = Number of Iterations × (Activity Assignment Cost + Summary Path Recomputation Cost) TKMR = I × ([K × |A| × cost(ai,ci)] + [K × dc × |N|2]) TKMR_I = I × ([K × |A| × cost(ai,ci)] + [K × dc × (|N| × r)2]) TKMR_IAS = I × ([|E| + |N|×log |N|] [K × dc × (|N|/K × r)2]) I = Number of Iterations K = Number of Clusters A = Set of activities cost(ai, ci) = Cost of calculating the distance between activity ai and cluster ci dc = Cost of looking up a path N = Set of Nodes E = Set of Edges r = active node ratio, 0 ≤ r ≤ 1

34 Experimental Evaluation
Variables Synthetic Dataset Real Dataset #Nodes #Routes Measures Java-based Simulator Analysis #Activities Active Node Ratio Candidates KMR_I KMR_IV KMR_ID KMR_IVD Goal: Comparative analysis Candidates: KMR with various design decisions KMR_I – KMR with inactive node pruning KMR_IV – KMR with inactive node pruning and Network voronoi activity assignment KMR_ID – KMR with Divide and conquer summary path recomputation KMR_IVD – KMR with all three design decisions Measure: CPU time (Unix time command) Platform: Mac Pro, 2 x Xeon Quad Core 2.26 GHz, 16 GB RAM Variables: #Nodes, #Routes, #Activities, Active Node Ratio Fixed Parameters: unit edge length Datasets: Synthetic and Real (Haiti Earthquake) 34

35 Data Description and Characteristics
Synthetic Data 2010 Census TIGER/Line® Shapefiles used for road network Activities randomly assigned to each edge Real-world data: Haiti Data Set Geospatial and Temporal Dataset describing recent events post-disaster Dataset collected from Jan 12, 2010 to March 23, 2010 1,677 records Characteristics Attributes Incident Title (e.g., “Food, Water, Tents needed…”) Incident Date and Time Location (City, port name) Category (numeric category) Latitude/Longitude Sources Crisis Map of Haiti - OpenStreetMap - 35

36 Effect of Number of Nodes
Synthetic Data Set Number of Activities = 1200 Active Node Ratio = 0.2 K = 2 Real Data Set Number of Activities = 1206 Active Node Ratio = K = 2 Trends: Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs Savings increase with number of nodes

37 Effect of Number of Routes, K
Synthetic Data Set Number of Nodes = 1000 Number of Activities = 1200 Active Node Ratio = 0.2 Real Data Set Number of Nodes = 1000 Number of Activities = 202 Active Node Ratio = 0.219 Trends: Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs Savings increase with number of routes

38 Effect of Number of Activities
Synthetic Data Set Number of Nodes = 1000 Active Node Ratio = 0.2 K = 2 Trends: Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs Savings increase with number of activities

39 Effect of Active Node Ratio
Synthetic Data Set Number of Nodes = 1000 Number of Activities = 1200 K = 2 Trends: Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs Savings increase with active node ratio

40 Case Study: Crime Analysis
Input (a set of crime incidents, k=5) KMR Output Crimestat K-Means (Euclidean distance) Crimestat K-Means (Network distance)

41 Case Study: Crime Analysis
Input (a set of crime incidents, k=5) KMR Output Crimestat K-Means (Euclidean distance) Crimestat K-Means (Network distance)

42 Case Study: Crime Analysis
Input (a set of crime incidents, k=5) KMR Output Crimestat K-Means (Euclidean distance) Crimestat K-Means (Network distance)

43 Summary Spatial network activity summarization was shown to be NP-complete. K-Main Routes (KMR) algorithm and its design decisions described Inactive node pruning Network Voronoi Activity assignment Divide and Conquer Summary path recomputation Analytically demonstrated correctness of design decisions and cost analysis showed Experimental evaluation Performance evaluated using synthetic and real world datasets Case study comparing KMR with geometry based summarization 43

44 Future Work Short Term Long Term Usefulness
When is it useful to domain professionals (crime analysts, emergency managers)? For which use cases is the proposed solution appropriate? For which geographies is the proposed solution appropriate? Distance based objective function instead of coverage based Overlapping paths Long Term Dynamically changing incidents Edge lengths, e.g. activities on a small section of a long edge 44

45 Acknowledgements Members of the Spatial Database and Spatial Data Mining Research Group, University of Minnesota, Twin-Cities. This work was supported by grants from USARMY and USDOD. Thank you for your time! Any questions or comments?


Download ppt "A K-Main Routes Approach to Spatial Network Activity Summarization"

Similar presentations


Ads by Google