Location-Based Services & Continuous kNN Query Processing Tai Do Data Systems Group, UCF Fall 2005.

Location-Based Services & Continuous kNN Query Processing Tai Do Data Systems Group, UCF Fall 2005

Outline Introduction to Data Management in Mobile Computing. Discussion on Location-Based Services and its enabling technologies. In-depth discussion on Continuous kNN queries (2 recent papers [MHP05], and [XMA05])

Data Management in Mobile Computing Our interest: application-driven research that involves data management in mobile computing. Services/applications that inspire data management research: Location-based services, Transactional services, Data mining applications. Research problems to support these novel services efficiently: Spatiotemporal Query Processing. Data dissemination over limited bandwidth channels. Data consistency guarantees. Advanced interfaces for mobile computers.

Location-Based Services (LBS) Location-Based Services can be defined as services that integrate a mobile device’s location or position with other information so as to provide added value to a user. Examples: Military and Government industries Emergency services (E911 in US and 112 in Europe) Commercial Sector: Advanced Traveler Information Systems (DoT), location-aware games, Advertising services Commercial potentials of LBS([S03]): Optimistic prediction: $4B by 2002, $81.9B by 2005 (Europe only) Pessimistic prediction: $11M by 2002, $167M by 2005 (USA only). Enabling Technologies: Mobile Positioning Methods Location Update Techniques Location-based Query Processing

Mobile Positioning GPS: Global Positioning System. Accuracy: up to 3 meters or more. Cell-ID (Europe): Accuracy: 100m-3km Overview of LBS app. And level of accuracy required ([SV04])

Location Update Techniques Dead-Reckoning Location Update Policies ([GS05]) Periodic Updates

Concept of Uncertainty Uncertainty is an inherent feature in databases storing location information. Sources of uncertainty: Mobile Positioning Methods Location Update Techniques Capturing uncertainty in the model and query language is an ongoing research.

Location-Based Queries Two kinds of location-based queries: Snapshot queries: “Tell me 3 nearest cars around me now” Continuous queries: “Monitor 3 nearest restaurants around me in the next 10 minutes” We focus on continuous kNN (CkNN) query processing. Main memory solution: Conceptual Partitioning Model CPM {MHP05} Disk-based solution: Shared Execution Algorithm SEA-CNN {XMA05}

Common Assumptions ParametersValues Underlying network Unconstrained (Euclidean)Transportation Network (shortest path) Movement patternUnpredictableTrajectory Location UpdateQuery-Aware (safe region)Query-Blind (periodic OR dead reckoning) MutabilityMoving queries over static objects Static queries over moving objects Moving queries over moving objects Processing TypeDistributedCentralized StorageDisk-residentMain memory

SEA-CNN: Over view Overview: Objects are stored in disk, everything else is in memory. Centralized processing. Support all kinds of mutability between objects and queries. No movement pattern, in open space. Goal: Minimize I/O cost, and CPU time. Two important features: Incremental evaluation of queries Shared execution

SEA-CNN: Data Structures

SEA-CNN: Incremental Search Key points: For each query q, define a search region based on past answer and recent movements of q and objects. Only objects inside search region are checked against q. Given q.AR t0 as the answer radius of q at time t 0 (q.AR = distance from q to kth-NN object) At time t 1, the search radius of query q (q.SR t1 ) is computed as follows: Step 1: check if any object moves in q.AR t0 during [t 0, t 1 ]. If yes, q.SR t1 = q. AR t0. If no, q.SR t1 = 0. Step 2: check if any object that was in q.AR t0 but moves out of q.AR t0 during [t 0, t 1 ]. If yes, q.SR t1 equals the distance from q to the furthest object. Step 3: check if q moves during [t 0, t 1 ]. If yes: If q. SR t1 =0 then q.SR t1 = q.AR t0 + |q.Loc t1 - q.Loc t0 | If q. SR t1 !=0 then q.SR t1 = q. SR t1 + |q.Loc t1 - q.Loc t0 |

SEA-CNN: Incremental Search (An Example) Q1: O 5 and Q 1 move during [T 0, T 1 ]. So Q 1.SR T1 = Q 1.AR T0 + |Q 1.Loc T1 -Q 1.Loc T0 Q2: O 8 moves out of Q 2.AR T0 during [T 0, T 1 ]. So Q 2.SR T1 = |Q 2.Loc T1 -O 8.Loc T0

SEA-CNN: Shared Execution Key points: Utilize shared execution to reduce repeated I/O operations. Group similar queries together. Evaluating this set of queries is reduced to a spatial join between the objects and the queries.

SEA-CNN: Algorithm

CPM: Overview Overview: Objects and queries are stored in memory. Centralized processing. Support all kinds of mutability between objects and queries. No movement pattern, in open space. Goal: Minimize CPU time. Important features: Conceptual Partitioning Simulate traditional kNN search (using branch-and-bound search with breadth-first (or best-first) traversal) Roadmap: Initial NN Computation (conceptual partitioning + branch and bound search + breadth-first traversal) Handling Updates

CPM: Data Structures

CPM: NN Computation (Conceptual Partitioning) Conceptual Partitioning : What is CP? Partitioning of cells into rectangles based on proximity to the query cell. Each rectangle has direction and level. Why CP? A natural processing order of the cells. Facilitate NN search (search minimal set of cells).

CPM: NN Computation (Algorithm by Example) Search heap content (always sorted): H ={,,,, } Deheap c 4 : do nothing. Deheap U 0 : insert cells of U 0 Insert U 1 Continue until deheap and find 1 st candidate p 1: best_dist = dist(p 1, q) = 1.7 Continue until deheap c 2,4 and find p 2 : best_dist = dist(p 2, q) = 1.3 Terminate because the next entry in the heap has min_dist >= best_dist

CPM: Handling Updates Key Points: Focus on moving objects, static queries. Moving queries are treated as new queries. Reexamine only queries whose influence regions overlap with updated cells. Re-compute affected queries incrementally based on book keeping information to save computation time.

CPM: Handling Updates (Algorithm by Example) p 2 moves from c 2,4 to c 0,6 c 2,4 has q in the influence list and dist(q, p 2 ’) > best_NN = dist(q, p 2 )  mark q as affected query. c 0,6 has an empty influence list  ignore Re-compute NN for q in the NN Re-computation algorithm NN Re-computation Algorithm Input: grid G, affected query q Output: new NN for q /* Similar to NN Computation. Utilize the book keeping information in visit_list and search heap */

SEA-CNN & CPM: A Comparison Common features between the two: Performance metrics: Use query processing time (or CPU time) at the centralized server as the primary metric. Ignore communication cost. Employ Grid-based Indexing (simple, fast maintenance). Keep a search region for each query to handle updates. Are the differences significant? CPM saves some computations over SEA-CNN (as shown in the CPM paper) because CPM uses an optimal search algorithm. However, is saving in CPU time still very important?

Summary Monitoring queries to support LBS is an intensive research area in the past few years: Short-term research trend seems to be proposals of new, more advance query types (our next presentation will discuss Reverse NN, and Group NN). Long-term research could be a Moving Object Databases. Recommend: “Moving Objects Databases” textbook to gain perspective: Location-management perspective vs. spatio-temporal data perspective. Many LBS-based commercial products: Verilocation, uLocate, meetro, EarthComber, CellSpotting. Standards and Development Software: Natural Area Coding System, Mobile Location Services Reference Architecture by Sun. For LBS updated info: try LBSZone.

References {B99} D. Barbara. "`Mobile Computing and Databases- A Survey.“ In {\em IEEE Transactions of Knowledge and Data Engineering, 11(1), 108-117, 1999.} {S03} http://www.wirelessdevnet.com/features/nacjan03/http://www.wirelessdevnet.com/features/nacjan03/ {GS05} R. H. Guting, M. Schneider. Moving Object Databases. Book. {SV04} J. Schiller, A. Voisard. Location-based Services. Book. {MHP05} Kyriakos Mouratidis, Marios Hadjieleftheriou, Dimitris Papadias. Conceptual Partitioning: An Efficient Method for Continuous Nearest Neighbor Monitoring Nearest Neighbor Monitoring. SIGMOD 2005. {YPK05} Yu, X., Pu, K., Koudas, N. Monitoring K-Nearest Neighbor Queries Over Moving Objects. ICDE, 2005. {XMA05} Xiong, X., Mokbel, M., Aref, W. SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases. ICDE, 2005. {CDT+00} Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In SIGMOD, 2000. {CF02} Sirish Chandrasekaran and Michael J. Franklin. Streaming Queries over Streaming Data. In VLDB, 2002. (Psoup system).

Note Due date of your presentation slides is November 14 2005.

Aggregate NN Queries in Spatial Databases and Location-based Services Tai Do Data Systems Group, UCF November 11, 2005

Outline Aggregate Nearest Neighbor (ANN) queries: Introduction to ANN. Solutions for Group Nearest Neighbor (GNN) Queries, a specific type of ANN. Solutions for Continuous Group Nearest Neighbor Queries (CGNN).

Aggregate NN: Examples and Applications Applications: Business decision making (construction of new facilities) Military Rescue (earliest pick-up time) Severe weather monitoring (most dangerous area)

Aggregate NN: Definition What is ANN? A generalized form of NN search (multiple query points vs. single query point) Formally: Given P = {p 1, …, p N } (set of data points), Q={q 1,…q n } (set of query points) Aggregate distance function adist(p, Q) = f(|pq 1 |, …, |pq n |) An ANN query returns the data point p with the minimum aggregate distance Note: AkNN is similar (find k >=1 data points), we only focus on ANN. When f= sum, the ANN is called Group Nearest Neighbor Queries.

Group NN Queries Assumptions: Queries are in memory. Data points are in disk and indexed by R-tree. Goal: Minimize the extent and cost of the search (I/O and CPU time) Roadmap: 3 solutions Multiple query method Single point method Minimum bound method

Multiple Query Method (MQM) Apply multiple conventional NN queries, then combine the results. MQM is a straightforward application of the threshold algorithm ([FLN03]): Each query point visits incrementally its NN data points (1 st NN, then 2 nd NN, …) Compute the aggregate distance of the current NN data point Do the two above steps until we have seen the best data point. Main idea: Question: how do we know that the aggregate distance of the seen data point is smaller than the aggregate distance of unseen data points? Answer: Predict minimum aggregate distance of unseen data points (or in other words, use a threshold)

MQM: An Example (1) Q = {q 1, q 2 } P = {p 1,…, p 12 }

MQM: An Example (2) IDDist(q 1 )Dist(q 2 )Sum/adist q1q1 q2q2 T= 0, best_dist = , best_NN = null t 1 = 0t 2 = 0

IDdist(q 1 )dist(q 2 )Sum/adist q1q1 q2q2 MQM: An Example (3) T= t 1 + t 2 = 2 + 0 = 2 t 1 = 2 Step 1: Find the next (1 st ) NN of q 1 Update t 1 and T (p 10, 2)

IDdist(q 1 )dist(q 2 )Sum/adist q1q1 q2q2 T = 2 t 1 = 2 Step 2: if the current aggregate distance < best_dist ? Update best_dist and best_NN If current best aggregate distance <= T ? Stop Else go to the next NN of the next query point and repeat step 1 (p 10, 2) p 10 257 MQM: An Example (4) best_dist =  best_dist = 7best_NN = p 10

MQM: An Example (5) IDdist(q 1 )dist(q 2 )Sum/adist T = t 1 + t 2 = 2 + 3 = 5 t 1 = 2 (p 10, 2) best_dist = 7best_NN = p 10 q1q1 q2q2 Step 1: Find the next (1 st ) NN of q 2 Update t 2 and T 25p 10 7 t 2 = 3 (p 11, 3)

IDdist(q 1 )dist(q 2 )Sum/adist q1q1 q2q2 T = 5 t 1 = 2 Step 2: if the current aggregate distance < best_dist ? Update best_dist and best_NN If current best aggregate distance <= T ? Stop Else go to the next NN of the next query point and repeat step 1 (p 10, 2) p 10 257 MQM: An Example (6) best_dist = 7 best_dist = 6best_NN = p 11 p 11 336 (p 11, 3) t 2 = 3

MQM: An Example (7) IDdist(q 1 )dist(q 2 )Sum/adist T = t 1 + t 2 = 3 + 3 = 6 t 1 = 3 (p 10, 2) best_dist = 6best_NN = p 11 q1q1 q2q2 Step 1: Find the next (2 nd ) NN of q 1 Update t 1 and T 25p 10 7 t 2 = 3 (p 11, 3) p 11 336

IDdist(q 1 )dist(q 2 )Sum/adist q1q1 q2q2 T = 6 t 1 = 3 Step 2: if the current aggregate distance < best_dist ? Update best_dist and best_NN If current best aggregate distance <= T ? Stop Else go to the next NN of the next query point and repeat step 1 (p 10, 2) p 10 257 MQM: An Example (6) best_dist = 6 best_NN = p 11 p 11 336 (p 11, 3) t 1 = 3 (p 11, 3) p 11 336 STOP No Update

Single Point Method (SPM) Problem with MQM: Multiple accesses to the same node and retrieve the same data point (e.g p 11 ) through different queries. SPM processes queries by a single traversal. Strategy: Compute the centroid q of Q, which is a point with small adist(q, Q) The GNN is a point of P “near” q. Challenges: The computation of q. The range around q, in which we should look for points of P, before we conclude that no better GNN can be found.

SPM: Illustration

SPM: The Computation of q

To define the range around q: find heuristics that can safely prune nodes in R-tree Lemma 1: For each query point q i we have |pq i | + |q i q|>= |pq| Summing up the n inequalities:  |pq i | +  |q i q| >= n*|pq|  adist (p, Q) >= n|pq| - adist (q, Q) (1) Lemma 1 can be used for pruning intermediate nodes: Node N can be pruned if mindist(N, q) >= (1/n) * [best_dist + adist(q,Q)] (2) Because: when we transform this pruning rule we have n * mindist(N, q) – adist(q,Q) >= best_dist (3) For any p in node N: dist(p,q) >= mindist(N,q), so n * dist(p,q) – adist(q, Q) >= best_dist (4) Using Lemma 1 we have adist(p, Q) >= best_dist, hence node N can be safely pruned. SPM: Finding the range

SPM: Pruning Illustration Both N 1 and N 2 can be pruned: best_dist = adist(best_NN, Q) = 9 adist(q, Q) = 3 (1/n)(best_dist + adist(q,Q)) = ½ (9 + 3) = 6 mindist(N 1,q) = 10 and mindist(N 2,q) = 6

Minimum Bound Method (MBM) Like SPM, MBM performs a single query, but uses the minimum bounding rectangle M of Q (instead of a centroid q) to prune the search space. Is MBM obviously better than SPM? No clear reason. Must evaluate through experiments. Strategy: Use good heuristics to identify the qualifying nodes

Minimum Bound Method: Heuristics Heuristic 1: A node N can’t contain qualifying points if: mindist (N, M) >= (1/n)*best_dist, because for any data point p in N adist(p, Q) >= n * mindist(N, M) >= best_dist Heuristic 1 prunes N 1 but not N 2. Heuristic 2: A node N can be safely pruned if:  (mindist(N, q i )) >= best_dist Heuristic 2 prunes both N 1 and N 2

Performance Study

Continuous Group NN Assumptions: Both query points and data points are in memory. Method: Use a grid index. Utilize conceptual partitioning of the space around query Q. Apply Minimum Bound Method.

Continuous GNN: Details amindist (c, Q) =  (qi in Q) (mindist(c, q i )). amindist(c,Q) is the lower bound of mindist(p, Q) for any data point p in cell c. The GNN computation is similar to the NN computation presented in previous class.

Summary Threshold Algorithm: Simple, useful, and reusable. Aggregate Nearest Neighbor Queries in Spatial Database: Practical applications. Good heuristics are important. Optimal ANN search remains unsolved???

References {GS05} R. H. Guting, M. Schneider. Moving Object Databases. Book. {PTM+05} D. Papadia, Y. Tao, K. Mouratidis, and C. K. Hui. Aggregate Nearest Neighbor Queries in Spatial Databases. ACM Trans. On Database Systems, Vol. 30, No. 2, June 2005, Pages 529-576. {MHP05} Kyriakos Mouratidis, Marios Hadjieleftheriou, Dimitris Papadias. Conceptual Partitioning: An Efficient Method for Continuous Nearest Neighbor Monitoring Nearest Neighbor Monitoring. SIGMOD 2005. {PST+04} Dimitris Papadias, Qiongmao Shen, Yufei Tao, Kyriakos Mouratidis.Group Nearest Neighbor Queries. ICDE 2004. {XZ06} Tian Xia, Donghui Zhang. Continuous Reverse Nearest Neighbor Monitoring. ICDE 2006. [FLN03] Ronald Fagin, Amnon Lotem, and Moni Naorc. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66 (2003) 614–656. www.cs.fiu.edu/~vagelis/ classes/COP6727/slides/fagin.ppt. The animation for the MQM comes from this.

Location-Based Services & Continuous kNN Query Processing Tai Do Data Systems Group, UCF Fall 2005.

Similar presentations

Presentation on theme: "Location-Based Services & Continuous kNN Query Processing Tai Do Data Systems Group, UCF Fall 2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Location-Based Services & Continuous kNN Query Processing Tai Do Data Systems Group, UCF Fall 2005.

Similar presentations

Presentation on theme: "Location-Based Services & Continuous kNN Query Processing Tai Do Data Systems Group, UCF Fall 2005."— Presentation transcript:

Similar presentations

About project

Feedback