Efficient Processing of Top-k Spatial Preference Queries

Efficient Processing of Top-k Spatial Preference Queries
João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg VLDB’ Seattle, USA

Outline Top-k spatial preference queries Current approaches
Our approach Mapping to distance-score space Query processing Materialization (index construction) Experimental evaluation Conclusion VLDB’ Seattle, USA

Motivation Increasing number of Web information systems specialized in location-based queries Systems are limited to simple spatial queries Example: return objects in a given spatial location Top-k spatial preference query Ranks data objects based on the score of feature objects in their spatial neighborhood Combines spatial and non-spatial scores Limited to queries restricted to spatial constraints This query take in account the quality (score) of the features VLDB’ Seattle, USA

Top-k spatial preference queries
Given a set of data objects and scored feature objects hotel bar café y b1(0.9) b3(0.3) b2(0.6) Query Spatial neighborhood Features of interest (e.g., bars) c1(0.6) Top-1 p2 Returns Ranked set of k best data objects p1 Top-1 c2(0.4) Score of a data object Obtained from feature objects in its spatial neighborhood c4(0.8) c3(0.2) p3 Top-1 x VLDB’ Seattle, USA

Score function Aggregation of partial scores Partial score
Any monotone function: sum, max, and min Partial score Score of a data object for a set of feature objects Defined by the score of a single feature object Highest score Satisfies the spatial constraint Spatial constraint Range, nearest neighbor, and influence VLDB’ Seattle, USA

Example (agg=sum) score(p)=1.5 score(p)=1.0 score(p)=0.6 Range
Nearest neighbor Influence score(p)=1.5 score(p)=1.0 score(p)=0.6 VLDB’ Seattle, USA

Current approaches Naïve State-of-the-art [1,2]
Compute the score of all objects, select the top-k Very costly State-of-the-art [1,2] Data objects and feature objects are indexed by multi-dimensional indices [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. VLDB’ Seattle, USA

Current approaches Probing algorithms (SP and GP)
Requires computing the score for all objects Branch and bound algorithms (BB and BB*) Compute an upper-bound score for the entries in the data objects R-tree Prune entries whose upper-bound score is smaller than the score of the k-th object found Feature join algorithm (FJ) Create combinations of feature sets with high score Combinations whose score is smaller than the score of the k-th object found are pruned VLDB’ Seattle, USA

Motivation behind our idea…
Few feature objects are necessary to compute the score of a data object Features not dominated by any other feature in terms of both distance and score Nice properties Small size in practice Sufficient to support any neighborhood condition and query parameter y c1(0.5) c2(0.6) p1 ? c4(0.4) c5(0.8) c3(0.2) Make dominate clear x hotel café VLDB’ Seattle, USA

Our framework Mapping to distance-score space Identify SKY(p, Fi)
Pairs of objects (p, t) with t  Fi to be examined Identify SKY(p, Fi) Minimum set of pairs required to compute the score of p according to Fi for any query Materialize SKY(p, Fi) Stored in a R-tree, one R-tree Ri per feature set Fi Efficient query processing and maintenance Query processing algorithm VLDB’ Seattle, USA

Mapping to the distance-score space
pair (p2,c) pair (p1,c) café hotel (p2,c1) (p1,c1) p1 c3(0.5) c1(0.9) c4(0.3) c2(0.7) p2 (p1,c2) (p2,c3) (p2,c2) (p1,c3) (p2,c4) (p1,c4) Mapping Pairs (object, feature) Space [distance X score] Skyline Minimize: distance Maximize: score VLDB’ Seattle, USA

Theoretical properties
SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query Maintaining SKY(p, Fi) is sufficient to answer any spatial preference query (stored in an R-tree) SKY(p, Fi) is the minimum set required The data required to process range queries permits processing nn and influence queries The proofs of the theorems can be found in the paper VLDB’ Seattle, USA

Access to partial scores
Only node entries that satisfy the spatial constraint are accessed Items are retrieved in decreasing order of score Minor modifications to support nn and influence root: e1 e2 Max-heap: <p3(0.8),p2(0.6)> Max-heap: <e1(0.8) > e1: (p3,t4) (p2,t1) (p1,t3) e2: (p3,t4) (p2,t4) (p3,t4) VLDB’ Seattle, USA

Query processing Compute top-k data objects progressively aggregating partial scores retrieved from Ri Similar to Fagin’s algorithm (NRA) Algorithm Each time an object p is retrieved from Ri, any unseen object p’ in Ri has a score(p’) ≤ score(p) Keep track of lower and upper-bound score of the seen objects Terminates when the lower-bound of the k-th object is better than the upper-bound of the remaining objects VLDB’ Seattle, USA

Example (range, r=4.5) + R1 p3(0.8) p1(0.9) R2 = 1.7 r=4.5 r=4.5 hotel
restaurant bar R1 p3(0.8) p1(0.9) R2 + = 1.7 Object R1 R2 Score Upper-bound p3 0.8 - 1.7 p1 - 0.9 1.7 VLDB’ Seattle, USA

Example (range, r=4.5) + R1 p2(0.6) R2 = 1.2 r=4.5 r=4.5 Object R1 R2
Score Upper-bound p3 0.8 - p1 0.9 1.4 1.5 p2 0.6 1.2 VLDB’ Seattle, USA

Example (range, r=4.5) + R1 p1(0.2) p3(0.3) R2 = 0.5 Top-1 r=4.5 r=4.5
Object R1 R2 Score Upper-bound p3 0.8 p1 0.9 p2 0.6 1.2 0.3 1.1 Top-1 0.2 1.1 VLDB’ Seattle, USA

Materialization Objects are partitioned into regions
The distance among objects in the same region is small The skyline set of the objects in the same region is similar with high probability Compute SKY(R, Fi) for the region R SKY(p, Fi)  SKY(R, Fi), ∀p  R Advantage The feature set is accessed only once to compute the dynamic skyline of all objects in the region Should I explain dynamic skyline? VLDB’ Seattle, USA

Experimental evaluation
We compare our approach (SFA) against SP, GP, BB, BB*, and FJ algorithms [1,2] All approaches are implemented in Java Measures: response time, I/O, update time, index construction time, and index size [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. VLDB’ Seattle, USA

Variables studied Data distribution Cardinality (object and features)
Uniform (UN), Synthetic (CN), Real (RL) Cardinality (object and features) 50K, 100K, 200K, 400K, 800K, 1600K Number of results (k) 10, 20, 30, 40, 50 Number of feature sets 1, 2, 3, 4 5 Query range (r), for range and influence queries 10, 40, 160, 640, 2560 VLDB’ Seattle, USA

Number of feature objects
Datasets Datasets Number of data objects Number of feature objects Dynamic skyline set Wal-Mart (WM) 11K 4K 1.98 Hotels (HT) 31K 4.82 Synthetic (CN) 100K 11.26 Uniform (UN) 12.04 VLDB’ Seattle, USA

Number of features a) I/O varying the number of feature sets
b) response time varying the number of feature sets VLDB’ Seattle, USA

Scalability b) response time varying |O| a) response time varying |Fi|
VLDB’ Seattle, USA

Real datasets a) range b) influence c) nearest neighbor
VLDB’ Seattle, USA

Conclusion Top-k spatial preference queries are a useful tool for novel location-based applications We propose a new approach for processing top-k spatial preference queries efficiently We find and materialize SKY(p, Fi) We prove that SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query The size of SKY(p, Fi) is small in practice We propose algorithms to process queries using our index The efficiency of our approach is verified through experiments on synthetic and real datasets VLDB’ Seattle, USA

Thanks! More information: João B. Rocha-Junior VLDB’ Seattle, USA

Efficient Processing of Top-k Spatial Preference Queries

Similar presentations

Presentation on theme: "Efficient Processing of Top-k Spatial Preference Queries"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Processing of Top-k Spatial Preference Queries

Similar presentations

Presentation on theme: "Efficient Processing of Top-k Spatial Preference Queries"— Presentation transcript:

Similar presentations

About project

Feedback