Presentation is loading. Please wait.

Presentation is loading. Please wait.

Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.

Similar presentations


Presentation on theme: "Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague."— Presentation transcript:

1 Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague

2 Similarity search content-based similarity search single-example queries – range query – kNN query multi-example queries – combination of single-example queries is not sufficient – should support partial matching compromise – metric skyline DATESO 2010, Štědronín - Plazy

3 Metric skyline query (MSQ) traditional skyline operator – linearly order- ed attribute domains – dominance relation + MDDRs (minimum dominating-dominated rectangles) – static schema metric skyline – multi-example query (not just operator) – attributes specified at query time – ith attribute = distance of database object to ith query example Q i – result set interpretation: objects similar to all query examples yet distinct (dissimilar to each other) – dynamic schema, cannot be reduced to the classic skyline operator for efficient skyline processing i.e., the coordinate system is established at query time DATESO 2010, Štědronín - Plazy

4 Generic algorithm for a hierarchic metric index branch-and-bound algorithm (originally developed for R-tree and classic/spatial skyline operator) dynamic mapping of the metric space into L 1 vector space (examples) heuristics: data/regions processed in L 1 order guarantee no false dismissals – a priority heap is used, storing index entries equipped by MDDRs to be inspected (higher priority = lower L 1 order of MDDR) The algorithm: 0)The entry of the entire index is pushed on the heap (e.g., M-tree root node). 1)An entry with the lowest L 1 distance of its MDDR is popped from the heap. 2)If the entry contains just one data object (e.g., entry in an M-tree leaf), it is added to the skyline set, while removing all entries from the heap dominated by the entry. Jump to 1. 3)If the entry is a region (e.g., entry in an M-tree inner node), its child node is fetched. The MDDRs of the child node’s entries are checked for dominance by the already determined skyline set, while the dominated ones are filtered from further processing. 4)The MDDRs of the non-filtered child entries are derived, while those not dominated by the current skyline set are pushed into the heap. Jump to 1. DATESO 2010, Štědronín - Plazy L1L1 L1L1

5 M-tree metric index based on B+-tree inner node contains routing entries – ball regions (object and radius) + distance to parent region + pointer to subtree leaf node contains ground entries – object + distance to parent region 2 types of filtering by querying – parent filtering (cheap) stored distance to parent is used – basic filtering (expensive) distance computation needed DATESO 2010, Štědronín - Plazy

6 MSQ implementation using M-tree DATESO 2010, Štědronín - Plazy uses the generic algorithm enhanced by specific M-tree MDDRs, mapping the M-tree regions from metric space into L 1 vector space (dimensions are distances of data/regions to the query examples Q i ) 2 types of M-tree MDDR – Par-MDDR the mapped oversized region ball (using the distance to parent) – B-MDDR the mapped region ball

7 PM-tree combination of M-tree and pivot tables (LAESA) M-tree balls reduced by rings centered in global pivots P i routing and ground entries store also the ring radii enhanced filtering – cheaply in pivot space (mapping of data/balls into L ∞ vector space) mapping of the query object into the pivot space is the only extra computation costs – if not filtered out in pivot space, regular M-tree filtering DATESO 2010, Štědronín - Plazy

8 Paper contribution: PM-tree MSQ implementation B-MDDR, Par-MDDR (inherited from M-tree) Piv-MDDR – using PM-tree rings the MDDR can be tightened – for each dimension (example Q i ) the maximal lower bound and minimal upper bound distance to the region is found (to the rings intersection) pivot skyline – skyline initialized by pivots mapped to the L 1 space – heavy optimization (reduction of heap size) deferred heap processing – reinsertions into heap to save distance computations DATESO 2010, Štědronín - Plazy

9 Experiments subset of the CoPhIR database, one million 76-dimensional tuples representing 2 MPEG7 features on flickr images, Euclidean distance used Polygons database, 250k 30-dimensional tuples representing 5-15 vertex 2D polygons, Hausdorff distance used average over 200 metric skyline queries each metric skyline query defined by 2-5 query examples DATESO 2010, Štědronín - Plazy

10 Experiments DATESO 2010, Štědronín - Plazy

11 Experiments DATESO 2010, Štědronín - Plazy

12 Conclusions PM-tree based metric skyline query implementation – up to 2x faster in terms of distance computations and I/O cost (wrt original M-tree implementation) – up to 20x faster in terms of heap operations – needs up to 20x less space for the heap Thank you for your attention! Questions? DATESO 2010, Štědronín - Plazy


Download ppt "Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague."

Similar presentations


Ads by Google