Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem.

Slides:

Advertisements

Similar presentations

The Optimal-Location Query

Advertisements

Algorithm Design Techniques

Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.

Identifying the Most Influential Data Objects with Reverse Top-k Queries By Akrivi Vlachou 1, Christos Doulkeridis 1, Kjetil Nørvag 1 and Yannis Kotidis.

On Reinsertions in M-tree Jakub Lokoč Tomáš Skopal Charles University in Prague Department of Software Engineering Czech Republic.

Choosing an Order for Joins

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.

Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.

Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.

The Skyline Operator (Stephan Borzsonyi, Donald Kossmann, Konrad Stocker) Presenter: Shehnaaz Yusuf March 2005.

1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal

Nearest Neighbor Queries using R-trees

Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.

Efﬁcient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

2-dimensional indexing structure

1 Shooting Stars in the Sky:An Online Algorithm for Skyline Queries 作者： Donald Kossmann Frank Ramsak Steffen Rost 報告：黃士維.

On Efficient Spatial Matching Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Yufei Tao (the Chinese University of Hong Kong) Ada Wai-Chee.

1 Continuous k-dominant Skyline Query Processing Presented by Prasad Sriram Nilu Thakur.

Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.

Spatial Queries Nearest Neighbor Queries.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.

Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.

Introduction Using time property and location property from lost items’ pictures, we construct the Lost and Found System which combined with image search.

Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,

Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,

SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:

Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,

©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.

Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.

Efficient Processing of Top-k Spatial Preference Queries

Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.

Presented by Ho Wai Shing

Sorting and Searching by Dr P.Padmanabham Professor (CSE)&Director

A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES

Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.

Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.

Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.

Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.

1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree ： An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.

Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.

HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.

Online Skyline Queries. Agenda Motivation: Top N vs. Skyline „Classic“ Algorithms –Block-Nested Loop Algorithm –Divide & Conquer Algorithm Online Algorithm.

Spatial Data Management

Tian Xia and Donghui Zhang Northeastern University

SIMILARITY SEARCH The Metric Space Approach

Spatial Queries Nearest Neighbor and Join Queries.

Spatial Indexing I Point Access Methods.

Nearest Neighbor Queries using R-trees

Preference Query Evaluation Over Expensive Attributes

K Nearest Neighbor Classification

Probabilistic Data Management

Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm

Nearest Neighbors CSC 576: Data Mining.

Chapter 7: Transformations

Database Systems (資料庫系統)

The Skyline Query in Databases Which Objects are the Most Important?

Efficient Processing of Top-k Spatial Preference Queries

Faster skyline searching using Hilbert R-tree

Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1

Data Mining CSCI 307, Spring 2019 Lecture 23

Presentation transcript:

Skyline Charuka Silva

Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem  Algorithms  Divide and Conquer Algorithm  Index based Algorithm  Nearest Neighbor

Trip to Nassau (Bahamas)  Hotel that is cheap and close to the beach.  Two goals are complementary as the hotels near the beach tend to be more expensive.  Travel agent can suggest all interesting hotels.  Interesting are all hotels that are not worse than any other hotel in both dimensions.  We call this set of interesting hotels the Skyline Charuka Silva, Skyline3

Distribution of Hotels Charuka Silva, Skyline4

Formal Skyline Definition Skyline is defined as those points which are not dominated by any other point. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension. Charuka Silva, Skyline5

Where It Applies? Skyline operator is important for applications involving multi- criteria decision making. Charuka Silva, Skyline6

Some Applications  Customer information systems, travel agencies and mobile city guides. Skyline has to be computed as user move on.  The Skyline of Manhattan, for instance, can be computed as the set of buildings which are high and close to the Hudson river.  Decision Support (Business intelligence), e.g. Customers who buy more and complain little  Data visualization. E.g. The points of an object from certain perspective can be determined  Distributed Query optimization. E.g. find set of interesting sites which have high computation power and are close to data needed to execute the query. Charuka Silva, Skyline7

Skyline Query select * from Hotels, skyline of price min, distance min what else: max, joins, group by and so on. Charuka Silva, Skyline8

Skyline Query Results Results for the query will be {a,i,k} Charuka Silva, Skyline9

Top-K Queries Vs Skyline  Top-K (or ranked) queries retrieve the best K objects that minimize a specific preference function.  E.g. Given preference function f(x,y)=x+y, the top-3 query  Retrieves,, (in this order) Charuka Silva, Skyline10

Divide-and-Conquer (D&C)  Divides the dataset into several partitions so that each partition fits in memory  The partial skyline of the points in every partition is computed  Merge the partial ones to obtain full skyline Algorithm 1 ‏ Charuka Silva, Skyline11

{ a,c,g}, {d}, {i},{m,k} Partitioned Space Charuka Silva, Skyline12

Divide and Conquer  All points in the skyline of s 3 must remain.  Those in s 2 are discarded; dominated by s 3  Each skyline point in s 1 is compared only with points in s 3, no point in s 2 or s 4 can dominate those in s 1. Charuka Silva, Skyline13

Drawbacks  D&C efficient only for small data sets. If the data set is large, the partitioning process requires reading and writing entire data set at least once : high I/O cost  Not suitable for online applications: can't report any results until partition process completes. Charuka Silva, Skyline14

Index Based Skyline  Organize set of d-dimensional points into d lists, a point p = (p1, p2,..., pd) is assigned to the ith list (1≤i≤d) when pi is the smallest.  Points in each list are sorted in ascending order of their minimum  A batch in the ith list consists of points that have the same ith coordinate Algorithm 2 Charuka Silva, Skyline15

Index List Charuka Silva, Skyline16

Processing a batch  Computing the skyline inside the batch  Among the computed points, it adds the ones not dominated by any of the already-found skyline points into the skyline list Charuka Silva, Skyline17

Processing a batch  Loads the first batch of each list, and handles the one with the minimum minC ( i.e. {a}, {k} ), add {a} to the Skyline list  Compare batch {b} and {k}, and add {k} to the list.  Load {b} and {i,m} ; Find skyline inside {i,m} first, that is {i}  Compare {i} and {b} and add {i} to skyline list  Algorithm stops, since any other batch is greater than or equal to {i}  Skyline is {a,k,i} Charuka Silva, Skyline18

Pros and Cons  Hashing technique is straight forward and incurs low CPU overhead  But high I/O cost, since multiple queries access large part of space.  Propagate and merge incur high I/O cost to scan to-do lost every time when a point is discovered and when finding best fit to merge. Charuka Silva, Skyline19

Nearest Neighbor (NN)  Performs a NN query on the R-tree, to find the point with the minimum distance from the beginning of the axes (point o).  Distances are computed according to L1 norm  All the points in the dominance region are exempt from further consideration  Results of NN search is used to partition the data universe recursively. Algorithm 3 Charuka Silva, Skyline20

Nearest Neighbor (NN) Two Partitions [0,i x ) [0,∞) and (ii) [0,∞) [0,i y ) Partition1: 1, 3 Partition2: 1,2 Charuka Silva, Skyline21

Nearest Neighbor (NN)  The set of partitions resulting after the discovery of a skyline point are inserted in a to-do list  While the to-do list is not empty, NN removes one of the partitions from the list and recursively repeats the same process Charuka Silva, Skyline22

Nearest Neighbor (NN) [ 0,a x ) [0,∞) subdivisions 1 and 3 [0,i x ) [0,a y ) subdivision 1 and 2 Charuka Silva, Skyline23

NN Concepts  Laisser-faire: A main memory hash table stores the skyline points found so far.  Propagate: When a point p is found, all the partitions in the to- do list that contain p are removed and re-partitioned according to p.  Merge: The main idea is to merge partitions in the to-do, thus reducing the number of queries that have to be performed.  Fine-grained Partitioning: The original NN algorithm generates d partitions after a skyline point is found. An alternative approach is to generate 2d non-overlapping subdivisions. Charuka Silva, Skyline24

Reference  S. Borzs onyi, D. Kossmann, and K. Stocker.The skyline operator. In Proc. IEEE Conf. on Data Engineering, Heidelberg, Germany,  K.-L. Tan, P.-K. Eng, and B. C. Ooi. Ecient progressive skyline computation. In Proc. of the Conf. on Very Large Data Bases, Rome, Italy, Sept  H. T. Kung, F. Luccio, and F. P. Preparata.On finding the maxima of a set of vectors. Journal of the ACM, 22(4), 1975  Kossmann, D., Ramsak, F., Rost, S. Shooting Stars in the Sky: an Online Algorithm for Skyline Queries.VLDB,  Dimitris Papadias, Yufei Tao, Greg Fu Bernhard Seeger. An optimal and progressive algorithm for skyline queries. In Conf. on Management of Data ACM SIGMOD Charuka Silva, Skyline25