HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

Slides:



Advertisements
Similar presentations
Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem.
Advertisements

The Skyline Operator (Stephan Borzsonyi, Donald Kossmann, Konrad Stocker) Presenter: Shehnaaz Yusuf March 2005.
1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Maintaining Sliding Widow Skylines on Data Streams.
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
ISAC 教育學術資安資訊分享與分析中心研發專案 The Skyline Operator Stephan B¨orzs¨onyi, Donald Kossmann, Konrad Stocker EDBT
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
1 Shooting Stars in the Sky:An Online Algorithm for Skyline Queries 作者: Donald Kossmann Frank Ramsak Steffen Rost 報告:黃士維.
BTrees & Bitmap Indexes
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
FlexPref: A Framework for Extensible Preference Evaluation in Database Systems Justin J. Levandoski Mohamed F. Mokbel Mohamed E. Khalefa.
Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,
1 Continuous k-dominant Skyline Query Processing Presented by Prasad Sriram Nilu Thakur.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia.
SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski.
1 Top-k Dominating Queries DB seminar Speaker: Ken Yiu Date: 25/05/2006.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
CS4432: Database Systems II Query Processing- Part 3 1.
1 Online Computation and Continuous Maintaining of Quantile Summaries Tian Xia Database CCIS Northeastern University April 16, 2004.
CS4432: Database Systems II Query Processing- Part 2.
B+ Trees: An IO-Aware Index Structure Lecture 13.
1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Query Processing CS 405G Introduction to Database Systems.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CS4432: Database Systems II Query Processing- Part 1 1.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Online Skyline Queries. Agenda Motivation: Top N vs. Skyline „Classic“ Algorithms –Block-Nested Loop Algorithm –Divide & Conquer Algorithm Online Algorithm.
UNIT 11 Query Optimization
CS 440 Database Management Systems
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Chapter 12: Query Processing
Evaluation of Relational Operations
Preference Query Evaluation Over Expensive Attributes
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Similarity Search: A Matching Based Approach
Chapter 12 Query Processing (1)
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
Efficient Processing of Top-k Spatial Preference Queries
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo

HKU CSIS DB Seminar Skyline A new operator in database systems Filters out a set of interesting points from a potential large set of data points A data point is interesting if it is not dominated by any other point

HKU CSIS DB Seminar Example Find some good places for us to hold the next DB Seminar Dataset (Table Homes): HomeDistance from HKUArea (m 2 ) Kevin1 km10 Ben9 km100 Felix5 km2 K.K Loo8 km250 Good  Close to HKU (Min.) Good  Max. Area (Max.) Return those homes that are not worse than any others in ALL DIMENSIONS

HKU CSIS DB Seminar Outline Introduction to Skyline Queries Skyline Operator in SQL Implementation Algorithms Progressive Algorithms Variations of Skyline Queries Experimental Result Conclusion

HKU CSIS DB Seminar The Skyline Operator ICDE 2001 S.Borzonyi, D.Kossmann, K.Stocker 1.Define the skyline operator in databases 2.Extension of SQL for skyline 3.Block-nested-loop Algorithm 4.Divide-and-conquer Algorithm

HKU CSIS DB Seminar Problem Definition Related to: maximum vector problem contour problem convex hull of a data set Assume the whole dataset fits in the memory

HKU CSIS DB Seminar SQL Extensions SKYLINE OF [DISTINCT] d1 [MIN | MAX], … … dm [MIN | MAX] SELECT … FROM … WHERE … GROUP BY … HAVING … SKYLINE OF [DISTINCT] d1 [MIN | MAX], … … dm [MIN | MAX] ORDER BY … d1… dm denote the dimensions participate the Skyline SELECT * FROM HOMES WHERE CITY=‘HK’ SKYLINE OF DIST MIN, AREA MAX;

HKU CSIS DB Seminar Naïve Approach for Skyline 1D skyline is equivalent to computing min, max in SQL Naïve 2D skyline: Sort the data according to the 2 dimensions Compare every tuple with its predecessor Sorting may need in 2 or more passes if the data are not fit into memory  use current external sorting techniques

HKU CSIS DB Seminar Naïve 2D HomeDistance from HKUArea Kevin1 km10 Felix5 km2 KK8 km250 Ben9 km100 1.Sort by “Distance” 2.Compare “Felix” with “Kevin”  eliminate “Felix” 3.Compare “KK” with “Kevin”  incompatible  part of skyline 4.Compare “Ben” with “KK”  eliminate “Ben”

HKU CSIS DB Seminar Naïve 2D not works for > 2Ds If skyline involves more than 2D, sorting does not work HomeDistance from HKUAreaRent Kevin1 km10$9 Felix5 km2$5 KK8 km250$10 Ben9 km100$9 2D3D Cmp Felix, Kevin  eliminated Cmp KK, Kevin  part of skyline Cmp Ben, KK  eliminated Cmp Felix, Kevin  part of skyline Cmp KK, Felix  part of skyline Cmp Ben, KK  part of skyline  No! Ben dominated by Kevin  predecessor not work!

HKU CSIS DB Seminar Block-nested-loops Algorithm A straightforward approach: Compare each point p with every other point If p is not dominated  part of skyline Scan the data file and keeping a list of candidate skyline points in main memory

HKU CSIS DB Seminar BNL cont. 1. Insert the 1 st data point into the list 2. For each subsequent point p: 1. If p is dominated by any point in the list, it is discarded 2. If p dominates any point in the list, insert it into the list and remove all points dominated by p 3. If p neither dominated, nor dominates any point in the list, inserted it as part of the skyline in the list

HKU CSIS DB Seminar BNL cont. The candidate list is self-organizing: Points that have been dominated other points are moved to the top of list Reduces the number of comparisons E.g. the self-organizing list holdings the partial skyline like: HomeDistance from HKUAreaRent Kevin1 km249$1 K.K8 km250$ … Other skylines which is not as strong as Kevin except a few dimensions

HKU CSIS DB Seminar More on BNL Point 3 in BNL: If p neither dominated, nor dominates any point in the list, inserted it as part of the skyline in the list. If the are no more space in the list, write p on a temporary file on disk. Tuples in tmp file will be further processed in next iteration of algorithm

HKU CSIS DB Seminar More on BNL (cont.) A E F H A B C D E F G H I J Dom. by A Incompatible with A Incompatible with A,E Dom On F, replace F G Incompatible with A,E,G Incompatible with A,E,G,H, but full! Incompatible with A,E,G,H, but full! J has not compare with I I J After 1 st Iteration, A,E,G,H are output as skylines, then clear up the list and treat I,J… as new data set and perform BNL again

HKU CSIS DB Seminar Short summary on BNL Easy to implement Any dimension without using index or sorting Relies on main memory  may have many iterations Not adequate for on-line processing  it has to read the entire data file before it returns the first skyline point (not progressively…)

HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist )Find the median of some dimension, say price, Price(med)=0.3 2)Split the input into 2 partitions according to Price(med) 3)Compute Skyline S1 in P1( =0.3) respectively by recursive partitioning. [Note: S1 is better than S2 on price] 4)Recursive partitioning until a partition contains very few (or 1) tuples 5)If only a few tuples, find out skyline is very easy 6)Merging the skylines of each partitions by eliminating those S2 which are dominated by S1 [Note: None of the tuples in S1 can be dominated by S2 as all tuples in S1 are better than S2 on price i.e. tuples in UPPER never be eliminated]

HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist )Find the median of some dimension, say price, Price(med)=0.3 2)Split the input into 2 partitions according to Price(med)

HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist S1 S2 S3 S4 S5 S6 S7

HKU CSIS DB Seminar Divide-and-Conquer Algorithm PriceDist S1 S2 S3 S4 S5 S6 S7 S1, S2 S4, S5, S7 S1,S2, S7 i.e. tuples in UPPER never be eliminated

HKU CSIS DB Seminar Efficient Progressive Skyline Computation VLDB 2001 K.L. Tan, P.K. Eng, B.C. Ooi Previous approach require at least one pass over the dataset to return the first interesting point, We propose: 1.Bitmap-based Algorithm 2.B+-tree-based Algorithm They can return the first interesting point once they identified.

HKU CSIS DB Seminar Progressive? Both bitmap and tree-base returns skyline very quickly Maybe useful if you are not willing to wait so long for the first few interesting homes out of the large dataset Also outperform BNL and D-&-C in overall response time

HKU CSIS DB Seminar Skyline by Bitmap Main idea: Given a point p, if “something” can tell you: p is not dominated by any other points in DB  skyline! p is dominated by some points in DB  throw away Non-blocking! Can return the skyline points immediately

HKU CSIS DB Seminar Bitmap All information requires to decide whether a point is in skyline are encoded in bitmaps A data point p = (p 1, p 2, …, p d ) where d is no. of dimensions, is mapped to a m-bit vector, m is number of distinct values over all dimensions

HKU CSIS DB Seminar Bitmap Distinct values on price and distance is 7 and 4 m = 11 PriceDist

HKU CSIS DB Seminar Bitmap representation Distinct value on x: 10 Distinct value on y: 10 m=20 20-bit vector E.g (4,8): 4 is 4-th smallest on dimension x, set 4-th to the leftmost be 1 (starting from right) 8 is 8-th smallest on y, set 8-th to the leftmost be 1) PointBitmap Representation (1,9)( , ) (2,10)( , ) (4,8 )( , ) (6,7 )( , ) (9,10 )( , ) (7,5 )( , ) (5,6 )( , ) (4,3 )( , ) (3,2 )( , ) (9,1 )( , ) (10,4 )( , ) (6,2 )( , ) (8,3)( , )

HKU CSIS DB Seminar Bitmap representation Do (4,8) is a skyline point? (min x, y) Create bit-strings Cx and Cy (Not CY Ng!) Cx= Cy= Cx & Cy = If Cx&Cy has more than one ‘1’, dominated by some points PointBitmap Representation (1,9)( , ) (2,10)( , ) (4,8 )( , ) (6,7 )( , ) (9,10 )( , ) (7,5 )( , ) (5,6 )( , ) (4,3 )( , ) (3,2 )( , ) (9,1 )( , ) (10,4 )( , ) (6,2 )( , ) (8,3)( , )

HKU CSIS DB Seminar Bitmap representation Do (3,2) is a skyline point? (min x, y) Create bit-strings Cx and Cy Cx= Cy= Cx & Cy = If Cx&Cy has only 1, it is a skyline PointBitmap Representation (1,9)( , ) (2,10)( , ) (4,8 )( , ) (6,7 )( , ) (9,10 )( , ) (7,5 )( , ) (5,6 )( , ) (4,3 )( , ) (3,2 )( , ) (9,1 )( , ) (10,4 )( , ) (6,2 )( , ) (8,3)( , )

HKU CSIS DB Seminar Short summary on Bitmap Need to pre-compute bitmap representation of every point Each point retrieve all bitmaps in order to get the juxtapositions (Cx and Cy) Large storage if the domain of each attributes are large

HKU CSIS DB Seminar Some other progressive algorithms B+-tree index (also proposed by BOC) Organizes the points into d lists (d is no. of dimensions in data) Build B+tree on the lists for retrieving skylines Suffer similar problem as bitmap approach

HKU CSIS DB Seminar Some other progressive algorithms (cont.) NN algorithm (by Donald Kossmann again) [VLDB 02]

HKU CSIS DB Seminar NN skyline

HKU CSIS DB Seminar Outline Introduction to Skyline Queries Skyline Operator in SQL Implementation Algorithms Progressive Algorithms Variations of Skyline Queries Experimental Result Conclusion

HKU CSIS DB Seminar An Optimal and Progressive Algorithm for Skyline Queries SIGMOD 2003 D.Papadias, Y. Tao, G. Fu, B. Seeger We propose: 1.A NN algorithm which is more efficient and I/O Optimal 2.Ranked skyline queries 3.Constrained skyline queries 4.Dynamic skyline queries 5.K-dominating queries

HKU CSIS DB Seminar Ranked Skyline A ranked skyline returns K skyline points that have minimum/max score according to a function f In our example, f = 3*Dist + 7*Area Return the top K homes Though skylines are returning interesting points, we may want the most interesting points according to our own preferences, especially the data set is large(say hotels) and skyline is also large!

HKU CSIS DB Seminar Constrained Skyline Returning the most interesting points in a specific data space

HKU CSIS DB Seminar Dynamic Skyline Returning update skyline dynamically E.g. Ask for hotels with minimum distance and price (again?) Minimum distance is now depends on my current location

HKU CSIS DB Seminar Enumerating Skyline Enumerate queries return, for each skyline point p, the number of points dominated by p Sometime useful if you want to know this skyline hotel C has dominated 1000 hotels, and another hotel Y dominated only 1 hotel maybe C is better than Y in many properties (e.g. price, dist, etc), but Y has only 1 properties better than C, e.g. with PS2

HKU CSIS DB Seminar Experimental Evaluation Running time comparison on progressive algorithms without NN approaches Index Bitmap D&C BNL

HKU CSIS DB Seminar Conclusion Introduction the skyline queries How to implement (support) the skyline operator in DBMS? Variation of skyline queries If the information are placed in different places, how to answer skyline queries on a mobile device?

HKU CSIS DB Seminar References S.Borzonyi, D.Kossmann, K.Stocker. The Skyline Operator. ICDE K.L. Tan, P.K. Eng, B.C. Ooi. Efficient Progressive Skyline Computation. VLDB D.Kossmann, F.Ramsak, S. Rost. Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. VLDB D.Papadias, Y. Tao, G. Fu, B. Seeger. An Optimal and Progressive Algorithm for Skyline Queries. SIGMOD 2003.