Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada Joint work with Y. Tao, M.
Recap: Mining association rules from large datasets
Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance [1] Pirooz Chubak May 22, 2008.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.
VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
1 DynaMat A Dynamic View Management System for Data Warehouses Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.
Frequent Closed Pattern Search By Row and Feature Enumeration
Fast Algorithms For Hierarchical Range Histogram Constructions
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Composite Subset Measures Lei Chen, Paul Barford, Bee-Chung Chen, Vinod Yegneswaran University of Wisconsin - Madison Raghu Ramakrishnan Yahoo! Research.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Presented by Darren Gates for ICS 280.
Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Preference Queries from OLAP and Data Mining Perspective
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,
Top-k and Skyline Computation in Database Systems
1 Continuous k-dominant Skyline Query Processing Presented by Prasad Sriram Nilu Thakur.
Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
CS4432: Database Systems II
SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:
Data Structures for Orthogonal Range Queries A New Data Structure and Comparison to Previous Work. Application to Contact Detection in Solid Mechanics.
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
Efficient Methods for Data Cube Computation and Data Generalization
1 Progressive Computation of Constrained Subspace Skyline Queries Evangelos Dellis 1 Akrivi Vlachou 1 Ilya Vladimirskiy 1 Bernhard Seeger 1 Yannis Theodoridis.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
1 Top-k Dominating Queries DB seminar Speaker: Ken Yiu Date: 25/05/2006.
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.
Bin Jiang, Jian Pei ICDE 2009 Online Interval Skyline Queries on Time Series 1.
Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li Department of Computer.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Answering Why-not Questions on Top-K Queries Andy He and Eric Lo The Hong Kong Polytechnic University.
University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.
Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.
Online Skyline Queries. Agenda Motivation: Top N vs. Skyline „Classic“ Algorithms –Block-Nested Loop Algorithm –Divide & Conquer Algorithm Online Algorithm.
Tian Xia and Donghui Zhang Northeastern University
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Efficient Methods for Data Cube Computation
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Stochastic Skyline Operator
Sofian Maabout University of Bordeaux. CNRS
Similarity Search: A Matching Based Approach
More advanced aspects of search
Online Analytical Processing Stream Data: Is It Feasible?
Relaxing Join and Selection Queries
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University, Canada + City University of Hong Kong

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline2 VLDB’05 at Trondheim, Let’s Go! Flights to Trondheim? Price, travel-time and # stops all matter! A (long) list of all feasible flights? –It is boring to review many flights A better idea: presenting only some selected flights – how? –Vancouver  Seattle  Munich  London  Oslo  Trondheim, $7200, 38 hours, 4 stops (bad) –Vancouver  Amsterdam  Trondheim, $2200, 14 hours, 1 stops (good) –Vancouver  Amsterdam  Oslo  Trondheim $1600, 18 hours, 2 stops (also good) Only the skyline routes are interesting – all possible trade- offs among price, travel-time and # stops superior to the others

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline3 Domination and Skyline A set of objects S in an n-dimensional space D=(D 1, …, D n ) –D 1, …, D n are in the domain of numbers –Can be extended to other domains For u, v  S, u dominates v if u.D i ≤ v.D i for 1 ≤ i ≤ n, and on at least one dimension D j, u.D j < v.D j u  S is a skyline object if u is not dominated by any other objects in S

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline4 Finding the Skyline in Full Space Many existing methods Divide-and-conquer and block nested loops by Borzsonyi et al. Sort-first-skyline (SFS) by Chomicki et al. Using bitmaps and the relationships between the skyline and the minimum coordinates of individual points, by Tan et al. Using nearest-neighbor search by Kossmann et al. The progressive method by Papadias et al.

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline5 Full Space Skyline Is Not Enough! Skylines in subspaces –Mr. Richer does not care about the price, how can we derive the superior trade-offs between travel-time and number of stops from the full space skyline? Sky cube – computing skylines in all non- empty subspaces (Yuan et al., VLDB’05) –Any subspace skyline queries can be answered (efficiently)

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline6 Even Sky Cube May Not Be Enough! Understanding skyline objects –Both Wilt Chamberlain and Michael Jordan are in the full space skyline of the Great NBA Players, which merits, respectively, really make them outstanding? –How are they different? Finding the decisive subspaces – the minimal combinations of factors that determine the (subspace) skyline membership of an object? –Total rebounds for Chamberlain, (total points, total rebounds, total assists) and (games played, total points, total assists) for Jordan

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline7 Intuition a, b and c are in the skyline of (X, Y) –Both a and c are in some subspace skylines –b is not in any subspace skyline d and e are not in the skyline of (X, Y) –d is in the skyline of subspace X –e is not in any subspace skyline Why and in which subspaces is an object in the skyline?

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline8 Observations Is subspace skyline membership monotonic? –x is in the skylines in spaces ABCD and A, but it is not in the skyline in ABD – it is dominated by y in ABD x and y collapse in AD, x and y are in the skylines of the same subspaces of AD

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline9 Coincident Groups How to capture groups of objects that share values in subspaces? (G, B) is a coincident group (c-group) if all objects in G share the same values on all dimensions in B –G B is the projection A c-group (G, B) is maximal if no any further objects or dimensions can be added into the group –Example: (xy, AD)

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline10 C-Group Lattices All coincident groups form a lattice (c-group lattice) All maximal c-groups form a lattice (maximal c-group lattice) Maximal c-group lattices are quotient lattices of c-group lattice Where are the (multidimensional) skyline objects in the (maximal) c-group lattice? –Are they also in some good structure?

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline11 Skyline Groups A maximal c-group (G, B) is a skyline group if G B is in the subspace skyline of B How to characterize the subspaces where G B is in the skyline? –(x, ABCD) is a skyline group –If the set of subspaces are convex, we can use bounds

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline12 Decisive Subspaces A space C  B is decisive if –G C is in the subspace skyline of C –No any other objects share the same values with objects in G on C –C is minimal – no C’  C has the above two properties (x, ABCD) is a skyline group, AC, CD are decisive

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline13 Semantics Problem: In which subspaces an object or a group of objects are in the skyline? The skyline membership of skyline groups are established by their decisive subspaces –For skyline group (G, B), if C is decisive, then G is in the skyline of any subspace C’ where C  C’  B Signature of skyline group Sig(G, B)=(G B, C 1, …, C k ) where C 1, …, C k are all decisive subspaces

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline14 Example The skyline membership of an object is determined by the skyline groups in which it participates An object u is in the skyline of subspace C if and only if there exists a skyline group (G, B) and its decisive subspace C’ such that u  G and C’  C  B

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline15 Subspace Skyline Analysis All skyline projections form a lattice (skyline projection lattice) –A sub-lattice of the c-group lattice All skyline groups form a lattice (skyline group lattice) –A quotient lattice of the skyline projection lattice –A sub-lattice of the maximal c-group lattice

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline16 Relationship Among Lattices C-group latticesMaximal c-group lattices Skyline projection latticesSkyline group lattices quotient sub-lattice

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline17 OLAP Analysis on Skylines Subspace skylines Relationships between skylines in subspaces Closure information

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline18 Full Space vs. Subspace Skylines For any skyline group (G, B), there exists at least one object u  G such that u is in the full space skyline –Can use u as the representative of the group An object not in the full skyline can be in some subspace skyline only if it collapses to some full space skyline objects –All objects not in the full space skyline and not collapsing to any full space skyline object can be removed from skyline analysis –If only the projections are concerned, only the full space skyline objects are sufficient for skyline analysis

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline19 Subspace Skyline Computation Compute the set of skyline groups and their signatures Top-down enumeration of subspaces –Similar ideas in skyline cube computation For each subspace, find skyline groups and decisive subspaces –Find (subspace) skylines by sorting –Share sorting and use merge-sorting as much as possible

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline20 Enumerating Subspaces Using a top-down enumeration tree –Each child explores a proper subspace with one dimension less –All objects not in the skyline of the parent subspace and not collapsing to one skyline object of the parent subspace can be removed

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline21 Computing Skylines by Sorting Sort all objects in lexicographic ascending order –a-d-b-e-c Check objects in the sorted list, an object is in the skyline if it is not dominated by any skyline objects before it in the list –{a, b, c} are skyline objects

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline22 Efficient Local Sorting Not necessary to sort for each subspace –A sorted list in subspace (A, B, C, D) can be used in subspaces (A), (A, B), (A, B, C) –To generate a sorted list in subspace (B, C, D), we can use merging sort to merge the sublists of different values on A If a non-skyline object collapses to a skyline object, the skyline object “absorbs” the non-skyline object by taking the non-skyline object’s id –A non-skyline object may be “absorbed” by multiple skyline objects –Recursively reduce the number of objects and shorten the sorted lists

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline23 Results on Great NBA Players’ 17,266 records 4 attributes are selected 67 skyline records in the full space, 146 decisive subspaces

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline24 # Skyline Groups vs. Dimensionality Dimensionality: the complexity of subspaces –A 1-d subspace has only one skyline group –A high-dimensional subspace many have many skyline groups –# skyline groups tends to increase when dimensionality increases Number of subspaces –An n-d data set has n 1-d subspaces, 1 n-d (sub-)space, and n!/[(n/2)!(n/2)!] n/2-d subspaces (if n is even) The number of skyline groups in subspaces of dimensionality k depends on the joint-effect of the two factors –When k < n/2, the two factors are consistent –When k > n/2, the two factors are contrasting

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline25 About the Synthetic Data Sets Independent: attribute values are uniformly distributed Correlated: if a record is good in one dimension, likely it is also good in others Anti-correlated: if a record is good in one dimension, it is unlikely to be good in others

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline26 Scalability w.r.t Database Size Independent Correlated Anti-correlated

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline27 Scalability w.r.t. Dimensionality

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline28 Conclusions Skyline analysis is important in many applications –Only skyline objects in the full space may not be enough Skyline cube is powerful to answer subspace skyline queries –But it is interesting to ask why an object is in the subspace skylines, and more Skyline groups and decisive subspaces – capturing the semantics of subspace skylines OLAP subspace skyline analysis An efficient algorithm to compute skyline groups

J. Pei, W. Jin, M. Ester, and Y. Tao: Catching the Best Views of Skyline29 Thank You! Vancouver, BC, Canada Trondheim, Norway By Gerold Jung Hong Kong skyline_on_a_cloudy_night_around_Central