VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea 2011. 9. 1 Jongwuk Lee, Seung-won Hwang VLDB 2011.

Slides:



Advertisements
Similar presentations
Ranking Multimedia Databases via Relevance Feedback with History and Foresight Support / 12 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT AND EXPLORATION.
Advertisements

1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.
A Local-Optimization based Strategy for Cost-Effective Datasets Storage of Scientific Applications in the Cloud Many slides from authors’ presentation.
Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,
Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Experiments We measured the times(s) and number of expanded nodes to previous heuristic using BFBnB. Dynamic Programming Intuition. All DAGs must have.
Visibility Culling using Hierarchical Occlusion Maps Hansong Zhang, Dinesh Manocha, Tom Hudson, Kenneth E. Hoff III Presented by: Chris Wassenius.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Cube Tree Dimension: number of group-by values Relation tuples map to a point in the space Aggregates: projection of all data points on all the subspaces.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia.
Distance Indexing on Road Networks A summary Andrew Chiang CS 4440.
Quality Indicators (Binary ε-Indicator) Santosh Tiwari.
Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
1 SOC Test Architecture Optimization for Signal Integrity Faults on Core-External Interconnects Qiang Xu and Yubin Zhang Krishnendu Chakrabarty The Chinese.
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
1 Progressive Computation of Constrained Subspace Skyline Queries Evangelos Dellis 1 Akrivi Vlachou 1 Ilya Vladimirskiy 1 Bernhard Seeger 1 Yannis Theodoridis.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Efficient Computation of Reverse Skyline Queries VLDB 2007.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti.
Network-Aware Query Processing for Stream- based Application Yanif Ahmad, Ugur Cetintemel - Brown University VLDB 2004.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
S CALABLE S KYLINE C OMPUTATION U SING O BJECT - BASED S PACE P ARTITIONING Shiming Zhang Nikos Mamoulis David W. Cheung sigmod
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.
Cohesive Subgraph Computation over Large Graphs
Tian Xia and Donghui Zhang Northeastern University
CSE 554 Lecture 5: Contouring (faster)
Data Mining Soongsil University
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Probabilistic Data Management
Zhichen Xu, Mallik Mahalingam, Magnus Karlsson
TT-Join: Efficient Set Containment Join
Orthogonal Range Searching and Kd-Trees
Chao Zhang1, Yu Zheng2, Xiuli Ma3, Jiawei Han1
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
A Fault-Tolerant Routing Strategy for Fibonacci-Class Cubes
Fast Computation of Symmetries in Boolean Functions Alan Mishchenko
Sofian Maabout University of Bordeaux. CNRS
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Relaxing Join and Selection Queries
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011

Outline  Motivation  Skycube Computation  Experiments  Conclusion 2

VLDB 2011 What is a Skyline?  Alice looks for the cheapest and lightest cell phone. Skyline: a set of points that are not dominated by any other points.  A set of top-1 candidates a dominates b. → a is no worse than b on all dimensions. dominates a is incomparable with b. → a is not dominated by b and vice versa. is incomparable with Price Weight 3 a heavy light highlow

VLDB 2011 Subspace Skyline  What if users may issue skyline queries based on arbitrary subsets of dimensions? Subspace skylines can vary significantly, depending on user-specific preferences. ,,, … 4 Skyline on  Price, Weight  Price Weight heavy light highlow Skyline on  Price, LCD size  Price LCD size small big highlow

VLDB 2011 What is a Skycube?  A skycube is the collection of all possible subspace skylines. A d -dimensional space contains 2 d - 1 subspaces.  Naïve approach: serially compute skylines for each subspace. Is it possible to reuse subspace skyline computation? 5 D1D2D3D1D2D3 D1D2D1D2 D1D3D1D3 D2D3D2D3 D1D1 D2D2 D3D3 a b c d e f g D 1 D 2 D 3 D 1 g U SKY U (S) D 2 f D 3 d, e, g D 1 D 2 a, e, f, g D 1 D 3 g D 2 D 3 e, f D 1 D 2 D 3 a, e, f, g Dimension Skyline Price Weight Size Skycube Yidong Yuan et al. “Efficient Computation of the Skyline Cube”, VLDB 2005

VLDB 2011 Outline  Motivation  Skycube Computation  Experiments  Conclusion 6

VLDB 2011 Strategies for Computing the Skycube (1 / 2)  Sharing result: exploits pre-computed subspace skylines to compute another subspace skyline. If U  V, then SKY U (S)  SKY V (S) under distinct value condition.  Bottom-up skycube algorithm (BUS) [VLDB2005] compute the skycube in a level-wise and bottom-up manner. Reduce the number of dominance tests for SKY V (S). The dominance tests of non-skyline points cannot be reused. 7 SKY D 1 D 2 (S) SKY D 1 (S) U = D 1  V = D 1 D 2 D1D1 D2D2 D1D2D1D2 SKY D 1 (S) SKY D 1 D 2 (S) No two points have the same values for each dimension.

VLDB 2011 Strategies for Computing the Skycube (2 / 2)  Sharing structure: exploits a structure to compute skylines on overlapped subspaces.  Top-down skycube algorithm (TDS) [VLDB2005] Compute the skycube in a top-down manner. Exploit two-dimensional space partitioning derived from DC algorithm. Dominance and incomparability relationships cannot be optimized in high-dimensional data. 8 D1D1 D2D2 D1D2D1D2 SKY D 1 D 2 (S) SKY D 1 (S) … … … … … … … …

VLDB 2011 One Summary Slide  Existing algorithms still has room for optimization. How to compute the skycube more efficiently?  Main idea Exploit finer structure to further share both dominance and incomparability. → Point-based space partitioning Sharing result for single parent can be extended into multiple parents. 9

VLDB 2011 Point-Based Space Partitioning (1 / 3)  Basic idea A pivot point is selected as a skyline point. A pivot point is partitioned d-dimensional space into 2 d subregions.  For, each subregion is mapped into a 2-bit binary vector. 10 dominates { }. { } and { } are incomparable D o D b a c e d f g h i j

VLDB 2011 Point-Based Space Partitioning (2 / 3)  Binary vectors are used to restrict possible subspaces to be a skyline point. 11 D o D Computing SKY D 1 D 2 (S) Computing SKY D 1 (S) o D1 D

VLDB 2011 ` ` ` ` Point-Based Space Partitioning (3 / 3)  Projecting D1D2D3 into D1D2 Identify the relationships between points by projecting binary vectors * 01*10* 11*

VLDB 2011 Constructing a SkyTree  Skyline algorithm using point-based space partitioning Partition subregions in a recursive way. Construct a skytree in computing the skyline. 13 D o ’ 10’ 00’ 11’ D (e, S) ’ Selected pivot point Partitioned point set Entire point set (b, {a, b, c}) b a c e d f g h i j (h, {h, i, j}) (j, {j}) (null, {a, b, c})(null, {h, i, j})

VLDB 2011 Sharing a SkyTree (1 / 3)  Vertical relationship I For the skytree on V, if any link connected between p and q is associated with a binary vector B such that ∀ d i ∈ U : B i = 1, p dominates q on U such that U ⊂ V.  Vertical relationship 2 For the skytree on V, if any link connected between p and q is associated with a binary vector B such that ∀ d i ∈ U : B i = 0, p dominates q on U such that U ⊂ V. 14 t p q r t 00*00* 11*11* 01*01* p q r Projecting D 1 D 2 Vertical relationship 1: t dominates {q, r} on D 1 D 2. Vertical relationship 2: p dominates t on D 1 D 2.

VLDB 2011 Sharing a SkyTree (2 / 3)  Horizontal relationship Exploit the transitivity between two vertical relationships.  Propagate the relationships by combining both vertical and horizontal relationships. 15 t p q r t 00*00* 01*01* 10*10* p q r Projecting D 1 D 2 horizontal relationship: p dominates {q, r} on D 2. Vertical relationship: q dominates r on D 1. If p dominates q, on D 1, then p dominates r on D 1 D 2. If p dominates q, and q dominances r, then p dominates r.

VLDB 2011 Sharing a SkyTree (3 / 3)  Identify skyline candidates by traversing the skytree. Access nodes in a topological order that preserves the dominance relationships between nodes. 16 SKY D 1 D 2 (S) = {b, e, h, j} SKY D 1 (S) = {b} SKY D 2 (S) = {j} e b h j D D b a c e d f g h i j o 0 {D1}{D1} {D2}{D2}

VLDB 2011 Sharing Multiple Parents 17 SKY D 1 D 2 (S) SKY D 1 D 3 (S) SKY D 1 (S)

VLDB 2011 Proposed Algorithm: QSkycube  Compute the skycube in a top-down manner. Compute the skyline and construct the corresponding skytree. Sharing a skytree  Traverse the skytree in a depth-first way and reduce non-skyline points. Sharing multiple parents  When computing SKY D 1 (S), both SKY D 1 D 2 (S) and SKY D 1 D 3 (S) are used. 18 D1D2D3D1D2D3 D1D2D1D2 D1D3D1D3 D2D3D2D3 D1D1 D2D2 D3D3 D 1 g Subspace Skyline D 2 f D 3 d, e, g D 1 D 2 a, e, f, g D 1 D 3 g D 2 D 3 e, f D 1 D 2 D 3 a, e, f, g … … … … … … … …

VLDB 2011 Outline  Motivation  Skycube Computation  Experiments  Conclusion 19

VLDB 2011 Experiments (1 / 5)  Experimental settings Distribution: Independent, Anti-correlated Dimensionality d: 4 ~ 22 (default d = 12) Cardinality n: 200K ~ 1,000K (default n = 200K)  Compared algorithms BUS: exploit sharing result based on SFS. TDS: exploit sharing structure based on DC. BSkytreeS: serially compute each subspace skyline using BSkyTree-P. vs. QSkycube 20 Jongwuk Lee et al. “BSkyTree: Scalable Skyline Computation Using a Balanced Pivot Point Selection”, EDBT 2010

VLDB 2011 Experiments (2 / 5)  Scalability of our proposed algorithm over dimensionality 21 IndependentAnti-correlated Ours

VLDB 2011 Experiments (3 / 5)  Scalability of our proposed algorithm over dimensionality 22 Ours IndependentAnti-correlated

VLDB 2011 Experiments (4 / 5)  Scalability of our proposed algorithm over cardinality 23 IndependentAnti-correlated Ours

VLDB 2011 Experiments (5 / 5)  Effect of sharing multiple parents Sharing Single Parent (SSP) vs. Sharing Multiple Parents (SMP) 24 Ours Anti-correlated

VLDB 2011 Outline  Motivation  Skycube Computation  Experiments  Conclusion 25

VLDB 2011 Conclusion  We studied efficient skycube algorithm based on point-based space partitioning. QSkycube exploits sharing structure with finer granularity and sharing result for multiple parents.  The proposed algorithm is significantly faster than state-of-the- art algorithms. QSkycube is about 4 ~ 5 times faster than existing algorithms. 26

VLDB 2011 Q & A 27 Thank you!

VLDB 2011 References  Yidong Yuan et al. “Efficient Computation of the Skyline Cube”, International Conference on Very Large Data Bases (VLDB) 2005  Jongwuk Lee et al. “BSkyTree: Scalable Skyline Computation Using a Balanced Pivot Point Selection”, International Conference on Extending Database Technology (EDBT)