Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti.

Slides:

Advertisements

Similar presentations

VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.

Advertisements

Optimization on Kepler Zehuan Wang

The Skyline Operator (Stephan Borzsonyi, Donald Kossmann, Konrad Stocker) Presenter: Shehnaaz Yusuf March 2005.

1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal

Latency considerations of depth-first GPU ray tracing

ISAC 教育學術資安資訊分享與分析中心研發專案 The Skyline Operator Stephan B¨orzs¨onyi, Donald Kossmann, Konrad Stocker EDBT

July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.

Back-Projection on GPU: Improving the Performance Wenlay “Esther” Wei Advisor: Jeff Fessler Mentor: Yong Long April 29, 2010.

CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, March 22, 2011 Branching.ppt Control Flow These notes will introduce scheduling control-flow.

Programming with CUDA WS 08/09 Lecture 9 Thu, 20 Nov, 2008.

MergeSort Source: Gibbs & Tamassia. 2 MergeSort MergeSort is a divide and conquer method of sorting.

FLANN Fast Library for Approximate Nearest Neighbors

To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,

Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.

Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.

High-Performance Packet Classification on GPU Author: Shijie Zhou, Shreyas G. Singapura and Viktor K. Prasanna Publisher: HPEC 2014 Presenter: Gang Chi.

Parallel Programming in C with MPI and OpenMP

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, ECE408 Applied Parallel Programming Lecture 11 Parallel.

Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,

CS 179: Lecture 4 Lab Review 2. Groups of Threads (Hierarchy) (largest to smallest)  “Grid”:  All of the threads  Size: (number of threads per block)

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.

Skyline Queries Against Mobile Lightweight Devices in MANETs Zhiyong Huang 1 Christian S. Jensen 2 Hua Lu 1 Beng Chin Ooi 1 1 National University of Singapore,

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

1 Evaluation of parallel particle swarm optimization algorithms within the CUDA™ architecture Luca Mussi, Fabio Daolio, Stefano Cagnoni, Information Sciences,

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

CUDA Optimizations Sathish Vadhiyar Parallel Programming.

Efficient Computation of Reverse Skyline Queries VLDB 2007.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Control Flow/ Thread Execution.

Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.

Data Management+ Laboratory Dynamic Skylines Considering Range Queries Speaker: Adam Adviser: Yuling Hsueh 16th International Conference, DASFAA 2011 Wen-Chi.

IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute.

Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.

Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.

Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology.

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.

© David Kirk/NVIDIA and Wen-mei W. Hwu University of Illinois, CS/EE 217 GPU Architecture and Parallel Programming Lecture 10 Reduction Trees.

CS/EE 217 GPU Architecture and Parallel Programming Midterm Review

Auto-tuning Dense Matrix Multiplication for GPGPU with Cache

Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.

Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.

Sunpyo Hong, Hyesoon Kim

Presented by: Dardan Xhymshiti Fall  Type: Research paper  Authors:  International conference on Very Large Data Bases. Yoonjar Park Seoul National.

Presented by: Dardan Xhymshiti Spring 2016:. Authors: Publication:  ICDM 2015 Type:  Research Paper 2 Sean Chester*Darius Sidlauskas`Ira Assent*Kenneth.

PRESENTED BY: ARWA ALFITNI GPU-Based Speculative Query Processing for Database Operation.

Multi-dimensional Range Query Processing on the GPU Beomseok Nam Date Intensive Computing Lab School of Electrical and Computer Engineering Ulsan National.

Merge Sort Comparison Left Half Data Movement Right Half Sorted.

Effect of Instruction Fetch and Memory Scheduling on GPU Performance Nagesh B Lakshminarayana, Hyesoon Kim.

CS 179: GPU Computing LECTURE 2: MORE BASICS. Recap Can use GPU to solve highly parallelizable problems Straightforward extension to C++ ◦Separate CUDA.

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Single Instruction Multiple Threads

Subject Name: Design and Analysis of Algorithm Subject Code: 10CS43

Sathish Vadhiyar Parallel Programming

Parallel Density-based Hybrid Clustering

Lecture 2: Intro to the simd lifestyle and GPU internals

Accelerating MapReduce on a Coupled CPU-GPU Architecture

MergeSort Source: Gibbs & Tamassia.

Presented by: Isaac Martin

B+-Trees and Static Hashing

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm

ECE 498AL Lecture 10: Control Flow

Relaxing Join and Selection Queries

ECE 498AL Spring 2010 Lecture 10: Control Flow

6- General Purpose GPU Programming

Presentation transcript:

Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti Fall 2015

 Introduction  Skyline computation  Related-Work  GPU-Friendly partitioning  The SKYALIGN algorithm  Experimental evaluation

 Skyline operator: First introduced: Stephan Borzsonyi, Donald Kossman, Konrad Stocker 2001 (Universitat Passau & Technische Universitat Muncen Germany)

 Skyline operator:  Example:  1. Go for a one day skiing in one of the Colorado’s ski center.  2. You have spent a lot of money.  3. It happens a car defect.  4. Try to find the nearest and cheapest hotel.  5. Take your phone and lunch the unknown touristic application.  6. A lot of hotels in different locations with variety of prices.  7. You want to find the CHEAPEST and the NEAREST one!?

 Skyline operator:  Example:  Query results: Result queryPriceDistance (Miles) Hotel A$ Hotel B$1401 Hotel C$2002 Hotel D$ ….……

 Skyline operator:  Example:  Query results: Result queryPriceDistance (Miles) Hotel A$ Hotel B$1401 Hotel C$2002 Hotel D$ ….…… Skyline set = {Hotel A, Hotel B, Hotel D} Term: Dominance

 Major problems:  Multidimensional data.  Computation intensive.  Comparison tuple-to-tuple (point-to-point).  What is done till now:  State-of-the art sequential algorithms.  Parallel skyline query processing algorithms.  Often try to achieve device’s maximum theoretical compute throughput.  Throughput is costly. The most efficient GPU algorithm GSS, does up to 650 times more work comparing to the best sequential algorithm, even if executing in 2688 cores.  For benchmark datasets, sequential algorithms perform 3x faster than GPU ones.  Should we use GPU or NOT?

 Sequential algorithms high performance is achieved by using:  Trees  Recursion  Strict ordering of computation.  Unpredictable branching.  Motivation:  Come up with a new algorithm called SkyAlign which:  MAIN POINT: Avoid as much as it can point-to-point comparisons.  Employ a globally static grid schema to make the dataset compatible for GPU.  This algorithm do not maximizes THROUGHPUT but is WORK-EFFICIENT. Many of these techniques are not compatible with GPU.

Dataset Skyline set Parallel DatasetSkyline set Sequential VS

Id

 GPU Computation  Tesla K80: 4992 number of Cuda Cores.  Threads are grouped into warps usually of sizes 32.  Warps are grouped into thread blocks.  All threads within a warp execute the same instruction at the same time.  Problem: branch divergence.

 Partition-based skyline algorithms  Divide-and-Conquer: Halved the dataspace recursively by the median of an arbitrarily chosen dimension and solved each half. After that the results are merged.  Sequential partition-based algorithms: These algorithms employ recursive, point-based partitioning. For each partition defined, a skyline point (pivot), is found, and the other points are partitioned based on their relationship to the pivot. The work performed varies from the pivot selected. SkyAlign: is a partition-based method, but it is not recursive and has no merge.

Get to know with point-based methods  Point-based recursive partitioning methods use a quad-tree partitioning of the data set and record skyline points as they are found in a tree. C B E A D F

Why recursive partitioning is not preferred?  High divergence Traversal Consider when points in F are to compare with points in D. First a DT with the root E is performed for each point, so generating bitmasks. These bitmasks are then used to determine which branches of D each point of F should traverse. Results often diverge. Partitioning Each partition has to be sub-partitioned relative to its own pivot. The pivot needs to be skyline. High dimensions Quad-tree partitioning do not scale well with dimensionality.

Id

Mask assignment: Masks are assigned for each point, given the quartiles of the dataset for each dimension.

Data sorting: Sort the data points based on their masks order.

Data sorting: Sort the data points based on their masks order.

Work-efficiency Compare the performance of the four algorithms with respect to: 1. Dominance tests (DT) 2. Work-efficiency

Work-efficiency Compare the performance of the four algorithms with respect to: 1. Dominance tests (DT) 2. Work-efficiency

Thank You