Sameh Shohdy, Yu Su, and Gagan Agrawal

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Spatial Join Yan Huang Spatial Join Given two sets of spatial data Find the pair of objects satisfying certain spatial predicate – e.g.
1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal
Danzhou Liu Ee-Peng Lim Wee-Keong Ng
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
STHoles: A Multidimensional Workload-Aware Histogram Nicolas Bruno* Columbia University Luis Gravano* Columbia University Surajit Chaudhuri Microsoft Research.
Access Methods for Advanced Database Applications.
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.
Spatial Mining.
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
FLANN Fast Library for Approximate Nearest Neighbors
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
1 SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices Gangyi Zhu, Yi Wang, Gagan Agrawal The Ohio State University.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.
Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
A Fault-Tolerant Environment for Large-Scale Query Processing Mehmet Can Kurt Gagan Agrawal Department of Computer Science and Engineering The Ohio State.
CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases.
Database Systems Laboratory The Pyramid-Technique: Towards Breaking the Curse of Dimensionality Stefan Berchtold, Christian Bohm, and Hans-Peter Kriegal.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Indexing Multidimensional Data
Strategies for Spatial Joins
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Spatial Indexing.
RE-Tree: An Efficient Index Structure for Regular Expressions
Query Processing in Databases Dr. M. Gavrilova
SpatialHadoop: A MapReduce Framework for Spatial Data
Evaluation of Relational Operations
Spatio-temporal Pattern Queries
Evaluation of Relational Operations: Other Operations
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
On Spatial Joins in MapReduce
Communication and Memory Efficient Parallel Decision Tree Construction
Selected Topics: External Sorting, Join Algorithms, …
Probabilistic Data Management
Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm
Lecture 13: Query Execution
Evaluation of Relational Operations: Other Techniques
Wednesday, 5/8/2002 Hash table indexes, physical operators
Multidimensional Search Structures
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
Presentation transcript:

Load Balancing and Accelerating Parallel Spatial Join Operations using Bitmap Indexing Sameh Shohdy, Yu Su, and Gagan Agrawal Computer Science and Engineering, The Ohio State University, Columbus, OH HiPC15 (Bengaluru, India) ,

Introduction Proliferation of spatial data in a variety of spatial-based applications. Common spatial queries: Range Queries. Nearest Neighbor Queries (NN). Spatial Join Queries (SJ). This work focuses on Spatial Join queries This work aims at improving existing sequential and parallel algorithms using Bitmap indexing.

Background Spatial Join: combines two spatial datasets to find pairs of spatial objects that satisfy a certain spatial relation. Examples: Find all the cities that exist in Ohio State? Ohio (State) contains Columbus, Cleveland, Dayton , …etc. (Cities). Find all the Interstate Highways that intersected with I-70 in Ohio I-70 intersects with I-71, I-75, and I-77 in Columbus.

Background (2) Existing SJ algorithms are divided into: In-memory SJ algorithms: Examples: Nested-loop join, indexed nested-loop join, R-Tree, and Plane Sweep, Perform two passes through both datasets (Filter and Refine) to find related objects. Filter: represents the objects by their Minimum Bounding Rectangles (MBR) and select a set of candidates to reduce the complexity of the join operation. Refine: Uses the full representation to evaluate the candidates Disk-based SJ algorithms: Examples: Partition based Spatial-Merge join (PBSM) and Size Separation Spatial join (S3).

Partition based Spatial-Merge join (PBSM) Dividing the spatial space into a set of partitions. Each partition contains a small set of spatial objects from both datasets. Objects in each partition can be joined using any existing in-memory join algorithms.

Challenge Data skew negatively impact the performance of such partitioning algorithms Load variation - parallel PBSM join algorithm with 64 cores.

Contributions We describe and effective approach for load-balancing based on Bitmap indexing as a summary of the data, Bitmap Partitioning-based Spatial Join (BPSJ). We also use bitmaps to speedup in-memory spatial join operation, In-memory Bitmap-based Spatial Join(IBSJ). Our BPSJ algorithm outperforms parallel well-known PBSM method by an average of 6.1x while IBSJ has an average speed-up of 4.2 times over plane-sweep and 3.06 times over R-tree method.

Bitmap Indexing att0 Bitmap Vectors 3 4 6 R0 1 R1 R2 R3 R4 V0 V1 V2 V3 A set of bitmap vectors are built without re- organizing the data. Depends on light processing logic operations (AND, OR, and NOT) att0 att1 R0 1 R1 3 R2 4 2 R3 6 5 R4 att1 Bitmap Vectors 1 2 5 R0 R1 R2 R3 R4 V0 V1 V2 V3

Bitmap Partitioning-based Spatial Join (BPSJ) Depends on bitmap to efficiently divides the spatial spaces into a set of partitions. Using pre-generated bitmap vectors, The actual number of objects per partition can be calculated Split line can be determined. A set of splits are generated for each dimension.

Example Bitmap indexing of attribute minx in ds1 Bitmap indexing of attribute maxx in ds1 Example Obj Maxx Bitmap Vectors 2 4 5 7 1 3 6 8 <4 10100111 >=2 011111101 AND 011111101 (7 objects) An example of spatial dataset query processing in X-dimension.

In-memory Bitmap-based Spatial Join (IBSJ) We uses Filter and Refine steps as most of existing in- memory spatial join methods. Using objects’ MBR, a set of bitmap indexing vectors are generated for Max and Min values in each dimension. Based-on the spatial relation, the join operation can be performed using only the index vectors. Example (Intersection):

Fixed-precision Approximation Technique

Experiments All experiments have been performed on the RI Cluster from CSE department at The Ohio State University. Datasets: Synthetic datasets: We use Spatial Index Library (SaIL) library to generate a uniform and Gaussian distribution datasets. Real datasets: TIGER/LINE geographical data (US Census Bureau, 2014) Dataset Geometry Number of Objects ROAD Polyline 19637275 ZIPCODE Polygon 33144 RAIL 182965

Performance of Parallel BPSJ Algorithm

Results (Contd)

In-memory IBSJ Algorithm Evaluation (Uniform Distribution)

In-memory IBSJ Algorithm Evaluation (Gaussian Distribution)

In-memory IBSJ Algorithm Evaluation (Real Datasets)

Index Size Comparison Dataset DS Size R-Tree Index Bitmap Index ROAD 1010.89 MB 989.869 MB 106.93 MB ZIPCODE 1.61MB 1.89 MB 1.55 MB RAIL 9.03 MB 10.2 MB 2.21 MB

Conclusion This work focuses on improving spatial Join Operation using bitmap index. We have developed two algorithms , BPSJ for large datasets that cannot fit any memory and in-memory algorithm IBSJ. IBSJ has an average speedup of 4.2 times over Plane- Sweep and 3.06x over R-tree method. BPSJ method has a speed-up of 6.1x over PBSM partitioning algorithm.