Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sameh Shohdy, Yu Su, and Gagan Agrawal

Similar presentations


Presentation on theme: "Sameh Shohdy, Yu Su, and Gagan Agrawal"— Presentation transcript:

1 Load Balancing and Accelerating Parallel Spatial Join Operations using Bitmap Indexing
Sameh Shohdy, Yu Su, and Gagan Agrawal Computer Science and Engineering, The Ohio State University, Columbus, OH HiPC15 (Bengaluru, India) ,

2 Introduction Proliferation of spatial data in a variety of spatial-based applications. Common spatial queries: Range Queries. Nearest Neighbor Queries (NN). Spatial Join Queries (SJ). This work focuses on Spatial Join queries This work aims at improving existing sequential and parallel algorithms using Bitmap indexing.

3 Background Spatial Join: combines two spatial datasets to find pairs of spatial objects that satisfy a certain spatial relation. Examples: Find all the cities that exist in Ohio State? Ohio (State) contains Columbus, Cleveland, Dayton , …etc. (Cities). Find all the Interstate Highways that intersected with I-70 in Ohio I-70 intersects with I-71, I-75, and I-77 in Columbus.

4 Background (2) Existing SJ algorithms are divided into:
In-memory SJ algorithms: Examples: Nested-loop join, indexed nested-loop join, R-Tree, and Plane Sweep, Perform two passes through both datasets (Filter and Refine) to find related objects. Filter: represents the objects by their Minimum Bounding Rectangles (MBR) and select a set of candidates to reduce the complexity of the join operation. Refine: Uses the full representation to evaluate the candidates Disk-based SJ algorithms: Examples: Partition based Spatial-Merge join (PBSM) and Size Separation Spatial join (S3).

5 Partition based Spatial-Merge join (PBSM)
Dividing the spatial space into a set of partitions. Each partition contains a small set of spatial objects from both datasets. Objects in each partition can be joined using any existing in-memory join algorithms.

6 Challenge Data skew negatively impact the performance of such partitioning algorithms Load variation - parallel PBSM join algorithm with 64 cores.

7 Contributions We describe and effective approach for load-balancing based on Bitmap indexing as a summary of the data, Bitmap Partitioning-based Spatial Join (BPSJ). We also use bitmaps to speedup in-memory spatial join operation, In-memory Bitmap-based Spatial Join(IBSJ). Our BPSJ algorithm outperforms parallel well-known PBSM method by an average of 6.1x while IBSJ has an average speed-up of 4.2 times over plane-sweep and times over R-tree method.

8 Bitmap Indexing att0 Bitmap Vectors 3 4 6 R0 1 R1 R2 R3 R4 V0 V1 V2 V3 A set of bitmap vectors are built without re- organizing the data. Depends on light processing logic operations (AND, OR, and NOT) att0 att1 R0 1 R1 3 R2 4 2 R3 6 5 R4 att1 Bitmap Vectors 1 2 5 R0 R1 R2 R3 R4 V0 V1 V2 V3

9 Bitmap Partitioning-based Spatial Join (BPSJ)
Depends on bitmap to efficiently divides the spatial spaces into a set of partitions. Using pre-generated bitmap vectors, The actual number of objects per partition can be calculated Split line can be determined. A set of splits are generated for each dimension.

10 Example Bitmap indexing of attribute minx in ds1
Bitmap indexing of attribute maxx in ds1 Example Obj Maxx Bitmap Vectors 2 4 5 7 1 3 6 8 <4 >=2 AND (7 objects) An example of spatial dataset query processing in X-dimension.

11 In-memory Bitmap-based Spatial Join (IBSJ)
We uses Filter and Refine steps as most of existing in- memory spatial join methods. Using objects’ MBR, a set of bitmap indexing vectors are generated for Max and Min values in each dimension. Based-on the spatial relation, the join operation can be performed using only the index vectors. Example (Intersection):

12 Fixed-precision Approximation Technique

13 Experiments All experiments have been performed on the RI Cluster from CSE department at The Ohio State University. Datasets: Synthetic datasets: We use Spatial Index Library (SaIL) library to generate a uniform and Gaussian distribution datasets. Real datasets: TIGER/LINE geographical data (US Census Bureau, 2014) Dataset Geometry Number of Objects ROAD Polyline ZIPCODE Polygon 33144 RAIL 182965

14 Performance of Parallel BPSJ Algorithm

15 Results (Contd)

16 In-memory IBSJ Algorithm Evaluation (Uniform Distribution)

17 In-memory IBSJ Algorithm Evaluation (Gaussian Distribution)

18 In-memory IBSJ Algorithm Evaluation (Real Datasets)

19 Index Size Comparison Dataset DS Size R-Tree Index Bitmap Index ROAD
MB MB MB ZIPCODE 1.61MB 1.89 MB 1.55 MB RAIL 9.03 MB 10.2 MB 2.21 MB

20 Conclusion This work focuses on improving spatial Join Operation using bitmap index. We have developed two algorithms , BPSJ for large datasets that cannot fit any memory and in-memory algorithm IBSJ. IBSJ has an average speedup of 4.2 times over Plane- Sweep and 3.06x over R-tree method. BPSJ method has a speed-up of 6.1x over PBSM partitioning algorithm.

21


Download ppt "Sameh Shohdy, Yu Su, and Gagan Agrawal"

Similar presentations


Ads by Google