Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.

Similar presentations


Presentation on theme: "An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer."— Presentation transcript:

1 An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer Science University of Denver {wbae, salkobai, leut}@cs.du.edu

2 Outline Introduction Motivation Spatial Join Estimation IRSJ Algorithm –Sampling –Joining –Statistics Experiments Conclusion

3 Introduction GIS data is used to describe the geometry and location of geographic phenomena. GIS data is represented in two ways: –Raster: divides the world into cells. –Vector: defines features based on coordinate- based structures (e.g. point, line, polygon). This paper targets vector data.

4 Introduction Cont. Geographic or spatial queries are applied to spatially indexed databases. –e.g. containment and intersection. We focus on finding the number of intersections of two spatial datasets (spatial joins). –e.g. number of roads that intersect rivers in the US.

5 Spatial Joins Spatial joins relate two data sets that share locations in space. The processing of spatial queries can be accelerated when some spatial indexing such as R-tree exists. Spatial joins of two R-trees can be done by applying synchronized tree traversals on both R-tree nodes to find intersecting items.

6 Spatial Joins Cont. R: RiversS: Cities R1R1 R2R2 S1S1 S2S2 r1r1 r2r2 r4r4 r3r3 r5r5 s1s1 s2s2 s3s3 s4s4 R2R2 R1R1 r1r1 r2r2 r3r3 r4r4 r5r5 s1s1 s2s2 s3s3 s4s4 S1S1 S2S2

7 Motivation GIS supports very large data sets finding exact answers to spatial queries can be very time consuming. In GIS data analysis a fast estimation of the final result that has error bounded to 2%-10% can do the job. So, provide an approximate answer through an incremental refining process. Thus, allow for more interactive data exploration.

8 Examples 1.“What are the intersections of mineral plants and radiometric ages areas in the US?” 2.“Where do mineral resources intersect geochemical sediments in the US?”

9 Examples Cont. Locations of geochemical sediments in COLocations of mineral resources in CO Intersections

10 Spatial Join Estimation Parametric: uses some properties of data distribution to present a formula for the estimation. (e.g. power law, fractal dimension). Histograms: keep certain information for different regions of the data to be used when a query is given. (e.g. Geometric Histogram, Euler Histogram).

11 Spatial Join Estimation Cont. Sampling: uses smaller data sets (samples) to calculate an estimate of the final result by applying the join on the sample.

12 Incremental Refining Join Process Dataset 1 (R) Dataset 2 (S) Sampling Samples WQ # intersections (intermediate result) Statistics Final Estimation w/ CI User Report Incremental Process

13 Random Sampling We assume that both data sets are indexed using R- trees. Samples are chosen from one R-tree called the outer relation R. Samples are used as window queries to query the inner relation S. Randomness: –Acceptance/Rejection method: inclusion probability is proportional to some parameter of the item sampled.

14 Tuple and Page Level Sampling Tuple-level: –A page is selected at random from R and one tuple (MBR) of that page is chosen at random. Page-level: –A page is selected at random from R and all tuples (MBRs) of that page are used as a sample.

15 Window Query The chosen MBR from one data set (R-tree) serves as a window query to the other data set to find the intersections. The query returns all the objects from the second data set that overlaps with the query window. The number of intersections found is used in the process of finding an approximate answer to the query.

16 Window Query Example R: RiversS: Cities R1R1 R2R2 S1S1 S2S2 r1r1 r2r2 r4r4 r3r3 r5r5 s1s1 s2s2 s3s3 s4s4 R2R2 R1R1 r1r1 r2r2 r3r3 r4r4 r5r5 s1s1 s2s2 s3s3 s4s4 S1S1 S2S2

17 Estimated Value and Confidence Interval Estimated Value: the statistic computed from sample information. Population Proportion: fraction indicating the part of the sample having a particular interest. Confidence interval: an interval that estimates a population parameter within a range of possible values at specified probability. –The specified probability is called the level of confidence.

18 IRSJ t Algorithm 1.C 0; C I 0 {count, confidence interval} 2.repeat 3. for i = 0 to k do 4. L Choose leaf from R at random 5. M MBR of a randomly chosen tuple within L 6. I number of intersections of a Window Query (M,S) 7. C C + I 8. end for 9. C I Compute confidence interval using C 10. EV Compute estimated value using C 11.until The desired confidence interval C f attained

19 Experiments (settings and data sets) IRSJ compared to full R-tree join. Confidence level set to 95%. Varied buffer size and data size. Data sets: –Synthetic: U x S, S x U, U x U (# of tuples in each relation varied from 100,000 to 600,000). –Real: from the U.S. Geological Survey: 1.Mineral Resources in the US 2005 (300,432 tuples). 2.Geochemistry of unconsolidated sediments in the US 2001 (199,850 tuples).

20 Experiments Cont. (Synthetic data results) Estimated Value U-600K x S-400K

21 Synthetic Cont. Confidence Interval U-600K x S-400K

22 Synthetic Cont. I/Os with 10% buffer

23 Synthetic Cont. R-tree join Ratio to IRSJ t

24 Experiments Cont. (Real data results) Estimated Value

25 Real Cont. Confidence Interval

26 Real Cont. Buffer Size CI = 10CI = 5CI = 3CI=2CI=1R-join I/Os5% 10% 233 182 403 333 770 752 1896 1164 8159 3795 27756 24756 Node Accesses 5% 10% 780 668 2591 2136 6933 9028 18299 19671 87233 85822 262200 I/O and Node Accesses of IRSJ t and a full R-tree join

27 Conclusion Proposed Incremental Refining Spatial Join: –Page-level –Tuple-level Experimental results showed: –IRSJ provides a reasonably accurate estimate in much earlier stages than the exact answer obtained by full R-tree join. –IRSJ t performs better than IRSJ p. –As the data size increased, the improvement of IRSJ t over full R-tree join increased.

28 Dataset 1 (R) Dataset 2 (S) Sampling Samples WQ # intersections (intermediate result) User Incremental Joining Process StatisticsReport Final Estimation w/ CI Dataset Output / Input Process

29 R: RiversS: Cities L1L1 L2L2 S1S1 S2S2 r1r1 r2r2 r4r4 r3r3 r5r5 s1s1 s2s2 s3s3 s4s4 L2L2 L1L1 r1r1 r2r2 r3r3 r4r4 r5r5 s1s1 s2s2 s3s3 s4s4 S1S1 S2S2 r3r3 R: RiversS: Cities L1L1 L2L2 S1S1 S2S2 r1r1 r2r2 r4r4 r3r3 r5r5 s1s1 s2s2 s3s3 s4s4 L2L2 L1L1 r1r1 r2r2 r3r3 r4r4 r5r5 s1s1 s2s2 s3s3 s4s4 S1S1 S2S2 r3r3

30 R: Rivers L1L1 L2L2 r1r1 r2r2 r4r4 r3r3 r5r5 L2L2 L1L1 r1r1 r2r2 r3r3 r4r4 r5r5

31 L1L1 r1r1 r3r3 L2L2 r2r2 r4r4 L1L1 L2L2 L3L3 L4L4 L5L5 L6L6 r6r6 r9r9 r 10 r 12 L3L3 L4L4 L5L5 L6L6 r6r6 r9r9 r 10 r 12 r5r5 r 11 r7r7 r8r8 r1r1 r2r2 r3r3 r4r4 r5r5 r7r7 r8r8 ST 1 ST 2 ST 3

32 R: Rivers L1L1 r1r1 r3r3 L2L2 r2r2 r4r4 L1L1 L2L2 L3L3 L4L4 L5L5 L6L6 r6r6 r9r9 r 10 r 12 L3L3 L4L4 L5L5 L6L6 r6r6 r9r9 r 10 r 12 r5r5 r 11 r7r7 r8r8 r1r1 r2r2 r3r3 r4r4 r5r5 r7r7 r8r8

33 Dataset 1 (R) Sampling Samples WQ Dataset 2 (S) # intersections (intermediate result) Statistics Final Estimation w/ CI Report User Stop End Yes No DatasetOutput / Input Process Transition step

34 L1L1 r1r1 r3r3 L3L3 L4L4 L6L6 r6r6 r9r9 r 10 r 12 r5r5 r 11 r7r7 r8r8 R 1 R 2 R 3 R 4 L 1 L 2 L 3 L 4 L 5 L 6 L 7 L 8 R: Rivers r1r1 r3r3 L5L5 L2L2 r2r2 r4r4 r 2 r 4 r 6 r 9 r 10 r 12 L7L7 L8L8 R 1 R 2 R 4 R 3 ST 1 ST 2 L 1 L 2 L 3 L 4

35 R 1 R 2 R 3 R 4 L 1 L 2 L 3 L 4 L 5 L 6 L 7 L 8 r2r2 ST 1 ST 2 L 1 L 2 L 3 L 4 R: Rivers r1 r1 r 3 r 4 r 5 r 6 r 7 r 8 r 10 r 11 r 12 r 13 r 16 r 17 r9 r9 r 14 L 5 L 6 L 7 L 8 r 15 r 18 R 1 R 2 R 3 R 4


Download ppt "An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer."

Similar presentations


Ads by Google