# Spatial (or N-Dimensional) Search in a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.

## Presentation on theme: "Spatial (or N-Dimensional) Search in a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U."— Presentation transcript:

1 Spatial (or N-Dimensional) Search in a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.

2 Equations Define Subspaces For (x,y) above the line ax+by > c Reverse the space by -ax + -by > -c Intersect 3 half-spaces: a 1 x + b 1 y > c 1 a 2 x + b 2 y > c 2 a 3 x + b 3 y > c 3 x y x=c/a y=c/b ax + by = c x y

3 Domain is Union of Convex Hulls Simple volumes are unions of convex hulls. Higher order curves also work Complex volumes have holes and their holes have holes. (that is harder). Not a convex hull +

4 Now in Relational Terms create table HalfSpace ( domainID int not null -- domain name foreign key references Domain(domainID), convexID int not null,-- grouping a set of ½ spaces halfSpaceID int identity(),-- a particular ½ space x float not null, -- the (a,b,..) parameters y float not null, -- defining the ½ space z float not null, cfloat not null, -- the constant (c above) primary key (domainID, convexID, halfSpaceID) (x,y,z) inside a convex if it is inside all lines of the convex (x,y,z) inside a convex if it is NOT OUTSIDE ANY line of the convex select convexID-- return the convex hulls from HalfSpace-- from the constraints * x * y * z < l -- point outside the line? group by all convexID-- consider all the lines of a convexID having count(*) = 0 -- count outside == 0

5 The Algebra is Simple = spDomainNew = spDomainNewConvex = spDomainNewConvexConstraint = select * from float) Once constructed they can be manipulated with the Boolean = spDomainOr = spDomainAnd = spDomainNot varchar(8000))

6 What! No Bounding Box? Bounding box limits search. A subset of the convex hulls. If query runs at 3M halfspace/sec then no need for bounding box, unless you have more than 10,000 lines. But, if you have a lot of half-spaces then bounding box is good.

7 A Different Problem Table-valued function find points near a point –Select * from fGetNearbyEq(ra,dec,r) Use Hierarchical Triangular Mesh –Space filling curve, bounding triangles… –Standard approach 13 ms/call… So 70 objects/second. Too slow, so precompute neighbors: Materialized view. At 70 objects/sec it takes 6 months to compute a billion objects.

8 Zone Based Spatial Join Divide space into zones Key points by Zone, offset (on the sphere this need wrap-around margin.) Point search look in a few zones at a limited offset: ra ± r a bounding box that has 1-π/4 false positives All inside the relational engine Avoids impedance mismatch Can batch all-all comparisons 33x faster and parallel 6 days, not 6 months! r ra-zoneMax (r 2 +(ra-zoneMax) 2 ) cos(radians(zoneMax)) zoneMax x Ra ± x

9 In SQL: points near point select o1.objID -- find objects from zone o1 -- in the zoned table where o1.zoneID between -- where zone # and-- overlaps the circle and o1.ra quick filter on ra and o1.dec between and -- quick filter on dec and ( (sqrt( -- careful filter on distance Eliminates the ~ 21% = 1-π/4 False positives Bounding box

10 Do the following in {-1, 0, 1} example ignores some spherical geometry details in full paper insert neighbors-- insert one zone's neighbors select o1.objID as objID, -- object pairs o2.objID as NeighborObjID,.. other fields elided from zone o1 join zone o2 -- force a nested loop on = o2.zoneID -- using zone number and ra and o2.ra between o1.ra and o1.ra -- elided where -- elided margin logic, see paper. and o2.dec between and -- quick filter on dec and sqrt(power(o1.x-o2.x,2)+power(o1.y-o2.y,2)+power(o1.z-o2.z,2)) -- careful filter on distance

11 Summary SQL is a set oriented language You can express constraints as rows Then You –Can evaluate LOTS of predicates per second –Can do set algebra on the predicates. Benefits from SQL parallelism SQL == Prolog?

12 References Representing Polygon Areas and Testing Point-in-Polygon Containment in a Relational Database A Purely Relational Way of Computing Neighbors on a Sphere,

Similar presentations