Presentation on theme: "Spatial (or N-Dimensional) Search in a Relational World Jim Gray."— Presentation transcript:
Spatial (or N-Dimensional) Search in a Relational World Jim Gray
Equations Define Subspaces For (x,y) above the line ax+by > c Reverse the space by -ax + -by > -c Intersect a 3 volumes: a 1 x + b 1 y > c 1 a 2 x + b 2 y > c 2 a 3 x + b 3 y > c 3 x y x=c/a y=c/b ax + by = c x y
Domain is Union of Convex Hulls Simple volumes are unions of convex hulls. Higher order curves also work Complex volumes have holes and their holes have holes. (that is harder). Not a convex hull +
Now in Relational Terms create table HalfSpace ( domainID int not null -- domain name foreign key references Domain(domainID), convexID int not null,-- grouping a set of ½ spaces halfSpaceID int identity(),-- a particular ½ space x float not null, -- the (a,b,..) parameters y float not null, -- defining the ½ space z float not null, cfloat not null, -- the constant (c above) primary key (domainID, convexID, halfSpaceID) (x,y,z) inside a convex if it is inside all lines of the convex (x,y,z) inside a convex if it is NOT OUTSIDE ANY line of the convex select convexID-- return the convex hulls from HalfSpace-- from the constraints * x * y * z < l -- point outside the line? group by all convexID-- consider all the lines of a convexID having count(*) = 0 -- count outside == 0
The Algebra is Simple = spDomainNew = spDomainNewConvex = spDomainNewConvexConstraint = select * from float) Once constructed they can be manipulated with the Boolean = spDomainOr = spDomainAnd = spDomainNot varchar(8000))
What! No Bounding Box? Bounding box limits search. A subset of the convex hulls. If query runs at 3M halfspace/sec then no need for bounding box, unless you have more than 10,000 lines. But, if you have a lot of half-spaces then bounding box is good.
A Different Problem Table-valued function find points near a point –Select * from fGetNearbyEq(ra,dec,r) Use Hierarchical Triangular Mesh –Space filling curve, bounding triangles… –Standard approach 13 ms/call… So 70 objects/second. Too slow, so precompute neighbors: Materialized view. At 70 objects/sec it takes 6 months to compute a billion objects.
Zone Based Spatial Join Divide space into zones Key points by Zone, offset (on the sphere this need wrap-around margin.) Point search look in a few zones at a limited offset: ra ± r a bounding box that has 1-π/4 false positives All inside the relational engine Avoids impedance mismatch Can batch all-all comparisons 33x faster and parallel 6 days, not 6 months! r ra-zoneMax (r 2 +(ra-zoneMax) 2 ) cos(radians(zoneMax)) zoneMax x Ra ± x
In SQL select o1.objID -- find objects from zone o1 -- in the zoned table where o1.zoneID between -- where zone # and-- overlaps the circle and o1.ra quick filter on ra and o1.dec between and -- quick filter on dec and ( (sqrt( -- careful filter on distance Eliminates the ~ 21% = 1-π/4 False positives Bounding box
Summary SQL is a set oriented language You can express constraints as rows Then You –Can evaluate LOTS of predicates per second –Can do set algebra on the predicates. Benefits from SQL parallelism SQL == Prolog?