Presentation is loading. Please wait.

Presentation is loading. Please wait.

2003 May 24Clive Page Implementation of XMATCH function.

Similar presentations


Presentation on theme: "2003 May 24Clive Page Implementation of XMATCH function."— Presentation transcript:

1 2003 May 24Clive Page Implementation of XMATCH function

2 2003 May 24Clive Page Cross-matching Very important functionality – by combining datasets we often get new scientific results. In DBMS terms it needs a spatial join – the join criterion is the overlap of the error-regions. Error-region always small patch of sky, never just a point because of errors of measurement, extended objects, proper motions, etc. Shapes of error-regions vary: often elliptical, sometimes circular, occasionally more complex. Size depends on confidence level of the match – often expressed as say x% confidence, or y-sigma (the latter assumes some error distribution, e.g. Gaussian).

3 2003 May 24Clive Page Other cross-match complications Difference in epochs of catalogues means some objects will have moved – apply proper motions? –Need epoch metadata. Different users will want different confidence levels and hence sizes of error regions –Large region may produce too many false positives. –May need to adjust confidence level in the light of experience.

4 2003 May 24Clive Page Current ADQL Syntax …WHERE XMATCH(x, y, !z) > 3 AND … Some problems: XMATCH is not a quite a function: if a confidence level or N-sigma value is needed it should be one of the arguments. Better to express matching probability in X% confidence than as N-sigma, as former makes no assumptions about the functional form of the error distribution. No syntax for LEFT OUTER JOIN (return unmatched sources as well as matched ones).

5 2003 May 24Clive Page Specifying a join in SQL Two methods in the standards: –SELECT * FROM t1, t2 WHERE t1.x = t2.y … –SELECT * FROM t1 [LEFT OUTER] JOIN t2 ON t1.x = t2.y WHERE … The latter form is more verbose but –allows a number of different types of join, –the join criterion is explicit. Propose that we use latter, e.g. ON XMATCH(…)

6 2003 May 24Clive Page Some cross-match implementations Cross-match of two catalogues of ~N sources is an O(N²) operation unless some form of indexing/sorting is used, which can reduce it to O(N log N) or better. Known algorithms –Join using spatial index such as R-tree (Oracle, Sybase, MySQL, Postgres…) or Grid-file (DB2) –Join using pixel code and B-tree (more complex and slower, but feasible with just about any DBMS) –Sort/sweep algorithm of Dave Abel and colleagues. Note: all these use bounding boxes drawn around the error ellipses (or whatever). Refinement stage weeds out the false matches.

7 2003 May 24Clive Page Algorithm limitations With most algorithms, if the size of the error region changes it is necessary to generate new indices - slow. But: the hard part is done by the DBMS in reducing an O(N²) problem to one of O(N log N) or better. Proposed solution: –Always carry out the cross-match using largest error region that is scientifically justifiable (99.9% or 3σ) –The user can then refine the crude selection using the relatively small table of results, rejecting sources too far apart for the actual error-regions and confidence level. –In this case: can omit confidence level (or N-σ) value in the XMATCH function – apply only in refine stage.

8 2003 May 24Clive Page Selecting sources which have no counterpart Syntax is: XMATCH(x, !z) Standard RDBMS can do this using –LEFT OUTER JOIN of x and z –INNER JOIN of x and z –Take difference between results of the last two. Or is there a simpler way? I think that it is at least as important to have a defined syntax for LEFT OUTER JOIN as knowing which sources have no counterpart is often scientifically important. Propose plus symbol, e.g. XMATCH(x+, y)

9 2003 May 24Clive Page Cross-match can also find clusters of objects Scientific examples: find clusters of stars, galaxies, objects affected by gravitational lensing. Method: cross-match catalogue with itself but with a much larger maximum offset than the error-regions. Problem: needs index generated using bounding boxes much larger than error-regions used for finding counterparts. May need additional function like XMATCH but with additional parameters e.g. for maximum offset.


Download ppt "2003 May 24Clive Page Implementation of XMATCH function."

Similar presentations


Ads by Google