Presentation on theme: "John Cunniffe Dunsink Observatory Dublin Institute for Advanced Studies Evert Meurs (Dunsink Observatory) Aaron Golden (NUI Galway) Aus VO 18/11/03 Efficient."— Presentation transcript:
John Cunniffe Dunsink Observatory Dublin Institute for Advanced Studies Evert Meurs (Dunsink Observatory) Aaron Golden (NUI Galway) Aus VO 18/11/03 Efficient X-ray Data Mining
2 Once you make doing science with your VO service easy, everyone will want to use your server. Analagous to oversubscribed observatory time - how do users successfully compete for query time Query modelling in a proposal? Need data simulators/previewers to run query on. and/or data subset for test run.
3 Future X-ray missions Current Missions - XMM/Chandra/RXTE - download data (typ. few GB/pointing) - processed on local machine XEUS, Constellation-X, Astro-E2, etc -very large data sets (few 100GB/pointing) -online data processing proposed framework involves users submitting web based requests for processing pipelines -derived data products very important source catalogues images, spectra, lightcurves, etc
4 Efficient X-ray Data Mining Efficient - Dont want to reprocess the data archive unless really needed –maximise use of metadata X-ray - Data processing pipelines more complicated (than e.g. optical) Treatment of faint sources/sky background statistically complex Instrument response complex (not exclusive to X-ray) Data Mining - Interested in the sources found in the data but also in the context (i.e. why we found them in that selection) Not simply interested in finding objects through cone searches and stopping there.
5 Science Use Case Interested in variable/transient X-ray objects short-term: e.g. flare stars(~1 dataset) long-term: e.g. variability of normal/active galaxies(multi-dataset) Current approach: use http-get scripts to Heasarc - create cross-correlated source cats. where known objects are not present in a catalogue –retrieve original dataset & calculate upper flux limit (Expensive) N.B. if source catalogue was generated from the whole data archive then we may need to re-analyse a significant fraction of it. To understand space density/flaring rate/etc of populations in the catalogues we need to know the volume of space covered by archive: area coverage (RA, dec)temporal coverage (t 1,t 2,...,t i ) spectral (Energy)flux limit
7 Regions with different sensitivity included in same source catalogues. c.f. XMM-Serendipitous Source Cat (created from pointed mode observations with different exposure times & instrument modes) Need a good coverage/sensitivity model of the data archive to understand volume of space contained in source catalogue. 6 6 binned image of RASS data set Survey depth
8 Model Method 1: Upper Limit predictor Combine: Instument model (ARFs, PSFs, modes,...) Exposure time.... (0-30ksec) N H information,.… … source spectral model,.... create a high resolution flux limit map of the RASS sky …. ….. in progress.
9 Model Method 2: Upper limit flux tabulation Reprocess the data archive and determine the upper limit statistics from the photon data directly … combine with …. N H information,.… … source spectral model,.... create a high resolution flux limit map of the RASS sky …. ….. in progress.
10 Results in a sensitivity map of the RASS sky - adds usefulness to the source catalogue Doing this with RASS is straightforward (though not quick) as the total data archive is a few 10s of GB. Doing it for future observatories will have to be done on the archive curators server
11 The role of Archive/Source Catalogue Metadata Data Archive Source Catalogue How should contents (not parameters) of a source catalogue best be described in the metadata? - why are the sources in it - in it? - describe the selection criteria X-ray photon lists/ancilliary instrument data Computationally expensive to reprocess Selection Criteria
12 Flux limit maps, limiting magnitude calculators, observation simulators ….. VO Data Model? These are an integral part of the sensitivity/coverage description Enhance the metadata (face larger metadata) Theory? This is really telescope simulation Build separate model/simulator
13 Other wavebands Similar challenges other wavebands. Complex coverage and sensitivity descriptions plus catalogue selection criteria. How many brown dwarves are there? In general, how much data description should go in the metadata and how much should be left in secondary resources?
14 Final Questions. How big (Kbytes) should data archive metadata be? –Should it include preview data (e.g. large FITS files)? –Should selection criteria be described in the metadata (or simply a reference to the original publication) –Provide partially reduced or preview data as externally held addendum to the metadata? Much bigger than standard metadata Much smaller than whole archive –What other tools are needed to allow astronomers to assess usefullness of, justify to Time Allocation Committees large proposals/queries in a VO context?