Presentation is loading. Please wait.

Presentation is loading. Please wait.

WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM Science Archive prototype Existing ad hoc flat file.

Similar presentations


Presentation on theme: "WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM Science Archive prototype Existing ad hoc flat file."— Presentation transcript:

1 WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM Science Archive prototype Existing ad hoc flat file archive (inflexible, restricted access) re-implemented in an RDBMS Catalogue data only (no image pixel data) 1.26 Tbytes of catalogue data Implement a working service for users & developers to exercise prior to arrival of Tbytes of WFCAM data

2 WFCAM Science Archive Critical Design Review, April 2003 SSA has several similarities to WSA: spatial indexing is required over celestial sphere many source attributes in common, eg. position, brightness, colour, shape, … multi-colour, multi-epoch merged source information results from multiple measurements of the same source

3 WFCAM Science Archive Critical Design Review, April 2003 Entity – Relationship Models (ERMs): generalised, DBMS – independent simple, pictorial summary of relational design ERMs map directly to table design

4 WFCAM Science Archive Critical Design Review, April 2003 SSA relational model: very simple relational model total of 5 entities Catalogues have ~256 byte records with mainly 4-byte attributes, ie. 50 to 60 per record so 2 tables dominate the DB - SurveyCat: 0.82 Tbyte - MergedCat: 0.44 Tbyte

5 WFCAM Science Archive Critical Design Review, April 2003 SSA has been implemented and 1% of data ingested: as prototype for V1.0 WSA Windows/SQL Server => “SkyServer” real-world queries used to exercise SSA 100% ingested and online by end Q2 2003 test-bed for user access tools and archive scientist curation

6 WFCAM Science Archive Critical Design Review, April 2003 Development method: “20 queries approach” a set of real-world astronomical queries, expressed in SQL includes joint queries between the SSA and SDSS currently have been exercised in the EDR region: - SSA: 13 million records; ~ 3 Gbyte - SDSS: 14 ; ~22

7 WFCAM Science Archive Critical Design Review, April 2003 WSA has significant differences, however: catalogue and pixel data; calibration and other extensive metadata; science – driven, nested survey programmes (as opposed to SSA “atlas” maps of whole sky) result in complex data structure; curation & update within DBMS (whereas SSA is a finished data product ingested once into the DBMS).

8 WFCAM Science Archive Critical Design Review, April 2003 WSA key requirements: flexibility: - ingested data are rich in structure - ingest occurs daily, curation daily/weekly/monthly … - many varied usage modes - protect proprietorial rights for many data scalability: - ~2 Tbyte of new catalogue & ancillary data per year rapid response: - need to maintain rapid response despite increasing data volumes

9 WFCAM Science Archive Critical Design Review, April 2003 Schematic picture of the WSA: Pixels: - one flat – file image store; access layer restricts public access - filenames and all metadata are tracked in DBMS tables with unrestricted access Catalogues: - WFAU incremental (no public access) - Public, released DBs - external survey datasets also held

10 WFCAM Science Archive Critical Design Review, April 2003 WFCAM pixel data pixel data consist of multiframes and combiframes in WSA parlance; stored as flat files (not BLOBs in the DBMS) metadata are stored in the DBMS library calibration frames are held default image products are held

11 WFCAM Science Archive Critical Design Review, April 2003 WFCAM multiframe - any pipeline product that: retains instrumental “paw print” as distinct images (WSA calls these “detector frames”) is not made up from other ingested frames (eg. microstep interleave is a multiframe) WSA includes difference images as multiframes

12 WFCAM Science Archive Critical Design Review, April 2003 WFCAM combiframe – any pipeline or archive product that: is the result of combination process on stored multiframes - eg. pipeline dither/stack/mosaic product - eg. archive default stack/mosaic product (NB: combiframe may still reflect the “paw print” so can have multiframe characteristics)

13 WFCAM Science Archive Critical Design Review, April 2003 Multiframe ERM: Programme & Field => vital library calibration multiframes stored & related primary/extension HDU keys logically stored & related this will work for VISTA

14 WFCAM Science Archive Critical Design Review, April 2003 Combiframe ERM: every combiframe has provenance linking to multiframes individual calibration frames not reqd. but individual confidence frames are combiframe may consist of multiframe-like detector combiframes

15 WFCAM Science Archive Critical Design Review, April 2003 Astrometric and photometric calibration data: require to store calibration information (SRAD) recalibration is required – esp. photometric (SRAD) old calibration coefficients must be stored (SRAD) time-dependence (versioning) complicates the relational model Calibration data are related to images; source detections are related to images and hence their relevant calibration data

16 Multiframe calibration data: “set-ups” define nightly detector & filter combinations: - extinctions have nightly values - zps have detector & nightly values coefficients split into current & previous entities Versioning & timing recorded highly non-linear systematics are allowed for via 2D maps

17 WFCAM Science Archive Critical Design Review, April 2003 Combiframe calibration data: no “set-ups”: each image separately calibrated; detector combiframes are catered for “luptidude” parameters stored for each image separately

18 WFCAM Science Archive Critical Design Review, April 2003 Catalogue data: general model related back through progenitor image to calibration data detection list for each programme (or set of sub-surveys) merged source entity is maintained merge events recorded list re-measurements derived

19 WFCAM Science Archive Critical Design Review, April 2003 Example: UKIDSS LAS, GPS & GCS LAS, GPS & GCS share one detection & one list re-measurement entity individual merged source and source re-measurement entities note curation information: - merge log (one per prog.) - current/repeat detections - primary/secondary (eg. overlaps) - new/old merge image flag (to trigger list re-measurement)

20 Critical Design Review, April 2003 Non-WFCAM data: general model each non-WFCAM survey has a stored catalogue cross-neighbour table: - records nearby sources between any two surveys - yields associated (“nearest”) source non-WFCAM list measurements where image data are available (NB: V2.0 requirement)

21 Critical Design Review, April 2003 Example: UKIDSS LAS & relationship to SDSS UKIDSS LAS overlaps with SDSS list measurements: - at positions defined by IR source, but in optical image data; - do not currently envisage implementing this the other way (ie. optical source positions placed in IR image data)

22 WFCAM Science Archive Curation – set of entities to track in-DBMS processing: archived programmes have: - required filter set - required join(s) - required list – driven measurement product(s) - release date(s) - final curation task - one or more curation timestamps a set of curation procedures is defined for the archive

23 The V2.0 R&D programme for the WSA: scalability issues speed: wish to maintain query response performance as catalogue data accumulate to many Tbytes - goal is ~100sec for Tbyte trawls (ie. non-indexed) data volume: wish to cope ultimately with single tables of size 10s of Tbytes or more …

24 But: we’re poor academics! limited financial resources - can’t afford rolls-royce SAN-type solution, for example staff resources limited - need low maintenance systems


Download ppt "WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM Science Archive prototype Existing ad hoc flat file."

Similar presentations


Ads by Google