Presentation is loading. Please wait.

Presentation is loading. Please wait.

VO as a Data Grid, NeSC ‘03 WFCAM Science Archive Nigel Hambly Wide Field Astronomy Unit Institute for Astronomy, University of Edinburgh.

Similar presentations


Presentation on theme: "VO as a Data Grid, NeSC ‘03 WFCAM Science Archive Nigel Hambly Wide Field Astronomy Unit Institute for Astronomy, University of Edinburgh."— Presentation transcript:

1 VO as a Data Grid, NeSC ‘03 WFCAM Science Archive Nigel Hambly Wide Field Astronomy Unit Institute for Astronomy, University of Edinburgh

2 VO as a Data Grid, NeSC ‘03 Background & context Wide Field Astronomy: - large-scale public surveys - multi-colour, multi-epoch imaging data sets Developments over recent decades: - whole-sky Schmidt telescope surveys (eg. SuperCOSMOS) - current generation optical/IR, eg. SDSS, WFCAM - next generation, eg. VISTA  Prime examples of key datasets that will be the cornerstone of the VO datagrid

3 VO as a Data Grid, NeSC ‘03 SuperCOSMOS scans photographic media: 10 Gbyte/day 3 colours: B, R & I 1 colour (R) at 2 epochs 0.7 “/pixel 2 byte/pixel whole sky total data volume (pix): ~15 Tbyte S hemisphere completed 2002 (N hemisphere by end 2005)

4 VO as a Data Grid, NeSC ‘03 WFCAM will image the sky directly using IR sensitive detectors; deployment on a 4m telescope (UKIRT): 100 Gbyte/night 5 colours: ZYJHK; some multi-epoch imaging 0.4 “/pixel 4 byte/pixel ~10% sky coverage in selected areas (various depths) total data volume (pix): ~100 Tbyte observations start in 2004; 7 yr programme planned

5 VO as a Data Grid, NeSC ‘03 500 Gbyte/night 4 colours: zJHK targeted surveys (various depths & areas) 0.34 “/pixel total data volume (pix): ~0.5 Pbyte observations start at the end of 2006 VISTA (also 4m) will have 4x as many IR detectors as WFCAM:

6 Characteristics of astronomy DBs (I) pixel images processed into lists of parameterised detections known as “catalogues” (parameterised data typically <10% of pixel data volume) detection association within survey data yielding multi-colour, multi-epoch source record

7 VO as a Data Grid, NeSC ‘03 Characteristics of astronomy DBs (II) detailed (but relatively small) amount of descriptive data with images and catalogues required to track descriptive data and images along with catalogue data for current/future generation surveys processing and ingest dictated by observing patterns but users require well defined, stable catalogue products on which to do their science  hence require periodic release of stable, well-defined, read-only catalogues

8 VO as a Data Grid, NeSC ‘03 Typical usages (I) increasingly involve jointly querying different survey datasets in different databases -example shows stellar population discrimination using SDSS colours and SSA proper motions (Digby et al., astro-ph/0304056, MNRAS in print)

9 VO as a Data Grid, NeSC ‘03 Typical usages (II) position & proximity searches v. common - spatial indexing (2d, spherical geom.) required statistical studies: ensemble characteristics of different species of source one-in-a-million searches for peculiar sources with highly detailed, specific properties - whole table scans …? => enable flexible interrogation to inspire new, innovative usage and promote new science

10 VO as a Data Grid, NeSC ‘03 Science archive development at WFAU: SSA: a few Tbytes WSA = 10x SSA VSA = 5x WSA  approach is to set up a prototype archive system now (SSA), expand and implement WSA to coincide with WFCAM ops, then scale to VSA.

11 Database design: key requirements (I) Flexibility: ingested data are rich in structure daily ingest; daily/weekly/monthly curation many varied usage modes protect proprietorial rights allow for changes/enhancements in design VO as a Data Grid, NeSC ‘03

12 Database design: key requirements (II) Scalability: ~2 Tbytes of new data per year operating lifetime > 5 years maintain performance for increasing data volumes Portability: V1.0/V2.0 phased approach to hardware/OS/DBMS VO as a Data Grid, NeSC ‘03

13 Database design: fundamentals (I) RDBMS, not OODBMS WSA V1.0: Windows/SQL Server (“SkyServer”) - V2.0 may be the same, DB2, or Oracle Image data stored as external flat files, not BLOBs - but image metadata stored in DBMS All attributes “not null”, ie. mandatory values Archive curation information stored in DBMS VO as a Data Grid, NeSC ‘03

14 Database design: fundamentals (II) Calibration coefficients stored for astrometry & photometry - instrumental quantities stored (XY in pix; flux in ADU) - calibrated quantities stored based on current calibration - all previous coefficients and versioning stored VO as a Data Grid, NeSC ‘03

15 Database design: fundamentals (III) Reruns: reprocessed image data - same observations yield new source attribute values - re-ingest, but retain old parameterisation Repeats: better measurements of the same source - eg. stacked image detections - again, retain old parameterisation Duplicates: same source & filter but different observation - eg. overlap regions - store all data, and flag “best” VO as a Data Grid, NeSC ‘03

16 Hardware design (I) separate servers for - pixels - catalogue curation - catalogue public access - web services different hardware solutions - mass storage on IDE with HW RAID5 - high bandwidth catalogue servers using SCSI and SW RAID

17 VO as a Data Grid, NeSC ‘03 Hardware design (II) mass storage of pixels using low-cost IDE

18 VO as a Data Grid, NeSC ‘03 Hardware design (III) dual P4 Xeon server independent PCI-X buses for maximum b/w dual channel Ultra320 SCSI adapters  High bandwidth catalogue server

19 VO as a Data Grid, NeSC ‘03 Hardware design (IV) individual Seagate 146 Gbyte disks sustain > 50 Mbyte/s sequential read Ultra320 saturates at 200 Mbyte/s in one channel 4 disks per channel SW RAID striping across disks (following SkyServer design of Gray, Szalay & colleagues)

20 The SuperCOSMOS Science Archive (SSA) WFCAM Science Archive prototype Existing ad hoc flat file archive (inflexible, restricted access) re-implemented in an RDBMS Catalogue data only (no image pixel data) 1.3 Tbytes of catalogue data Implement a working service for users & developers to exercise prior to arrival of Tbytes of WFCAM data VO as a Data Grid, NeSC ‘03

21 SSA has several similarities to WSA: spatial indexing is required over celestial sphere many source attributes in common, eg. position, brightness, colour, shape, … multi-colour, multi-epoch detection information results from multiple measurements of the same source VO as a Data Grid, NeSC ‘03

22 Development method: “20 queries approach” a set of real-world astronomical queries, expressed in SQL includes joint queries between the SSA and SDSS Example: /* Q14: Provide a list of stars with multiple epoch measurements, which have light variations >0.5 mag. */ select objid into results from Source where (classR1=1 and classR2=1 and qualR1<128 and qualR2<128) and abs (bestmagR1-bestmagR2) > 0.5 VO as a Data Grid, NeSC ‘03

23 SSA relational model: relatively simple catalogues have ~256 byte records with mainly 4-byte attributes, ie. 50 to 60 per record so 2 tables dominate the DB - Detection: 0.83 Tbyte - Source: 0.44 Tbyte VO as a Data Grid, NeSC ‘03

24 SSA has been implemented & data are being ingested: VO as a Data Grid, NeSC ‘03

25 WSA has significant differences, however: catalogue and pixel data; science – driven, nested survey programmes (as opposed to SSA “atlas” maps of whole sky) result in complex data structure; curation & update within DBMS (whereas SSA is a finished data product ingested once into the DBMS). VO as a Data Grid, NeSC ‘03

26 WFCAM Science Archive : relational design VO as a Data Grid, NeSC ‘03

27 WFCAM Science Archive Schematic picture of the WSA: Pixels: - one flat – file image store; access layer restricts public access - filenames and all metadata are tracked in DBMS tables with unrestricted access Catalogues: - WFAU incremental (no public access) - Public, released DBs - external survey datasets also held VO as a Data Grid, NeSC ‘03

28 Image metadata relational model Programme & Field => vital library calibration frames stored & related primary/extension HDU keys logically stored & related this will work for VISTA VO as a Data Grid, NeSC ‘03

29 Astrometric and photometric calibration data: require to store calibration information recalibration is required – esp. photometric old calibration coefficients must be stored time-dependence (versioning) complicates the relational model Calibration data are related to images; source detections are related to images and hence their relevant calibration data VO as a Data Grid, NeSC ‘03

30 Image calibration data: “set-ups” define nightly detector & filter combinations: - extinctions have nightly values - zps have detector & nightly values coefficients split into current & previous entities Versioning & timing recorded highly non-linear systematics are allowed for via 2D maps

31 Catalogue data: general model related back through progenitor image to calibration data detection list for each programme (or set of sub-surveys) merged source entity is maintained merge events recorded list re-measurements derived VO as a Data Grid, NeSC ‘03

32 Non-WFCAM data: general model each non-WFCAM survey has a stored catalogue (currently locally stored). cross-neighbour table: - records nearby sources between any two surveys - yields associated (“nearest”) source VO as a Data Grid, NeSC ‘03

33 Example: UKIDSS LAS & relationship to SDSS UKIDSS LAS overlaps with SDSS list measurements: - at positions defined by IR source, but in optical image data; - do not currently envisage implementing this the other way (ie. optical source positions placed in IR image data) VO as a Data Grid, NeSC ‘03

34 – set of entities to track in-DBMS processing: archived programmes have: - required filter set - required join(s) - required list – driven measurement product(s) - release date(s) - final curation task - one or more curation timestamps a set of curation procedures is defined for the archive VO as a Data Grid, NeSC ‘03 Curation:

35 WFCAM Science Archive: V1.0 schema implementation VO as a Data Grid, NeSC ‘03

36 Implementation: unique identifiers (UIDs) meaningful UIDs, not arbitrary DBMS-assigned sequence no. following relational model, compound UIDs from appropriate attributes, eg. - detection UID is a combination of sequence no. on detector and detector UID - detector UID is a combination of extension no. of detector and multiframe UID but: top-level UIDs compounded into new attribute to avoid copying many columns down the relational hierarchy, eg. - meaningful multiframe UID is made up from UKIRT run no., and observation and ingest dates. VO as a Data Grid, NeSC ‘03

37 Critical Design Review, April 2003 Implementation: SQL Server database picture (I) Multiframe & nearest neighbour tables VO as a Data Grid, NeSC ‘03

38 Implementation: SQL Server database picture (II) UKIDSS LAS & nearest neighbour tables VO as a Data Grid, NeSC ‘03

39 Implementation: spatial index attributes Hierarchical Triangular Mesh algorithm (courtesy of P. Kunszt, A. Szalay & colleagues) HTM attribute HTMID for each occurrence of RA & Dec SkyServer functions & stored procedures: - spHTM_Lookup, spHTM_Cover, spHTM_To_String, fHTM_Cover etc. VO as a Data Grid, NeSC ‘03

40 Implementation: table indexing standard RDBMS practice: index tables on commonly used fields one “clustered” index per table based on primary key (default) - results in re-ordering of data on disk further non-clustered indices: - when indexing on more than one field, put in order of decreasing selectivity - HTM index attribute is included as most selective in at least one non-clustered index on appropriate tables - index files stored on different disk volumes to tables to help minimise disk “thrashing” = > experimentation required with real astronomical data and queries: SSA prototype VO as a Data Grid, NeSC ‘03

41 User interface & Grid context (I) “traditional” interfaces (ftp/http), eg. existing implementations:  WWW from interface Access via CDS Aladin tool 

42 VO as a Data Grid, NeSC ‘03 User interface & Grid context (II) SQL form interfaces:

43 VO as a Data Grid, NeSC ‘03 User interface & Grid context (III) web services under development (XML/SOAP/VOtable) other data (eg. SDSS, 2MASS, …) mirrored locally initially but aspiration is to enable usages employing distributed resources (both data and CPU) ultimately  recast web services as Grid services to integrate WSA into the VO Data Grid


Download ppt "VO as a Data Grid, NeSC ‘03 WFCAM Science Archive Nigel Hambly Wide Field Astronomy Unit Institute for Astronomy, University of Edinburgh."

Similar presentations


Ads by Google