NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory.

2 NVO Summer School, September 20062 Data Access Layer (DAL) Services Goals –Understand what the DAL services are … and what is involved to implement them Agenda –Review current and planned DAL services –Introduce options/issues faced in implementing the DAL services

3 NVO Summer School, September 20063 Current and Planned DAL Services Dataset Generic dataset, complex data aggregates and associations ( proposed ) Cone (SCS) Catalog data ( released ) SIAP V1.0 Image data ( released ) SSAP 1D Spectra ( near PR; 2nd gen DAL prototype) SLAP Spectral line lists ( near PR ) [S]TAP Table/Catalog access ( proposed ) SSAP followon Spectral Energy Distributions (SEDs) SSAP followon Time series SIAP V2.0 Major upgrade - cube data etc. SNAP Numerical Models / Theory data

4 NVO Summer School, September 20064 Major elements of a DAL service Discovery query (queryData) –Discover data matching query –Access metadata ("headers") for candidate datasets –Negotiate contract for virtual data generation This is a web/database type operation Data access (getData; acref URL) –Retrieve selected datasets (URL-based) –May be archival data, or virtual data computed on the fly –In general dataset may be computed, like a CGI web page This is numerical/scientific computing type operation Interface –RESTful; only parameter based currently available –Syntax-based query (ADQL/SQL) will be added as option –SOAP will be added but RESTful interface will be retained

5 NVO Summer School, September 20065 Simple Cone Search Summary –Simplest possible access to astronomical catalogs –By far the most widely implemented VO data service –Prototypical DAL service Query Parameters –RA, DEC Position on the sky (J2000, DDEG) –SR Search radius (DDEG) –VERB Verbosity (levels 1-3, optional) Query Response –VOTable UCDs describe columns

6 NVO Summer School, September 20066 Simple Image Access (SIA V1.0) Summary –Uniform access to 2+ dimensional images Basically 2-D, but data model and interface are more general –Same service profile as Cone, but adds getData The query is now used for data discovery instead of data access as for Cone; data access is a separate operation –Prototype for 2 nd generation DAL interfaces Data models, multiple output formats, virtual data generation, etc.

7 NVO Summer School, September 20067 SIA Concepts Types of Services –AtlasPrecomputed survey image (entire image) –PointedImage from pointed observation (entire image) –CutoutCutout existing image (pixels unchanged) –MosaicReprojected image (pixels resampled) Virtual Data –Data model mediation –Subsetting, filtering, transformation, etc. on the fly –Possible to view same data in different ways SIA data model is the familiar "astronomical image" –Generally this means a 2D sky projection, but cubes too –Data array is logically a regular grid of pixels –Encoded as a FITS image, GIF/JPEG, etc.

8 NVO Summer School, September 20068 SIA Input Parameters Required parameters –POScenter of ROI (ra, dec decimal degrees ICRS) –SIZEwidth; or width, height –FORMAT ALL, GRAPHIC, image/fits, image/jpeg, text/html,… FORMAT=metadata returns service metadata Optional parameters –INTERSECTvalues: covers, enclosed, center, overlaps –VERBtable verbosity Service-defined parameters –used to further refine queries, but not yet standardized e.g., BAND, SURVEY, etc. Image generation parameters –NAXIS, CFRAME, EQUINOX, CRPIX, CRVAL, CDELT, ROTANG, PROJ used for cutout/mosaic services to specify image to be generated

9 NVO Summer School, September 20069 SIA Query Response Output is a VOTable –Must contain a RESOURCE element with tag="results", containing the results of the query. The results resource contains a single table –Each row of the table describes a single data object which can be retrieved. The fields of the table describe the attributes of the dataset –These are the attributes of the SIA data model –In SIA 1.0, the UCD is used to identify the data model attribute e.g., POS_EQ_RA_MAIN, VOX:Image_Scale, etc.

10 NVO Summer School, September 200610 SIA Query Response Image metadata –Describes the image object (required) Coordinate system metadata –Image WCS Spectral bandpass metadata –Prototype data model describing spectral bandpass of image Processing metadata –Tells whether the service modified the image data Access metadata –Tells client how to access the dataset (required) Resource-specific metadata –Additional optional service-defined metadata describing image

11 NVO Summer School, September 200611 SIA Image Metadata (UCDs) VOX:Image_TitleBrief description of image POS_EQ_RA_MAINRa (ICRS) POS_EQ_DEC_MAINDec (ICRS) INST_ID Instrument name VOX:Image_MJDateObsMJD of observation VOX:Image_Naxes Number of image axes VOX:Image_Naxis Length of each axis VOX:Image_Scale Image scale, deg/pix VOX:Image_Format Image file format

13 NVO Summer School, September 200613 Image Retrieval Retrieval is optional –Typically only a fraction of the available images are retrieved Based on query response –If an access reference is provided, the data can be retrieved –SIAP can also be used to describe data which is not online –The same data may be available in multiple formats Image retrieval –Very simple; access reference is a URL –Standard tools can be used to fetch the data (browser, wget, curl, i/o library, etc.) –Data is often computed on-the-fly –All retrieval is synchronous (currently) –No provision for restricting access (currently)

14 NVO Summer School, September 200614 Simple Spectral Access (SSA) Summary –Uniform access to 1-D spectra Can also handle spectral aggregates via association Support for SEDs and time series will be added –First of the 2nd generation DAL interfaces Basic approach does not change (queryData, getData) Query interface and metadata are generalized SIA upgrade (etc.) will share the same basic interface –Includes a standard data model for spectral datasets Needed, as there is no standard way to represent spectra Standard serializations are defined (VOTable, FITS, etc.) Returned data is typically generated on the fly External stored spectra may be in any form

15 NVO Summer School, September 200615 SSA Interface Overview Service Operations –queryData Discovery query –(getData) URL-based currently, as for SIA –(stageData) Reserved; used to asynchronously stage data –getCapabilitiesQuery service metadata and capabilities Complexity –Basic usage is quite simple queryData; examine VOTable fetch data by access reference URL –Basic Spectrum object general metadata ("header") spectral coordinate vector flux vector optional error vector –Formats VOTable, FITS, XML, etc.; user or service choice

16 NVO Summer School, September 200616 SSA Query Interface Mandatory query parameters –POSX, Y, [FRAME (ICRS)] –SIZEdiameter (decimal degrees) –BANDspectral region (1-2 num or name) –TIME date1/date2 (ISO8601) –FORMATVOTable, FITS, XML, text, graphics, html, native

17 NVO Summer School, September 200617 SSA Query Interface Optional query parameters –specres minimum spectral resolution (L/dL) –spatres minimum spatial resolution (DDEG) –timeres minimum time resolution (seconds) –SNR minimum SNR –redshift redshift interval (1-2 decimal values) –targetnametarget name, e.g., "mars" –targetclass target class, e.g., star, QSO, AGN, etc.

18 NVO Summer School, September 200618 SSA Query Interface Optional query parameters –pubDID publisherID string –creatorDID creatorID string –collection collection ID (shortName, minimum match) –top max top-ranked entries to be returned –token continuation token for multipage querys –maxrec maximum records in query response –mtime create/modify time in given range (ISO8601) –runid passed on to any other services –compress enable compression

19 NVO Summer School, September 200619 SSA Query Response Classes of Query Metadata –Query Describes the query itself –Association Logical associations (aggregation) –Access Access metadata for data retrieval –Dataset General dataset metadata (type etc.) –DataID Dataset identification - what is it –Curation How data is published and made available –Target Astronomical target observed, if any –Derived Derived quantities (SNR, redshift, etc.) –Char.Coverage Coverage of spatial, spectral, time axes –Char.Accuracy Calibration, resolution, sampling, errors –CoordSys Coordinate system reference frames (STC)

20 NVO Summer School, September 200620 SSA Query Response Query Metadata –Query.Score Degree of match to query params –Query.Token Step through large query response Association Metadata –Association.Type Type of association –Association.ID Instance ID linking associated records –Association.Key Unique key identifying each member Access Metadata –Access.Reference URL of data product to be retrieved –Access.ServiceDID DataID of virtual data product –Access.Format MIME type of dataset –Access.Size approximate dataset size (bytes)

21 NVO Summer School, September 200621 SSA Query Response DataID - Dataset Identification Metadata –DataID.Title One-line description of dataset (String) –DataID.Collection Collection name (shortName) –DataID.Creator Creator of dataset (String) –DataID.CreatorID Identifier for VO Creator (URI) –DataID.CreatorDID Dataset ID assigned by creator (URI) –DataID.CreatorLogoURL for Creator logo (URI) –DataID.Contributor Contributor (may be multiple instances) –DataID.Date Date last modified (ISO Date string) –DataID.Version Version of dataset instance (String) –DataID.Instrument Instrument description (String) –DataID.Bandpass Spectral bandpass, e.g., filter (String) –DataID.DataSource Original source of data (String) –DataID.CreationType How was dataset created (String)

22 NVO Summer School, September 200622 Some SSA Concepts DataSource –survey, pointed, theory, artificial CreationType –native, archival, cutout, filtered, mosaic, projection, spectral extraction, catalog extraction, etc. Provenance –Where did this data come from? especially important for virtual data generated by service –DataID (Collection, CreatorDID, etc.) refers to original data –Curation (PublisherDID etc.) refer to data from service –CreationType indicates how the data was derived

23 NVO Summer School, September 200623 Some SSA Concepts Associations –Use association metadata to link related records (datasets) –An association is a complex dataset Data Models –Data models formalize the content of data or metadata –Container/component architecture Component data models aggregated in a container and associated logically (similar to a relational database) –Dataset, Spectrum, Characterization, STC, etc. Characterization –Physically characterize the data Spatial, spectral, and temporal axes Coverage, sampling, resolution, accuracy –Applies to any dataset (not specific to spectra)

25 NVO Summer School, September 200625 SIA Upgrade Preview (SIA V2.0) Main objectives –Upgrade metadata, query interface as for SSA standard generic dataset metadata more powerful query interface more comprehensive output metadat –Precision image data access enhancements e.g., cube data, image slicing, projection, filtering (TBD whether this is folded into basic SIA or done as a separate service class) –Advanced service capabilities versioning, metadata query asynchronous data staging, authentication, VOStore integration

26 NVO Summer School, September 200626 Cube Data Overview –Motivated primarily by radio data surveys (CGPS, Arecibo) –Many O/IR integral field unit (IFU) instruments coming online as well –Challenge: datasets can be both large and complex Large datasets –Current data cubes are several hundred MB up to several GB –Future wide-field wide-band: 2048x2048x8192x4 = 128 GB –With polarization, multiple bands, could have 1/2 TB datasets! Complex datasets –e.g., CGPS: HI cube, CO cube, continuum, IQUV, IRAS same field –Multiple ways to view the same data –Multi-band surveys are a simpler example of this trend Use-Cases for recent study –CGPS, SGPS, GALFA (Arecibo), SINFONI (ESO IFU)

28 NVO Summer School, September 200628 Cube Data Data access considerations –Network download of large cubes can be impractical –VO-style virtual data access to remote data is required subsetting, filtering (spectral or time regions), transformations (projections, spectrum extraction) –Strategy: iteratively download data subset, visualize locally Typical access modes –Whole image –Spectrum extraction –Cutout 2D planes –Cutout 3D sub-cube (permits local full 3D analysis) –2D projection along one axis –3D projection (general 3D transformation) –2D slice through 3D cube at arbitrary 3D pos,orientation

29 NVO Summer School, September 200629 Cube Data Typical access scenario –Discovery query to discover data, get access metadata –Access query to set up virtual data access (WCS based) –Data access, dynamically generating virtual data –Repeat for a different region or view Example: Compute 2D projection with spectral filtering –View 2D preview or projection, e.g., continuum –Extract 1D spectra in sky regions (SSA with synthetic aperture) –Analyze sky spectrum to determine night sky lines (SLAP) –Compute 2D projection of cube excluding sky emission, absorption Other examples –Extract 3D sub-cube for full 3D analysis locally –2D slice at arbitrary position and orientation

30 NVO Summer School, September 200630 Cube Examples Extract 2-D plane from cube, same orientation –queryData PubDID= POS= SIZE= –(cutout of smaller region also possible here) BAND= NAXES=2 FORMAT=FITS

31 NVO Summer School, September 200631 Cube Examples 2-D Projection with spectral filtering –queryData PubDID= POS= SIZE= –(cutout of smaller region also possible here) BAND= NAXES=2 FORMAT=FITS (in SINFONI case original cube is in Euro-3D format)

32 NVO Summer School, September 200632 Cube Examples Extract 3-D Sub-Cube –queryData PubDID= POS= SIZE= BAND=3.45E-7/8.76E-6 NAXES=3 FORMAT=FITS

33 NVO Summer School, September 200633 Implementing DAL Services Overall Process –Determine what subclass of service to implement do we return whole files, cutouts, extract spectra, etc.? –Select service technology Java, dotNet/Mono, Ruby, etc. –Implement Reference code or a template would be useful here –Test Service verification tools –Register As soon as you do this you are online!

34 NVO Summer School, September 200634 Cone Search queryData operation –SQL select operation on a RDBMS –Transform output into VOTable format a VOTable package can be useful here Issues –May need to assign UCDs to your catalog fields

35 NVO Summer School, September 200635 Simple Image Access queryData operation –Select operation on a RDBMS –Compute SIA query response metadata –Transform output into VOTable format Issues –Computing the SIA query response metadata can be nontrivial e.g., for a cutout or mosaic don't forget you should return WCS information –Metadata generation This is much easier if image metadata is cached in DBMS For virtual data must compose access reference command

36 NVO Summer School, September 200636 Simple Image Access (contd) getData operation –Atlas, Pointed only input is an access URL pointing to the file return FITS file –Cutout, Mosaic access URL is the command which generates the virtual data may require significant, complex computation! getCapabilities –For SIA V1.0 this is FORMAT=metadata –Tells client service capabilities and any optional parameters

37 NVO Summer School, September 200637 Implementing DAL Services Web Service Frameworks –LAMP - Linux, Apache, MySQL, Python/Perl/PHP etc. Apache Web server, Tomcat, Java servlets –dotNET/Mono Microsoft approach; SQL server, C# –Ruby on Rails Trendy new alternative Virtual Data Generation –Backend may require significant computation Re-use some science package (IRAF, IDL, AIPS, CASA, etc.) Or at least CFITSIO, WCSTOOLS, and other libraries

