NVO Summer School, Aspen 9-Sep Data Access Layer Doug Tody (NRAO) US N ATIONAL V IRTUAL O BSERVATORY
NVO Summer School, Aspen 9-Sep Data Access Layer What does it do? –Provides access to data data discovery mediation to a standard model data retrieval on-demand data generation server-side computation (subsetting, filtering) What is it for? –Supports client data analysis distributed, multiwavelength How does it work? –Object (dataset) oriented catalog, image, spectrum, time series, SED, etc. –Services cone search (also SkyNode), SIA, SSA
NVO Summer School, Aspen 9-Sep Cone Search
NVO Summer School, Aspen 9-Sep Cone Search Provides basic catalog access –Query by position and aperture (cone in space) –Query consists of base-URL (service endpoint) plus parameters e.g., %RA=12.0&DEC=0.0&SR=1.0http://base-url –Catalog returned as a VOTable Advantages –Simple but powerful, provides standard interface –Easy to implement and use Limitations –Catalog metadata is not defined –No data model support Future –Supplanted by basic SkyNode (Greene, Saturday) –Supports metadata discovery, SQL-like syntactical queries –We will continue to support the basic cone search query however!
NVO Summer School, Aspen 9-Sep Simple Image Access
NVO Summer School, Aspen 9-Sep Simple Image Access (SIA) Basic Usage, Highest Level –Client queries Registry to find interesting services –Each service is queried (in turn or simultaneously) for data –Client collates and analyzes results –Selected datasets are retrieved
NVO Summer School, Aspen 9-Sep Simple Image Access (SIA) Basic Usage, Single Service –Query find data of interest from a single service %POS=12.0,0.0&SIZE=0.2&FORMAT=image/fitshttp://base-url –Query response VOTable, one row per candidate dataset "access reference" (a URL) points to data –Data selection Performed by the client using query response metadata –Dataset retrieval Retrieve actual datasets, if any
NVO Summer School, Aspen 9-Sep Service Capabilities Types of Services –AtlasPrecomputed survey image (entire image) –PointedImage from pointed observation (entire image) –CutoutCutout existing image (pixels unchanged) –MosaicReprojected image (pixels resampled) Virtual Data –Data model mediation –Subsetting, filtering, etc. on the fly –Possible to view same data in different ways Interface –RESTful interface currently (HTTP GET) –Document oriented (VOTable, FITS, JPEG, etc.)
NVO Summer School, Aspen 9-Sep Data Model SIA data model is the familiar "astronomical image" –Generally this means a 2D sky projection –Data array is logically a regular grid of pixels –Encoded as a FITS image, GIF/JPEG, etc. Standardized dataset metadata –Provenance –Image geometry –Scale –Format –Position, WCS –Time of observation –Spectral bandpass –Access information
NVO Summer School, Aspen 9-Sep Input Parameters Required parameters –POScenter of ROI (ra, dec decimal degrees ICRS) –SIZEwidth; or width, height –FORMAT ALL, GRAPHIC, image/fits, image/jpeg, text/html,… Optional parameters –INTERSECTvalues: covers, enclosed, center, overlaps –VERBtable verbosity Service-defined parameters –used to further refine queries, but not yet standardized e.g., BAND, SURVEY, etc. Image generation parameters –NAXIS, CFRAME, EQUINOX, CRPIX, CRVAL, CDELT, ROTANG, PROJ used for cutout/mosaic services to specify image to be generated
NVO Summer School, Aspen 9-Sep Query Response Output is a VOTable –Must contain a RESOURCE element with tag="results", containing the results of the query. The results resource contains a single table –Each row of the table describes a single data object which can be retrieved. The fields of the table describe the attributes of the dataset –These are the attributes of the SIA data model –In SIA 1.0, the UCD is used to identify the data model attribute e.g., POS_EQ_RA_MAIN, VOX:Image_Scale, etc.
NVO Summer School, Aspen 9-Sep Query Response Image metadata –Describes the image object (required) Coordinate system metadata –Image WCS Spectral bandpass metadata –Prototype data model describing spectral bandpass of image Processing metadata –Tells whether the service modified the image data Access metadata –Tells client how to access the dataset (required) Resource-specific metadata –Additional optional service-defined metadata describing image
NVO Summer School, Aspen 9-Sep Image Metadata VOX:Image_TitleBrief description of image POS_EQ_RA_MAINRa (ICRS) POS_EQ_DEC_MAINDec (ICRS) INST_ID Instrument name VOX:Image_MJDateObsMJD of observation VOX:Image_Naxes Number of image axes VOX:Image_Naxis Length of each axis VOX:Image_Scale Image scale, deg/pix VOX:Image_Format Image file format
NVO Summer School, Aspen 9-Sep
NVO Summer School, Aspen 9-Sep Image Retrieval Completely optional –Typically only a fraction of the available images are retrieved Query response –If an access reference is provided, the data can be retrieved –SIAP can also be used to describe data which is not online –The same data may be available in multiple formats Image retrieval –Very simple; access reference is a URL –Standard tools can be used to fetch the data (browser, wget, curl, i/o library, etc.) –Data is often computed on-the-fly –All retrieval is synchronous (currently) –No provision for restricting access (currently)
NVO Summer School, Aspen 9-Sep Service Registration
NVO Summer School, Aspen 9-Sep Future Development SIA V1.1 –Based on work done on SSA –Expanded query interface no longer limited to positional queries –Much richer query response generic dataset identification, characterization, etc. metadata extension mechanism –Selected features VOTable 1.1 with UCD 1+, GROUP, UTYPE query response can be ordered by "score" logical groupings of related query records compression support –Versioning required to make protocol upgrades manageable
NVO Summer School, Aspen 9-Sep
NVO Summer School, Aspen 9-Sep
NVO Summer School, Aspen 9-Sep Future Development Service verification –for testing at development time –when registered; level of compliance metric Grid capabilities –Data staging asynchronous image generation (long running jobs) batch generation of images (multiple images) –Data management support for single sign-on authentication, authorization network data caching, third party delivery (VOStore etc.) –Web service interface resource metadata service availability (etc.) ADQL integration –Capability to use query language for queries
NVO Summer School, Aspen 9-Sep Simple Spectral Access
NVO Summer School, Aspen 9-Sep Simple Spectral Access (SSA) What is it? –Provides access to 1D spectra, time series, SEDs –Tabular spectrophotometric data (photometry points) –Represents second generation, data model-based DAL interfaces Status –Draft V0.9 query interface reviewed in Kyoto (May 05) –Revisions in progress; draft PR targeted for Madrid (Oct 05) –Much work on data models however still being revised –Some initial prototypes already exist (services, client apps) IVOA/Madrid discussions will be held immediately after the ADASS and are open to all
NVO Summer School, Aspen 9-Sep Basic Usage SSA specification may be complex, but basic usage is simple Simple query –POS, SIZE, FORMAT - like cone search, SIA –Possibly refined by spectral or time bandpass, etc. –Most metadata in query response is optional Data retrieval –Simple retrieval is again URL-based –Get back a dataset "document" (VOTable, FITS, JPEG, etc.) –In simplest case could be wavelength, flux as text (for Spectrum) –Pass-through of external data is permitted Data Analysis –Standard data model isolates application from quirks of –external project data
NVO Summer School, Aspen 9-Sep Concepts - Dataset-oriented Data object type –Spectrum, TimeSeries, SED Dataset creation type –AtlasWhole datasets, uniform survey data –Pointed Whole datasets, variable instrumental data –Cutout Subset, data samples are not modified –Resampled Subset, data samples computed by service Dataset derivation –Observed An observation –Composite Combination of several observations –Simulated Simulated observation made from real data –Synthetic Data from a theoretical model
NVO Summer School, Aspen 9-Sep Data Models Data models used in SSA –Spectral dataSpectrum, TimeSeries, SED –Dataset Generic dataset descriptor –Target Astronomical target observed –Curation Origin of data –CharacterizationPhysical characteristics of data –Provenance Instrument which generated the data User defined data models –Metadata extension mechanisms additional data model attributes (table fields) additional resources in VOTable, linked back to main table –Provide a mechanism to "subclass" dataset to tailor it for a given data collection
NVO Summer School, Aspen 9-Sep Spectral Data (SED) spectrum segment Photometry point
NVO Summer School, Aspen 9-Sep Spectral/SED Data Model
NVO Summer School, Aspen 9-Sep
NVO Summer School, Aspen 9-Sep Query Interface Mandatory query parameters –POSRA, DEC (ICRS) –SIZEdiameter (decimal degrees) –TIME data1,date2 (epoch in decimal years UTC) –BANDwave1,wave2 (meters in vacuum; source or observer) –FORMATVOTable, fits, xml, text, graphics, html, external
NVO Summer School, Aspen 9-Sep Query Interface Recommended query parameters –APERTURE approx spatial resolution (decimal degrees) –SPECRES spectral resolution (meters) –TOP number of top-ranked records to return –OBJTYPEmandatory if service returns multiple object types –COLLECTIONdata collection identifier
NVO Summer School, Aspen 9-Sep Query Interface Optional parameters –CREATORID creator-assigned dataset identifier (at most 1) –PUBIDpublisher-assigned dataset identifier (at most N) –COMPRESSenable compression (for both data _and_ queries?) –SNR signal-to-noise ratio –REDSHIFT redshift range (dlambda/lambda) –TARGETCLASSstar, galaxy, pulsar, PN, QSO, AGN, etc.
NVO Summer School, Aspen 9-Sep Query Response Classes of query metadata –Query metadataDescribes the query itself –Dataset metadataDescribes data object; object-specific –Target metadata Astronomical target –Curation metadata External identification of dataset –Characterization Coverage, Accuracy, Frame, etc. –Instrument metadata Service-defined; hard to standardize –Access metadata Describes how to access the dataset
NVO Summer School, Aspen 9-Sep Query Response Query Metadata –Query.ScoreHow well object matches query –Query.LNameLogical name (identifier) –Query.LNameKeyLogical name key (id-ref) Example: LName="MyObj123" LNameKey="server,format"
NVO Summer School, Aspen 9-Sep Query Response Dataset Metadata –Dataset.Type Spectrum, TimeSeries, SED, etc. –Dataset.DataModel DM name, e.g., "SSA-V0.90" –Dataset.Title Brief descriptive title of dataset –Dataset.SSA.NSamples Total samples in dataset Dataset.SSA.Aperture Characteristic aperture diameter –Dataset.SSA.TimeAxis TimeCoord axis (external data) –.SSA.SpectralAxis SpectralCoord axis (external data) –Dataset.SSA.FluxAxis Flux axis (external data) –Dataset.CreationType atlas, pointed, cutout, resampled –Dataset.Derivation observed, composite, simulated, synthetic
NVO Summer School, Aspen 9-Sep Query Response Target Metadata –Target.NameName of astronomical object –Target.Class Target class (star, galaxy, QSO, etc.) –Target.SpectralClassSpectral class (e.g., 'O', 'B', etc.) –Target.Redshift Nominal redshift for object –Derived.VarAmpl Variability amplitude (fraction 0-1) –Derived.SNR Observed signal to noise ratio
NVO Summer School, Aspen 9-Sep Query Response Curation Metadata –Curation.CollectionData collection name (identifier) –Curation.Creator Creator identify (identifier) –Curation.CreatorID Creator-assigned dataset identifier –Curation.PublisherID Publisher-assigned dataset identifier –Curation.Date Dataset creation date (ISO date string) –Curation.Version Dataset version (within same ID)
NVO Summer School, Aspen 9-Sep Query Response Characterization1 - Coverage –.Location.SpatialPosition (e.g., RA, DEC) –.Location.Time Observation time characteristic value –.Location.Spectral Spectral bandpass characteristic value –.Location.Spectral.BandID Bandpass ID (band or filter name) –.Bounds.Spatial Aperture footprint (polygon on sky) –.Bounds.Time Low/High time values –.Bounds.Spectral Low/High spectral values –.Bounds.Flux Limiting flux, saturation limit (Jansky) –.Fill.Spatial Spatial sampling filling factor (0-1) –.Fill.Time Time sampling filling factor (0-1) –.Fill.Spectral Spectral sampling filling factor (0-1)
NVO Summer School, Aspen 9-Sep Query Response Characterization2 - Accuracy –Accuracy.*.Calibrateduncalibrated, relative, absolute –Accuracy.*.Resolution Resolution of measured signal –Accuracy.*.StatErr Statistical error (measured) –Accuracy.*.SysErr Systematic error (estimated) ('*' = Spatial, Time, Spectral, Flux)
NVO Summer School, Aspen 9-Sep Query Response Characterization3 - Reference Frames –Frame.Spatial.TypeCoordinate frame (default ICRS) –Frame.Spatial.Equinox Coordinate system equinox (J2000) –Frame.Time.System Timescale (TT) –Frame.Time.SIDim SI factor and dimension –Frame.Spectral.SIDim SI factor and dimension –Frame.Flux.SIDim SI factor and dimension –Frame.Flux.UCD UCD of flux value (flux type) (These apply only to the query response) (SIDim metadata still under construction)
NVO Summer School, Aspen 9-Sep Query Response Instrument Metadata –Instrument.NameInstrument name (identifier) –Instrument.Exposure Total exposure time (seconds) –Instrument. Service-defined Notes –Optional; provided for instrumental data collections –In general, Collection, Bounds.Time, etc. are preferred –In general Instrument metadata is service-defined –Use Observation model as a starting point
NVO Summer School, Aspen 9-Sep Query Response Access Metadata –Access.ReferenceData access URL –Access.Format MIME type of returned dataset –Access.Size Approximate dataset size (bytes) –Access.Server Server endpoint URL Staging support goes here in the future –e.g., will dataset access require asynchronous staging –estimated cost to construct dataset
NVO Summer School, Aspen 9-Sep Service Metadata Usage –Describe service type and capabilities –Characterize service (data resources served, coverage, etc.) –Describe interface (optional query parameters) Interface –Requires new service metadata query method –Returns resource metadata descriptor (XML) Format –Registry resource descriptor (XML)
NVO Summer School, Aspen 9-Sep Data Retrieval Based on GET as with SIA –Variety of formats available –Compression supported Data representation –Data model defines logical content of data –The same data object may be represented in various formats –Hence we need to specify both the data model, and the file format
NVO Summer School, Aspen 9-Sep Data Retrieval Data models –SSA data model for fully-compliant data –Provider-defined data model for external data Data formats –VOTable (a container), native XML (direct serialization) –FITS binary table (another container; uses FITS spectral WCS) –Text, e.g., CSV –Graphics (JPEG etc.) –text/html (rendered into browser page)