Stephen Gwyn Canadian Astronomy Data Centre Aggregating Metadata from Multiple Archives: a Non-VO Approach Stephen Gwyn Canadian Astronomy Data Centre.

Slides:



Advertisements
Similar presentations
Data Quality and Related Issues – A Discussion Dave De Young NOAO.
Advertisements

A VO experience: Lessons learnt from accessing SIA compliant image servers with the AVO prototype F.Bonnarel on behalf of M.Allen, T.Boch, M.Dolensky,
Viewing and Features ShowSky - a Jini aware Applet/API astronomical archive discovery tool Object Design and Implementation Guide Star Catalog-II Jini.
VO/IVOA and The Astronomy Community Dave De Young NOAO.
Discovery and Exploration in the VO Chris Miller NOAO/CTIO La Serena, Chile T HE US N ATIONAL V IRTUAL O BSERVATORY.
NVO Summer School, Santa Fe Sept Access to Spectroscopic Data In the VO Doug Tody (NRAO/US-NVO ) I NTERNATIONAL V IRTUAL O BSERVATORY A LLIANCE.
The NVO Data Discovery Portal Tom McGlynn NASA/GSFC.
September 13, 2004NVO Summer School1 Exploring VO Registries, Resources and Software with The NVO DataScope and Other VO Tools Tom McGlynn NASA/GSFC T.
Sept NVO Summer School1 The NVO DataScope: Internals Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY.
What does LOFAR have to do with the Virtual Observatory (VO)? LOFAR Science Day 16 December 2003 Melbourne David Barnes The University of Melbourne.
Relevance Feedback A relevance feedback mechanism for content- based image retrieval G. Ciocca, R. Schettini 1999.
Overview of Current and Forthcoming GALEX Search Capabilities and Data Products Current Search Options New GALEX Fluxes gPhoton.
CASDA Virtual Observatory CSIRO ASTRONOMY AND SPACE SCIENCE Arkadi Kosmynin 11 March 2014.
MAST-VizieR/NED cross correlation tutorial 1. Introduction Figure 1: Screenshot of the MAST VizieR Catalog Search Form. or enter here as object class:
MOST - Moving Object Search Tool for NEOWISE and IRSA Kevin Yau 6/11/2010.
Megacam Imaging of the Extended Groth Strip M. Ashby, B. McLeod, J.-S. Huang, P. Barmby, G. Fazio, C. Papovich, S. Willner.
Canadian Virtual Observatory Project David Schade Canadian Astronomy Data Centre Herzberg Institute for Astrophysics National Research Council Canada.
NOAO/Gemini Data workshop – Tucson,  Hosted by CADC in Victoria, Canada.  Released September 2004  Gemini North data from May 2000  Gemini.
Telescopes in the world Lison Malo Student CRAQ workshop.
The Web 2.0 and the NOAO NVO Portal Christopher J. Miller Data Products Program CTIO/NOAO.
David Schade Canadian Astronomy Data Centre Data Services Curation Practice and User Metrics.
TMT Instrumentation Canadian capabilities and interests David Loop, NRC-HIA June 26, 2010.
Hubble Legacy Archive Lee Quick - TIPS meeting July 19, 2012 Goals Data History Current Work Demo.
Digitized Sky Survey Update Brian McLean : Archive Sciences Branch / Operations and Engineering Division.
Data Management Subsystem: Data Processing, Calibration and Archive Systems for JWST with implications for HST Gretchen Greene & Perry Greenfield.
Dec 2, 2014 MAST Data Discovery Portal Tom Donaldson Tony Rogers.
DateADASS How to Navigate VO Datasets Using VO Protocols Ray Plante (NCSA/UIUC), Thomas McGlynn and Eric Winter NASA/GSFC T HE US N ATIONAL V IRTUAL.
Virtual Ice Charting System Archive Browser Interface Distribution IngestProduction Ice Analyst Application Database Click on the boxes for more information.
Spectroscopy in VO, ESAC Mar Access to Spectroscopic Data In the VO Doug Tody (NRAO/US-NVO ) for the IVOA DAL working group I NTERNATIONAL.
How to Adapt existing Archives to VO: the ISO and XMM-Newton cases Research and Scientific Support Department Science Operations.
26 October 2005HST Calibration Workshop1 The National Virtual Observatory and HST T HE US N ATIONAL V IRTUAL O BSERVATORY Robert Hanisch US National Virtual.
Using the NSA Presentation to NOAO Users Committee October 5, 2005.
European New HST & MMI Demo Nacho León María Arévalo Jonas Haase Jesús Salgado Deborah Baines Bruno Merín ESAC 20 January 2015.
P Structured Query Language for Virtual Observatory Yuji Shirasaki National Astronomical Observatory of Japan, and Masahiro Tanaka (NAOJ), Satoshi.
AstroGrid: The UK’s Virtual Observatory Dr Dugan Witherick – Astrophysics Group, UCL Wednesday 5 th December 2007 The University of Warwick.
Making Press Release Astronomical Images NVO Compliant Lisa M. Frattare Space Telescope Science Institute Communicating Astronomy with the Public June.
Common Archive Observation Model (CAOM) What is it and why does JWST care?
SPACE TELESCOPE SCIENCE INSTITUTE Operated for NASA by AURA WFC3 and StarView
Crossmatch: the NRAO Cross-identifcation Service Jared Crossley Data Archive Access Meeting - Socorro, NM - 5 April 2010.
Nov Common Archive Observation Model What is CAOM and why should MAST use it? Brian McLean.
Progress Report of VOQL WG May 15 (Thu) Masatoshi Ohishi (Japan)
The International Virtual Observatory Alliance (IVOA) interoperability in action.
Workshop on How to Publish Data in VO ESAC, June 25-June DAL (Data Access Layer) protocols Jesus Salgado
NEON School NEON Archive Observing School Alberto Micol ESA Space Telescope Operations Division 15 July 04 ESO & HST Archives.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
Virtual Observatories, Press Release Images, and Web Services Dr. Frank Summers Space Telescope Science Institute November 3, 2005.
1 Databases A database is a collection of related pieces of information A database can be computerised or non- computerised (manual - such as filing system,
F. Genova, VO as a Data Grid, 2003/06/301 Interoperability of astronomy data bases Françoise Genova, CDS.
The Large Synoptic Survey Telescope Project Bob Mann Wide-Field Astronomy Unit University of Edinburgh.
April , 2006 HEASARC Users Group Tom McGlynn The HEASARC On-line Services Tom McGlynn.
Gamma-Ray Bursts with the ANTARES neutrino telescope S. Escoffier CNRS/CPPM, Marseille.
10.1Retrieving a Database File – In the last chapter, we have created a database file and several tables for that database. – In order to view and modify.
Current Status of Users Meeting F URUSHO, Reiko (Astronomy Data Center, NAOJ) Topics: 1. Minor Bodies Search is Opened. 2. Number.
Publishing Combined Image & Spectral Data Packages Introduction to MEx M. Sierra, J.-C. Malapert, B. Rino VO ESO - Garching Virtual Observatory Info-Workshop.
Faculty meeting - 13 Dec 2006 The Hubble Legacy Archive Harald Kuntschner & ST-ECF staff 13 December 2006.
A Search for High Redshift Galaxies behind Gravitationally Lensing Clusters Kazuaki Ota (Kyoto U) Johan Richard (Obs.Lyon), Masanori Iye (NAOJ), Takatoshi.
BROADCAST MASTER APPLICATION OVERVIEW. Overview Broadcast Master is a complete Channel Management and Ad Sales system providing the ability to manage.
HUBBLE LEGACY ARCHIVE STSCI Astronomical Data Tagging Web 2.0 meets Astronomy in the HLA Niall I. Gaffney, W. Warren Miller (STScI)
Rachel Anderson Laura Parker William Harris Department of Physics & Astronomy, McMaster University Hamilton, Ontario, L8S-4M1, Canada Searching for Galaxy.
Data Centres in the Virtual Observatory Age
Components of the night sky
Planning Observations
Future Challenges Today’s burning questions in EGPN research, and the strategies, tools & techniques needed to answer them Outline: Overview (10 min) Open.
COMPASS Database SPACE TELESCOPE SCIENCE INSTITUTE Gretchen Greene
Progress Report of VOQL WG
Virtual Ice Charting System
Observing with Modern Observatories (the data flow)
Data Mining (Don’t worry, I am not presenting these slides; just for your reading pleasure)
Google Sky.
Boyce Astro: Online Catalogs BRIEF Boyce Astro:
Presentation transcript:

Stephen Gwyn Canadian Astronomy Data Centre Aggregating Metadata from Multiple Archives: a Non-VO Approach Stephen Gwyn Canadian Astronomy Data Centre CADC

Stephen Gwyn Canadian Astronomy Data Centre - Astronomy is using more and more archival data - More than 50% of HST papers are archival - Similar trends for other telescopes - Harder for solar system astronomy SSOIS: S olar S ystem O bject I mage S earch allows users to search for images of moving targets

Stephen Gwyn Canadian Astronomy Data Centre SSOIS: S olar S ystem O bject I mage S earch allows users to search for images of moving targets

Stephen Gwyn Canadian Astronomy Data Centre SSOIS: S olar S ystem O bject I mage S earch allows users to search for images of moving targets

Stephen Gwyn Canadian Astronomy Data Centre CFHT Initally, only data from CFHT/MegaCam was searched

Stephen Gwyn Canadian Astronomy Data Centre NEAT CFHT Subaru ESO Gemini AAT SDSS NOAO ING HST WISE Next added data from external telescope archives

Stephen Gwyn Canadian Astronomy Data Centre CADC Next added data from external telescope archives

Stephen Gwyn Canadian Astronomy Data Centre For each image, we need: - position (RA,Dec) - Field of view - MJD of mid-exposure - filter - exposure time - target name - URL to data Scraping external archives:

Stephen Gwyn Canadian Astronomy Data Centre For each image, we need: - position (RA,Dec) - Field of view - MJD of mid-exposure - filter - exposure time - target name - URL to data Scraping external archives:

There are a variety of data archive interfaces....

Stephen Gwyn Canadian Astronomy Data Centre - In an ideal world: one query to get all metadata - In real life: row limits - As the archives are updated, they need to be re-scraped periodically - Programmatic retrieval is required Scraping external archives:

Stephen Gwyn Canadian Astronomy Data Centre Advantages: - A single tool can scrape multiple archives Disadvantages: - Not all archives have an SIAP interface - Many SIAP services do not conform to the VO standard - Not all SIAP services contain all the necessary metadata - Most archives have at least 1 heavily observed patch of sky: hit the row limit again - SIAP services vary in ability for positional queries - maximum search area - search is circle or box - may require 10 5 queries: may be perceived as DOS attack Far better off scraping by day/night/MJD - Almost all telescopes take <10000 observations per 24 hours: - Can re-scrape with fewer queries Use SIAP?

Stephen Gwyn Canadian Astronomy Data Centre Scraping by RA/Dec

Stephen Gwyn Canadian Astronomy Data Centre Scraping by Date

Stephen Gwyn Canadian Astronomy Data Centre Older archive interfaces: - Query page + simple CGI result page - view source on the query page - get form inputs - issue repeated queries to CGI result page using GET or POST with wget/curl/scripting API - Easy

Stephen Gwyn Canadian Astronomy Data Centre Newer archive interfaces: - AJAX/HTML5/etc page - Download Javascript and run through de-obfuscator - locate relevant XMLHttpRequest - determine if cookies are necessary - issue repeated queries to XMLHttpRequest URLs - Much harder

Stephen Gwyn Canadian Astronomy Data Centre Easiest of all...

Stephen Gwyn Canadian Astronomy Data Centre A script to get all Subaru/SuprimeCam metadata... #!/bin/bash wget wget wget wget wget wget wget wget wget wget wget wget wget wget wget wget

Stephen Gwyn Canadian Astronomy Data Centre The second easiest: CADC's Advanced Search

Stephen Gwyn Canadian Astronomy Data Centre The second easiest: CADC's Advanced Search

Stephen Gwyn Canadian Astronomy Data Centre The second easiest: CADC's Advanced Search

Stephen Gwyn Canadian Astronomy Data Centre The second easiest: CADC's Advanced Search cnrc.gc.ca/tap/sync?LANG=ADQL&REQUEST=doQuery&QUERY=SELECT%20Observation.observationURI%20AS%20%22Preview%22%2C%20Observation.coll ection%20AS%20%22Collection%22%2C%20Observation.observationID%20AS%20%22Obs.%20ID%22%2C%20COORD1(CENTROID(Plane.position_bounds))% 20AS%20%22RA%20(J2000.0)%22%2C%20COORD2(CENTROID(Plane.position_bounds))%20AS%20%22Dec.%20(J2000.0)%22%2C%20Plane.time_bounds_c val1%20AS%20%22Start%20Date%22%2C%20Observation.instrument_name%20AS%20%22Instrument%22%2C%20Plane.time_exposure%20AS%20%22Int.%2 0Time%22%2C%20Observation.target_name%20AS%20%22Target%20Name%22%2C%20Plane.energy_bandpassName%20AS%20%22Filter%22%2C%20Plan e.calibrationLevel%20AS%20%22Cal.%20Lev.%22%2C%20Observation.type%20AS%20%22Obs.%20Type%22%2C%20Plane.energy_bounds_cval1%20AS%20 %22Min.%20Wavelength%22%2C%20Plane.energy_bounds_cval2%20AS%20%22Max.%20Wavelength%22%2C%20Observation.proposal_id%20AS%20%22Pro posal%20ID%22%2C%20Observation.proposal_pi%20AS%20%22P.I.%20Name%22%2C%20Plane.productID%20AS%20%22Product%20ID%22%2C%20Plane.d ataRelease%20AS%20%22Data%20Release%22%2C%20AREA(Plane.position_bounds)%20AS%20%22Field%20of%20View%22%2C%20Plane.position_sample Size%20AS%20%22Pixel%20Scale%22%2C%20Plane.dataProductType%20AS%20%22Data%20Type%22%2C%20Plane.position_timeDependent%20AS%20%2 2Moving%20Target%22%2C%20Plane.provenance_name%20AS%20%22Provenance%20Name%22%2C%20Plane.provenance_keywords%20AS%20%22Proven ance%20Keywords%22%2C%20Observation.intent%20AS%20%22Intent%22%2C%20Observation.target_type%20AS%20%22Target%20Type%22%2C%20Obser vation.target_standard%20AS%20%22Target%20Standard%22%2C%20Plane.metaRelease%20AS%20%22Meta%20Release%22%2C%20Observation.sequence Number%20AS%20%22Sequence%20Number%22%2C%20Observation.algorithm_name%20AS%20%22Algorithm%20Name%22%2C%20Observation.proposal_ti tle%20AS%20%22Proposal%20Title%22%2C%20Observation.proposal_keywords%20AS%20%22Proposal%20Keywords%22%2C%20Observation.proposal_proje ct%20AS%20%22Proposal%20Project%22%2C%20Plane.position_bounds%20AS%20%22Polygon%22%2C%20Plane.energy_emBand%20AS%20%22Band%22 %2C%20Plane.provenance_reference%20AS%20%22Prov.%20Reference%22%2C%20Plane.provenance_version%20AS%20%22Prov.%20Version%22%2C%20 Plane.provenance_project%20AS%20%22Prov.%20Project%22%2C%20Plane.provenance_producer%20AS%20%22Prov.%20Producer%22%2C%20Plane.proven ance_runID%20AS%20%22Prov.%20Run%20ID%22%2C%20Plane.provenance_lastExecuted%20AS%20%22Prov.%20Last%20Executed%22%2C%20Plane.prov enance_inputs%20AS%20%22Prov.%20Inputs%22%2C%20Plane.energy_restwav%20AS%20%22Rest- frame%20Spectral%20Coverage%22%2C%20Plane.planeID%20AS%20%22planeID%22%2C%20isDownloadable(Plane.planeURI)%20AS%20%22DOWNLOADA BLE%22%2C%20Plane.planeURI%20AS%20%22CAOM%20Plane%20URI%22%2C%20Observation.instrument_keywords%20AS%20%22Instrument%20Keyword s%22%2C%20Plane.energy_transition_species%20AS%20%22Molecule%22%2C%20Plane.energy_transition_transition%20AS%20%22Transition%22%2C%20Pl ane.position_resolution%20AS%20%22IQ%22%20FROM%20caom2.Plane%20AS%20Plane%20JOIN%20caom2.Observation%20AS%20Observation%20ON%20 Plane.obsID%20%3D%20Observation.obsID%20WHERE%20%20(%20Observation.instrument_name%20%3D%20%27MegaPrime%27%20AND%20Observation.c ollection%20%3D%20%27CFHT%27%20)&FORMAT=tsv

Stephen Gwyn Canadian Astronomy Data Centre The second easiest: CADC's Advanced Search SELECT Observation.observationURI AS "Preview", Observation.collection AS "Collection", Observation.observationID AS "Obs. ID", COORD1(CENTROID(Plane.position_bounds)) AS "RA (J2000.0)", COORD2(CENTROID(Plane.position_bounds)) AS "Dec. (J2000.0)", Plane.time_bounds_cval1 AS "Start Date", Observation.instrument_name AS "Instrument", Plane.time_exposure AS "Int. Time", Observation.target_name AS "Target Name", Plane.energy_bandpassName AS "Filter", Plane.calibrationLevel AS "Cal. Lev.", Observation.type AS "Obs. Type", Plane.energy_bounds_cval1 AS "Min. Wavelength", Plane.energy_bounds_cval2 AS "Max. Wavelength", Observation.proposal_id AS "Proposal ID", Observation.proposal_pi AS "P.I. Name", Plane.productID AS "Product ID", Plane.dataRelease AS "Data Release", AREA(Plane.position_bounds) AS "Field of View", Plane.position_sampleSize AS "Pixel Scale", Plane.dataProductType AS "Data Type", Plane.position_timeDependent AS "Moving Target", Plane.provenance_name AS "Provenance Name", Observation.intent AS "Intent", Observation.target_type AS "Target Type", Observation.target_standard AS "Target Standard", Observation.sequenceNumber AS "Sequence Number", Observation.algorithm_name AS "Algorithm Name", Observation.proposal_title AS "Proposal Title", Observation.proposal_keywords AS "Proposal Keywords", Plane.energy_emBand AS "Band", Plane.provenance_version AS "Prov. Version", Plane.provenance_project AS "Prov. Project", Plane.provenance_runID AS "Prov. Run ID", Plane.provenance_lastExecuted AS "Prov. Last Executed", Plane.energy_restwav AS "Rest-frame Spectral Coverage", isDownloadable(Plane.planeURI) AS "DOWNLOADABLE", Plane.planeURI AS "CAOM Plane URI", Observation.instrument_keywords AS "Instrument Keywords", Plane.energy_transition_species AS "Molecule", Plane.energy_transition_transition AS "Transition", Plane.position_resolution AS "IQ" FROM caom2.Plane AS Plane JOIN caom2.Observation AS Observation ON Plane.obsID = Observation.obsID WHERE ( Observation.collection = 'CFHT' )

Stephen Gwyn Canadian Astronomy Data Centre The other hard part: - Parsing downloaded metadata - Which observations are images? - Quality control - is MJD right? - Are coordinates or ? - Sorting out filters: - remove narrow band filter data - remove bad filters - remove grism data - maybe homogenize filter names (B vs Bj vs Bjohnson vs Johnson B vs...) - Telescope footprint not typically part of the metadata - Work out links back to original images

SSOIS saves the Earth....

Stephen Gwyn Canadian Astronomy Data Centre Summary: - SSOIS allows multi-archive searches for moving objects - Metadata is harvested from external archives - Lessons learned: - SIAP is not useful for metadata harvesting - multiple queries by time not by position - older interfaces are easier to scrape - parsing metadata often harder than retrieving it

Stephen Gwyn Canadian Astronomy Data Centre

Stephen Gwyn Canadian Astronomy Data Centre Summary: - SSOIS allows multi-archive searches for moving objects - Metadata is harvested from external archives - Lessons learned: - SIAP is not useful for metadata harvesting - multiple queries by time not by position - older interfaces are easier to scrape - parsing metadata often harder than retrieving it