Data Centres in the Virtual Observatory Age

Slides:



Advertisements
Similar presentations
What does LOFAR have to do with the Virtual Observatory (VO)? LOFAR Science Day 16 December 2003 Melbourne David Barnes The University of Melbourne.
Advertisements

The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
SWITCH Visit to NeSC Malcolm Atkinson Director 5 th October 2004.
ESO-ESA Existing Activities Archives, Virtual Observatories and the Grid.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility.
Canadian Virtual Observatory Project David Schade Canadian Astronomy Data Centre Herzberg Institute for Astrophysics National Research Council Canada.
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
Planning for the Virtual Observatory Tara Murphy … with input from other Aus-VO members …
Data preservation & the Virtual Observatory Bob Mann Wide-Field Astronomy Unit Royal Observatory Edinburgh
PV2013 Summary Results Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
David Schade Canadian Astronomy Data Centre Data Services Curation Practice and User Metrics.
TMT Instrumentation Canadian capabilities and interests David Loop, NRC-HIA June 26, 2010.
Aus-VO: Progress in the Australian Virtual Observatory Tara Murphy Australia Telescope National Facility.
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
Data Management Subsystem: Data Processing, Calibration and Archive Systems for JWST with implications for HST Gretchen Greene & Perry Greenfield.
What is CANARIE? CANARIE runs Canada’s only national high-bandwidth network for research & education Connects one million users at 1,100 institutions.
Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.
Dec 2, 2014 MAST Data Discovery Portal Tom Donaldson Tony Rogers.
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Doug Tody E2E Perspective EVLA Advisory Committee Meeting December 14-15, 2004 EVLA Software E2E Perspective.
Canadian Virtual Observatory David Schade Canadian Astronomy Data Centre Pat Dowler, Daniel Durand, Luc Simard, Norm Hill, Severin Gaudet.
E.Soundararajan R.Baskaran & M.Sai Baba Indira Gandhi Centre for Atomic Research, Kalpakkam.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Common Archive Observation Model (CAOM) What is it and why does JWST care?
“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.
Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian Astrophysical Observatory NSF
August 3, March, The AC3 GRID An investment in the future of Atlantic Canadian R&D Infrastructure Dr. Virendra C. Bhavsar UNB, Fredericton.
Ray Norris, CSIRO Australia Telescope National Facility The Astronomers’ Data Manifesto.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
Storage and Data Movement at FNAL D. Petravick CHEP 2003.
DISCUSSION DRAFT ONLY Data Management METRICS for NNDC and CLASS David Hermreck.
Introduction to the VO ESAVO ESA/ESAC – Madrid, Spain.
Implementation Review1 Archive Ingest Redesign March 14, 2003.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Faculty meeting - 13 Dec 2006 The Hubble Legacy Archive Harald Kuntschner & ST-ECF staff 13 December 2006.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
RI EGI-InSPIRE RI Astronomy and Astrophysics Dr. Giuliano Taffoni Dr. Claudio Vuerli.
AXF – Archive eXchange Format Report of AXF WG to TC-31FS 6 December, 2012.
HST and JWST Pipelines and Reference Files
Accessing the VI-SEEM infrastructure
Grid and Cloud Computing
An Approach to Software Preservation
DocFusion 365 Intelligent Template Designer and Document Generation Engine on Azure Enables Your Team to Increase Productivity MICROSOFT AZURE APP BUILDER.
Tools and Services Workshop
Community Science Updates
Joslynn Lee – Data Science Educator
Computing models, facilities, distributed computing
Working Group 4 Facilities and Technologies
aspects of archive system design
The INES Archive in the era of Virtual Observatories
Moving towards the Virtual Observatory Paolo Padovani, ST-ECF/ESO
Connecting the European Grid Infrastructure to Research Communities
Future Data Architecture Cloud Hosting at USGS
EGI – Organisation overview and outreach
SDM workshop Strawman report History and Progress and Goal.
EGI Webinar - Introduction -
Computing & Data Resources Application interface
Grid Application Model and Design and Implementation of Grid Services
Observing with Modern Observatories (the data flow)
Andy Puckett – Sales Engineer
The New Internet2 Network: Expected Uses and Application Communities
Data Management Components for a Research Data Archive
Maria Teresa Capria December 15, 2009 Paris – VOPlaneto 2009
Fabio Pasian, Marco Molinaro, Giuliano Taffoni
Presentation transcript:

Data Centres in the Virtual Observatory Age David Schade Canadian Astronomy Data Centre PV2009 Madrid December 3, 2009

A few things I’ve learned in the past two days PV2009 Madrid December 3, 2009 A few things I’ve learned in the past two days There exist serious efforts at Long-Term Data Preservation Alliance for Permanent Access CASPAR: interesting top-down approach to broad study of digital data preservation Older data degrades in value as understanding gets fuzzy I don’t know much about Long Term Preservation 2

CADC History Data Collections PV2009 Madrid December 3, 2009 CADC History Data Collections Formed in 1986 to distribute HST data to Canadian astronomers HST

CADC History Data Collections PV2009 Madrid December 3, 2009 CADC History Data Collections CFHT Canada France Hawaii Telescope HST

CADC History Data Collections PV2009 Madrid December 3, 2009 CADC History Data Collections CFHT Gemini Gemini Observatories Hawaii and Chile HST

CADC History Data Collections PV2009 Madrid December 3, 2009 CADC History Data Collections CFHT Gemini JCMT James Clerk Maxwell Telescope Sub-mm HST

CADC 2009 Data Collections 25 Staff PV2009 Madrid December 3, 2009 CADC 2009 Data Collections CFHT Gemini JCMT Deliver 2 Terabytes per week to users 2500 distinct users 87 countries Serve all Canadian astronomy research universities Large astronomy data centre (400 TB) 25 Staff HST CGPS MOST FUSE BLAST MACHO

CADC Data Flows (last year) PV2009 Madrid December 3, 2009 CADC Data Flows (last year)

PV2009 Madrid December 3, 2009 The key to our success An intimate and intense working relationship between CADC software developers, scientists, and operations staff

Added value is our business PV2009 Madrid December 3, 2009 Added value is our business Query capability Web interfaces On-the-fly calibration for HST Data Processing WFPC2 ACS Hubble Legacy Archive CFHT mosaic cameras Virtual Observatory Integration of services locally and globally

Old model for data management PV2009 Madrid December 3, 2009 Old model for data management CFHT Principle Investigator

Old model for data management PV2009 Madrid December 3, 2009 Old model for data management CFHT Data Archive Archive Users Principle Investigator

Current model for data management PV2009 Madrid December 3, 2009 Current model for data management CFHT CADC is in the main stream of data flow Processing Centre Data Centre Principle Investigator Survey Teams Archival Researchers Virtual Observatory All data and metadata flow through data centre

Long Term Data Preservation PV2009 Madrid December 3, 2009 Long Term Data Preservation We are not in the LTDP business We do not have the right staff for LTDP We don’t have funding for LTDP We pursue funding based on support of leading-edge science especially surveys Our work lays the foundation for LTDP We need a proper model for LTDP for these major collections 14

PV2009 Madrid December 3, 2009 Data Security We maintain 2 copies of each file on spinning disk (GEN 5 now) We maintain 2 copies on tape (offsite) backup Our recovery-from-backup system is not well tested This is not a very high level of security We are not satisfied with this situation 15

Important roles for CADC PV2009 Madrid December 3, 2009 Important roles for CADC CFHT Data Centre Principle Investigator Survey Teams Archival Researchers Virtual Observatory

Important roles for CADC PV2009 Madrid December 3, 2009 Important roles for CADC HST JCMT Gemini CFHT Data Centre We play an important role in working with observatories to improve their data and metadata collection methods and quality. (INTENSE interaction) This results in great improvements to “archived” data quality PI and archive data streams are the same: this is good for data quality

Integration of data and services PV2009 Madrid December 3, 2009 Integration of data and services We have taken a few steps beyond the “archive-specific” data management paradigm USER CFHT Gemini JCMT HST 18

Common Archive Observation Model PV2009 Madrid December 3, 2009 Common Archive Observation Model We have taken a few steps beyond the “archive-specific” data management paradigm USER 19 HST CFHT JCMT Gemini Integration

Common Archive Observation Model PV2009 Madrid December 3, 2009 Common Archive Observation Model Transform “archive-specific metadata” [fitstoCAOM] into a common model that represents various data products, provenance, etc. for multi-facility data All downstream software is a single stack (Query, data access, VO services) CAOM supports a science user view of the observations It does not necessarily support LTDP CAOM 20 HST CFHT JCMT Gemini Integration

Common Archive Observation Model To be implemented in each archive Mirrored in the data warehouse Purpose: Standardize the core of every archive The only metadata interface between archives and the data warehouse A general purpose infrastructure to respond to evolving VO standards Model: Inspired from VO work: Observation, Characterization, SIA, SSA, Authentication, etc. Using our archive, data modelling and VO experience Characterisation based on FITS WCS papers I, II and III Tools: data access (web pages and proxies), discovery agents, advanced query capabilities, advanced applications

Common Archive Observation Model

CAOM Data Types

Common Archive Observation Model

Access Control

Common Archive Observation Model

Processing and Provenance

Common Archive Observation Model PV2009 Madrid December 3, 2009 Common Archive Observation Model Transform “archive-specific metadata” [fitstoCAOM] into a common model that represents various data products, provenance, etc. for multi-facility data All downstream software is a single stack (Query, data access, VO services) CAOM supports a science user view of the observations It does not necessarily support LTDP CAOM 28 HST CFHT JCMT Gemini Integration

CAOM Virtual Observatory USER Integration PV2009 Madrid December 3, 2009 Virtual Observatory The requirement for integration is driven by science practice: multi-facility, multi-wavelength datasets used by multi-national teams. USER Integration CAOM CFHT Gemini JCMT HST 29

Science Data Centres USER Integration Physics data Ocean Sciences data PV2009 Madrid December 3, 2009 Science Data Centres Future steps will take us beyond the “science-specific” data management paradigm USER Integration Physics data Ocean Sciences data Chemistry data Biology data CFHT Gemini JCMT HST

PV2009 Madrid December 3, 2009 VO Challenges A fundamental problem in VO is that we have been focusing on INTEROPERABILITY when the quality of many (most?) data collections and services are not of high enough quality to support VO-level services Substantial data engineering is needed to bring these services up to the required standard The concept of VO as a lightweight layer on top of archives is incorrect Competing pressures make it easy to implement make it powerful 31

Data Centres USER Integration Science Data Medical Data Economic Data PV2009 Madrid December 3, 2009 Data Centres Future steps will take us beyond the “domain-specific” data management paradigm USER Integration Science Data Medical Data Economic Data Security data *Are we really looking at “data” at this level?

Long Term Data Preservation PV2009 Madrid December 3, 2009 Long Term Data Preservation The growth in instrumental data volume output with time means that migrating old collections to new media have been negligible in cost Soon the JCMT will close. Later CFHT will close. We will be faced for the first time with significant costs for data preservation for inactive data collections Where will the funding come from? We need a proper model for LTDP for these major collections 33

Canadian Investments in Computational Infrastructure PV2009 Madrid December 3, 2009 Canadian Investments in Computational Infrastructure COMPUTE CANADA CANARIE Neither the grid computing facilities of Compute Canada nor the network facilities of CANARIE are delivering the performance needed by observational astronomers “Configuration” issues are the problem CANFAR is attempting to address these issues 34

Proposed model for data management PV2009 Madrid December 3, 2009 Proposed model for data management CFHT CANARIE Processing Centre COMPUTE CANADA Data Centre Principle Investigator Survey Teams Archival Researchers Virtual Observatory Cloud storage and processing on Compute Canada grid (Infrastructure as a Service)

Cloud Computing Implications for LTDP PV2009 Madrid December 3, 2009 Cloud Computing Implications for LTDP CFHT COMPUTE CANADA What are the implications? What commitment exists from Compute Canada? What will happen 10 years from now? New uncertainties for LTDP Data Centre Archival Researchers Virtual Observatory

PV2009 Madrid December 3, 2009 Summary Key to success: intimate working relationship between CADC software developers, scientists, and operations staff We add value to data (we are in the mainstream of data flow) Long Term Data Preservation is not a primary driver of our activities (yet our work lays the foundation for LTDP) We invest heavily in data integration (CAOM) and VO Data collections need to be upgraded as well as become interoperable. We help to improve data quality at source The future of our data collections is uncertain to a disturbing degree