Archiving of solar data Luis Sanchez Solar and Heliospheric Archive Scientist Research and Scientific Support Department.

Slides:



Advertisements
Similar presentations
Science Archives Workshop - April 25, Page 1 Archive Policies and Implementation: A Personal View from a NASA Heliophysics Data Policy Perspective.
Advertisements

1 CEOS/WGISS20 – Kyiv – September 13, 2005 Paul Kopp SIPAD New Generation: Dominique Heulet CNES 18, Avenue E.Belin Toulouse Cedex 9 France
Solar and STP Physics with AstroGrid 1. Mullard Space Science Laboratory, University College London. 2. School of Physics and Astronomy, University of.
Summary Role of Software (1 slide) ARCS Software Architecture (4 slides) SNS -- Caltech Interactions (3 slides)
Operating a Virtual Observatory Raymond J. Walker, Jan Merka, Todd A. King, Thomas Narock, Steven P. Joy, Lee F. Bargatze, Peter Chi and James Weygand.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Astronomical GRID Applications at ESAC Science Archives and Computer Engineering Unit Science Operations Department ESA/ESAC.
March 2010 PDS Imaging Node 1 NASA PDS Imaging Node: NASA PDS Imaging Node: Digital Data Archives and Distribution Archiving and distributing data and.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
CAA/CFA Review | Andrea Laruelo | ESTEC | May CFA Development Status CAA/CFA Review ESTEC, May 19 th 2011 European Space AgencyAndrea Laruelo.
EGY Meeting March Page 1 The Data Policy for NASA's Heliophysics Science Missions & the eGY Geoscience Information Commons D. A. Roberts.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
How to Adapt existing Archives to VO: the ISO and XMM-Newton cases Research and Scientific Support Department Science Operations.
Introduction to the ESA Planetary Science Archive  Jose Luis Vázquez (ESAC/ESA)  Dave Heather (ESTEC/ESA)  Joe Zender (ESTEC/ESA)
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.
SPASE: Metadata Interoperability in the Great Observatory Environment Jim Thieman Todd King Aaron Roberts Joe King AGU Joint Assembly May 23, 2006.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Introduction / Context.
Database Administration
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Science ESAC Cluster Final Archive Pedro Osuna Head of the Science Archives and VO Team Science Operations Department CAA-CFA Review Meeting.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
The Path Toward Data System Integration Raymond J. Walker Todd A. King Steven P. Joy Science Archives in the 21 st Century University of Maryland April.
Introduction to the VO ESAVO ESA/ESAC – Madrid, Spain.
Chapter 2 Database Environment.
VxO Kickoff Meeting - May 22, 2006 The Evolving Heliophysics Data Environment: “VxO Kickoff” Chuck Holmes Joe Bredekamp May 22, 2006.
The Virtual Solar Observatory – An Operational Resource for Heliophysics Informatics Frank Hill & The VSO Team.
A Perspective on the Electronic Geophysical Year Raymond J. Walker UCLA Presented at eGY General Meeting Boulder, Colorado March 13, 2007.
STEREO and the Virtual Heliospheric Observatory Tom Narock 1,2 Adam Szabo 1 Jan Merka 2 (1) NASA/Goddard Space Flight Center (2) L3 Communications, GSI.
ESA Scientific Archives and Virtual Observatory Systems Science Archives and VO Team Research and Scientific Support Department.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Heliophysics MO&DA Program - November 13, Page 1 Notes from the Heliophysics MO&DA Program STEREO SWG Meeting Chuck Holmes “Director, Heliophysics.
Todd King James Thieman D. Aaron Roberts SPASE Consortium SPASE – An Introduction Standard Metadata and Open Sharing.
1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.
TRIG: Truckee River Info Gateway Dave Waetjen Graduate Student in Geography Information Center for the Environement (ICE) University of California, Davis.
Operating System Structures
PLM, Document and Workflow Management
N-Tier Architecture.
SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?
Solar System Plasma data archiving in France
Joseph JaJa, Mike Smorul, and Sangchul Song
Middleware independent Information Service
Part 3 Design What does design mean in different fields?
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Cloud based Open Source Backup/Restore Tool
University of Technology
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
INPE, São José dos Campos (SP), Brazil
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2: Database System Concepts and Architecture
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Component-Based Software Engineering
Chapter 2: System Structures
Implementing an Institutional Repository: Part II
Lecture 1: Multi-tier Architecture Overview
Database Environment Transparencies
Chapter 1: The Database Environment
Chapter 6 – Architectural Design
Manuscript Transcription Assistant Initiative
Middleware, Services, etc.
Google Sky.
Data Management Components for a Research Data Archive
Objectives Explain the role of computers in client-server and peer-to-peer networks Explain the advantages and disadvantages of client- server and peer-to-peer.
Implementing an Institutional Repository: Part II
The Database Environment
Project Management Unit #2
How to Implement an Institutional Repository: Part II
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Presentation transcript:

Archiving of solar data Luis Sanchez Solar and Heliospheric Archive Scientist Research and Scientific Support Department

Contents A bit of history on ESA archives. The SOHO archive. Evolution: New ESA approach to science operations and archiving. Archive infrastructure and data products. The virtual observatory layer. NASA’s Heliospheric Virtual Observatory.

ESA’s old approach to archiving science data ‘Traditional’ approach: No science archiving done at ESA (except Hipparcos). Funding agencies supporting PIs responsible for archiving data. Infrared Space Observatory changed that: ESA’s RSSD established an archive for this observatory type mission. Triggered involvement in archiving of astronomy data at VILSPA: Active participation in IVOA with the development of virtual observatory aware mission archives for astronomy missions (ISO, XMM, Integral…). Same path also followed for planetary data: Establishment of Planetary Science Archive at VILSPA with very close ties to PDS. Meanwhile, other mission specific archives were established elsewhere SOHO at GSFC, Ulysses and CAA at ESTEC. Limited interoperability with virtual observatories.

Archiving of SOHO data The SOHO archive was developed by ESA with the collaboration of NASA. Servers supplied by ESA. Software designed and developed by ESA in 1997. Storage provided by NASA as part of the Solar Data Analysis Center (SDAC) and shared with other missions. Network infrastructure contributed by NASA. Simple, modular design based on several components: Relational Database Managing System (RDBMS). Web based user interface (UI). Middleware for passing information between the UI and the RDBMS. Validation and ingestion of data products. Off line (batch) distribution of data products.

The SOHO archive place in SOHO operations

The ingestion is instrument based: A software module is written to validate and extract metadata from all the data products provided by a given instrument. Addition of new data products or modification of existing ones do not affect to data products from other instruments.

The SOHO archive Pros: Easy software maintenance. Designed to be used with any RDBMS supported by Perl::DBI. Designed to be used with a variety of user interfaces. Runs on any major operative system. Cons: Primitive interface with virtual observatories. Not easy to run applications on top of it (for example, basic data analysis. 10 year old technology. New archive being developed at ESAC reusing existing code base for science archives fitting with new approach to science ops and archiving.

ESA’s new approach to sciops/archiving VILSPA is now ESAC (European Space Astronomy Centre): Focal point for science operations and archiving. ESA supports the establishment of long term science archives across all disciplines reusing the infrastructure already developed at ESAC for astronomy and planetary missions. ESA’s RSSD is discussing a renewed approach to science operations and archiving for Solar System missions: On-going process tied to RSSD reorganization. More resources to PI teams to get calibrated data. Improve consolidation across missions for operations and archiving. Development mission archives for all science disciplines which support existing virtual observatories.

Archive building blocks Mission or discipline oriented long term archives. Archive infrastructure can be common for ‘active’ and long term archives. This is the ‘technical layer’. Hardware (servers, storage, and networks). Operative system and application/utility level software like RDBMS. Great scope for infrastructure consolidation (lower costs, more efficiency). But has to work properly with the archive holdings to be held. Archive holdings. This is where the science is. Data products in the traditional sense. Software. Science applications. Procedures. Logs. Documentation.

Archive infrastructure requirements Some basic requirements for the archive infrastructure: Completeness: All data from the mission stored together with software and procedures. Different levels of data products also stored. Longevity: Hardware and software ought to be upgraded as easily as possible during the life of the archive. Integrity: Data products should not change (see also ‘security’). Availability: Data products should be accessible to PIs and other scientists without restrictions and in a timely manner. Accountability: Every operation with the archive is documented and traceable. Security: Against tampering and denial of service attacks. Status information: The status of the archive including data holdings but also operational status (users, queries executed, data distributed…).

Data products (including software, documentation, etc.) Some aspects to have into account when defining data products: Intended usage (science analysis, housekeeping, public relations…) Audience intended (PI team, engineering team, wider scientific community…). Tools to be used when accessing and using it. Turnaround times for generation, expiration and access. Dependencies on other data products (for generation, expiration, access). Versions to be produced (perhaps for different calibrations or purposes). Metadata required to fully describe it. Relationship between metadata used and those used by the science community on similar or related data products. Format for data and metadata representation. Physical implementation for the chosen format. Documentation on procedure and software used to generate it. Documentation on what the data product represents. Quality of data information (very hard to do it properly a posteriori).

Virtual Observatory layer Additional layer on top of existing archives and services (order below is roughly chronological from now onwards so some future time): Archive location irrelevant to the user (distributed access). Data and metadata may be held in different locations. Searches are independent from the archive holding the data. Searches use a common set of parameters using a data model. Data retrieval is done from one or many data repositories. Possibility to run science applications on data holdings. Eventually, data retrieval might be not even necessary. GRID computing (remote data, services, and computation). Virtual observatory ‘added value’: Working with science data is easier (less boring, non productive stuff to do). Opens up new science (that was too work intensive, or because with ‘data mining’ is possible to find new relationships between data products). Making new data products accessible to the science community is also easier.

NASA Virtual Observatories initiative Heliophysics Virtual Observatory: Data from existing missions (SOHO, Trace, RHESSI, Wind, Cluster, ACE, Polar, Geotail, FAST, IMAGE, TIMED, SORCE, Ulysses, Voyager…) and upcoming ones (STEREO, Solar-B, SDO…). Heliophysics becomes separated from Earth Sciences. Distributed environment: ‘Small box’ approach with the Virtual Solar Observatory (VSO) as pathfinder. Resident archives (the existing ones) to retain data collections. Virtual observatories for convenient search with access to all data. Distributed funding and implementation. SPASE data model as ‘Rosetta stone’ for interoperability of heliophysics data. Magnetospheric data in PDS to be made compatible with SPASE so it becomes accessible to the space physics scientific community. The Heliophysics Virtual Observatory is the umbrella or the sum of all these virtual observatories.