Science ESAC Cluster Final Archive Pedro Osuna Head of the Science Archives and VO Team Science Operations Department CAA-CFA Review Meeting.

Slides:



Advertisements
Similar presentations
Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
Advertisements

The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
Database System Concepts and Architecture
1 CEOS/WGISS20 – Kyiv – September 13, 2005 Paul Kopp SIPAD New Generation: Dominique Heulet CNES 18, Avenue E.Belin Toulouse Cedex 9 France
Solar and STP Physics with AstroGrid 1. Mullard Space Science Laboratory, University College London. 2. School of Physics and Astronomy, University of.
GRID Activities at ESAC Science Archives and Computer Engineering Unit Science Operations Department ESA/ESAC – Madrid, Spain.
CS 501: Software Engineering Fall 2000 Lecture 16 System Architecture III Distributed Objects.
Chapter 9: Moving to Design
Leicester Database & Archive Service J. D. Law-Green, S. W. Poulton, J. Osborne, R. S. Warwick Dept. of Physics & Astronomy, University of Leicester LEDAS.
Distributed Systems: Client/Server Computing
Astronomical GRID Applications at ESAC Science Archives and Computer Engineering Unit Science Operations Department ESA/ESAC.
Introduction to Database Concepts
TAP service at ESAC - VOTAP Carlos Rios Diaz Science Archives Team (SAT) ESA-ESAC, Madrid, Spain VOTAP.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
OSN Archive: Current status and future implementations José Miguel Ibáñez Instituto de Astrofísica de Andalucía - CSIC Sierra Nevada Observatory First.
Systems analysis and design, 6th edition Dennis, wixom, and roth
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
CAA/CFA Review | Andrea Laruelo | ESTEC | May CFA Development Status CAA/CFA Review ESTEC, May 19 th 2011 European Space AgencyAndrea Laruelo.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC Roles and responsibilities.
Data Management Subsystem Jeff Valenti (STScI). DMS Context PRDS - Project Reference Database PPS - Proposal and Planning OSS - Operations Scripts FOS.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
How to Adapt existing Archives to VO: the ISO and XMM-Newton cases Research and Scientific Support Department Science Operations.
Data Visualization Project B.Tech Major Project Project Guide Dr. Naresh Nagwani Project Team Members Pawan Singh Sumit Guha.
Astronomical Spectroscopy and the Virtual Observatory ESAC, March 2007 VO tools and cross-calibration Pedro García-Lario European Space Astronomy.
Introduction to the ESA Planetary Science Archive  Jose Luis Vázquez (ESAC/ESA)  Dave Heather (ESTEC/ESA)  Joe Zender (ESTEC/ESA)
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
European New HST & MMI Demo Nacho León María Arévalo Jonas Haase Jesús Salgado Deborah Baines Bruno Merín ESAC 20 January 2015.
Common Archive Observation Model (CAOM) What is it and why does JWST care?
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Introduction / Context.
European Space Astronomy Centre (ESAC) Villafranca del Castillo, MADRID (SPAIN) Applications May 2006, Victoria, Canada VOQuest A tool.
06-1L ASTRO-E2 ASTRO-E2 User Group - 14 February, 2005 Astro-E2 Archive Lorella Angelini/HEASARC.
Sharing scientific data: astronomy as a case study for a change in paradigm Présenté par Françoise Genova.
21-jun-2009 IVOA Standards Pedro Osuna ESA-VO Project Science Archives and Computer Support Engineering Unit (SRE-OE) Science Operations Department (SRE-O)
AstroGrid NAM 2001 Andy Lawrence Cambridge NAM 2001 Andy Lawrence Cambridge Belfast Cambridge Edinburgh Jodrell Leicester MSSL.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
Event and Feature Catalogs in the Virtual Solar Observatory Joseph A. Hourclé and the VSO Team SP54A-07 : 2008 May 30.
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Introduction to the VO ESAVO ESA/ESAC – Madrid, Spain.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
European Space Astronomy Centre (ESAC) Villafranca del Castillo, MADRID (SPAIN) Pedro Osuna VOSpec Kyoto May 2005 VOSpec: A Tool to Handle VO-Compatible.
1 SUZAKU HUG 12-13April, 2006 Suzaku archive Lorella Angelini/HEASARC.
Publishing Combined Image & Spectral Data Packages Introduction to MEx M. Sierra, J.-C. Malapert, B. Rino VO ESO - Garching Virtual Observatory Info-Workshop.
ESA Scientific Archives and Virtual Observatory Systems Science Archives and VO Team Research and Scientific Support Department.
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Science Gateway- 13 th May Science Gateway Use Cases/Interfaces D. Sanchez, N. Neyroud.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Archives System Building Infrastructure: Re-engineering ESA’s space based missions’ archives Pedro Osuna Science Archives and Computer Support Engineering.
Big Data is a Big Deal!.
Archiving of solar data Luis Sanchez Solar and Heliospheric Archive Scientist Research and Scientific Support Department.
The INES Archive in the era of Virtual Observatories
Joseph JaJa, Mike Smorul, and Sangchul Song
CHAPTER 3 Architectures for Distributed Systems
Exploitation of ISS Scientific data - sustainability
Manuscript Transcription Assistant Initiative
Google Sky.
SDMX IT Tools SDMX Registry
Presentation transcript:

Science ESAC Cluster Final Archive Pedro Osuna Head of the Science Archives and VO Team Science Operations Department CAA-CFA Review Meeting ESTEC, of May 2011

European Space Astronomy Centre  ESAC default location for:  Science operations, long history with astronomical missions Now also solar system missions  Science archives, Astronomy, Planetary and Solar Systems Long term preservation of ESA Science data  ESA Virtual Observatory activities, ESAC is the European VO node for space-based astronomy. Located near Madrid, Spain ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 2

Astronomy Archives at ESAC (current) ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 3

Solar Systems Archives at ESAC ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 4

Science Archives ESAC  More than 13 years experience over many different missions Astronomy / planetary /solar systems, observatory / survey / PI missions Mission in development, in operations, in post operations, in archive phases Raw data, calibrated processed data, high level data products Data processed at ESAC by SOC or by PI teams Standard processing, bulk reprocessing, on-the-fly reprocessing  Financed by projects funds (~11 FTE) as part of their SOC activities  Some SRE-O department (~3 FTE) for core archive activities (eg “re-engineering”), maintaining archives long term (ie ISO, Exosat) and small VO activities  ESAC Science Archives Team highly involved in all VO activities within ESA ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 5

Science ESAC : main characteristics  Complete Mission Scientific data all stored on hard disks Images, spectras, measurements (photons counts, temperatures, wind speeds, …), catalogues or maps of astronomical objects, … Around 50TB of science data, increase quickly to TB, then ~1 PB with Gaia New data ingested regularly  All archives data management systems automatic All data is distributed through Internet/FTP No archive operator Development, maintenance, operations, monitoring done by Science Archives Team Exception for some complex data ingestion –eg Planetary missions requiring technical validation, –one off ingestion of point sources catalogue  Data is made available to the scientific community through Internet Through a standard Internet browser (Internet Explorer, Firefox, …) Search, preview, select and download Public access after a proprietary period ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 6

Archive and Data Management  OAIS Reference model for Open Archival Information System Recommendation for Space Data System Standards  An archive is more than a “JBOD” (Just a Bunch Of Disks) Dozens of TB, millions of files, complex queries  Proper software engineering is require to ensure the various OAIS functions Powerful and user-friendly access interfaces Complex Data and Metadata Management Well Modeled Databases Flexible Data Distribution Interoperability Added Value Services Logging, Statistics Long Term Preservation ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 7

ESAC Archives Common Architecture ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 8

Archives Building System Infrastructure (ABSI)  All new ESAC Science Archives are based on ABSI 1 st generation of ESAC Archives are being re-engineered into the ABSI framework  The ABSI provides with a set of Building Blocks, modular enough to be reusable for different purpose Scientific Archives (c.f., SOHO, EXOSAT, Planck)  These Building Blocks are divided in: Interfaces: –Object Oriented type of Interface Modules: –It's a software package that packs self-contained functionality. Must be accompanied by an API or similar that gives information on how to consume it. Component sample: –Wraps-up sample code implementing certain functionality. It may contain Modules and/or GlueCode. ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 9

ABSI Elements ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 10

Handling big amount of data  We are dealing with more than a million observations (granularity in SOHO different from Astronomical cases). Database Table indexing gets overly complicated and joins poorly performant  To know how to apply the joins to the different attributes requested ("where" part of the query), we implement the Dijkstra algorithm (shortest path algorithm, graph theory).  Dijkstra's algorithm, conceived by Dutch computer scientist Edsger Dijkstra in 1959, is a graph search algorithm that solves the single-source shortest path problem for a graph with non negative edge path costs, outputting a shortest path tree.  This algorithm is often used in routing.  We have applied it to our database tables and relationships  On-line examples: ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 11

Indexing spherical data in databases  Indexing spherical data for search in a DB is traditional problem  Coordinate searches done “a capella” in our archives -> plane SQL searches with indices in RA, DEC.  Gets complicated when coordinate operations, transformations, functions, etc. have to be executed - > poor performance, low flexibility  PgSphere is a module that implements Spherical types (database types) in PostgreSQL (an open source database system).  Provides: input and output of data containing, overlapping, and other operators various input and converting functions and operators circumference and area of an object spherical transformation indexing of spherical data types several input and output formats  Implemented in Exosat for the first time. Being re-used for Planck, Herschel. ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 12

New ESAC Science Archive Creation Process  Bottom-up approach: First build the general UML for the overall project. Then start building from there.  UML  DB Design  Repository design  DAO (Data Access Objects) design  User Interface design  Good UML design for project extremely important.  Proper knowledge of the data by the SAT is crucial in order to build good Data repository and Data Distribution systems hundreds of mails interchanged between SOHO Archive Scientist and SAT Team on Data issues) ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 13

ESAC Archives Common Architecture Technologies used Java Rich Client Webstart InfoNode JGoodies Swing Web Services Java Tomcat Spring Java PostgreSQL PgSphere NAS, NFS NetApp Filer Hibernate Java ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 14

ESAC Archives Interfaces  Complementary interfaces to serve various type of users Scientific Community (public access), PI teams and observers (controlled access), Science Operations Team (privilege access)  Powerful web based Java GUI interface Standard access to the archives Simple to use, powerful search / results facilities Handling of proprietary and public data Direct download, shopping basket On the fly reprocessing (for some archives) Link (back and forth) to the science literature Mars Map Browser  Scriptable Interface Machine interface, mainly used by Science Operations Teams and mirror sites  Interoperability with other external archives and tools through Virtual Observatory protocols ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting May 2011 | Pag. 15