ES3 architecture ES3’s provenance capture system uses a client- server architecture. The collector client augments a science process’s execution environment.

Slides:



Advertisements
Similar presentations
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 1 Introduction to Perl and CGI.
Advertisements

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
1 1999/Ph 514: Channel Access Concepts EPICS Channel Access Concepts Bob Dalesio LANL.
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
From UV to fluorescence, a semi-analytical ocean color model for MODIS and beyond Stéphane Maritorena & Dave Siegel Earth Research Institute University.
1 UIM with DAML-S Service Description Team Members: Jean-Yves Ouellet Kevin Lam Yun Xu.
Beyond Chlorophyll: Ocean color ESDRs and new products S. Maritorena, D. A. Siegel and T. Kostadinov Institute for Computational Earth System Science University.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
® IBM Software Group © 2006 IBM Corporation Rational Software France Object-Oriented Analysis and Design with UML2 and Rational Software Modeler 04. Other.
The GSM merging model. Previous achievements and application to GlobCOLOUR Globcolour / Medspiration user consultation, Dec 4-6, 2006, Villefranche/mer.
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
Client/Server Databases and the Oracle 10g Relational Database
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Guide To UNIX Using Linux Third Edition
ASP.NET Programming with C# and SQL Server First Edition Chapter 8 Manipulating SQL Server Databases with ASP.NET.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Chapter 11 ASP.NET JavaScript, Third Edition. 2 Objectives Learn about client/server architecture Study server-side scripting Create ASP.NET applications.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Maintaining and Updating Windows Server 2008
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
Creating Web Page Forms
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
JavaScript, Fourth Edition Chapter 12 Updating Web Pages with AJAX.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Configuration Management (CM)
The Ocean Color MEASURES Project S. Maritorena 1, J. Frew 2, N.B. Nelson 1, D.A. Siegel 1,3, M. Behrenfeld 4 1. Institute for Computational Earth System.
Smith’s Aerospace © P. Bailey & K. Vander Linden, 2005 Architecture: Component and Deployment Diagrams Patrick Bailey Keith Vander Linden Calvin College.
ES 3 Takes the Provenance Challenge J AMES F REW Donald Bren School of Environmental Science and Management University of California, Santa Barbara
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Tom O’Reilly Monterey Bay Aquarium Research Institute.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
PI Data Archive Server COM Points Richard Beeson.
ASSESSMENT OF OPTICAL CLOSURE USING THE PLUMES AND BLOOMS IN-SITU OPTICAL DATASET, SANTA BARBARA CHANNEL, CALIFORNIA Tihomir S. Kostadinov, David A. Siegel,
Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.
A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer.
VAPOR 3.0 Architecture Overview. Purpose of this document Provide understandable overview of 3.0 architecture Enable the development team to maintain.
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata.
Apr. 8, 2002Calibration Database Browser Workshop1 Database Access Using D0OM H. Greenlee Calibration Database Browser Workshop Apr. 8, 2002.
IS-907 Java EE World Wide Web - Overview. World Wide Web - History Tim Berners-Lee, CERN, 1990 Enable researchers to share information: Remote Access.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
1 Channel Access Concepts – IHEP EPICS Training – K.F – Aug EPICS Channel Access Concepts Kazuro Furukawa, KEK (Bob Dalesio, LANL)
Application Web Service Toolkit Allow users to quickly add new applications GGF5 Edinburgh Geoffrey Fox, Marlon Pierce, Ozgur Balsoy Indiana University.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
CMPE 226 Database Systems April 19 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Maintaining and Updating Windows Server 2008 Lesson 8.
Two New UML Diagram Types Component Diagram Deployment Diagram.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
1 Chapter 1 INTRODUCTION TO WEB. 2 Objectives In this chapter, you will: Become familiar with the architecture of the World Wide Web Learn about communication.
Architecture Review 10/11/2004
湖南大学-信息科学与工程学院-计算机与科学系
IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C
EE 122: HyperText Transfer Protocol (HTTP)
Overview of Workflows: Why Use Them?
Channel Access Concepts
An Introduction to JavaScript
Presentation transcript:

ES3 architecture ES3’s provenance capture system uses a client- server architecture. The collector client augments a science process’s execution environment (e.g., the bash shell) with plugins (e.g., strace) that monitor the science process's system calls. The collector plugins communicate synchronously with a logger process that identifies provenance events (file access, program execution, etc.) and writes them to local log files. When a logger process exits, a transmitter process scans the log files, assigns a universally unique identifier (UUID) to each provenance event, and formats each event (and any available metadata for the objects involved) as an XML message. Provenance messages are submitted to the ES3 core: a provenance database with a web service interface. Introduction Provenance is metadata about an object’s origin and history. For an Earth science data product, such as a satellite-derived map of global ocean color, provenance comprises the source data sets, processing steps, and parameter settings that produced the final product. Provenance is conventionally recorded either manually (e.g., by a human taking notes), which is tedious and error-prone, or prescriptively, as in a workflow graph which must be then be executed in a specific workflow system. We have developed a third, passive approach to provenance capture as part of our Earth System Science Server (ES3) project (Frew et al., 2007, DOI: /cpe.1247). Our passive provenance capture implementation works in any Linux- derived computing environment, without any changes to the science processing codes. GSM ocean color model The Garver-Siegel-Maritorena (GSM) model (Maritorena and Siegel, 2005, DOI: /j.rse ) uses normalized water-leaving radiances from any available sensors to invert a semianalytical ocean color model and simultaneously retrieve chlorophyll a concentration, combined dissolved/detrital particulate absorption (CDM), and particulate backscatter (BBP). GSM is unique in that it calculates the best combination of input sensors independently for each output value. The model is not computationally intensive, so its application to generating climate data records allows for frequent reprocessing. ES3 helps maintain the provenance of these multiple versions, without any changes to the current GSM implementation (a combination of UNIX shell scripts and interpreted IDL programs.) Automatic Provenance Collection and Publishing in a Science Data Production Environment—Early Results James Frew, Greg Janée, and Peter Slaughter Institute for Computational Earth System Science, University of California, Santa Barbara This work was supported by NASA Cooperative Agreement NNG04GE66G What’s this? → The large “wiring diagram” across the center of the poster illustrates the complete processing history of a single run of GSM, as recorded and then reconstituted by ES3. Events in ES3 The request-response excerpt at right (end tags omitted for brevity) shows how an application can request information from ES3 about single events (in this case, those involving files that match the wildcard expression “*.hdf”.) The response (single file excerpt) shows how, in addition to the usual filesystem metadata, ES3 saves the file’s UUID, digital signature, and the UUID of the workflow (e.g., shell script) in whose context the file was accessed. SeaWiFS … + MODIS Aqua … + MERIS= GSM output (partial*) Merged chlorophyll a concentration on 21 March 2008 GSM inputs (partial*) Normalized water-leaving radiances at 412 nm on 21 March 2003 *Full GSM inputs include radiances in 5 wavelength bands from each sensor. ← Legend Circles: files read and/or written Boxes: programs executed. Nesting indicates programs that execute other programs and wait for them to terminate (e.g., command interpreters) Arrows: data transfer (I/O) (graph rendered by yEd) (*GSM outputs also include CDM, BBP, and estimated error fields) Provenance in ES3 ES3 maintains provenance as a directed graph of input/output relationships between events. Provenance information is retrieved by specifying an event’s UUID, a direction in which to proceed (forward, backward, or both), and an output graph representation. In the example to the left, the excerpted GraphML shows how objects and links in ES3 are represented as nodes and edges in GraphML. Objects in ES3 ES3 manages provenance events, but those events usually involve objects (e.g., data or program files) that persist across events. ES3 therefore allows objects to be registered by an external name (currently, a filename.) This registration persists until the object’s content is determined to have changed (e.g. in a “write” event.).*hdf fd63a471-f c-8dac-975cbc es3 …/GSMchl L3b_DAY.1.v5.hdf T061106Z e24d09744e4f26adb549035f520e7609 2b9b441e-106e-45e1-8ab4-c7e792510fbb fd63a471-f c-8dac-975cbc both link graphml nested,yfiles full <node id="fd63a471-f c-8dac-975cbc138036" name="GSMchl L3b_DAY.1.v5.hdf" objectType="file"> … <edge source= "fd63a471f c-8dac-975cbc138036” target= "55cd8e66-71a7-4a10-ac3b-cbb42e93cc14”> … Identity Management for ES3 ES3’s current use of filenames as external identifiers is expedient but problematic. Filenames are easily obtained by ES3, but there is no guarantee that a file won’t be modified without ES3 being notified. Further, obtaining provenance for a file is currently a two-step process: obtain the file’s UUID, then obtain the UUID’s provenance. For more robust external identity management, we are developing an extension for ES3 based on Archival Resource Keys (ARKs) (Kunze et al., ARKs allow a persistent, protocol-independent nomenclature that easily interoperates with both filenames and URLs. Object properties that would be published as versions will be embedded in ARKs. individual granule locators will be appended to these ARKs.