Coding Provenance in Software and Matching Tools to Data OPeNDAP Provenance Project And ESIP ToolMatch Project Patrick West, Tetherless World Constellation.

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

The HDF Group ESIP Summer Meeting Easy access HDF files via Hyrax Kent Yang The HDF Group 1 July 8 – 11, 2014.
ESIP Semantic Web Working Group 2013 ESIP Winter Meeting 3:30PM EST, Wednesday, January 9.
Summary previous session 1 3 D:\ tools models add meta information netCDF on web server transform to netCDF netCDF on OPeNDAP server data.
Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Numerical Grid Computations with the OPeNDAP Back End Server (BES)
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
ESIP Semantic Web Working Group 2013 ESIP Winter Meeting 3:30PM EST, Wednesday, January 9.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Configurable User Interface Framework for Cross-Disciplinary and Citizen Science Presented by: Peter Fox Authors: Eric Rozell, Han Wang, Patrick West,
The HDF Group ESIP Summer Meeting HDF OPeNDAP update Kent Yang The HDF Group 1 July 8 – 11, 2014.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Google Confidential and Proprietary 1 Advanced Docs Google Apps.
Provenance Capture in Data Access And Data Manipulation Software Patrick West 1 Peter Fox
Important ESDIS 2009 tasks review Kent Yang, Mike Folk The HDF Group April 1st, /1/20151Annual briefing to ESDIS.
OOI CyberInfrastructure: Technology Overview - Hyrax January 2009 Claudiu Farcas OOI CI Architecture & Design Team UCSD/Calit2.
References: [1] [2] [3] Acknowledgments:
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013 ECO-OP is supported by NSF Grant # PIs: Peter Fox.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
DOAP – Description of a Project Ontology DOAP provides us with the ability to represent software, software projects, releases of software, licensing information,
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
WDO-It! 101 Workshop: Creating an abstraction of a process UTEP’s Trust Laboratory NDR HP MP.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society OpenDAP 2007
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West 1 Nancy Hoebelheinrich.
SIMO SIMulation and Optimization ”New generation forest planning system” Antti Mäkinen & Jussi Rasinmäki Dept. of Forest Resource Management.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Improving Data Catalogs with Free and Open Source Software Kevin O’Brien University of Washington Joint Institute for the Study of the Atmosphere and Ocean.
Dean Anderson Polk County, Oregon GIS in Action 2014 Modifying Open Source Software (A Case Study)
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Semantic Phyloinformatic Web Services Using the EvoInfo Stack Speaker: John Harney LSDIS Lab, Dept. of Computer Science, University of Georgia Mentor(s):
OPeNDAP Hyrax Harnessing the power of the BES OPeNDAP Hyrax Back-End Server Patrick West
SIMO Python/XML Simulator Current situation 28/10/2005 SIMO Seminar Antti Mäkinen Dept. of Forest Resource Management / University of Helsinki.
Interoperability = Leverage + Collaboration  Chris Lynnes  GES DISC.
Facilitating Next Generation Science Collaboration: Marine Ecosystems Status Reports and Assessments June 24, 2014 IMBER – D2 Peter Fox (RPI/ Tetherless.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
TWC A use case-driven iterative method for building a provenance-aware GCIS ontology Xiaogang Ma a, Jin Guang Zheng a, Justin Goldstein b,c, Linyun Fu.
LAS and THREDDS: Partners for Education Roland Schweitzer Steve Hankin Jonathan Callahan Joe Mclean Kevin O’Brien Ansley Manke Yonghua Wei.
Supported by ESIP Semantic Web Cluster A service based on community-built semantic web applications Provide users with the means to match their datasets.
Human-Aware Sensor Network Ontology (HASNetO): Semantic Support for Empirical Data Collection Paulo Pinheiro 1, Deborah McGuinness 1, Henrique Santos 1,2.
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
1 Earth System Grid Center for Enabling Technologies OPeNDAP Services for ESG March 9, 2016 Peter Fox, Patrick West, Stephan Zednik RPI Performance Measures.
NOAA's Northeast Shelf Ecosystem Status Report: collaborating with IPython Notebooks for reproducibility July 2013 ECO-OP is supported by NSF Grant #
Outline Announcements: –HW II due today! –HW III on web CVS.
9/21/04 James Gallagher Server-Side: The Basics This part of the workshop contains an overview of the two servers which OPeNDAP has developed. One uses.
Introduction  Model contains different kinds of elements (such as hosts, databases, web servers, applications, etc)  Relations between these elements.
Provenance Capture in Data Access And Data Manipulation Software
MERRA Data Access and Services
Access HDF5 Datasets via OPeNDAP’s Data Access Protocol (DAP)
ToolMatch Service: Finding Tools for Your Data & Data for Your Tools ESIP Summer 2014 A Collaboration between ESIP’s: Semantic Web Cluster & Product &
Ecosystem Status Report: collaborating with IPython Notebooks
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West1 Nancy
OPeNDAP Community Working Groups October 18, 2007
Future Development Plans
OPeNDAP/Hyrax Interfaces
Outline Announcements: Version control with CVS HW II due today!
Presentation transcript:

Coding Provenance in Software and Matching Tools to Data OPeNDAP Provenance Project And ESIP ToolMatch Project Patrick West, Tetherless World Constellation Rensselaer Polytechnic Institute

What is Provenance Provenance is information about entities, activities, and people involved in producing a piece of data or thing. In Data Science we’re interested in keeping track of, or being able to trace back, how a data product was generated and from what. E.G. As part of the Ecosystem Status Report there’s an interesting plot in one of the chapters which I’m interested in learning more about. 1

Generating a Plot 2

How did I get there? 3

I know how it was generated Because I’m the one who added the plot to the document I know how the plot was generated I wrote parts of the software in OPeNDAP Hyrax that’s doing the data access, manipulation, and transformation So I know:. A plot is generated by accessing a set of data using OPeNDAP Hyrax; which generates a DAP DataDDS object by reading in a set of NetCDF files, constraining and projecting the data, running a server side function or two, doing an aggregation; and then using that data product to generate the plot. 4

IPython Notebook cell Generating a Plot 5 OPeNDAP Hyrax Reads in Data Spits out data Badda Bing Badda Boom Uses data Generates plot OPeNDAP Request URL BUT I WANT TO KNOW MORE

Some information I WANT to know How was that plot generated? What software was used to generate the plot and any intermediary data? What data files were read in to generate the plot, what was done to the data, and by what? Where did those data files come from? What parameters are in there? What sensors measured those parameters? Tell me information about the measuring of the data. 6

Generating a Plot 7 OPeNDAP Hyrax Reads in Data Spits out data IPython Notebook cell Uses data Generates plot OPeNDAP Request URL Where did the data files come from?

Linked Data I also am interested in the developers of the software and who publishes the software, the licensing of the software, and how I could use it. I’m interested in what IPython Notebooks are, what they can do, and whether I could use them for other projects. And I want to be able to let the “owner” of the data files know that I’ve used the results of an access in a publication, presentation, article, or whatever. 8

What the project focuses on 9 OPeNDAP Hyrax OLFSBES NetCDFdapServer Side Functions aggregat e transform Request URL

W3C Prov 10

Prov-O 11 :dds_of_reading a prov:Entity; dcterms:format opendap:DataDDS; prov:wasGeneratedBy [ a prov:Activity; prov:used [ a vsto:Dataset, prov:Entity, toolmatch:DataCollection; toolmatch:hasAccessURL ; ]; prov:used [ a vsto:Dataset, prov:Entity, toolmatch:DataCollection; toolmatch:hasAccessURL ; ]; prov:wasAssociatedWith ; ];.

Prov-O 12 :aggregated_dds a prov:Entity; dcterms:format opendap:DataDDS; prov:wasGeneratedBy [ a prov:Activity; prov:used :constrained_dds; prov:wasAssociatedWith ; ];. :result a foaf:Document; nfo:fileName "thursday.h5"; dcterms:format netcdf; prov:wasGeneratedBy [ a prov:Activity; prov:used :aggregated_dds; prov:wasAssociatedWith ; ];. :constrained_dds a prov:Entity; dcterms:format opendap:DataDDS; prov:wasGeneratedBy [ a prov:Activity; prov:used :dds_of_reading; prov:wasAssociatedWith ; ];.

DOAP – Description of a Project 13

DOAP – Description of a Project 14 a doap:Project, prov:Entity; doap:name "OPeNDAP Back-End Server (BES)"; doap:developer ; doap:homepage ; doap:vendor ; doap:repository ; doap:bug-database ; doap:release ; doap:description "BES is a high-performance back-end server software framework that allows data providers more flexibility in providing end users views of their data."; doap:license ;. a doap:Version, prov:Entity; prov:specializationOf ; doap:name "BES "; doap:revision "3.12.0"; doap:download-page ; doap:repository ; doap:license ; doap:created ;.

DOAP – Description of a Project 15 a doap:SVNRepository; doap:location doap:browse. dc:description "This software is distributed under the GNU Lesser General Public License "; doap:name "GNU LESSER GENERAL PUBLIC LICENSE"; rdfs:seeAlso ;. # The hash above is: HASH(config file, BES version that read it) a prov:Agent; prov:wasDerivedFrom, ;. prov:wasDerivedFrom :config_file_hash; # b/c BES set it up: prov:wasAttributedTo ;.

What We’re Trying The BES loads shared modules at startup that handle specific tasks Our first attempt was to use something called a Reporter that reports on the completion of a request, but it’s too after the fact. Second thought is that the modules themselves add provenance information on the fly, which to me is ideal, but is unrealistic. The probably implementation is that the BES, the software framework that communicates with the modules, is where the provenance is tracked. 16

What’s next Get more use cases about what types of information we want to collect Write the story about what we’re trying to do Come up with software use cases for the implementation Continue discussing provenance with the core OPeNDAP group Continue to work with the original Prov group (Tim, Jim, and Stephan) in discussions 17

Questions 18

ToolMatch Usecase "I need data for Carbon dioxide (CO2) concentrations, a climate change indicator, for the summer of 2012, that can be accessed via OPeNDAP Hyrax and plotted as a timeseries.” "I need data with measurements of atmospheric aerosol optical depth sliced along latitude and longitude, returned as netcdf data, and accessible in MatLab." 19

Using SADL 20

Inferencing 21 * Equivalent Class DataCollection and (isAccessedBy value OPeNDAP) or (hasDataStorageFormat value NetCDF) and (usesGridType value AuxiliaryLatLonGrid) or (usesGridType value RegularLatLonGrid) and usesConvention value ClimateForecast_CF * Subclass Of mappedBy value IDV and mappedBy value McIDAS-V and mappedBy value Panoply Inferred

Inferencing 22 * Equivalent Class DataCollection and (isAccessedBy value OPeNDAP) or (hasDataFormat value NetCDF) and usesConvention value CF1Convention and usesConvention value RegularLatLonGrid * Subclass Of mappedBy value Ferret and mappedBy value GrADS Inferred

Inferencing 23 * Equivalent Class DataCollection and (isAccessedBy value GrADSDataServer) or (isAccessedBy value Hyrax) or (isAccessedBy value ThreddsDataServer) or (isAccessedBy value erddap) * Subclass Of isAccessedBy value OPeNDAP Inferred

Resulting Query 24 The resulting query to find the set of tools available to visualize a data collection becomes very simple DESCRIBE ?tool WHERE { toolmatch:visualizedBy ?tool. ?tool rdf:type toolmatch:Tool. }

The Result 25 Descr iption Tools

Where are we and what’s next We’ve got part of the ontology done We’ve got stuff in the triple store We need to complete the dataset ontology piece We need to verify the ontology and rules We need crowd sourcing for more tools and information about tools Patrick needs to understand rules better 26

Questions 27

References OPeNDAP Provenance Project Prov Overview - OPeNDAP Prov - OPeNDAP LODSPeaKr - OPeNDAP Endpoint - OPeNDAP – ToolMatch Project ToolMatch - ToolMatch Virtual Server - ToolMatch Schema - ToolMatch Endpoint