Presentation is loading. Please wait.

Presentation is loading. Please wait.

Coding Provenance in Software and Matching Tools to Data OPeNDAP Provenance Project And ESIP ToolMatch Project Patrick West, Tetherless World Constellation.

Similar presentations


Presentation on theme: "Coding Provenance in Software and Matching Tools to Data OPeNDAP Provenance Project And ESIP ToolMatch Project Patrick West, Tetherless World Constellation."— Presentation transcript:

1 Coding Provenance in Software and Matching Tools to Data OPeNDAP Provenance Project And ESIP ToolMatch Project Patrick West, Tetherless World Constellation Rensselaer Polytechnic Institute

2 What is Provenance Provenance is information about entities, activities, and people involved in producing a piece of data or thing. In Data Science we’re interested in keeping track of, or being able to trace back, how a data product was generated and from what. E.G. As part of the Ecosystem Status Report there’s an interesting plot in one of the chapters which I’m interested in learning more about. 1

3 Generating a Plot 2

4 How did I get there? 3

5 I know how it was generated Because I’m the one who added the plot to the document I know how the plot was generated I wrote parts of the software in OPeNDAP Hyrax that’s doing the data access, manipulation, and transformation So I know:. A plot is generated by accessing a set of data using OPeNDAP Hyrax; which generates a DAP DataDDS object by reading in a set of NetCDF files, constraining and projecting the data, running a server side function or two, doing an aggregation; and then using that data product to generate the plot. 4

6 IPython Notebook cell Generating a Plot 5 OPeNDAP Hyrax Reads in Data Spits out data Badda Bing Badda Boom Uses data Generates plot OPeNDAP Request URL BUT I WANT TO KNOW MORE

7 Some information I WANT to know How was that plot generated? What software was used to generate the plot and any intermediary data? What data files were read in to generate the plot, what was done to the data, and by what? Where did those data files come from? What parameters are in there? What sensors measured those parameters? Tell me information about the measuring of the data. 6

8 Generating a Plot 7 OPeNDAP Hyrax Reads in Data Spits out data IPython Notebook cell Uses data Generates plot OPeNDAP Request URL Where did the data files come from?

9 Linked Data I also am interested in the developers of the software and who publishes the software, the licensing of the software, and how I could use it. I’m interested in what IPython Notebooks are, what they can do, and whether I could use them for other projects. And I want to be able to let the “owner” of the data files know that I’ve used the results of an access in a publication, presentation, article, or whatever. 8

10 What the project focuses on 9 OPeNDAP Hyrax OLFSBES NetCDFdapServer Side Functions aggregat e transform Request URL

11 W3C Prov 10

12 Prov-O 11 :dds_of_reading a prov:Entity; dcterms:format opendap:DataDDS; prov:wasGeneratedBy [ a prov:Activity; prov:used [ a vsto:Dataset, prov:Entity, toolmatch:DataCollection; toolmatch:hasAccessURL ; ]; prov:used [ a vsto:Dataset, prov:Entity, toolmatch:DataCollection; toolmatch:hasAccessURL ; ]; prov:wasAssociatedWith ; ];.

13 Prov-O 12 :aggregated_dds a prov:Entity; dcterms:format opendap:DataDDS; prov:wasGeneratedBy [ a prov:Activity; prov:used :constrained_dds; prov:wasAssociatedWith ; ];. :result a foaf:Document; nfo:fileName "thursday.h5"; dcterms:format netcdf; prov:wasGeneratedBy [ a prov:Activity; prov:used :aggregated_dds; prov:wasAssociatedWith ; ];. :constrained_dds a prov:Entity; dcterms:format opendap:DataDDS; prov:wasGeneratedBy [ a prov:Activity; prov:used :dds_of_reading; prov:wasAssociatedWith ; ];.

14 DOAP – Description of a Project 13

15 DOAP – Description of a Project 14 a doap:Project, prov:Entity; doap:name "OPeNDAP Back-End Server (BES)"; doap:developer ; doap:homepage ; doap:vendor ; doap:repository ; doap:bug-database ; doap:release ; doap:description "BES is a high-performance back-end server software framework that allows data providers more flexibility in providing end users views of their data."; doap:license ;. a doap:Version, prov:Entity; prov:specializationOf ; doap:name "BES-3.12.0"; doap:revision "3.12.0"; doap:download-page ; doap:repository ; doap:license ; doap:created 2013-08-27;.

16 DOAP – Description of a Project 15 a doap:SVNRepository; doap:location doap:browse. dc:description "This software is distributed under the GNU Lesser General Public License "; doap:name "GNU LESSER GENERAL PUBLIC LICENSE"; rdfs:seeAlso ;. # The hash above is: HASH(config file, BES version that read it) a prov:Agent; prov:wasDerivedFrom, ;. prov:wasDerivedFrom :config_file_hash; # b/c BES set it up: prov:wasAttributedTo ;.

17 What We’re Trying The BES loads shared modules at startup that handle specific tasks Our first attempt was to use something called a Reporter that reports on the completion of a request, but it’s too after the fact. Second thought is that the modules themselves add provenance information on the fly, which to me is ideal, but is unrealistic. The probably implementation is that the BES, the software framework that communicates with the modules, is where the provenance is tracked. 16

18 What’s next Get more use cases about what types of information we want to collect Write the story about what we’re trying to do Come up with software use cases for the implementation Continue discussing provenance with the core OPeNDAP group Continue to work with the original Prov group (Tim, Jim, and Stephan) in discussions 17

19 Questions 18

20 ToolMatch Usecase "I need data for Carbon dioxide (CO2) concentrations, a climate change indicator, for the summer of 2012, that can be accessed via OPeNDAP Hyrax and plotted as a timeseries.” "I need data with measurements of atmospheric aerosol optical depth sliced along latitude and longitude, returned as netcdf data, and accessible in MatLab." 19

21 Using SADL 20

22 Inferencing 21 * Equivalent Class DataCollection and (isAccessedBy value OPeNDAP) or (hasDataStorageFormat value NetCDF) and (usesGridType value AuxiliaryLatLonGrid) or (usesGridType value RegularLatLonGrid) and usesConvention value ClimateForecast_CF * Subclass Of mappedBy value IDV and mappedBy value McIDAS-V and mappedBy value Panoply Inferred

23 Inferencing 22 * Equivalent Class DataCollection and (isAccessedBy value OPeNDAP) or (hasDataFormat value NetCDF) and usesConvention value CF1Convention and usesConvention value RegularLatLonGrid * Subclass Of mappedBy value Ferret and mappedBy value GrADS Inferred

24 Inferencing 23 * Equivalent Class DataCollection and (isAccessedBy value GrADSDataServer) or (isAccessedBy value Hyrax) or (isAccessedBy value ThreddsDataServer) or (isAccessedBy value erddap) * Subclass Of isAccessedBy value OPeNDAP Inferred

25 Resulting Query 24 The resulting query to find the set of tools available to visualize a data collection becomes very simple DESCRIBE ?tool WHERE { toolmatch:visualizedBy ?tool. ?tool rdf:type toolmatch:Tool. }

26 The Result 25 Descr iption Tools

27 Where are we and what’s next We’ve got part of the ontology done We’ve got stuff in the triple store We need to complete the dataset ontology piece We need to verify the ontology and rules We need crowd sourcing for more tools and information about tools Patrick needs to understand rules better 26

28 Questions 27

29 References OPeNDAP Provenance Project Prov Overview - http://www.w3.org/TR/prov-overview/http://www.w3.org/TR/prov-overview/ OPeNDAP Prov - https://github.com/tetherless-world/opendap/https://github.com/tetherless-world/opendap/ OPeNDAP LODSPeaKr - http://opendap.tw.rpi.edu/index.htmlhttp://opendap.tw.rpi.edu/index.html OPeNDAP Endpoint - http://opendap.tw.rpi.edu/virtuoso/sparqlhttp://opendap.tw.rpi.edu/virtuoso/sparql OPeNDAP – http://opendap.orghttp://opendap.org ToolMatch Project ToolMatch - http://wiki.esipfed.org/index.php/ToolMatchhttp://wiki.esipfed.org/index.php/ToolMatch ToolMatch Virtual Server - http://toolmatch.tw.rpi.edu/http://toolmatch.tw.rpi.edu/ ToolMatch Schema - http://toolmatch.tw.rpi.edu/docs/indexhttp://toolmatch.tw.rpi.edu/docs/index ToolMatch Endpoint - http://toolmatch.tw.rpi.edu/sparqlhttp://toolmatch.tw.rpi.edu/sparql 28


Download ppt "Coding Provenance in Software and Matching Tools to Data OPeNDAP Provenance Project And ESIP ToolMatch Project Patrick West, Tetherless World Constellation."

Similar presentations


Ads by Google