Modeling Data Set Versioning Operations

Slides:



Advertisements
Similar presentations
Towards a Common Provenance Model for Research Publications Linyun Fu Xiaogang Ma Patrick West Stace Beaulieu.
Advertisements

Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
Evolving the BCO-DMO search interface - experience with semantic and smart search Cyndy Chandler (WHOI) Peter Fox (RPI and WHOI) Robert Groman, Dicky Allison.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
Progress in Open-World, Integrative, Web-based Collaborative Research Platforms Peter Fox and the DCO-DS* Team Tetherless World Constellation.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
References: [1] [2] [3] Acknowledgments:
Deep Carbon Observatory Data Science and Data Management Infrastructure Overview and Demonstration Patrick West – Tetherless World Constellation Rensselaer.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Local global disambiguation of terms and concepts The BCO-DMO metadata database uses controlled vocabularies to record many of the important pieces of.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
 The many scales encountered in assessing and managing large marine ecosystems (LMEs) presents a level of diversity and heterogeneity, or complexity,
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Beginning with an NSF INTEROP project whose goal is to facilitate the deployment of an Integrated Ecosystem Approach (IEA) to management in the Northeast.
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West 1 Nancy Hoebelheinrich.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
Supported by ESIP Semantic Web Cluster A service based on community-built semantic web applications Provide users with the means to match their datasets.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Deep Carbon Observatory Data Science and Data Management Infrastructure Overview and Demonstration Patrick West – Tetherless World Constellation Rensselaer.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Oceanobservatories.org Funding for the Ocean Observatories Initiative is provided by the National Science Foundation through a Cooperative Agreement with.
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
Scaling the Wall: Experiences adapting a Semantic Web application to utilize social networks on mobile devices Evan W. Patton 1 ( ) &
British Oceanographic Data Centre, Liverpool, England, United Kingdom
Get the poster at Semantic Visualization Provenance Records:
Linking Environmental Data and Samples
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Deep Carbon Observatory Data Science Platform
CMSP / OCM Vocabulary Services rpi
Data types and persistent identifiers in
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West1 Nancy
Adoption of RDA DTR and PIT in the Deep Carbon Observatory Data Portal
Towards Executable Provenance Graphs for Reported Results in Research Publications Linyun Fu Xiaogang Ma Patrick West
Modeling Data Set Versioning Operations
Presentation transcript:

Modeling Data Set Versioning Operations Benno Lee1 (leeb5@rpi.edu), Peter Fox1 (pfox@cs.rpi.edu), (1Rensselaer Polytechnic Institute 110 8th St., Troy, NY, 12180 United States) Introduction Data sets do not remain stagnant after collection.  They must often be corrected and grown to ensure data quality and coverage.  Popular version naming methods do not capture the magnitude of change a data set undergoes. We use a script to automatically document the changes undergone by two versions of mineralogy data stored in Excel spreadsheets during a series of workshops to develop visualizations for the paragenetic mode of copper data. Given mappings from Version June to Version August Use the versioning concept model to categorize the changes made to the files Instantiate relationships into a machine-readable semantic graph Use Resource Description Framework in Attributes (RDFa) to embed machine-readable change information into the change log 1. Concept Model Addition Invalidation 2. RDF Graph The graph displays a selection of modifications made in the data set that correspond to the Addition, Invalidation, and Modification operations in the concept model, respectively. Modification 3. RDF Embedded Changelog Conclusion Conclusion Conclusion The process of developing and applying a conceptual model of "changes" and embedding machine readable annotations is an initial demonstration that autonomous systems can collect and encode detailed change information to provide better insight into the transformation of data. The production of change logs into a web accessible format increases the transparency and availability of change information for data consumers. RDFa embedded change logs also provide a means for machines to assist consumers in adapting to data evolution. Further work needs to be completed to ensure the model fully extends to capturing database and web ontology evolution. The graph structure of linked data also reveals a potential to transform the graph into a flow and provide a quantitative measure for data change. Embedding the change information using RDFa keeps the change log human readable, while also allowing for machine consumptions UTF encoding artifact from browser Sponsors: National Science Foundation Poster: Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute RDFa – Resource Description Framework in Attributes Artifact source Acknowledgments: Shaunna Morrison and the Deep Time Data Infrastructure group for the use of their data Igor Tolstikhin from Deep Carbon Observatory The Marine Biodiversity Virtual Laboratory group at Woods Hole Oceanographic Institution Patrick West from the Tetherless World Constellation at RPI