Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling Data Set Versioning Operations

Similar presentations


Presentation on theme: "Modeling Data Set Versioning Operations"— Presentation transcript:

1 Modeling Data Set Versioning Operations
Benno Lee1 Peter Fox1 (1Rensselaer Polytechnic Institute 110 8th St., Troy, NY, United States) Introduction Data sets do not remain stagnant after collection.  They must often be corrected and grown to ensure data quality and coverage.  Popular version naming methods do not capture the magnitude of change a data set undergoes. We use a script to automatically document the changes undergone by two versions of mineralogy data stored in Excel spreadsheets during a series of workshops to develop visualizations for the paragenetic mode of copper data. Given mappings from Version June to Version August Use the versioning concept model to categorize the changes made to the files Instantiate relationships into a machine-readable semantic graph Use Resource Description Framework in Attributes (RDFa) to embed machine-readable change information into the change log 1. Concept Model Addition Invalidation 2. RDF Graph The graph displays a selection of modifications made in the data set that correspond to the Addition, Invalidation, and Modification operations in the concept model, respectively. Modification 3. RDF Embedded Changelog Conclusion Conclusion Conclusion The process of developing and applying a conceptual model of "changes" and embedding machine readable annotations is an initial demonstration that autonomous systems can collect and encode detailed change information to provide better insight into the transformation of data. The production of change logs into a web accessible format increases the transparency and availability of change information for data consumers. RDFa embedded change logs also provide a means for machines to assist consumers in adapting to data evolution. Further work needs to be completed to ensure the model fully extends to capturing database and web ontology evolution. The graph structure of linked data also reveals a potential to transform the graph into a flow and provide a quantitative measure for data change. Embedding the change information using RDFa keeps the change log human readable, while also allowing for machine consumptions UTF encoding artifact from browser Sponsors: National Science Foundation Poster: Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute RDFa – Resource Description Framework in Attributes Artifact source Acknowledgments: Shaunna Morrison and the Deep Time Data Infrastructure group for the use of their data Igor Tolstikhin from Deep Carbon Observatory The Marine Biodiversity Virtual Laboratory group at Woods Hole Oceanographic Institution Patrick West from the Tetherless World Constellation at RPI


Download ppt "Modeling Data Set Versioning Operations"

Similar presentations


Ads by Google