Presentation is loading. Please wait.

Presentation is loading. Please wait.

Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2

Similar presentations


Presentation on theme: "Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2"— Presentation transcript:

1 Resource Discovery for Extreme Scale Collaboration Benno Lee 1(leeb5@rpi.edu), Patrick West 1 (westp@rpi.edu), William Smith 2 (william.smith@pnnl.gov), Sumit Purohit 2 (sumit.purohit@pnnl.gov), Karen Schuchardt 2 (karen.schuchardt@pnnl.gov), Alan Chappell 2 (alan.chappell@pnnl.gov ), Peter Fox 1 (pfox@cs.rpi.edu ), Jesse Weaver 2 (Jesse.Weaver@pnnl.gov ),leeb5@rpi.eduwestp@rpi.eduwilliam.smith@pnnl.govsumit.purohit@pnnl.govkaren.schuchardt@pnnl.govalan.chappell@pnnl.govpfox@cs.rpi.eduJesse.Weaver@pnnl.gov ( 1 Rensselaer Polytechnic Institute, 2 Pacific Northwest National Laboratory) The amount of data produced in the practice of science is growing rapidly. Despite the accumulation and demand for scientific data, relatively little is actually made available for the broader scientific community. We surmise that the root of the problem is the perceived difficulty to electronically publish scientific data and associated metadata in a way that makes it discoverable. We propose to exploit Semantic Web technologies and practices to make (meta)data discoverable and easy to publish. We share our experiences in curating metadata to illustrate both the flexibility of our approach and the pain of discovering data in the current research environment. We also make recommendations by concrete example of how data publishers can provide their (meta)data by adding some limited, additional markup to HTML pages on the Web. With little additional effort from data publishers, the difficulty of data discovery/access/sharing can be greatly reduced and the impact of research data greatly enhanced. RDESC Architecture TWC/RPI S2S Faceted Browser Facets on the left allow users to constrain their search based on data resources, GCMD Keywords, Special Measured Parameters, and lat/lon coordinates. The facets changed over time based on the metadata extracted from ingesting the various data resources. RDESC RDF Graphs An example description of a GCMD dataset as a RDF graph, using the initial ontology. The current ontology. Ovals represent classes/concepts, and arrows indicate subClassOf relationships. Classes are colored so that darker classes were established in the ontology prior to lighter classes. An example of a RDF description for an ARM data stream and how the ARM measured property hierarchy is used to link data streams to measured properties of interest Conclusion we have emphasized the importance that data publish- ers provide their (meta)data in a way that makes structural and semantic integration a natural process. This is accomplished by following a shared vocabulary of terms embodied as an ontology, and by expressing metadata as RDF triples that utilize the ontology. Although this can sound daunting, we showed that doing so is actually quite easy in practice. We demonstrated the flexibility of this approach by curating existing metadata into the recommended format. Publishing (meta)data in this (or a similar) way will ameliorate (at least in part) the poor data sharing practices that currently pervade the practice of science No matter what dataset we have ingested we will be able to present the metadata in search and browse interfaces, like S2S above, and provide splash pages for each dataset with the information retrieved from the external system. And as you can see, the metadata retrieved from the various systems can be quite different. Acknowledgments: Eric Rozell, Masters Student at Rensselaer Polytechnic Institute now with Microsoft Sponsors: US Department of Energy Glossary: ARM – Atmospheric Radiation Measurement OWL – Web Ontology Language PNNL – Pacific Northwest National Laboratory RDESC – Resource Discovery for Extreme Scale Collaboration RDFS – Resource Description Language Schema RPI – Rensselaer Polytechnic Institute SPARQL – a RDF query language S2S – a faceted web browser TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute Resources: http://rdesc.orghttp://rdesc.org - site developed fro RDESC project http://rdesc.org/2014/http://rdesc.org/2014/ - The RDESC ontology


Download ppt "Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2"

Similar presentations


Ads by Google