Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2

Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 (leeb5@rpi.edu), Sumit Purohit 2 (sumit.purohit@pnnl.gov), William Smith 2(william.smith@pnnl.gov), Jesse Weaver 2(Jesse.Weaver@pnnl.gov), Alan Chappell 2(alan.chappell@pnnl.gov), Patrick West 1(westp@rpi.edu), Peter Fox 1(pfox@cs.rpi.edu) ( 1 Rensselaer Polytechnic Institute Troy, NY, 12180 United States) ( 2 Pacific Northwest National Laboratory Richland, VA, United States)leeb5@rpi.edusumit.purohit@pnnl.govwilliam.smith@pnnl.govJesse.Weaver@pnnl.govalan.chappell@pnnl.govwestp@rpi.edu Poster: IN33C-3785 Glossary: RDESC – Resource Discovery for Extreme Scale Collaboration RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute Acknowledgments: Eric Rozell – RPI Master’s Student now with Microsoft Sponsors: Department of Energy The volume and variety of data generated in science is rapidly increasing. Geophysical science is no exception in that various independent projects produce disparate, heterogeneous datasets. While researchers typically make this data available to others, there is a need to make these valuable resources more discoverable and understandable to user communities in order to accelerate scientific research. The cost of making data discoverable and understandable depends on how the original data was curated, transformed, generated, and published. User interfaces and visualizations that support exploration and interaction with the data further enhance understanding of available content. This presentation describes research and development conducted under the Resource Discovery for Extreme Scale Collaboration (RDESC) project. As part of RDESC we curate, clean, publish, and visualize scientific data following Linked Data principles. Towards enabling discovery and understandability, we curated data from multiple, interdisciplinary science domains and represented the metadata using standard Semantic Web and Web technologies. As a result of this transformation, we generated some 1.4 billion RDF triples that describe these previously existing data resources. These efforts led to our formulation of a number of suggested best practices for data publishers to reduce the cost and barriers to making data discoverable and understandable to research communities. Additionally, we developed a set of tools that provide scalable visualizations of this large-scale metadata to enhance the understandability for prospective users of the data resources. Abstract Resource splash pages dynamically generated using the twsparql module TWC S2S Faceted browser interface allowing search for collected resources First attempt at curating information from various sources, crawling OPeNDAP Hyrax installations to grab resources. Overall architecture of RDESC, curating information, trying different systems of curating, translating the information into semantic representation, different triple stores to store semantic information, and different ways of visualizing the information. RDESC Information Model utilizing already existing models Foaf – friend of a friend DC – Dublin Core terms Schema.org – common set of schemas for structured data and markup for the web RDESC web site http://rdesc.org using simple, standard web technologieshttp://rdesc.org Total number of triples currently being used 230,743,316 Total number of triples available Web Presence RDESC Ontology resolvable at http://rdesc.org/2014/ http://rdesc.org/2014/ Virtuoso StarDog Take Away: Multiple sources of data curated into a seamless Semantic Knowledge Store for searching, browsing, and visualization Information represented in common semantic information model using RDFs Research into the use of various semantic technologies with billions of triples – storage, search, browse, visualization Best practices showing the importance of providing rich information, context and experience with existing metadata. Future Work: Trying different content management systems with the large number of triples Distributed/Federated system Semantically represented information flattened and pushed into Apache SOLR (left). Or retrieved directly from the RDESC Knowledge Store (right). From either SOLR or S2S Faceted browser, resources displayed within content management system. Showing the difference in limited provided information (left) vs. semantically rich information (right) And/or

Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2

Similar presentations

Presentation on theme: "Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2

Similar presentations

Presentation on theme: "Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2"— Presentation transcript:

Similar presentations

About project

Feedback