Download presentation
Presentation is loading. Please wait.
Published byAudrey Cunningham Modified over 9 years ago
1
Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING THE REAL - TIME PERFORMANCE OF GLOBAL AGRICULTURAL DATA INFRASTRUCTURES KREAM 2013 5 June 2013 Pythagoras Karampiperis National Centre for Scientific Research “Demokritos”
2
Outline 5 June 2013 KREAM 2013 2/15 Introduction / Problem Statement The SemaGrow Solution The POWDER W3C Recommendation SemaGrow Architecture The SemaGrow Stack SemaGrow Maintenance Components
3
Moving Forward with “Old” Technologies 3/15 KREAM 2013 5 June 2013 How Many? BigData Problem ! Is it feasible?
4
What Semantic Web can bring into the picture KREAM 2013 4/15 5 June 2013 Going beyond existing Distributed Triple Store Implementations Link Heterogeneous but Semantically Connected Data Index Extremely Large Information Volumes (Peta Sizes) Improve Information Retrieval response Data (+Metadata) physically stored in Data Provider No need for harvesting Vocabularies / Thesauri / Ontologies of Data Provider choice No need for aligning according to common schemas One Data Access Point for the entire Data Cloud Enabling Service-Data level agreements with Data providers Application-level Vocabularies / Thesauri / Ontologies Enabling different application facets for different communities of users over the SAME data pool
5
The SemaGrow Solution 5 June 2013 KREAM 2013 5/15 Use POWDER to mass-annotate large-subspaces Exploit naming convention regularities to compress the indexes used by the system Partition triple patterns in the original query Annotate each fragment with an ordered list of data sources most likely to contain relevant data Distribute and transform the query fragments Collect and align the results
6
The POWDER W3C Recommendation 5 June 2013 KREAM 2013 6/15 Exploits natural groupings of URIs to annotate all resources in a subset of the URI space Regular expression based grouping Allows properties and their values to be associated with an arbitrary number of subjects within a fully- defined semantic framework POWDER Description Resources: http://www.w3.org/TR/powder-dr/http://www.w3.org/TR/powder-dr/ POWDER Formal Semantics: http://www.w3.org/TR/powder-formal/http://www.w3.org/TR/powder-formal/
7
The SemaGrow Stack 5 June 2013 KREAM 2013 7/15 Integrates the components in order to offer a single SPARQL endpoint that federates a number of heterogeneous data sources Targets the federation of independently provided data sources
8
SemaGrow Architecture 5 June 2013 KREAM 2013 8/15 Query Decomposition Resource Discovery Data Summaries Endpoint Federated Endpoint Wrapper
9
Query Decomposition 5 June 2013 KREAM 2013 9/15 Analyses SPARQL queries Decides on the optimal way to create query fragments to be dispatched to sources’ endpoints Components Query Decomposition: Suggestions of possible decompositions Selector: Evaluates these suggestions based on information and predictions from the Resource Discovery Component
10
Resource Discovery 5 June 2013 KREAM 2013 10/15 Provides an annotated list of candidate data sources that (possibly) hold triples matching a query pattern Sources are annotated with additional information Schema-level metadata Instance-level metadata Predicted Response Volume Run-time information about current source load Semantic proximity of source and query schemas
11
Data Summaries Endpoint 5 June 2013 KREAM 2013 11/15 Serves metadata about the schema and instances of the various federated data stores Receives entity URIs Returns the repositories where these entities are located (either at the schema or instance level) Returns ontology alignment knowledge regarding entity equivalence between different sources
12
Federated Endpoint Wrapper 5 June 2013 KREAM 2013 12/15 Manages the communication with external data sources federated by the SemaGrow Stack Query Manager Call Query Transformation Service when necessary Forwarding query fragments to the Query Results Merger Collecting and forwarding run-time statistics to the Resource Discovery Component Query Results Merger Pay-as-you-go behaviour Provides first approximations and iteratively refines them if more computational resources are warranted by the reactivity parameters Query Transformation Service Accesses the Schema Mappings Repository Rewrites query fragments from the original query schema to that of the data source that will be used for the fragment Rewrites query results from the source schema to the query schema
13
Maintenance Components 5 June 2013 KREAM 2013 13/15 Authoring Tool Visual tool for assisting data providers Construction of POWDER statements Provenance and cataloguing metadata Ontology Alignment Tool Semi-automatic (human intervention) alignment of Semantic Vocabularies used by data providers and consumers Content Classification and Ontology Evolution Refine coarsely annotated data to a level of detail where they can be more accurately aligned with other schemas within the federation
14
Project info 5 June 2013 KREAM 2013 14/15 SemaGrow: Data intensive techniques to boost the real-time performance of global agricultural data infrastructures FP7-ICT-2011.4.4 (Intelligent Information Management) No.NameCountry 1Universidad de Alcala 2NCSR “Demokritos” 3Universita Degli Studi di Roma Tor Vergata 4Semantic Web Company 5Institut Za Fiziku 6Stichting Dienst Landbouwkundik Onderzoek 7Food and Agriculture Organization of the UN 8Agroknow Technologies
15
Thank You! 5 June 2013 KREAM 2013 15/15 Dr. Pythagoras P. Karampiperis ( pythk@iit.demokritos.gr ) Institute of Informatics & Telecommunications (IIT), NCSR “Demokritos” (NCSR) www.semagrow.eu
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.