Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING THE REAL - TIME PERFORMANCE OF GLOBAL AGRICULTURAL DATA INFRASTRUCTURES KREAM June 2013 Pythagoras Karampiperis National Centre for Scientific Research “Demokritos”
Outline 5 June 2013 KREAM /15 Introduction / Problem Statement The SemaGrow Solution The POWDER W3C Recommendation SemaGrow Architecture The SemaGrow Stack SemaGrow Maintenance Components
Moving Forward with “Old” Technologies 3/15 KREAM June 2013 How Many? BigData Problem ! Is it feasible?
What Semantic Web can bring into the picture KREAM /15 5 June 2013 Going beyond existing Distributed Triple Store Implementations Link Heterogeneous but Semantically Connected Data Index Extremely Large Information Volumes (Peta Sizes) Improve Information Retrieval response Data (+Metadata) physically stored in Data Provider No need for harvesting Vocabularies / Thesauri / Ontologies of Data Provider choice No need for aligning according to common schemas One Data Access Point for the entire Data Cloud Enabling Service-Data level agreements with Data providers Application-level Vocabularies / Thesauri / Ontologies Enabling different application facets for different communities of users over the SAME data pool
The SemaGrow Solution 5 June 2013 KREAM /15 Use POWDER to mass-annotate large-subspaces Exploit naming convention regularities to compress the indexes used by the system Partition triple patterns in the original query Annotate each fragment with an ordered list of data sources most likely to contain relevant data Distribute and transform the query fragments Collect and align the results
The POWDER W3C Recommendation 5 June 2013 KREAM /15 Exploits natural groupings of URIs to annotate all resources in a subset of the URI space Regular expression based grouping Allows properties and their values to be associated with an arbitrary number of subjects within a fully- defined semantic framework POWDER Description Resources: POWDER Formal Semantics:
The SemaGrow Stack 5 June 2013 KREAM /15 Integrates the components in order to offer a single SPARQL endpoint that federates a number of heterogeneous data sources Targets the federation of independently provided data sources
SemaGrow Architecture 5 June 2013 KREAM /15 Query Decomposition Resource Discovery Data Summaries Endpoint Federated Endpoint Wrapper
Query Decomposition 5 June 2013 KREAM /15 Analyses SPARQL queries Decides on the optimal way to create query fragments to be dispatched to sources’ endpoints Components Query Decomposition: Suggestions of possible decompositions Selector: Evaluates these suggestions based on information and predictions from the Resource Discovery Component
Resource Discovery 5 June 2013 KREAM /15 Provides an annotated list of candidate data sources that (possibly) hold triples matching a query pattern Sources are annotated with additional information Schema-level metadata Instance-level metadata Predicted Response Volume Run-time information about current source load Semantic proximity of source and query schemas
Data Summaries Endpoint 5 June 2013 KREAM /15 Serves metadata about the schema and instances of the various federated data stores Receives entity URIs Returns the repositories where these entities are located (either at the schema or instance level) Returns ontology alignment knowledge regarding entity equivalence between different sources
Federated Endpoint Wrapper 5 June 2013 KREAM /15 Manages the communication with external data sources federated by the SemaGrow Stack Query Manager Call Query Transformation Service when necessary Forwarding query fragments to the Query Results Merger Collecting and forwarding run-time statistics to the Resource Discovery Component Query Results Merger Pay-as-you-go behaviour Provides first approximations and iteratively refines them if more computational resources are warranted by the reactivity parameters Query Transformation Service Accesses the Schema Mappings Repository Rewrites query fragments from the original query schema to that of the data source that will be used for the fragment Rewrites query results from the source schema to the query schema
Maintenance Components 5 June 2013 KREAM /15 Authoring Tool Visual tool for assisting data providers Construction of POWDER statements Provenance and cataloguing metadata Ontology Alignment Tool Semi-automatic (human intervention) alignment of Semantic Vocabularies used by data providers and consumers Content Classification and Ontology Evolution Refine coarsely annotated data to a level of detail where they can be more accurately aligned with other schemas within the federation
Project info 5 June 2013 KREAM /15 SemaGrow: Data intensive techniques to boost the real-time performance of global agricultural data infrastructures FP7-ICT (Intelligent Information Management) No.NameCountry 1Universidad de Alcala 2NCSR “Demokritos” 3Universita Degli Studi di Roma Tor Vergata 4Semantic Web Company 5Institut Za Fiziku 6Stichting Dienst Landbouwkundik Onderzoek 7Food and Agriculture Organization of the UN 8Agroknow Technologies
Thank You! 5 June 2013 KREAM /15 Dr. Pythagoras P. Karampiperis ( ) Institute of Informatics & Telecommunications (IIT), NCSR “Demokritos” (NCSR)