Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING.

Similar presentations


Presentation on theme: "Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING."— Presentation transcript:

1 Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING THE REAL - TIME PERFORMANCE OF GLOBAL AGRICULTURAL DATA INFRASTRUCTURES KREAM 2013 5 June 2013 Pythagoras Karampiperis National Centre for Scientific Research “Demokritos”

2 Outline 5 June 2013 KREAM 2013 2/15  Introduction / Problem Statement  The SemaGrow Solution  The POWDER W3C Recommendation  SemaGrow Architecture  The SemaGrow Stack  SemaGrow Maintenance Components

3 Moving Forward with “Old” Technologies 3/15 KREAM 2013 5 June 2013 How Many? BigData Problem ! Is it feasible?

4 What Semantic Web can bring into the picture KREAM 2013 4/15 5 June 2013  Going beyond existing Distributed Triple Store Implementations  Link Heterogeneous but Semantically Connected Data  Index Extremely Large Information Volumes (Peta Sizes)  Improve Information Retrieval response  Data (+Metadata) physically stored in Data Provider  No need for harvesting  Vocabularies / Thesauri / Ontologies of Data Provider choice  No need for aligning according to common schemas  One Data Access Point for the entire Data Cloud  Enabling Service-Data level agreements with Data providers  Application-level Vocabularies / Thesauri / Ontologies  Enabling different application facets for different communities of users over the SAME data pool

5 The SemaGrow Solution 5 June 2013 KREAM 2013 5/15  Use POWDER to mass-annotate large-subspaces  Exploit naming convention regularities to compress the indexes used by the system  Partition triple patterns in the original query  Annotate each fragment with an ordered list of data sources most likely to contain relevant data  Distribute and transform the query fragments  Collect and align the results

6 The POWDER W3C Recommendation 5 June 2013 KREAM 2013 6/15  Exploits natural groupings of URIs to annotate all resources in a subset of the URI space  Regular expression based grouping  Allows properties and their values to be associated with an arbitrary number of subjects within a fully- defined semantic framework  POWDER Description Resources: http://www.w3.org/TR/powder-dr/http://www.w3.org/TR/powder-dr/  POWDER Formal Semantics: http://www.w3.org/TR/powder-formal/http://www.w3.org/TR/powder-formal/

7 The SemaGrow Stack 5 June 2013 KREAM 2013 7/15  Integrates the components in order to offer a single SPARQL endpoint that federates a number of heterogeneous data sources  Targets the federation of independently provided data sources

8 SemaGrow Architecture 5 June 2013 KREAM 2013 8/15 Query Decomposition Resource Discovery Data Summaries Endpoint Federated Endpoint Wrapper

9 Query Decomposition 5 June 2013 KREAM 2013 9/15  Analyses SPARQL queries  Decides on the optimal way to create query fragments to be dispatched to sources’ endpoints  Components  Query Decomposition: Suggestions of possible decompositions  Selector: Evaluates these suggestions based on information and predictions from the Resource Discovery Component

10 Resource Discovery 5 June 2013 KREAM 2013 10/15  Provides an annotated list of candidate data sources that (possibly) hold triples matching a query pattern  Sources are annotated with additional information  Schema-level metadata  Instance-level metadata  Predicted Response Volume  Run-time information about current source load  Semantic proximity of source and query schemas

11 Data Summaries Endpoint 5 June 2013 KREAM 2013 11/15  Serves metadata about the schema and instances of the various federated data stores  Receives entity URIs  Returns the repositories where these entities are located (either at the schema or instance level)  Returns ontology alignment knowledge regarding entity equivalence between different sources

12 Federated Endpoint Wrapper 5 June 2013 KREAM 2013 12/15  Manages the communication with external data sources federated by the SemaGrow Stack  Query Manager  Call Query Transformation Service when necessary  Forwarding query fragments to the Query Results Merger  Collecting and forwarding run-time statistics to the Resource Discovery Component  Query Results Merger  Pay-as-you-go behaviour  Provides first approximations and iteratively refines them if more computational resources are warranted by the reactivity parameters  Query Transformation Service  Accesses the Schema Mappings Repository  Rewrites query fragments from the original query schema to that of the data source that will be used for the fragment  Rewrites query results from the source schema to the query schema

13 Maintenance Components 5 June 2013 KREAM 2013 13/15  Authoring Tool  Visual tool for assisting data providers  Construction of POWDER statements  Provenance and cataloguing metadata  Ontology Alignment Tool  Semi-automatic (human intervention) alignment of Semantic Vocabularies used by data providers and consumers  Content Classification and Ontology Evolution  Refine coarsely annotated data to a level of detail where they can be more accurately aligned with other schemas within the federation

14 Project info 5 June 2013 KREAM 2013 14/15  SemaGrow: Data intensive techniques to boost the real-time performance of global agricultural data infrastructures  FP7-ICT-2011.4.4 (Intelligent Information Management) No.NameCountry 1Universidad de Alcala 2NCSR “Demokritos” 3Universita Degli Studi di Roma Tor Vergata 4Semantic Web Company 5Institut Za Fiziku 6Stichting Dienst Landbouwkundik Onderzoek 7Food and Agriculture Organization of the UN 8Agroknow Technologies

15 Thank You! 5 June 2013 KREAM 2013 15/15 Dr. Pythagoras P. Karampiperis ( pythk@iit.demokritos.gr ) Institute of Informatics & Telecommunications (IIT), NCSR “Demokritos” (NCSR) www.semagrow.eu


Download ppt "Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING."

Similar presentations


Ads by Google