Intelligent Technologies Module: Ontologies and their use in Information Systems Part II Alex Poulovassilis November/December 2009.

Slides:

Advertisements

Similar presentations

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.

Advertisements

DIMNet Workshop 7 & 8/10/2002 AutoMed: Automatic generation of Mediator tools for heterogeneous database integration Alex Poulovassilis (Birkbeck College)

Using AutoMed Metadata in Data Warehousing Environments Hao FanAlexandra Poulovassilis School of Computer Science & Information Systems Birkbeck college,

RDFTL: An Event-Condition- Action Language for RDF George Papamarkos Alexandra Poulovassilis Peter T. Wood School of Computer Science and Information Systems.

Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.

19 January 2007 Data Quality Meeting Alex Poulovassilis.

October 2007 Data integration architectures and methodologies for the Life Sciences Alexandra Poulovassilis, Birkbeck, U. of London.

SeLeNe Kick-off Meeting 15-16/11/2002 SeLeNe-related Research At Birkbeck Alex Poulovassilis and Peter T.Wood Database and Web Technologies Group School.

Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.

Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.

1 University of Namur, Belgium PReCISE Research Center Using context to improve data semantic mediation in web services composition Michaël Mrissa (spokesman)

Database System Concepts and Architecture

XML: Extensible Markup Language

A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.

1 CIS224 Software Projects: Software Engineering and Research Methods Lecture 11 Brief introduction to the UML Specification (Based on UML Superstructure.

GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.

Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.

1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.

Xyleme A Dynamic Warehouse for XML Data of the Web.

Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.

How can Computer Science contribute to Research Publishing?

1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.

Automatic Data Ramon Lawrence University of Manitoba

CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Overview of Database Languages and Architectures.

Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.

1 DCS861A-2007 Emerging IT II Rinaldo Di Giorgio Andres Nieto Chris Nwosisi Richard Washington March 17, 2007.

Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.

OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR

Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.

Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.

Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.

 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.

Metadata Tools and Methods Chris Nelson Metanet Conference 2 April 2001.

An Integration Framework for Sensor Networks and Data Stream Management Systems.

Introduction to MDA (Model Driven Architecture) CYT.

Mobile Topic Maps for e-Learning John McDonald & Darina Dicheva Intelligent Information Systems Group Computer Science Department Winston-Salem State University,

Stephen Booth EPCC Stephen Booth GridSafe Overview.

10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.

11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

Dimitrios Skoutas Alkis Simitsis

Data Integration by Bi-Directional Schema Transformation Rules Data Integration by Bi-Directional Schema Transformation Rules By Peter McBrien and Alexandria.

ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.

Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.

Data Integration and Management A PDB Perspective.

Aberdeen, 28/1/2003 AutoMed: Automatic generation of Mediator tools for heterogeneous data integration Alex Poulovassilis School of Computer Science and.

Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.

11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.

Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.

Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.

Application Ontology Manager for Hydra IST Ján Hreňo Martin Sarnovský Peter Kostelník TU Košice.

Visit to HP Labs, 22/10/2002 Heterogeneous information integration Alex Poulovassilis Database and Web Technologies Group School of Computer Science and.

Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.

A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.

Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2

Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.

Of 24 lecture 11: ontology – mediation, merging & aligning.

ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.

A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.

OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.

Databases (CS507) CHAPTER 2.

Building Trustworthy Semantic Webs

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.

Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1

Grid Based Data Integration with Automatic Wrapper Generation

Metadata The metadata contains

Business Process Management and Semantic Technologies

Presentation transcript:

Intelligent Technologies Module: Ontologies and their use in Information Systems Part II Alex Poulovassilis November/December 2009

In the previous lecture we discussed: What is an ontology? How are ontologies developed? Reasoning over ontologies Usage Scenarios for Ontologies in Information Systems: Supporting personalisation

Outline of this lecture In this lecture we will look at two other major usage scenarios for ontologies in Information Systems: Ontology as global schema, and in particular: integrating and querying heterogeneous databases under an ontology Enabling interoperability between different systems, via ontology-assisted service reconciliation

6. Ontology as Global Schema We will look at the challenges faced in integrating heterogeneous data the main approaches to heterogeneous data integration our work at Birkbeck in this area three case studies, illustrating three different ways in which heterogeneous databases may be integrated under an ontology and how querying is supported in each case

Challenges of heterogeneous data integration Increasingly large volumes of data are being made available Data sources are often developed by different people with differing requirements for differing purposes Data sources may therefore be heterogeneous in terms of their: data model query interfaces query processing capabilities database schema or data exchange format data types used nomenclature adopted Integrating data sources to meet the needs of new users or new applications requires reconciliation of such heterogeneities

Two main approaches to data integration (DI) Materialised DI: import the source data into a centralised staging area define a Data Warehouse schema, typically within a relational Database Management System (DBMS) such as Oracle, DB2, SQL Server etc. clean, transform and aggregate the imported data so that it conforms to the DW schema load the data into the DW query the DW via the DBMSs querying facilities incrementally update the DW when data sources change (insertions, deletions, updates …), utilising facilities that the DBMS provides

Two main approaches to DI Virtual DI: define an integrated schema (IS) wrap the data sources (DSs), using wrapper software construct mappings between the DSs and the IS, using mediator software different mediator systems support different types of mappings e.g. Global-as-view (GAV) mappings, where each schema object o in the IS is defined as a view over the DSs: o = / / φ(DS 1,…, DS n ) for some formula φ Local-as-view (LAV) mappings, where each schema object o in a DS is defined as a view over the IS: o = / / ψ(IS) for some formula ψ Global-local-as-view (GLAV) mappings, where the mapping relates a view over a DS with a view over the IS: φ(DS i ) = / / ψ(IS) for some formulae φ, ψ

Two main approaches to DI Virtual DI (contd): the integrated schema is queried by submitting queries to the mediator system the mediator coordinates query evaluation: it uses the mappings to reformulate queries expressed on the IS to queries expressed on the DS schemas it submits subqueries to the DS wrappers for evaluation at the data sources it merges the subquery results returned by the wrappers into an overall query result for the original query

Comparing the approaches Materialised and virtual DI both allow the integrated resource to be queried as though it were a single data source: both support a single integrated schema users/applications do not generally need to be aware of the data source schemas, formats or content A materialised DI approach is generally adopted for: better query performance (centralised rather than distributed) greater ease of data cleansing and data annotation (on materialised rather than virtual data) A virtual DI approach is generally adopted for: lower cost of storing and maintaining the integrated resource (no replication of DSs data) greater currency of the integrated resource (the IS reflects the current content of the DSs)

Virtual DI: the integrated schema The integrated schema may be defined in a standard data modelling language Or, it may be a source-independent ontology: defined in a higher-level ontology language serving as a global schema for multiple potential data sources, beyond the ones being integrated The integrated schema does not necessarily have to encompass all of the data in the data sources: it may be enough to capture just the data that is needed for answering the main queries that users will want to submit this avoids the possibly complex process of creating a complete integrated schema and a complete set of mappings between the data sources and the integrated schema

Our work: AutoMed Project The AutoMed Project at Birkbeck and Imperial has developed tools for the semi-automatic integration of heterogeneous information sources: AutoMed supports a graph-based metamodel, the Hypergraph Data Model (HDM) Higher-level modelling languages can be defined in terms of this HDM, via the API of AutoMeds Model Definitions Repository so the system is extensible with new modelling languages e.g. relational, XML, RDF/S, OWL After a modelling language has been defined in this way, a set of primitive transformations become available that can be applied to schemas expressed in that language Schemas may or may not have a data source associated with them i.e. they may be materialised or virtual schemas

AutoMed features Transformations are applied to schemas via the API of AutoMeds Schemas & Transformations Repository The set of primitive transformations supported is as follows: add(construct,query), delete(construct,query) extend(construct,Range q1 q2), contract(construct,Range q1 q2) rename(construct,old_name,new_name) The queries within these transformations allow automatic data and query translation along sequences of transformations (which we term transformation pathways) Using sequences of AutoMed primitive transformations, we can specify GAV, LAV and GLAV mappings between an integrated schema and a set of data source schemas So we term AutoMeds mapping approach Both-As-View (BAV)

AutoMed Architecture Global Query Processor Global Query Optimiser Schema Evolution Tool Schema Transformation and Integration Tools Model Definition Tool Schemas and Transformations Repository Model Definitions Repository Wrappers Distributed Data Sources The AutoMed software components – deployed at a single site The AutoMed Metadata Repository

Virtual Data Integration using AutoMed

Integrating heterogeneous databases under an ontology There are several possible architectures: ontology-based access to multiple ontology-mapped databases ontology-based access to multiple databases ontology-based access to an integrated virtual database resource

Ontology-based access to multiple ontology-mapped databases Global Ontology Ontology for this data source Mappings Query Processor User queryresult Ontology for this data source Ontology Mapper Ontology Mapper Metadata

Case Study E: SemWIQ (Semantic Web Integrator and Query Engine) The aim of the SemWIQ middleware is to support data sharing in distributed scientific communities An RDF mapping service (specifically D2R Server) interacts with each of the data sources The global ontology is expressed in OWL-DL The mappings consist of GAV views, which map terms in the ontology to the data sources RDF representations Users queries are expressed with respect to the ontology, using SPARQL The SemWIQ Query Processor reformulates user queries to sub- queries expressed on the data sources RDF representations, using the mappings The sub-queries are directed to the RDF mapping services over the data sources, which translate the SPARQL sub-queries to native queries that are submitted to the actual data sources for evaluation

SPARQL – a query language for RDF See SPARQL Query language for RDFhttp:// Here is an example SPARQL query from that document: PREFIX dc: PREFIX ns: SELECT ?title ?price WHERE { ?x ns:price ?price. FILTER (?price < 30.5) ?x dc:title ?title. }

Ontology-based access to multiple databases Global Ontology Source DB Schema Mappings Query Processor User queryresult Source DB Schema Metadata

Case Study F: MASTRO The global ontology is expressed in the DL-Lite A language [Poggi et al., 2008] The data in the data sources is treated as instance data of the ontology Mappings consist of a query Q D over a data source mapped to a query Q O over the ontology (so these are GLAV mappings) User queries are expressed with respect to the ontology User queries are first expanded according to the constraints contained in the ontology The Query Processor then reformulates these queries to sub-queries expressed on the data sources, using the mappings The sub-queries are evaluated at the data sources, and their results are merged to give the overall answer to the original query

Ontology-based access to an Integrated Virtual Database Resource Integrated DB Schema Mappings Query Processor User queryresult Source DB Schema Global Ontology Mapping Metadata

Case Study G: ASSIST The ASSIST project aims to facilitate cervical cancer research by integrating several medical databases containing information about patients and examinations they have undergone There is an existing domain ontology, expressed in OWL-DL, together with an additional set of medical rules expressed in EL++ AutoMed is used to define the mappings shown in the diagram above i.e. between the ontology and the (virtual) integrated database schema, and between this integrated database schema and the data source schemas User queries are expressed with respect to the ontology Queries are first expanded according to the medical rules AutoMed is then used to evaluate the expanded queries

ASSIST Integration Strategy Step 1: integrate the data source schemas into a relational schema, R1; however, R1 may contain information that is not encompassed by the ontology Step 2: transform R1 into a relational schema R2, which drops information from R1 that is not in the ontology; and includes new concepts aligned with those of the ontology and defined in terms of concepts of R1 Step 3: automatically translate R2 into OWL-DL using a relational-to-RDF algorithm (Lausen, ODBIS07) Step 4: transform this OWL-DL intermediate ontology into to the final ASSIST OWL-DL domain ontology

7. Enabling interoperability between systems A plethora of services are being made available in many domains: education, science, business … Similar problems arise when combining such services into larger- scale workflows as when integrating heterogeneous data sources: the necessary services are often created independently by different parties, using different technologies, data formats and data types Therefore additional code needs to be developed to transform the output of one service into a format that can be consumed by another Pointing to the need for service reconciliation techniques

Service reconciliation Previous approaches include: Shims, developed as part of the my Grid project. These are services that act as intermediaries between specific pairs of services and reconcile their inputs and outputs. They need to be programmed individually Bowers & Lud ä scher (DILS 04) use mappings to an ontology for reconciling services. An XQuery query is automatically generated from these mappings. Applied to the output of one service, the XQuery query transforms it into a form that can be accepted by the next service

Our approach (1)We adopt XML as the common representation format for service inputs and outputs: Suppose we need to automatically translate the output of service S1 (getIPIEntry) so that it can be passed to the next service S2 (getPfamEntry) We assume the availability of a format converters that can convert the output of service S1 into XML, if it is not XML; and similarly that can convert from an XML representation into the input format expected by service S2

Our approach (2) We adopt XMLDSS as the schema type: We use our own XMLDSS schema type as the common schema type for XML data that is output from/input to services XMLDSS is a structural summary of an XML document This can be automatically derived from a DTD/XML Schema associated with the XML document, if this is available If not, an XMLDSS schema can be automatically extracted from an XML document – there is an AutoMed tool that can do that

Our approach (3) We use correspondences to a domain ontology: We assume the availability of a set of correspondences between each XMLDSS schema and a domain ontology (a set of correspondences C1 for schema X1 and a set of correspondences C2 for schema X2) An XMLDSS element can correspond to some concept or path in the ontology An XMLDSS attribute can correspond to some literal-valued property, or to some path ending in such a property In general, there may be several correspondences for an element or an attribute with respect to the ontology

Our approach (4) Schema and data transformation: A schema transformation pathway is then automatically generated that can transform X1 to X2: The correspondences are used to create pathways X1 X1 and X2 X2, where intermediate schemas X1 and X2 are conformed w.r.t. the ontology Our XMLDSS restructuring algorithm then automatically creates the pathway X1 X2 Hence obtaining an overall pathway X1 X1 X2 X2. This can now be used to transform data output from S1 into data that can be input to S2

Correspondences to multiple ontologies In a setting where a) X1 corresponds to an ontology O1 using a set of correspondences C1, b) X2 corresponds to a different ontology O2 using a set of correspondences C2, and c) there is an AutoMed transformation pathway O1 O2 Then, we can automatically generate a new set of correspondences C1 for X1 with respect to O2 (using the information in the pathway O1 O2) The setting is now identical the single-ontology setting earlier Proviso: the generated C1 must conform to our language for specifying correspondences

Possible deployments A Workflow Tool could use our approach either dynamically or statically: a) Dynamically, as a run-time mediation service: The workflow tool invokes a service S1 and receives its output The tool submits the output of S1, the schema of the next service S2, and the two sets of correspondences to the AutoMed XML data transformation service The AutoMed service transforms the output of S1 to a suitable input for submission to S2, and returns this back to the workflow tool The workflow tool invokes service S2 with this data b) Statically, for shim generation: AutoMed is used to generate a shim between services S1 and S2 The AutoMed XML data transformation service can generate an XQuery query which, when applied to output from S1, creates data corresponding to the input that is expected by S2 The Workflow tool can use these shims at run-time

Case Study H: Interoperability in the MyPlan project The MyPlan project aimed to support learners in planning their lifelong learning One project goal was to facilitate interoperability between existing systems that target the lifelong learner In general, direct access to these systems metadata repositories is not possible; but some systems do provide services that can be invoked by other systems Therefore, we applied our service reconciliation techniques described above in this setting.

MyPlan project Our proof-of-concept implementation involves transferring learners data between two systems: L4All ( whose services inputs and outputs correspond to the L4ALL ontology, defined in RDFSwww.lkl.ac.uk/research/l4all eProfile ( whose services inputs and outputs correspond to the FOAF ontology, defined in OWL-DLwww.schools.bedfordshire.gov.uk/im/EProfile In particular, we discuss next how data output from an L4All service S 1 which semantically corresponds to the L4ALL ontology, can be automatically transformed to data that semantically corresponds to FOAF and can be input to service S 2 of eProfile

Reconciling services S 1 of L4All and S 2 of eProfile 1.Manually integrate, using AutoMed, the L4ALL and FOAF ontologies into a global ontology for the Lifelong Learning domain – the LLO ontology (defined in OWL-DL) 2.Automatically generate (using the AutoMed XMLDSS schema generation tool) a schema X 1 for the output of service S 1 and a schema X 2 for the input of service S 2

Reconciling services S 1 of L4All and S 2 of eProfile 4.Manually specify, or semi- automatically derive, a set of correspondences C 1 for X 1 with respect to L4ALL, and a set of correspondences C 2 for X 2 with respect to FOAF. 5.Automatically translate C 1 to C 1, now targeting FOAF, using the L4ALL LLO FOAF transformation pathway. 6.Automatically produce schemas X 1 and X 2 that are now conformed w.r.t. FOAF. 7.Automatically produce the transformation pathway X 1 X 2

Fragments of the L4ALL, FOAF and LLO ontologies

Correspondences C1 between X 1 and L4ALL ConstructPath user$1 [c|c l4:Learner ] userID$1 [id| l,id l4:id; l4:Learner; l4:Identification ; id,lit l4:username; l4:Identification; rdfs:Literal ] fullname$1 [id| l,id l4:id; l4:Learner; l4:Identification ; id,lit l4:name; l4:Identification; rdfs:Literal ] $1 [id| l,id l4:id; l4:Learner; l4:Identification ; id,lit l4: ; l4:Identification; rdfs:Literal ] interests$1 [p| l,p l4:learning-prefs; l4:Learner;l4:Learning_Prefs ; id,lit l4:interests; l4:Learning Prefs; rdfs:Literal ]

Correspondences C1 between X 1 and FOAF ConstructPath user$1 [c|c foaf:Agent ] userID$1 [id| l,id (generateProperty foaf:Agent ); id,oa foaf:holdsAccount,foaf:Agent,foaf:OnlineAccount ; oa,userlit foaf:accountName,foaf:OnlineAccount,rdfs:Literal ] fullname$1 [id| l,id (generateProperty foaf:Agent ); id foaf:Agent id,lit foaf:name,owl:Thing,rdfs:Literal ] $1 [id| l,id (generateProperty foaf:Agent ); x,id,lit (generateProperty foaf:mbox,foaf:Agent,rdfs:Literal )] interests$1 [p| l,p,z (generateProperty foaf:topic_interest,foaf:Person,owl:Thing ); x,id,lit (generateProperty foaf:topic_interest,foaf:Person,owl:Thing )]

Reading for this week Flexible data integration and ontology-based data access to medical records. L.Zamboulis, A.Poulovassilis, G.Roussos. Proc. BIBE'08, Athens, pp 1-6. At [you can skip Data Cleansing Section III D, the Relational-to-OWL-DL translation Section III E (1), Query Processing Section III F] Ontology-Assisted data transformation and integration. L.Zamboulis, A.Poulovassilis, J.Wang. Proc. ODBIS'08, Auckland, pp At [you can skip Sections 2.1, 3.1, 4] Question for you to consider: What are the similarities and dissimilarities of the three ways of accessing heterogeneous databases through an ontology, as illustrated by Case Studies E, F and G ?

References A semantic web middleware for virtual data integration on the Web, A.Langegger et al, Proc. ESWC 2008, pp Linking data to ontologies, A.Poggi et al, Journal of Data Semantics, 2008, pp Bioinformatics service reconciliation by heterogeneous schema transformation, L.Zamboulis, N.Martin, A.Poulovassilis, Proc. DILS'07, Philadelphia, pp At