Kamran Munir, M. Odeh, R. McClatchey

Slides:



Advertisements
Similar presentations
1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
Advertisements

Database Planning, Design, and Administration
SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.
Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.
Xyleme A Dynamic Warehouse for XML Data of the Web.
1 Chapter 2 Database Environment Transparencies © Pearson Education Limited 1995, 2005.
Chapter 2 Database Environment.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
Chapter 2 Database Environment Pearson Education © 2014.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Lecture Nine Database Planning, Design, and Administration
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PROJECT VISTA: Integrating Heterogeneous Utility Data A very brief overview.
The information integration wizard (Iwiz) project Report on work in progress Joachim Hammer Presented by Muhammed Al-Muhammed.
Course Instructor: Aisha Azeem
Chapter 1 Introduction to Databases
Database System Concepts and Architecture Dr. Ali Obaidi.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Database Design - Lecture 1
DBS201: DBA/DBMS Lecture 13.
1 Introduction An organization's survival relies on decisions made by management An organization's survival relies on decisions made by management To make.
ITEC224 Database Programming
Database System Concepts and Architecture
Introduction to MDA (Model Driven Architecture) CYT.
1 Minggu 9, Pertemuan 17 Database Planning, Design, and Administration Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Chapter 6 Architectural Design.
ICS (072)Database Systems: An Introduction & Review 1 ICS 424 Advanced Database Systems Dr. Muhammad Shafique.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
Bayu Adhi Tama, M.T.I 1 © Pearson Education Limited 1995, 2005.
1 Chapter 1 Introduction to Databases Transparencies.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Database Systems Database Systems: Design, Implementation, and Management, Rob and Coronel.
Object storage and object interoperability
Chapter 2 Database Environment.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
CCNT Lab of Zhejiang University
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2: Database System Concepts and Architecture
Basic Concepts in Data Management
Chapter 1 Database Systems
Data Warehousing and Data Mining
Database Environment Transparencies
ece 627 intelligent web: ontology and beyond
Ontology-Based Approaches to Data Integration
Chapter 1 Database Systems
Database System Concepts and Architecture
Presentation transcript:

Semantic Information Retrieval from Distributed Heterogeneous Data Sources Kamran Munir, M. Odeh, R. McClatchey CCS Research Centre, University of the West of England, Frenchay, Bristol, UK I Habib, S. Khan, A. Ali, National University of Sciences and Technology, Pakistan

In this paper we describe a framework for a data integration system which provides access to distributed heterogeneous data sources. Our research examines the problem of query reformulation across biomedical data sources, based on merged ontologies and the underlying heterogeneous descriptions of the respective data sources. The data integration and semantic information retrieval concept presented in this paper will lead towards the construction of powerful query reformulation rules to be utilized in the European Health-e-Child (HeC) [J. Freund 2006] project.

Health-e-Child project aims to develop an integrated healthcare platform for European paediatrics, providing seamless integration of traditional and emerging sources of biomedical information. The long-term goal of the project is to provide uninhibited access to universal biomedical knowledge repositories for personalised and preventive healthcare, large-scale information-based biomedical research and training, and informed policy making.

The stakeholders of this project are Siemens- Germany, Lynkeus Srl-Italy, Giannina Gaslini-Italy, University College London - UK, CERN- Switzerland, Maat G Knowledge- Spain, University of the West of England- UK, University of Athens - Greece, University of Genoa - Italy, INRIA - France, European Genetics Foundation-Italy, Aktiaselts Asper Biotech- Estonia Gerolamo Gaslini Foundation-Italy. ……….

Health-e-Child comes to initiate a long journey towards filling the gap between what is current practice and the needs of modern health provision and research. Ultimately, with the Health-e-Child system, information will have no conceptual, logical, physical, temporal, or personal borders or barriers, but will be available to all professionals with the appropriate level of clearance.

Health-e-Child architecture

Introduction Over the past few years, the biomedical domain has been witnessing a tremendous increase in the number of data providers, the volume, and heterogeneity of generated data. We describe an ontology assisted system that allows users to query distributed heterogeneous data sources by hiding details like location, information structure, access pattern and semantic structure of the data. To enable knowledge discovery, clinicians’ queries generally require an integrated and merged view of the data available across distributed data sources.

Why we are using ontologies? Designing a data integration system is a complex task which involves major issues that include the heterogeneity of the underlying data sources, the difference in access mechanisms and the support of query languages and aspects of semantic heterogeneity in relation to their data models. Currently ontologies are being widely used to overcome the problem of semantic heterogeneity. Ontologies are being used as the basis for communication for representing and storing data, for knowledge sharing, classification and organization of data resources and for policy enforcement etc.

Related Work In recent years, several methods have been proposed that use semantic knowledge and mapping details to reformulate a user query in order to provide quick and intelligent answers to the queries. The associated query processing is therefore based on searching for information in documents, searching within (often very heterogeneous) databases and searching for metadata or descriptions of data. Biomedical information integration systems can be based on a data warehousing or mediation approach which are then enriched by ontologies to manage and extract semantic knowledge. In the following sections, these approaches are compared.

The Data Warehousing Approach with an Ontology Based Query Facility The data warehousing approach uses a single, centralized data storage to physically retain a copy of the data from each data source. The schema in the data warehouse holds the collective schema of all data sources. Mappings are provided between a schema and the ontology to link them. User queries are formulated on the global ontology and all requests are directly answerable by the warehouse. This can result in fast responses and enables multifaceted results from a centralized data store. Data Wrapper Source 1 Source 2 Source 3 ……….. Source n Warehouse Manager Source Manager Data Manager Data Warehouse Ontology based query interface (OWL) Query Results Data Merging Specifications Extract – Transform – Load Figure 1: Data warehousing approach with ontology based query facility

Issues in utilising “The Data Warehousing Approach” There are many issues in utilising this kind of system for the integration of biomedical data sources. Generally the intention is to avoid duplicating terabytes of data. Moreover, each biomedical data source may have local arrangements for its storage and access already agreed. Moving all this data from sources into a warehouse involves a huge rebuild of data administration and security infrastructures. Managing a data warehouses is also not a simple task. Whenever new data is added or removed from any of the source systems the update has to be reflected in the warehouse and this may require suspension of the execution of user data requests.

The Mediation Approach with Individual Data Source Ontologies We have also realized the feasibility of the mediation approach by linking different source systems to wrappers and mediators as a result of their market availability. Many different approaches have been proposed in order to build mediators for information retrieval systems (e.g. see [H. Nottelmann 2001 and N. Fuhr 1999]) and for web-based sources (e.g. see [J. L. Ambite 1998 and C.-C. K. Chang 1999]). This is a form of interactive system where a query is built by asking questions of the user. Mediator User Interface Ontology Selection Data Source Ontologies (Source 1,2…n) Results Query Save / Update Ontology Wrapper Source 1 Source 2 Source 3 ……….. Source n Figure 2: The mediation approach with individual data source ontologies Query Request: Answer:

Issues in utilising “The Mediation Approach with Individual Data Source Ontologies” Utilising this kind of architecture for the integration of a number of biomedical data sources can cause performance overheads where it is required to identify semantic relationship like homonyms and synonyms from all source system ontologies. Secondly each data source is represented by its own ontology and there is no global view of the data. Thirdly there is also an overhead of maintaining the relationships between individual data source ontologies. Finally all of these limitations will also lead towards more complicated results merging.

A Methodology for Semantic Information Retrieval from Distributed Heterogeneous Data Sources The approach presented here is based on the availability/generation of ontologies for each data source and the use of a global merged ontology which defines the integrated and virtual view of the underlying distributed heterogeneous data sources. The merged ontology provides a unified representation of all underlying ontologies and will be used in query generation and reformulation. The reformulation of user queries consists of two steps: “data source selection” and “query translation”. When a user query enters the system it is divided into multiple sub-queries for execution at individual data sources; once these queries are executed their results are merged into a final query answer.

Semantic information retrieval form distributed heterogeneous data sources User Queries User Interface Parse Query Build sub-queries using integrated/merge ontology and mapping information Query Module Execution and result merging Wrappers Source 1 Source 2 Source 3 …………….. Source n Ontology Mapping Info. Ontology Integration/Merge Merged Ontology The merged ontology provides a unified representation of all underlying ontologies and will be used in query generation and reformulation, which can be utilised for knowledge discovery.

Some Facts Data Duplication We are not duplicating or copying source data to any other locations and are using only structural and semantic information of the data to reformulate user queries, data residing in existing arrangements can be deleted and new data can be added at any point in time. Security and Confidentiality of data While dealing with biomedical data one of the major concerns is the security and confidentiality of data. For each of the data source there can be agreed approved local arrangements (subject to suitable ethical clearance) for data storage and access between administrators and users of that particular data source, while utilizing this architecture there is no need to rebuild the whole security infrastructure. Keeping the sources intact will also enable us to keep existing applications running above that data sources.

Who will build these ontologies? In order to use this approach there is an overhead in building an ontology for each data source since a detailed merged ontology is required for query resolution. This begs the question:, “Who will build these ontologies?” Ontology development can only be done by the person or by group of people who have a clear understanding of the vocabulary used in the ontology. Ontologies can be reused, extended or partially utilised and considered as long term assets that can be utilized for both resolving semantic conflicts and for communication in different application domains.

Conclusion The design of a data integration system can be a complex task and involves major issues to be handled that include: heterogeneity of data, differences in access mechanisms, support for query languages and semantic heterogeneity. In this paper we have described a framework for the data integration system which provides access to distribute heterogeneous data sources by utilising merged ontology and mapping information. Finally we have discussed and compared two general data integration approaches that utilises ontologies to provide access to distribute heterogeneous data sources namely data warehouse and mediation approach.

Future Work In future we aim to develop novel approaches using merged ontologies to reformulate a user query into a set of queries that are respectively associated with distributed heterogeneous data source ontologies.

Thanks