Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Corso di Architetture della Info A.A. 2009-2010 Carlo Batini 5.1.2 I sistemi di Data Integration elementi architetturali.

Similar presentations


Presentation on theme: "1 Corso di Architetture della Info A.A. 2009-2010 Carlo Batini 5.1.2 I sistemi di Data Integration elementi architetturali."— Presentation transcript:

1 1 Corso di Architetture della Info A.A. 2009-2010 Carlo Batini 5.1.2 I sistemi di Data Integration elementi architetturali

2 2 Data Integration (or mediator) systems

3 3 Data Integration definition Data integration is a major research and business area that has the main purpose of allowing a user to provide uniform access to multiple, autonomous, heterogeneous data sources through the presentation of a unified view of these data. Finding this agreement is complex because one has to find differences and similarities in each schema to be able to conform.

4 The plus of data integration architectures wrt federated architectures Manages –schema level heterogeneities more complex than in federated databases  –(to some extent..) instance level heterogeneities due to quality errors (accuracy, currency, incompleteness, inconsistencies, etc.) in data

5 5 Data integration – several approaches Data integration stands for several approaches for combining data from different data sources [Hull, 1997]: Integrated read-only views: Mediation. To support an integrated, read-only, view of data that resides in multiple databases (the majority of academic and commercial systems) Integrated read-write views: Mediation with update. An extension of the mediation architecture to support updates against an integrated view Initially, we will deal only with the first issue

6 Schema level heterogeneities

7 NB heterogeneity and conflic are synonym in the following Are of two types Name heterogeneities Type heterogeneities

8 Name heterogeneities Sinonyms – Different names for the same concepts –employee, clerk –exam, course –code, num Homonyms – Same name for different concepts - Employee as employee in one schema, as vendor in another schema

9 Name conflicts – HOMONYMS – SYNONIMS Examples of name heterogeneities price (production price) Product price (sale price) Product Department Division

10 Type conflicts The same concepts is represented with different conceptual structures in two schemas Different definition domains for the same attribute in two schemas Attribute in one schema and derived value in another schema Attribute in one schema and entity in another schema Attribute in one schema and generalization hierarchy in another schema Entity in one schema and relationship in another schema Different abstraction levels for the same concept in two schemas: e.g. two entities with homonym names related by an IS-A hierarchy in two schemas Different granularities in the definition domains Different cardinalities in the same relationships Key conflicts See next pages for examples - 

11 Examples of type conflicts - 1 TYPE CONFLICTS in a single attribute (e.g. NUMERIC, ALPHANUMERIC,...) e.g. the attribute “gender”: –Male/Female –M/F –0/1 –In Italy, it is implicit in the “codice fiscale” (SSN) Year has a four digit domain in one schema and two digit domain in another schema

12 different currencies (euros, US dollars, etc.) different measure systems (kilos vs pounds, centigrades vs. Farhenheit.) different granularities (grams, kilos, etc.) Examples of type conflicts - 2

13 Examples of type conflicts - 3 Person WOMAN MAN GENDER Person PUBLISHER BOOK PUBLISHER EMPLOYEE DEPARTMENT PROJECT EMPLOYEE PROJECT Structure conflicts

14 DEPENDENCY (OR CARDINALITY) CONFLICTS Examples of type conflicts - 4 EMPLOYEE DEPARTMENT PROJECT EMPLOYEE PROJECT 1:11:n 1:1 1:n

15 KEY CONFLICTS Examples of type conflicts - 5 CODE PRODUCT LINE CODE PRODUCT DESCRIPTION

16 16 Data integration The research community has been investigating data integration for about 20 years: different research communities (database, artificial intelligence, semantic web) have been developing and addressing issues related to data integration: –Definitions, architectures, classification of the problems to be addressed –Data Integration problems have been analyzed in different perspectives and different approaches have been proposed –Developed benchmarks allow the evaluation and the comparison of the approaches (THALIA benchmark) –Several commercial software suites have been released and are on testing in real environments

17 17 Integration of Heterogeneous & Distributed Data Sources “Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data” (Global Virtual Schema (GS)) [Lenzerini, 2002] Query Global Schema (GS) Mapping Local Schema DB File XML

18 18 Main elements of DI architecture Three main elements of the architecture of a schema integration system can be distinguished. These elements are: a global schema one or more source/local schemas mappings between the global and the source/local schemas

19 19 Typical architecture of a data integration system Global schema Mapping User query Source 1Source 2 Source n Local schema 1Local schema 2Local schema n Wrapper Mediator Wrapper

20 20 Definitions of global schema and mappings The global schema describes the structure of the schema representing the whole universe of discourse. The mappings, or connections, describe how each element in the local schemas relates to the global schema (REMARK mappings can be expressed in the two directions…) 

21 21 Typical architecture of a data integration system Global schema Mapping User query Source 1Source 2 Source n Local schema 1Local schema 2Local schema n Wrapper Mediator Wrapper Global schema Mapping User query Source 1Source 2 Source n Local schema 1Local schema 2Local schema n Wrapper Mediator Wrapper From local schemas to the global schema From the global schema to local schemas

22 22 Definitions of global schema and mappings The global schema describes the structure of the schema representing the whole universe of discourse. The mappings, or connections, describe how each element in the local schemas relates to the global schema Mappings can be expressed in the two directions Summarized, the essence of integration is to combine information in a logical way so information can be queried as one through a common interface. The schema for each information source needs to be connected through a mapping with the global schema of the common interface to enable querying.

23 23 Wise 2009 – Poznan (PL)Università di Modena e Reggio Emilia & Milano Bicocca 23 Mediators (1) Query Interface Local Sources Global Schema View Mapping Local Schemata SOURCE 1 Professor (first_name, last_name, e-mail, area) SOURCE 2 Faculty_member(name, mail, research_topic) GLOBAL SCHEMA Full_professor (name, mail, area) Search mail of professors whose research activities are in the “Database area” Select e-mail From Professor Where area = “Database” Select mail From Faculty_member Where research_topic = “Database” Resultset

24 24 Wise 2009 – Poznan (PL)Università di Modena e Reggio Emilia & Milano Bicocca 24 Mediators (2) The mediator builds a unified schema of several (heterogeneous) information sources and allows a user to formulate a query on it The user query is transformed in a set of sub-queries, one for each data source involved in the query The results are collected by the Mediator, merged and shown to the user

25 25 Architettura funzionale di un Data Integration system Wrapper Mediatore Wrapper DBMS BD MultiDBMS client Mediatore - Fornisce agli utenti una rappresentazione virtuale unica delle fonti, data dallo schema globale - Traduce le queries in termini di frammenti, inviate ai wrapper -Ricompone i risultati restituiti dai wrapper - Effettua le azioni di data fusion e di risoluzione delle eterogeneita’ sui valori 

26 Instance level heterogeneities

27 Mediators object fusion and reconciliation A mediator’s main functionality is object fusion:  group together information about the same real world entity  remove redundancy among the various data sources  resolve inconsistencies among the various data sources  achieve accuracy, completeness, currency (and other DQ dimensions…) among data from different data sources

28 28 Architettura funzionale di un Data Integration system Wrapper Mediator Wrapper DBMS BD DI System client Wrapper -Traduce la richiesta che proviene dal mediatore in termini della rappresentazione logico fisica dello schema locale sottostante

29 29 Wise 2009 – Poznan (PL)Università di Modena e Reggio Emilia & Milano Bicocca 29 Mediators (3) We may divide the interactions with a mediator in two phases: 1.The creation of the unified representation (Publishing phase at design time) 2.The formulation and the execution of a query in the unified representation (Querying phase)

30 30 Architettura funzionale di un MDBS nel nostro esempio Wrapper Mediatore Wrapper DBMS BD MultiDBMS client StudenteCorsoProfessore Global schema

31 31 Architettura funzionale di un mediator system - esempio Wrapper Mediatore Wrapper DBMS BD MultiDBMS client Studente Corso Professore Modulo Local schema

32 32 Virtual Integration Architecture including optimization functionality Data source wrapper Data source wrapper Data source wrapper Sources can be: relational, hierarchical (IMS), structured files, web sites. Mediator: User queries Mediated schema Data source catalog Reformulator Optimizer Execution engine

33 33 DI Systems and design time vs run time issues Publishing phase (or Design time) –[The global schema and the mappings] must be defined from source schemas Run time –Queries are executed and –Global schema, local schemas and the mappings are maintained

34 34 Wise 2009 – Poznan (PL)Università di Modena e Reggio Emilia & Milano Bicocca 34 Mediators – relevant challenges Mediator User Interface Data Sources Publishing Phase Visualizing the unified schema Model and language for representing the unified schema Matching and Mapping the unified schema and the local sources Building the unified schema Managing updates Schema extraction Querying Phase Model and Language for formulating queries Model and language for querying the schema Query unfolding / rewriting Data fusion and cleaning Query transformation and execution

35 35 Wise 2009 – Poznan (PL)Università di Modena e Reggio Emilia & Milano Bicocca 35 Mediators – relevant challenges Mediator User Interface Data Sources Publishing Phase Visualizing the unified schema Model and language for representing the unified schema Matching and Mapping the unified schema and the local sources Building the unified schema Managing updates Schema extraction Querying Phase Model and Language for formulating queries Model and language for querying the schema Query unfolding / rewriting Data fusion and cleaning Query transformation and execution

36 36 wrapper Mediated Schema Semantic mappings optimization & execution query reformulation Design timeRun time

37 37 Basic properties of a DI System A System Providing: –Uniform (same query interface to all sources) –Access to (queries; eventually updates too) –Multiple (we want many, but 2 is hard too) –Autonomous (DBA doesn’t report to you) –Heterogeneous (data models are different) –Structured (and at least semi-structured) –Data Sources (not only databases).


Download ppt "1 Corso di Architetture della Info A.A. 2009-2010 Carlo Batini 5.1.2 I sistemi di Data Integration elementi architetturali."

Similar presentations


Ads by Google