Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Defining and combining.

Similar presentations


Presentation on theme: "1 Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Defining and combining."— Presentation transcript:

1 1 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 1 Advanced databases – Defining and combining heterogeneous databases: Basics and overview Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ Last update: 17 October 2007

2 2 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 2 Motivation: Price comparison engines search & combine heterogeneous travel-agency DBs, which seach & combine heterogeneous airline DBs

3 3 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 3 Agenda Goals and challenges Global schema integration (short survey) Federated database systems An example: IBM’s DB2 User sovereignty & multidatabase language approach

4 4 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 4 Multi database systems Multiple databases created for the same functionality n Different operating systems, data formats, query languages etc Typically DBs managed by DBMSs running on heterogeneous computing platforms Information sharing across dissimilar platforms n Interconnect previously isolated software systems (DBMS) n Not only invoke but also coordinate interactions

5 5 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 5 Interoperating with heterogeneous databases – requirements (1) n Distributed transparency l users must access a number of different databases in the same way as accessing a single database. n Heterogeneity transparency l users must access other schemas in the same way they access their local database (using a familiar model and language). n The existing database systems and applications must not be changed.

6 6 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 6 n Addition of new databases must be easily accommodated into the system. n The databases have to be accessed both for retrievals and updates. n The performance of heterogeneous systems has to be comparable to the one of homogeneous distributed systems. Interoperating with heterogeneous databases – requirements (2)

7 7 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 7 Autonomy and heterogeneity Interconnection and cooperation of autonomous and heterogeneous databases must address n Distribution n Autonomy n Heterogeneity

8 8 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 8 Heterogeneity n Heterogeneity is independent of location of data n When is an information system homogeneous? l Software that creates and manipulates data is the same l All data follows same structure and data model and is part of a single universe of discourse n Different levels of heterogeneity l Different languages to write applications l Different query languages l Different models l Different DBMSs l Different File systems l Semantic heterogeneity etc.

9 9 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 9 Autonomy Databases usually under separate and independent control Aspects of autonomy n Design autonomy: Local DBs chose their own data model, query language, interpretation of data etc. n Communication autonomy: Local DBs decide when and how to respond to other DB requests n Execution autonomy: Execution of local/external operations/transactions is not controlled by any external DBMS n Association autonomy: Local DBs can decide how much of their data/functions/operations to share with other classes of users n Another kind of autonomy: User autonomy / sovereignty !

10 10 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 10 Balancing autonomy and heterogeneity Different degrees of autonomy: n No/little autonomy (intra corporate, poor networking infrastructure) n More of autonomy and flexible bridging of heterogeneity (federated approach) n Autonomy over heterogeneity (multi database language approach)

11 11 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 11 Interoperability The ability to request and receive services between the interoperating systems and use each others’ functionality. Systems considered interoperable if n They can exchange messages and requests n They can receive services and operate as a unit in solving a common problem

12 12 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 12 Heterogeneous Distributed Databases Information systems that provide interoperation and varying degrees of integration among multiple DBs are called n Multi database systems or n Federated (database) systems or n More generally, heterogeneous distributed database systems (HDDBSs)

13 13 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 13 Solutions to integrating HDDBSs Global Schema Integration Federated Database systems Multi database language approach

14 14 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 14 Agenda Goals and challenges Global schema integration (short survey) Federated database systems An example: IBM’s DB2 User sovereignty & multidatabase language approach

15 15 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 15 Definition and advantages Global database integration: n Based on complete integration to provide a single view Advantages: n Consistent, uniform view of and access to data for users n Users unaware of existing multiple existing DBs

16 16 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 16 Disadvantages n Hard to automate creation of a global schema: structural, semantic or behavioral conflicts n Autonomy esp. association autonomy sacrificed: all local data and operations to be revealed n Loss of semantic information depending on how the schema integration is performed n Correctness of global schema is hard to prove: hard because of context dependent meanings n Error prone, time consuming n Unsuitable for frequent dynamic changes to schemas n Does not scale well with size of DB networks

17 17 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 17 Agenda Goals and challenges Global schema integration (short survey) Federated database systems An example: IBM’s DB2 User sovereignty & multidatabase language approach

18 18 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 18 Taxonomy - based on autonomy DBS either centralized or distributed n Centralized: a single DBMS managing a single DB n Distributed: a single distributed DBMS managing multiple DBs MDBS supports operations on multiple DBs

19 19 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 19 What is a federated system? (check that not redundant) A federated system integrates existing, possibly heterogeneous, databases while preserving their autonomy*. The main difference between federated systems and traditional distributed systems is that in federated systems each component remains autonomous. Autonomy of a component system means that the local administrator maintains some control over his/her system. * A. P. Sheth and J. A. Larson. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases.ACM Computing Surveys, 22(3):183-236, 1990.

20 20 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 20 Definition n A Federated Database System (FDBS) is a collection of cooperating but autonomous component DBSs. n Aim: remove the need for static global schema integration n Allows each local DB to have more control over the shareable information n Control is decentralized n Integration need not be complete but depends on needs of users n More terminology: FDBMS = the software that controls, coordinates the component DBSs of an FDBS

21 21 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 21 FDBS coupling Loosely coupled FDBS n If user’s responsibility to create and maintain the federation. No control enforced by the federation admin. Tightly coupled FDBS n If federation admin have responsibility for creating and maintaining the federation and actively controlling access to the component DBSs. Association autonomy of the individual component DBs still exists

22 22 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 22 FDBs as a compromise Compromise between n no integration in which users must explicitly interface between multiple autonomous DBs AND n Total integration in which autonomy of each component DBS is sacrificed so that users can access data through a single global interface but not as a local user Support local and global (federated) operations

23 23 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 23 Can continue local operations and participate in more than 1 federation. Can be (de/) centralized or another FDBMS A FDBS and its components – cooperation among independent systems

24 24 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 24 Basic system components of the data management architecture

25 25 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 25 FDBSs Schemas Local schema n Conceptual schema of a component DB Component schema n Local schema translated to a common data model of the FDBS. Alleviates data model heterogeneity. Export schema n Specify shareable objects to other members or classes of members of the FDBS. Federated schema n A statically integrated schema or dynamic view of multiple export schemas. Can be multiple federated schemas. External schema n For customization when the federated schema is large and complicated. Another level of abstraction for class of users for example.

26 26 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 26 The systme of schemas needs to be extended  Five level schema architecture of a FDBS

27 27 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 27 Loosely coupled FDBSs n User creates and maintains federation schema n Creating schema corresponds to creating a view against relevant export schemas n Therefore, each user must be aware of information and structure of the export schemas n Hard to support view updates – therefore, assume highly autonomous read-only DBs

28 28 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 28 Loosely coupled FDBSs - Advantages n Flexibility of different interpretations possible for same federated schema n Easier to cope with dynamic changes in schemas since it is easier to create views. Detection of changes is however expensive.

29 29 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 29 Loosely coupled FDBSs - Disadvantages n Duplicated effort in creation of similar federated schemas. n Difficulty in understanding the semantics of schemas available to the user. n Due to possible multiple view creations, view updating cannot be supported.

30 30 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 30 Tightly coupled FDBSs n Aim: provide location, replication and distribution transparency n Federation administrators have full control over creation and maintenance of federated schemas and access to other export schemas n Single federated schema same as global schema but view updates possible if administrators understand the mappings.

31 31 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 31 Tightly coupled FDBSs – Disadvantages n FDBS administrator and component DBSs negotiate creation of export schemas during which adm. has complete read access to component schema and/or data. Violates autonomy n Change in export/component schemas imply redoing federated schema creation.

32 32 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 32 Basic system components of the data management architecture

33 33 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 33 Processors in a FDBS n Transforming processors l Uses mappings to transform commands from internal command language to local query language etc. n Filtering processors l Uses access control specified in export schema to limit allowable operations submitted to corresponding component schemas n Constructing processors l Performs query decomposition and merges data

34 34 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 34 System architecture of an FDBS – schemas and processors

35 35 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 35 Data integration approaches n Multidatabase languages and declarative integration languages l Collective identifiers, semantic variables, virtual classes that form a global schema n Conceptual-level abstraction from data sources l Data integration performed on top of this conceptual layer n Object-oriented virtual integration approaches l Enable user to express specific views and ways to compose integrated data objects n Ontology-based integration approaches l Single-ontology (global) or multi-ontologies n Semantic Web approaches l Ontology-based n Taxonomic database systems l Support multiple, overlapping classifications in centralized, non- integrated DB systems

36 36 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 36 Examples of integration approaches (1)

37 37 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 37 Examples of integration approaches (2)

38 38 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 38 Examples of integration approaches (3)

39 39 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 39 Agenda Goals and challenges Global schema integration (short survey) Federated database systems An example: IBM’s DB2 User sovereignty & multidatabase language approach

40 40 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 40 Background n Garlic l research project l wrapper architecture (  virtual integration) l start from standard relational database, extend language and data model to support some object-oriented features l cross-source query optimization n DB2 DataJoiner l commercial system l combine multiple heterogeneous relational sources l focus on query optimization n DB2 (see Haas et al. 2002) l incorporates ideas of both Garlic and DataJoiner l user-defined fucntions to "federate" simple data sources

41 41 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 41 DB2 architecture for database federation

42 42 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 42 Styles of federation n Scalar UDFs (user-defined functions) l Input: data from surrounding SQL statement l Output: a single scalar result  Can federate function (combine data from one source with a function provided by another, in a single statement) n Table UDFs l Input: as in scalar UDFs l Output: a table  Can federate data l Note: UDFs can also be used to access Web services n Wrappers: Federate function and data l A wrapper transforms an external data source to table form l This data source / table is then identified by a nickname (and can be queried like a „normal“ local table) l Wrappers for a variety of relational and non-relational sources are supplied (e.g., Oracle, Excel, XML) l + a toolkit for developing wrappers for other data sources

43 43 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 43 Examples: Scalar UDFs n Send a message to an MQSeries queue : db2mq.mqsend() l Built-in function l [MQSeries: a middleware that allows the exchange of messages between independent applications; all messages are transferred via this queue] n Send a message with database content to the client application: SELECT db2mq.mqsend(a.headline) FROM Articles a WHERE a.article_timestamp >= CURRENT TIMESTAMP

44 44 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 44 Examples: Table UDFs Data source: address book in a Lotus Notes database SELECT a.first, a.last, a.phone, a.email FROM TABLE (addressbook( )) AS a, Company_Profiles c WHERE c.industry ‘FINANCIAL’ AND c.revenue > 50,000,000 AND c.name = a.company_name Data source: local file system SELECT f.filename, f.author, f.last_modified_date FROM TABLE (dir(‘\laura\papers’, ‘.pdf’)) AS f WHERE f.last_modified_date ‘07/04/2002’

45 45 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 45 Using wrappers to integrate different relational databases (overview)

46 46 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 46 Using wrappers to integrate different relational databases (sample queries) 1. Register nicknames for transactions from 2 company branches: sf.Transactions, ny.Transactions 2. Create federated view CREATE VIEW National_Transactions (store_id, tran_date, tran_id, item_id) AS SELECT store_id, tran_date, tran_id, item_id FROM sf.Transactions UNION ALL SELECT store_id, tran_date, tran_id, item_id FROM ny.Transactions 3. Generate a national sales report SELECT MONTH(tran_date), item_id, COUNT(*) FROM National_Transactions WHERE YEAR(tran_date)=2001 GROUP BY MONTH(tran_date), item_id NB: Can also generate materialized views (cache information locally): CREATE TABLE... AS...

47 47 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 47 Federation of nonrelational structured data (overview) (A single XML document may be mapped to multiple nicknames)

48 48 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 48 Federation of nonrelational structured data (sample query) Excel spreadsheets nicknames: Items, Suppliers SELECT i.mfg, s.id FROM Items i, Suppliers s WHERE i.id = s.id AND i.id = (SELECT g.id FROM (SELECT g.id, COUNT(*), ROWNUMBER( ) OVER (ORDER BY COUNT(*) DESC) AS rownum FROM National_Transactions g, Items it WHERE it.cat=‘television’ AND g.id = it.id AND YEAR(tran_date)=2001 GROUP BY g.id) AS tv_total_2001 WHERE rownum = 1)

49 49 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 49 Agenda Goals and challenges Global schema integration (short survey) Federated database systems An example: IBM’s DB2 User sovereignty & multidatabase language approach

50 50 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 50 Autonomy of data sources – autonomy and sovereignty of users n Autonomy of data sources is valued highly l Degree to which a local data source can operate independently must not be reduced by the integration system n But what about the autonomy of data receivers? l Human users and applications l Autonomous: have different information needs, vary in the ways they perceive their domain of interest l Using integrated data should be non-intrusive: users should not be forced to adapt to any standard concerning structure and semantics of data they desire

51 51 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 51 The ASME criteria for evaluating data integration approaches n Abstraction l Shield users from low-level heterogeneities and underlying data sources n Selection l the possibility of user-specific selection of data and data sources for individual integration n Modeling l The availability of means to incorporate user-specific ways to perceive a domain of interest for which integrated data is desired in the process of data integration n Explicit semantics l Means for explicitly representing the real-world semantics of data  Do different approaches realize these (or not)?  Can we „have it all“?

52 52 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 52 Evaluation results (1)

53 53 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 53 Evaluation results (2)

54 54 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 54 Evaluation results (3)

55 55 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 55 Conclusion: Current state of Multi database language approach – disadvantages and future work needed Lack of distribution and location transparency for users. Users responsible for n finding relevant DBs, n understanding schemas, n detecting and resolving semantic conflicts n performing view integration Some support offered by the language constructs “abstracting the user from technical-level issues and supporting user- specific data selection and modeling are conflicting goals” (Ziegler 200, p. 6)

56 56 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 56 What about user sovereignty like this? n „Yahoo! Pipes is an interactive data aggregator and manipulator that lets you mashup your favorite online data sources. n Like Unix pipes, simple commands can be combined together to create output that meets your needs: l combine many feeds into one, then sort, filter and translate to create your ultimate custom feed. l remix your favorite data sources and use the Pipe to power a new application. l...“ (http://pipes.yahoo.com/pipes/)http://pipes.yahoo.com/pipes/

57 57 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 57 Next lecture Goals and challenges Global schema integration (short survey) Federated database systems An example: IBM’s DB2 User sovereignty & multidatabase language approach Schema integration

58 58 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.ac.be/~berendt/teaching/2007w/adb/ 58 References / background reading; acknowledgements n Slides 2-35 are based on l Meena Nagarajan (2006). Federated database systems. Part I. l http://lsdis.cs.uga.edu/~meena/Spring06/ADB/Federated%20Database%20Systems.ppt http://lsdis.cs.uga.edu/~meena/Spring06/ADB/Federated%20Database%20Systems.ppt - – which in turn reports the classic survey paper - Amit P. Sheth, James A. Larson: Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Comput. Surv. 22(3): 183-236(1990) - (available for example at http://www.cs.auc.dk/~tbp/Teaching/DAT5E00/sheth.pdf )http://www.cs.auc.dk/~tbp/Teaching/DAT5E00/sheth.pdf - The slides on DB2 are based on the paper - Haas, L.M., Lin, E.T., & Roth, M.A. (2002). Data integration through database federation. IBM Systems Journal, 41(4), 578-596 - http://researchweb.watson.ibm.com/journal/sj/414/haas.pdf http://researchweb.watson.ibm.com/journal/sj/414/haas.pdf - The slides on user sovereignty and slides 35-38 are based on the paper - Ziegler, P. (2004). User-specific semantic integration of heterogeneous data: What remains to be done? IFI, University of Zurich, Technical Report ifi-2004.01 - ftp://ftp.ifi.unizh.ch/pub/techreports/TR-2004/ifi-2004.01.pdf ftp://ftp.ifi.unizh.ch/pub/techreports/TR-2004/ifi-2004.01.pdf p.40: Garlic: M. Tork Roth, P. Schwarz, and L. Haas, “An Architecture for Transparent Access to Diverse Data Sources,” Component Database Systems, K. R. Dittrich, A. Geppert, Editors, Morgan-Kaufmann Publishers, San Mateo, CA (2001), pp. 175–206. DataJoiner: IBMCorporation, DataJoiner, http://www.software.ibm.com/data/datajoinerhttp://www.software.ibm.com/data/datajoiner


Download ppt "1 Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Defining and combining."

Similar presentations


Ads by Google