21.1 Introduction to Information Integration CS257 Fan Yang
Need for Information Integration All the data in the world could put in a single database (ideal database system) In the real world (impossible for a single database): databases are created independently hard to design a database to support future use
How to integrate Start over build one database: contains all the legacy databases; rewrite all the applications result: painful Build a layer of abstraction (middleware) on top of all the legacy databases this layer is often defined by a collection of classes BUT…
Heterogeneity Problem What is Heterogeneity Problem Aardvark Automobile Co dealers has 1000 databases to find a model at another dealer can we use this command: SELECT * FROM CARS WHERE MODEL=“A6”;
Type of Heterogeneity Communication Heterogeneity Query-Language Heterogeneity Schema Heterogeneity Data type difference Value Heterogeneity Semantic Heterogeneity
Conclusion One database system is perfect, but impossible Independent database is inconvenient Integrate database 1. start over 2. middleware heterogeneity problem