Chapter 111 Chapter 11 Information Integration Spring 2001 Prof. Sang Ho Lee School of Computing, Soongsil Univ.

Slides:



Advertisements
Similar presentations
Wrappers in Mediator-Based Systems Chapter 21.3 Information Integration Presented By Annie Hii Toderici.
Advertisements

Data Analysis. Overview Traditional database systems are tuned to many, small, simple queries. Some applications use fewer, more time-consuming, analytic.
Management Information Systems, Sixth Edition
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Introduction to Data Warehousing CPS Notes 6.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #15.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Data Warehousing and OLAP
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Chapter 21 Information Integration 21.3 Wrappers in Mediator-Based Systems Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220.
Chapter 14 The Second Component: The Database.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Introduction to Data Warehousing Enrico Franconi CS 636.
1 9 Adv. DBMS Data Warehouse CSC5301 Review Hachim Haddouti.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Joachim Hammer 1 Data Warehousing Overview, Terminology, and Research Issues Joachim Hammer.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
On-Line Analytic Processing Chetan Meshram Class Id:221.
Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
OnLine Analytical Processing (OLAP)
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Data Warehousing and OLAP. Warehousing ► Growing industry: $8 billion in 1998 ► Range from desktop to huge:  Walmart: 900-CPU, 2,700 disk, 23TB Teradata.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
DBMS2001Notes 10: Information Integration1 Principles of Database Management Systems 10: Information Integration Pekka Kilpeläinen University of Kuopio.
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
1 On-Line Analytic Processing Warehousing Data Cubes.
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Wrappers in Mediator-Based Systems. Introduction Mediator Wrapper Source 1 Source 2 Query Result.
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
Section 20.1 Modes of Information Integration Anilkumar Panicker CS257: Database Systems ID: 118.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
An Overview of Data Warehousing and OLAP Technology
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data warehouse.
Chapter 11 Information Integration
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
On-Line Analytic Processing
Data warehouse and OLAP
On-Line Analytic Processing
Chapter 13 The Data Warehouse
Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220
Data Warehouse.
Data Warehouse and OLAP
Data Warehousing and OLAP
Introduction to Data Warehousing
Introduction of Week 9 Return assignment 5-2
Data Warehouse and OLAP
Presentation transcript:

Chapter 111 Chapter 11 Information Integration Spring 2001 Prof. Sang Ho Lee School of Computing, Soongsil Univ.

Chapter 112 How to integrate information, which is usually scattered physically This is an unavoidable question to all of us. Approaches –(homogenous) Distributed DBMS (80’s) –Federated databases, Multidatabases, remote data access (90’s) –Data warehouse, mediator (late 90’s)

Chapter 113 Why Information Integration is Difficult (1) Heterogeneous sources Examples (Aardvark Automobile Co.) –1000 dealers –Each dealer maintains a database of their cars in stock –Aardvark wants to create an integrated database –1000 dealers do not all use the same database schema –Dealer 1: Cars(serialNo, model, color, autoTrans, cdPlayer, …) –Dealer 2 Autos(serial, model, color) Options(serial, option)

Chapter 114 Why Information Integration is Difficult (2) Furthermore … –Data type differences: Serial numbers might be represented by character strings or integers –Value differences: The color black might be represented by an integer code, the string BLACK, or the code BL –Semantic differences: One dealer distinguish station wagon from minivans, while another doesn’t –Missing values: A source does not record information that all or most of the other sources provide

Chapter 115 Modes of Information Integration Federated databases –The sources are independent, but one source can call on others to supply information Warehousing –Copies of data from several sources are stored in a single database, called a (data) warehouse Mediation –A mediator is a software component that supports a virtual database, which the user may query as if it were materialized –The mediator stores no data of its own

Chapter 116 Federated Database Systems A federated database system is a federation of existing databases systems (called local database systems, LDBS) and provides applications with a uniform means of access to data that are managed by more than one of these database systems In theory, local databases should preserve local autonomy

Chapter 117 Local Autonomy (1) Design autonomy –Ability of an LDBS to choose its own design decisions wrt any matter, including data model, query language, constraints, system functions, semantic interpretation of data, … Execution autonomy –Ability of an LDBS to execute local operations without interference from external operations and to decide the order in which to schedule external operations

Chapter 118 Local Autonomy (2) Communication autonomy –Ability of an LDBS to decide whether and when to communicate with other database systems Association autonomy –Ability of an LDBS to decide whether and how much to share its functionality and resources with others. For example, an LDBS may export only part of its database to external users or even disassociate itself from an LDBS for some reasons.

Chapter 119 Federated Database Example A federated collection of four local databases DB1DB2

Chapter 1110 Federated Database If n databases each need to talk to the n – 1 other databases, then we should write n(n – 1) pieces of code to support queries between systems This approach is easy to implement in some circumstances !!!

Chapter 1111 Query Translation Example Dealer 1: NeededCars(model, color, autoTrans) Dealer 2: Autos(serial, model, color), Options(serial, option) /* Dealer 1 queries Dealer 2 for needed car For (each tuple (:m, :c, :a) in NeededCars) { if ( :a = TRUE) { /* automatic transmission wanted */ SELECT serial FROM Autos, Options WHERE Autos.serial = Options.serial AND Options.option = ‘autoTrans’ AND Autos.model = :m AND Autos.color = :c; } else { /* automatic transmission not wanted */ SELECT serial FROM Autos WHERE Autos.model = :m AND Autos.color = :c AND NOT EXISTS ( SELECT * FROM Options WHERE serial = Autos.serial AND option = ‘autoTrans’ ); } }

Chapter 1112 Mediators A mediator supports a virtual view or collection of view Don’t store any data of its own Source 1Source 2 Wrapper Mediator query result query result query result

Chapter 1113 Mediator Example (1) –A view that is a single relation AutosMed(serialNo, model, color, autoTrans, dealer) –A query to the mediator SELECT serialNo, model FROM AutoMed WHERE color = ‘red’ –The mediator can forward the same query to each of the two wrappers –The translation work can be done by the wrappers alone

Chapter 1114 Mediator Example (2) –A suitable translation for Dealer 1 Cars(serialNo, model, color, autoTrans, cdPlayer, …) SELECT serialNo, model FROM Cars WHERE color = ‘red’; –A suitable translation for Dealer 2 Autos(serial, model, color), Options(serial, option) SELECT serial, model FROM Autos WHERE color = ‘red’; –Each wrapper returns to the mediator a serialNo-model pairs and serial-model pairs, respectively –The mediator can take the union of these sets and return the result to the user

Chapter 1115 Wrappers in Mediator-Based Systems Sources could be DBMSs (in various models), file systems, Web servers, … Handles all connection/query-translation problems peculiar to sources Mediator systems require more complex wrappers than do most warehouse systems Techniques –Wrapper generator –Template-based –Filter techniques –Etc.

Chapter 1116 Templates for Query Patterns Templates are queries with parameters that represent constants –Example SELECT *SELECT serialNo, model, color FROM AutosMed =>autoTrans, ‘dealder1’ WHERE color = ‘$c’ FROM Cars WHERE color = ‘$c’; In general there would be 2 n templates if we have the option of specifying n attributes The number of templates could grow unreasonably large

Chapter 1117 Wrapper Generators Wrapper generator –The software that creates the wrapper –A table that holds the various query patterns contained in the templates Wrapper generator Templates Driver Source Table Queries from mediator Results Queries

Chapter 1118 Filters It is not always realistic to write a template for every possible from of query Another approach to supporting more queries is to have the wrapper filter the results of queries

Chapter 1119 Filters Example –Suppose the only template we have is the one that finds cars given a color –The mediator needs to find blue ‘Gobi’ model cars SELECT * FROM autosMed WHERE color = ‘blue’ and model = ‘Gobi’ –A possible way to answer the query Use the template (with $c = ‘blue’) Store the result in a temporary relation Select from TempAutos the Gobi’s

Chapter 1120 Data Warehousing Growing industry since mid 90’s Ranges from desktop to huge Lots of buzzwords, hype –Slice & dice, rollup, MOLAP, pivot, …

Chapter 1121 Information as a Competitive Weapon Organizations have collected large amounts of data. Now it is time to use it to their advantage.

Can You Easily Answer These Questions? What are Personnel Services costs across all departments for all funding sources? What are the effects of outsourcing specific services? What is the correlation between expenditures and collection of delinquent taxes? What is the impact on revenues and expenditures of changing the operating hours of the Dept. of Motor Vehicles? What is the economic impact of the small business initiative in our district?

Chapter 1123 What is a Warehouse (1) Collection of diverse data –Subject oriented –Aimed at executive, decision maker –Often a copy of operational data –With value-added data (e.g., summaries, history) –Integrated –Time-varying –Non-volatile AND …

Chapter 1124 What is a Warehouse (2) Collection of tools –Gathering data –Cleansing, integrating –Querying, reporting, analysis –Data mining –Monitoring, administering warehouse

Chapter 1125 Warehouse Architecture Client Warehouse Source Query & Analysis Integration Metadata

Chapter 1126 Motivating Examples Forecasting Comparing performance of units Monitoring, detecting fraud Visualization

Chapter 1127 Why a Warehouse Two approaches: –Query-driven (lazy) –Warehouse (eager) Source ?

Chapter 1128 Query-driven approach Client Wrapper Mediator Source

Chapter 1129 Advantages of Query-driven No need to copy data –Less storage –No need to purchase data More up-to-date data Query needs can be unknown Only query interface needed at sources May be less draining on sources

Chapter 1130 Advantages of Warehousing High query performance Queries not visible outside warehouse Local processing at sources unaffected Can operate when sources unavailable Can query data not stored in a DBMS Extra information at warehouse –Modify, summarizes (store aggregates) –Add historical information

Chapter 1131 OLTP vs. OLAP OLTP (On-Line Transaction Processing) –Describes processing at operational sites OLAP (On-Line Analytical Processing) –Describes processing at warehouse

Chapter 1132 OLTP vs. OLAP OLTP –Mostly updates –Many small transactions –Mb-Tb of data –Current snapshot –Raw data –Clerical users –Consistency, recoverability critical Warehouse –Mostly reads –Queries are long and complex –Gb-Tb of data –History –Summarized, consolidated data –Decision-makers, analysts as users

Chapter 1133 OLAP Example The schema for the warehouse –Sales(serialNo, date, dealer, price) –Autos(serialNo, model, color) –Dealers(name, city,state,phone) A typical decision-support query –SELECT state, AVG(price) FROM Sales, Dealers WHERE Sales.dealer = Dealers.name AND date >= ‘ ’ GROUP BY state; Common OLTP query –“Find the price at which the auto with serial number 123 was sold”

Chapter 1134 Warehouse Models and Operations Data models –Relations –Stars and snowflakes –Cubes Operations –Slice and dice –Roll-up, drill-down –Pivoting –other

Chapter 1135 Star Schemas Star schema = fact table + dimension tables Fact table Dimension table Dependent attributes

Chapter 1136 Example-1 (1) Sales(serialNo, date, dealer, price) Autos(serialNo, model, color) Dealers(name, city, state, phone) Sales is a fact table –serialNo, date, dealer are dimensions –The one dependent attribute is price, which is what OLAP queries will typically request in an aggregation Autos relation and Dealer relation are dimension tables –Attribute serialNo in the fact table is a foreign key, referencing serialNo of dimension table Autos Join between fact table and dimension tables, is frequently done date dealer car

Chapter 1137 Example-1 (2) A time dimension table Days (day, week, month, year) –Since grouping by various time units is frequently desired by analysts –It helps to build into the database a notion of time, as if there were a time dimension table such as above

Chapter 1138 Example-2 (1)

Chapter 1139 Example-2 (2)

Chapter 1140 Slicing and Dicing Dicing –For example, in the time dimension, we might partition (“group by” clause) according to days, weeks, months, years, or not partition at all –Partitioning is also possible for cars and dealers Slicing –Through the “where” clause, a query focuses on partitions along one or more dimensions dealer date car

Chapter 1141 Example 1 A query in which we ask for a slice in one dimension (the date), and dice in two other dimensions (car and dealer) The date is divided into four groups, … car date dealer

Chapter 1142 More Examples SELECT color, SUM(price) FROM Sales NATURAL JOIN Autos WHERE model = ‘Gobi’ GROUP BY color; –This query dices by color and then slices by model SELECT dealer, month, SUM(price) FROM (Sales NATURAL JOIN Autos) JOIN Days on date = day WHERE model = ‘Gobi’ and color = ‘red’ GROUP BY color;

Chapter 1143 How to support cube-structured data for OLAP ROLAP, or Relational OLAP –Data may be stored in relations with a specialize structure called a “star schema” MOLAP, or Multidimensional OLAP –A specialized structure, the “data cube”, is used to hold the data