Introduction to Data Warehousing Enrico Franconi CS 636.

Slides:



Advertisements
Similar presentations
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Advertisements

Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #15.
Data Management for Decision Support Session - 1 Prof. Bharat Bhasker.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
CSE6011 Data Warehouse and OLAP  Why data warehouse  What’s data warehouse  What’s multi-dimensional data model  What’s difference between OLAP and.
COMP 578 Data Warehouse and Data Warehousing: An Introduction
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
SLIDE 1IS 257 – Fall 2011 Data Warehousing University of California, Berkeley School of Information IS 257: Database Management.
11/1/2001Database Management -- R. Larson Data Warehouses, Decision Support and Data Mining University of California, Berkeley School of Information Management.
Data Mining and Data Warehousing – a connected view.
SLIDE 1IS Fall 2002 Data Warehouses, Decision Support and Data Mining University of California, Berkeley School of Information Management.
10/30/2001Database Management -- R. Larson Data Warehousing University of California, Berkeley School of Information Management and Systems SIMS 257: Database.
SLIDE 1IS 257 – Spring 2004 Data Warehouses, Decision Support and Data Mining University of California, Berkeley School of Information Management.
Chapter 13 The Data Warehouse
11/2/2000Database Management -- R. Larson Data Warehouses, Decision Support and Data Mining University of California, Berkeley School of Information Management.
DATA WAREHOUSE (Muscat, Oman).
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
CS 345: Topics in Data Warehousing Tuesday, September 28, 2004.
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Joachim Hammer 1 Data Warehousing Overview, Terminology, and Research Issues Joachim Hammer.
1 Database Administration (CG168) – Lecture 10a: Introduction to Data Warehousing Data Warehousing “An Introduction” Dr. Akhtar Ali School of Computing,
Data Management for Decision Support Session-2 Prof. Bharat Bhasker.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
AN OVERVIEW OF DATA WAREHOUSING
Intro. to Data Warehouse
© 2007 by Prentice Hall 1 Introduction to databases.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
SLIDE 1IS 257 – Fall 2014 Data Warehousing University of California, Berkeley School of Information IS 257: Database Management.
Data Management for Decision Support Session-3 Prof. Bharat Bhasker.
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
CISB594 – Business Intelligence Data Warehousing Part I.
Data Warehouses and OLAP Data Management Dennis Volemi D61/70384/2009 Judy Mwangoe D61/73260/2009 Jeremy Ndirangu D61/75216/2009.
DBSQL 9-1 Copyright © Genetic Computer School 2009 Chapter 9 Data Mining and Data Warehousing.
MAIN BOOKS 1. DATA WAREHOUSING IN THE REAL WORLD : Sam Anshory & Dennis Murray, Pearson 2. DATA MINING CONCEPTS AND TECHNIQUES : Jiawei Han & Micheline.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Acct 6910 Building Business Intelligence Systems An Introduction to Data Warehouse.
Data Warehousing MEC 623 – Data Warehousing and Data Mining.
Data Warehousing/Mining 1 Data Warehousing/Mining Introduction.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 5: Data Warehousing.
Data Warehouse – Your Key to Success. Data Warehouse A data warehouse is a  subject-oriented  Integrated  Time-variant  Non-volatile  Restructure.
Chapter 1 Overview of Databases and Transaction Processing.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data Warehouse and OLAP
Advanced Applied IT for Business 2
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
Instructor: Dan Hebert
Data Warehouse and OLAP
Introduction to Data Warehousing
Data Warehouse and OLAP
Data and Interoperability:
Data Warehouse.
Introduction to Data Warehousing
Data Warehouse and OLAP
Data Warehouse and OLAP Technology
Presentation transcript:

Introduction to Data Warehousing Enrico Franconi CS 636

CS 3362 Problem: Heterogeneous Information Sources “Heterogeneities are everywhere”  Different interfaces  Different data representations  Duplicate and inconsistent information Personal Databases Digital Libraries Scientific Databases World Wide Web

CS 3363 Problem: Data Management in Large Enterprises  Vertical fragmentation of informational systems (vertical stove pipes)  Result of application (user)-driven development of operational systems Sales AdministrationFinanceManufacturing... Sales Planning Stock Mngmt... Suppliers... Debt Mngmt Num. Control... Inventory

CS 3364 Goal: Unified Access to Data Integration System  Collects and combines information  Provides integrated view, uniform user interface  Supports sharing World Wide Web Digital LibrariesScientific Databases Personal Databases

CS 3365  Two Approaches:  Query-Driven (Lazy)  Warehouse (Eager) Source ? Why a Warehouse?

CS 3366 The Traditional Research Approach Source... Integration System... Metadata Clients Wrapper  Query-driven (lazy, on-demand)

CS 3367 Disadvantages of Query-Driven Approach  Delay in query processing  Slow or unavailable information sources  Complex filtering and integration  Inefficient and potentially expensive for frequent queries  Competes with local processing at sources  Hasn’t caught on in industry

CS 3368 The Warehousing ApproachDataWarehouse Clients Source... Extractor/ Monitor Integration System... Metadata Extractor/ Monitor Extractor/ Monitor  Information integrated in advance  Stored in wh for direct querying and analysis

CS 3369 Advantages of Warehousing Approach  High query performance  But not necessarily most current information  Doesn’t interfere with local processing at sources  Complex queries at warehouse  OLTP at information sources  Information copied at warehouse  Can modify, annotate, summarize, restructure, etc.  Can store historical information  Security, no auditing  Has caught on in industry

CS Not Either-Or Decision  Query-driven approach still better for  Rapidly changing information  Rapidly changing information sources  Truly vast amounts of data from large numbers of sources  Clients with unpredictable needs

CS What is a Data Warehouse? A Practitioners Viewpoint “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant

CS What is a Data Warehouse? An Alternative Viewpoint “A DW is a  subject-oriented,  integrated,  time-varying,  non-volatile collection of data that is used primarily in organizational decision making.” -- W.H. Inmon, Building the Data Warehouse, 1992

CS A Data Warehouse is...  Stored collection of diverse data  A solution to data integration problem  Single repository of information  Subject-oriented  Organized by subject, not by application  Used for analysis, data mining, etc.  Optimized differently from transaction- oriented db  User interface aimed at executive

CS … Cont’d  Large volume of data (Gb, Tb)  Non-volatile  Historical  Time attributes are important  Updates infrequent  May be append-only  Examples  All transactions ever at Sainsbury’s  Complete client histories at insurance firm  LSE financial information and portfolios

CS Generic Warehouse Architecture Extractor/ Monitor Extractor/ Monitor Extractor/ Monitor Integrator Warehouse Client Design Phase Maintenance Loading... Metadata Optimization Query & Analysis

CS Data Warehouse Architectures: Conceptual View  Single-layer  Every data element is stored once only  Virtual warehouse  Two-layer  Real-time + derived data  Most commonly used approach in industry today “Real-time data” Operational systems Informational systems Derived Data Real-time data Operational systems Informational systems

CS Three-layer Architecture: Conceptual View  Transformation of real-time data to derived data really requires two steps Derived Data Real-time data Operational systems Informational systems Reconciled Data Physical Implementation of the Data Warehouse View level “Particular informational needs”

CS Data Warehousing: Two Distinct Issues (1) How to get information into warehouse “Data warehousing” (2) What to do with data once it’s in warehouse “Warehouse DBMS”  Both rich research areas  Industry has focused on (2)

CS Issues in Data Warehousing  Warehouse Design  Extraction  Wrappers, monitors (change detectors)  Integration  Cleansing & merging  Warehousing specification & Maintenance  Optimizations  Miscellaneous (e.g., evolution)

CS  OLTP: On Line Transaction Processing  Describes processing at operational sites  OLAP: On Line Analytical Processing  Describes processing at warehouse OLTP vs. OLAP

CS Warehouse is a Specialized DB Standard DB (OLTP)  Mostly updates  Many small transactions  Mb - Gb of data  Current snapshot  Index/hash on p.k.  Raw data  Thousands of users (e.g., clerical users) Warehouse (OLAP)  Mostly reads  Queries are long and complex  Gb - Tb of data  History  Lots of scans  Summarized, reconciled data  Hundreds of users (e.g., decision-makers, analysts)