CISB594 – Business Intelligence Data Warehousing Part I.

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
 Data Warehouse Architecture By: Harrison Reid. Outline  What is a Data Warehouse Architecture  Five Main Data Warehouse Architectures  Factors That.
Managing Data Resources
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 16 Designing.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Chapter 5 DATA WAREHOUSING.
Chapter 8: Data Warehousing
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Lecture-8/ T. Nouf Almujally
© 2003, Prentice-Hall Chapter Chapter 2: The Data Warehouse Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas.
2nd semester 2010 Dr. Qusai Abuein
Decision Support Systems Data Warehousing Chattrakul Sombattheera.
Chapter 2 Data Warehousing
Chapter 12 Designing Distributed and Internet Systems
Chapter 11 Designing Distributed and Internet Systems Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Database Systems – Data Warehousing
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Warehouse Concepts Transparencies
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
© 2007 by Prentice Hall 1 Introduction to databases.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 10: The Data Warehouse Decision Support Systems in the 21 st.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
CISB594 – Business Intelligence
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )
Data Warehouse. Group 5 Kacie Johnson Summer Bird Washington Farver Jonathan Wright Mike Muchane.
CISB594 – Business Intelligence Data Warehousing Part I.
 Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures (high level).  Describe the processes used.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
CISB594 – Business Intelligence Data Warehousing Part I.
Decision Support Systems Data Warehousing. Modified from Decision Support Systems and Business Intelligence Systems 9E. 1-2 Learning Objectives Understand.
1 ISQS 3358, Business Intelligence Data Warehousing Zhangxi Lin Texas Tech University 1.
Chapter 2 Data Warehousing. Learning Objectives  Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures.
Chapter 2 Data Warehousing. Learning Objectives  Understand the basic definitions and concepts of data warehouses  Understand data warehousing architectures.
CISB594 – Business Intelligence Data Warehousing Part I.
 Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures (high level).  Describe the processes used.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Chapter 8: Data Warehousing. Data Warehouse Defined A physical repository where relational data are specially organized to provide enterprise- wide, cleansed.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 5: Data Warehousing.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
DATA WAREHOUSING. Learning Objectives  Understand the basic definitions and concepts of data warehouses  Understand data warehousing architectures 
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
Managing Data Resources File Organization and databases for business information systems.
Chapter 2 Data Warehousing
Advanced Applied IT for Business 2
Defining Data Warehouse Concepts and Terminology
Data Warehousing and Data Mining By N.Gopinath AP/CSE
Data Warehouse.
Chapter 8: Data Warehousing
Data Warehouse Architecture
Defining Data Warehouse Concepts and Terminology
MANAGING DATA RESOURCES
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Data Warehouse.
Chapter 3 DATA WAREHOUSING.
Presentation transcript:

CISB594 – Business Intelligence Data Warehousing Part I

CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the following texts, unless stated otherwise.

CISB594 – Business Intelligence Objectives At the end of this lecture, you should be able to: Understand the basic definitions and concepts of data warehouses Understand how a data warehouse differs from a database Describe the characteristics of data warehouse Describe data warehouse process overview Describe the different types of data warehouse architectures CISB594 – Business Intelligence

Data Warehouse A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format “The data warehouse is a collection of integrated, subject- oriented databases design to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time” A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis. (Wikipedia) In your own words?

CISB594 – Business Intelligence Characteristics of data warehousing Main Subject oriented - Data organized by detailed subject, containing only information relevant for decision support, unlike operational database which are product oriented Subject oriented - Data organized by detailed subject, containing only information relevant for decision support, unlike operational database which are product oriented Integrated – must place data from different sources into a consistent format, to do so they must deal with naming conflict and discrepancies Integrated – must place data from different sources into a consistent format, to do so they must deal with naming conflict and discrepancies Time variant (time series) - maintains historical data. Data for analysis from multiple sources contain multiple time points Time variant (time series) - maintains historical data. Data for analysis from multiple sources contain multiple time points Nonvolatile - after data are entered into a data warehouse, users cannot change or update the data. Nonvolatile - after data are entered into a data warehouse, users cannot change or update the data.

CISB594 – Business Intelligence Characteristics of data warehousing Additional Relational/multidimensional Client/server Real-time Include metadata

CISB594 – Business Intelligence How is a data warehouse different from a database? Technically a data warehouse is a database, with certain characteristics to facilitate its role in decision support. Technically a data warehouse is a database, with certain characteristics to facilitate its role in decision support. However, it is an “integrated, time-variant, non­volatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organi­ zation.” - These characteristics, are not necessarily true of databases in general. However, it is an “integrated, time-variant, non­volatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organi­ zation.” - These characteristics, are not necessarily true of databases in general. As a practical matter most databases are highly normalized, in part to avoid update anomalies. As a practical matter most databases are highly normalized, in part to avoid update anomalies. Data warehouses are often denormalized for performance reasons. This is acceptable because their content is never updated, just added to. (Historical data are static) Data warehouses are often denormalized for performance reasons. This is acceptable because their content is never updated, just added to. (Historical data are static)

CISB594 – Business Intelligence Data Warehousing - Concept Data mart – Smaller and focuses on a particular subject or department. – It is a subset of data warehouse/departmental data warehouse – A data mart is a smaller DW designed around one problem, organizational function, topic, or other focus area. Can be Dependent data mart – A subset that is created directly from a data warehouse – Ensures that the end user is viewing the same version of the data that are accessed by all other data warehouse users Or Independent data mart – A small data warehouse designed for a strategic business unit or a department

CISB594 – Business Intelligence Data Warehousing - Concept Operational data stores (ODS) – It is a type of database often used as an interim area for a data warehouse, especially for customer information files. – Use for short term decisions rather than medium and long term – Similar to short term memory, stores only recent information

CISB594 – Business Intelligence Data Warehousing - Concept Oper marts An operational data mart. An oper mart is a small-scale data mart typically used by a single department or functional area in an organization

CISB594 – Business Intelligence Data Warehousing - Concept Enterprise data warehouse (EDW) – A large scale data warehouse used across the enterprise for decision support – Used to provide data for many types of DSS, including CRM, supply chain management, BPM, KMS etc Metadata – Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its use

CISB594 – Business Intelligence Data Warehousing Process Overview Organizations continuously collect data, information, and knowledge at an increasingly accelerated rate and store them in computerized systems The number of users needing to access the information continues to increase as a result of improved reliability and availability of network access, especially the Internet Creating of data warehouse involves the following: – Data are imported from various internal and external resources – Cleansed and organized to suit the organization’s needs. – Data marts can be loaded for specific department/area (alternatively data marts are created first and later integrated into EDW)

CISB594 – Business Intelligence Data Warehousing Process Overview The data warehousing process consists of the following steps: 1.Data are imported from various internal and external sources 2.Data are cleansed and organized consistently with the organization’s needs 3a. Data are loaded into the enterprise data warehouse 4a.If desired, data marts are created as subsets of the EDW —or— 3b.Data are loaded into data marts 4b.The data marts are consolidated into the EDW 5.Analyses are performed as needed

CISB594 – Business Intelligence Data Warehousing - Process Overview The major components of a data warehousing process Data sources Data sources. Data are sourced from operational systems and possibly from external data sources. Data extraction Data extraction. Data are extracted using custom-written or commercial software called ETL. Data loading Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse. Comprehensive database Comprehensive database. This is the EDW that supports decision analysis by providing relevant summarized and detailed information. Metadata Metadata. Metadata are maintained for access by IT personnel and users. Metadata include rules for organizing data summaries that are easy to index and search. Middleware tools Middleware tools. Middleware tools enable access to the data warehouse from a variety of front-end applications.

Data Warehousing - Process Overview

CISB594 – Business Intelligence Data Warehousing Architectures There are several basic architectures for data warehousing To distinguished the architectures data warehouse is divided into three parts: The data warehouse itself Data acquisition (back-end) software, which extracts data from legacy systems and external sources, consolidates and loads into the data warehouse Client (front-end) software, which allows users access and analyze data from the warehouse

CISB594 – Business Intelligence Three-tier DW architecture 1 st tier : operational systems contain data and software for data acquisition 2 nd tier : the data warehouse 3 rd tier : DSS/BI/BA engines Data from data warehouse are processed and deposited in multidimensional database and organized for easy analysis and presentation Advantage: its separation of the functions of the data warehouse, which eliminates resource constraints and makes it easy to create data marts

Data Warehousing Architectures CISB594 – Business Intelligence

Two-tier DW architecture 1 st tier : operational systems contain data and software for data acquisition (i.e the server) 2 nd tier : DSS/BI/BA engines and the data DSS engines run on the same hardware platform as the data warehouse, hence more economical Advantage: economical Disadvantage: performance problem for large data warehouse with data intensive applications for decision support

Data Warehousing Architectures CISB594 – Business Intelligence

Web-based DW architecture Data warehousing and the Internet are two key technologies that offer important solutions for managing corporate data The integration of these two produced Web-based data warehousing On the client side, the user needs an Internet connection and a Web browser using GUI The Internet/Intranet/Extranet is the communication medium between client and servers On the server side, a Web server is used to manage the flow of info between client and server Advantage: ease of access, platform independence, lower cost Disadvantage: server capacity must be well planned carefully, page loading speed

Data Warehousing Architectures CISB594 – Business Intelligence

Data Warehousing Architectures CISB594 – Business Intelligence

Data Warehousing Architectures CISB594 – Business Intelligence

Data Warehousing Architectures CISB594 – Business Intelligence

Data Warehousing Architectures CISB594 – Business Intelligence

Data Warehousing Architectures CISB594 – Business Intelligence

Data Warehousing Architectures CISB594 – Business Intelligence

Data Warehousing Architectures CISB594 – Business Intelligence

Data Warehousing Architectures Issues to consider when deciding which architecture to use: – Which database management system (DBMS) should be used? Most are built using RDBMS (Oracle, SQL server, DB2 are commonly used) Each supports client/server and Web- based architecture – Will parallel processing and/or partitioning be used? Parallel processing enables multiple CPUs to process data warehouse requests simultaneously and provide scalability. Partitioning will split into smaller tables for access effeciency

CISB594 – Business Intelligence Data Warehousing Architectures Issues to consider when deciding which architecture to use: – Will data migration tools be used to load the data warehouse? – What tools will be used to support data retrieval and analysis?

Data Warehousing Architectures 1.Information interdependence between organizational units 2.Upper management’s information needs 3.Urgency of need for a data warehouse 4.Nature of end-user tasks 1.5. Constraints on resources 2.6. Strategic view of the data warehouse prior to implementation 3.7. Compatibility with existing systems 4.8. Perceived ability of the in-house IT staff 5.9. Technical issues Social/political factors Ten factors that potentially affect the architecture selection decision: CISB594 – Business Intelligence

Now ask if.. You are able to: Understand the basic definitions and concepts of data warehouses Understand how a data warehouse differs from a database Describe the characteristics of data warehouse Describe data warehouse process overview Describe the different types of data warehouse architectures CISB594 – Business Intelligence