CISB594 – Business Intelligence Data Warehousing Part I.

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
April 30, Data Warehousing and OLAP Technology: An Overview  What is a data warehouse?  Data warehouse architecture  From data warehousing to.
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 3 Database Management
CISB594 – Business Intelligence
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Components of the Data Warehouse Michael A. Fudge, Jr.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
Decision Support Systems Data Warehousing Chattrakul Sombattheera.
Database Systems – Data Warehousing
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
AN OVERVIEW OF DATA WAREHOUSING
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
© 2007 by Prentice Hall 1 Introduction to databases.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
1 Data Warehouses BUAD/American University Data Warehouses.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
CISB594 – Business Intelligence Business Analytics and Data Visualization Part II.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
CISB594 – Business Intelligence
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
CISB594 – Business Intelligence Data Warehousing Part I.
CISB594 – Business Intelligence Data Warehousing Part I.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
 Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures (high level).  Describe the processes used.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
CISB594 – Business Intelligence Data Warehousing Part I.
DATA RESOURCE MANAGEMENT
CISB594 – Business Intelligence Business Analytics and Data Visualization Part I.
Data Mining Data Warehouses.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Advanced Database Concepts
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Chapter 8: Data Warehousing. Data Warehouse Defined A physical repository where relational data are specially organized to provide enterprise- wide, cleansed.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 5: Data Warehousing.
Data Warehouse – Your Key to Success. Data Warehouse A data warehouse is a  subject-oriented  Integrated  Time-variant  Non-volatile  Restructure.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
BUSINESS INTELLIGENCE. The new technology for understanding the past & predicting the future … BI is broad category of technologies that allows for gathering,
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Business Intelligence Overview
Advanced Applied IT for Business 2
Data warehouse.
Data warehouse and OLAP
Data Warehouse—Subject‐Oriented
Data Warehouse.
MANAGING DATA RESOURCES
Data Warehouse and OLAP
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data Warehouse.
Data Warehousing Concepts
Data Warehouse and OLAP
Data Warehouse and OLAP Technology
Presentation transcript:

CISB594 – Business Intelligence Data Warehousing Part I

CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the following texts, unless stated otherwise.

CISB594 – Business Intelligence Objectives At the end of this lecture, you should be able to: Understand the basic definitions and concepts of data warehouses Understand how a data warehouse differs from a database Describe the characteristics of data warehouse Describe data warehouse process overview Describe the different types of data warehouse architectures CISB594 – Business Intelligence

Data Warehouse “The data warehouse is a collection of integrated, subject- oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time” (Inmon) A copy of transaction data specifically structured for query and analysis (Kimball) A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis. (Wikipedia)

CISB594 – Business Intelligence Data Warehouse A decision support database that is maintained separately from the organization’s operational database Support information processing by providing a solid platform of consolidated, historical data for analysis In your own words?

CISB594 – Business Intelligence 4 main characteristics of data warehousing 1.Subject oriented Organized around major subjects, such as customer, sales, containing only information relevant for decision support, unlike operational database which are product oriented Organized around major subjects, such as customer, sales, containing only information relevant for decision support, unlike operational database which are product oriented Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process

CISB594 – Business Intelligence 4 main characteristics of data warehousing 1.Subject oriented For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented ( For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented (

CISB594 – Business Intelligence 4 main characteristics of data warehousing 2.Integrated Constructed by integrating multiple, various data sources Constructed by integrating multiple, various data sources Must place data from different sources into a consistent format, to do so they must deal with naming conflict and discrepancies Must place data from different sources into a consistent format, to do so they must deal with naming conflict and discrepancies Data cleaning and data integration techniques are applied Data cleaning and data integration techniques are applied Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources When data is moved to the warehouse, it is converted When data is moved to the warehouse, it is converted

CISB594 – Business Intelligence 4 main characteristics of data warehousing 3. Time variant (time series) maintains historical data, data for analysis from multiple maintains historical data, data for analysis from multiple sources contain multiple time points A data warehouse's focus on change over time The time horizon for the data warehouse is significantly longer than that of operational systems The time horizon for the data warehouse is significantly longer than that of operational systems Operational database: current value data Operational database: current value data Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

CISB594 – Business Intelligence 4 main characteristics of data warehousing 4. Non-volatile after data are entered into a data warehouse, users cannot change or update the data. after data are entered into a data warehouse, users cannot change or update the data. Never overwritten, nor deleted Never overwritten, nor deleted Operational update of data does not occur in the data warehouse environment Operational update of data does not occur in the data warehouse environment Does not require transaction processing, recovery, and concurrency control mechanisms Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: Requires only two operations in data accessing: Initial loading of data and access of data Initial loading of data and access of data

CISB594 – Business Intelligence Data Warehouse Runs on a DBMS such as Oracle, SQL, DB2 … Runs on a DBMS such as Oracle, SQL, DB2 … Keeps a large amount of data from different time for a long period of time (time variant) Keeps a large amount of data from different time for a long period of time (time variant) Data in data warehouse cannot be overwritten (non-volatile) Data in data warehouse cannot be overwritten (non-volatile) Data comes from various sources, internally and externally (integrated) Data comes from various sources, internally and externally (integrated) Carefully designed to allow for analysis/ pattern discovery on identified subject matter (subject-oriented) Carefully designed to allow for analysis/ pattern discovery on identified subject matter (subject-oriented)

CISB594 – Business Intelligence Data Warehouse Vs. Operational DBMS OLTP (on-line transaction processing) OLTP (on-line transaction processing) – Major task of traditional relational DBMS – Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. OLAP (on-line analytical processing) OLAP (on-line analytical processing) – Major task of data warehouse system – Data analysis and decision making Distinct features (OLTP vs. OLAP): Distinct features (OLTP vs. OLAP): – User and system orientation: customer vs. market – Data contents: current, detailed vs. historical, consolidated – Database design: ER + application vs. star + subject – View: current, local vs. evolutionary, integrated – Access patterns: update vs. read-only but complex queries

CISB594 – Business Intelligence OLTP OLTP (on-line transaction processing) – Major task of traditional relational DBMS – Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.

CISB594 – Business Intelligence OLAP Online Analytical Processing (OLAP) is an industry-accepted reporting technology that provides high- performance analysis and easy reporting on large volumes of data Online Analytical Processing (OLAP) is an industry-accepted reporting technology that provides high- performance analysis and easy reporting on large volumes of data The goal of OLAP: The goal of OLAP: – multidimensional data analysis, – provide fast and flexible data summarization, analysis, and reporting capabilities – ability to view trends over time

CISB594 – Business Intelligence OLTP vs OLAP OLTPOLAP UsersClerk, IT professionalKnowledge worker FunctionDay to day operationsDecision support DB DesignApplication-orientedSubject-oriented Data Current, up-to-date detailed, flat relational Isolated Historical, summarized, multidimensional, integrated, consolidated UsageRepetitiveAd-hoc Access Read/write Index/hash on prim. Key Lots of scans Unit of WorkShort, simple transactionComplex query # Records AccessedTensMillions # UsersThousandsHundreds DB Size100MB-GB100GB-TB

CISB594 – Business Intelligence How the database looks like for the two types The operational database (relational):

CISB594 – Business Intelligence How the database looks like for the two types The datawarehouse (star schema):

CISB594 – Business Intelligence Why … Can we not operate BI on operational database to obtain the answers to our business questions? Answer : BI requires complex query formulation, preparation of data to address the query and if use the operational database, the process will be very slow due to complex joins and multiple scans – A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month." – A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for this customer."

CISB594 – Business Intelligence Ask yourself Explain data warehouse. How does it differ from operational database? Provide an example to support your answer Explain the 4 main characteristics of data warehouse Compare and contrast OLAP to OLTP

CISB594 – Business Intelligence Data Warehousing - Concept Data mart – Smaller and focuses on a particular subject or department. – It is a subset of data warehouse/departmental data warehouse – A data mart is a smaller DW designed around one problem, organizational function, topic, or other focus area. Can be Dependent data mart – A subset that is created directly from a data warehouse – Ensures that the end user is viewing the same version of the data that are accessed by all other data warehouse users Or Independent data mart – A small data warehouse designed for a strategic business unit or a department

CISB594 – Business Intelligence Data Warehousing - Concept Enterprise data warehouse (EDW) – A large scale data warehouse used across the enterprise for decision support – Used to provide data for many types of DSS, including CRM, supply chain management, BPM, KMS etc Metadata – Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its use. – Metadata in layman term : Metadata describes other data. It provides information about a certain item's content. For example, an image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data

CISB594 – Business Intelligence Data Warehousing Process Overview The data warehousing process consists of the following steps: 1.Data are imported from various internal and external sources 2.Data are cleansed and organized consistently with the organization’s needs 3a. Data are loaded into the enterprise data warehouse 4a.If desired, data marts are created as subsets of the EDW —or— 3b.Data are loaded into data marts 4b.The data marts are consolidated into the EDW 5.Analyses are performed as needed

CISB594 – Business Intelligence Data Warehousing - Process Overview The major components of a data warehousing process Data sources Data sources. Data are sourced from operational systems and possibly from external data sources. Data extraction Data extraction. Data are extracted using custom-written or commercial software called ETL. Data loading Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse. Data warehouse/Comprehensive database Data warehouse/Comprehensive database. This is the EDW that supports decision analysis by providing relevant summarized and detailed information. Metadata Metadata. Metadata are maintained for access by IT personnel and users. Metadata include rules for organizing data summaries that are easy to index and search. Middleware tools Middleware tools. Middleware tools enable access to the data warehouse from a variety of front-end applications.

Data Warehousing - Process Overview

CISB594 – Business Intelligence Data Warehousing Architectures There are several basic architectures for data warehousing To distinguished the architectures data warehouse is divided into three parts: The data warehouse itself Data acquisition (back-end) software, which extracts data from legacy systems and external sources, consolidates and loads into the data warehouse Client (front-end) software, which allows users access and analyze data from the warehouse

Data Warehousing Architectures CISB594 – Business Intelligence

Data Warehousing Architectures 1.Information interdependence between organizational units 2.Upper management’s information needs 3.Urgency of need for a data warehouse 1.4. Constraints on resources, funding 2.5. Strategic view of the data warehouse prior to implementation 3.6. Compatibility with existing systems 4.7. Perceived ability of the in- house IT staff 5.8. Technical issues, technology 6.9. Social/political factors/nature of users Factors that potentially affect the architecture selection decision: CISB594 – Business Intelligence

Now ask if.. You are able to: Understand the basic definitions and concepts of data warehouses Understand how a data warehouse differs from a database Describe the characteristics of data warehouse Describe data warehouse process overview CISB594 – Business Intelligence