Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

Intro to Data Mining: Extracting Information and Knowledge from Data.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Data Warehousing M R BRAHMAM.
Chapter 13 The Data Warehouse.
Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics
Chapter 13 Business Intelligence and Data Warehouses
Database Systems: Design, Implementation, and Management Tenth Edition
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapter 12 The Data Warehouse
Introduction to Data Warehousing. From DBMS to Decision Support DBMSs widely used to maintain transactional data Attempts to use of these data for analysis,
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
Designing a Data Warehouse
Components of the Data Warehouse Michael A. Fudge, Jr.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
ITEC 3220A Using and Designing Database Systems
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 Additional Information Instructor: Dan Hebert.
Chapter 13 The Data Warehouse
12 The Data Warehouse and Data Mining MIS 304 Winter 2006.
Data Warehouse & Data Mining
OnLine Analytical Processing (OLAP)
Datawarehouse Objectives
1 Data Warehouses BUAD/American University Data Warehouses.
13 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 13 Business Intelligence and Data Warehouses.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 13 Business Intelligence and Data Warehouses.
UNIT-II Principles of dimensional modeling
1 On-Line Analytic Processing Warehousing Data Cubes.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Managing Data for DSS II. Managing Data for DS Data Warehouse Common characteristics : –Database designed to meet analytical tasks comprising of data.
What is OLAP?.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
ITEC 3220M Using and Designing Database Systems Instructor: Prof. Z.Yang Course Website: c3220m.htm Office: TEL.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Managing Data Resources File Organization and databases for business information systems.
Intro to MIS – MGS351 Databases and Data Warehouses
Data warehouse.
Chapter 13 Business Intelligence and Data Warehouses
On-Line Analytic Processing
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Chapter 13 – Data Warehousing
MANAGING DATA RESOURCES
Data Warehouse and OLAP
Introduction of Week 9 Return assignment 5-2
Chapter 13 The Data Warehouse
Chapter 13 The Data Warehouse
Data Warehouse and OLAP
Presentation transcript:

Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor Professor’s Notes In this lecture you will learn about data warehouses. More and more businesses are creating decision support systems, where the data warehouse is an essential component. The data warehouse has a different architecture than does a transactional relational database. So we will seek to explore the differences between a data warehouse and a relational database, and how it is used to advanced information into business knowledge. Samuel Conn, Asst. Professor

In this lecture, you will learn: How operational data and decision support differ What a data warehouse is and how its data are prepared What star schemas are and how they are constructed What steps are required to implement a data warehouse successfully What data mining is and what role it plays in decision support 2 In this lecture, you will learn: How operational data and decision support differ What a data warehouse is and how its data are prepared What star schemas are and how they are constructed What steps are required to implement a data warehouse successfully What data mining is and what role it plays in decision support The key points of this lecture are how operational data from a transactional system is different from decision support data. We also want to look at the basic construct of the data warehouse, the steps to implementing a data warehouse, and how data mining is used in decision support. 2

The Need for Data Analysis External and internal forces require tactical and strategic decisions Search for competitive advantage Business environments are dynamic Decision-making cycle time is reduced Different managers require different decision support systems (DSS) 3 External and internal forces require tactical and strategic decisions Search for competitive advantage Business environments are dynamic Decision-making cycle time is reduced Different managers require different decision support systems (DSS) The Need for Data Analysis Businesses need to make both tactical (short term) and strategic (long term) decisions. They are always searching for a competitive advantage in the marketplace. Since the business environment is dynamic and always changing, the amount of time that a business has to make decisions is shortened. Decision support systems can help by providing data “analytics” that are positioned against various time intervals. The different managers within the organization all require different types of business knowledge, or intelligence. 3

Decision Support Systems Is a methodology Extracts information from data Uses information as basis for decision making 4 Decision Support Is a methodology Extracts information from data Uses information as basis for decision making Decision Support Systems Decision Support Systems, or DSS, is based on a methodology. The methodology is a combination of processes and technologies that extract information from data, and knowledge from information. The idea is to use the decision support data as the basis for making decision. 4

Decision Support Systems Decision support system (DSS) Arrangement of computerized tools Used to assist managerial decision Extensive data “massaging” to produce information Used at all levels in organization Tailored to focus on specific areas and needs Interactive Provides ad hoc query tools 5 Decision Support Systems Decision support system (DSS) Arrangement of computerized tools Used to assist managerial decision Extensive data “massaging” to produce information Used at all levels in organization Tailored to focus on specific areas and needs Interactive Provides ad hoc query tools A DSS is a complex environment. It involves many computers and analytic software. Data is said to be “massaged” in order to advance the IT value proposition from data to information, and then information to knowledge. 5

DSS Components Figure 13.1 6 6 DSS Components Figure 13.1 Here are the components of the DSS….you can see that it starts with operational data. This is the transactional data found in your organization’s relational database. The data is extracted and reformatted into the business data, or the data store. From the data store business model data and rules can be applied and the end-user can generate graphic representations of the analyzed data. We use “data visualization” tools to accomplish this. Figure 13.1 6

Operational vs. Decision Support Data Operational data Relational, normalized database Optimized to support transactions Real time updates DSS Snapshot of operational data Summarized Large amounts of data Data analyst viewpoint Timespan Granularity Dimensionality 7 Operational data Relational, normalized database Optimized to support transactions Real time updates DSS Snapshot of operational data Summarized Large amounts of data Data analyst viewpoint Timespan Granularity Dimensionality Operational vs. Decision Support Data There is a big difference between what is considered “operational” data and “DSS” data. Operation data is that which is normalized and stored in a relational database. It is optimized for transactions from the customer or user. DSS data is a “snapshot” of the operational data. It is summarized and generally involves large amounts of data. DSS data have characteristics that operational data do not have. Principally, the data is viewed from the standpoint of Timespan, Granularity, and Dimensionality. 7

The DSS Database Requirements Database schema Support complex (non-normalized) data Extract multidimensional time slices Data extraction and filtering End-user analytical interface Database size Very large databases (VLDBs) Contains redundant and duplicated data 8 Database schema Support complex (non-normalized) data Extract multidimensional time slices Data extraction and filtering End-user analytical interface Database size Very large databases (VLDBs) Contains redundant and duplicated data The DSS Database Requirements The database for the DSS has a different architecture, or schema, than the relational architecture found in the database the hosts the operational data. It has to support complex, non-normalized data and it has to be able to extract multidimensional slices of data based on time. So data extraction and filtering becomes important, along with the end-user interface that is able to run “analytics” on the data, or “data mine” it. Also, the size of the database is generally very large because you are dealing with the accumulation of the operational data. 8

Data Warehouse Integrated Subject-Oriented Time Variant Non-Volatile Centralized Holds data retrieved from entire organization Subject-Oriented Optimized to give answers to diverse questions Used by all functional areas Time Variant Flow of data through time Projected data Non-Volatile Data never removed Always growing 9 Integrated Centralized Holds data retrieved from entire organization Subject-Oriented Optimized to give answers to diverse questions Used by all functional areas Time Variant Flow of data through time Projected data Non-Volatile Data never removed Always growing Data Warehouse Here are some characteristics of the Data Warehouse. One is that it is centralized and holds the data for the entire organization. Next, it is subject oriented and optimized to answer very specific questions. It also has a time variant that shows the flow of the data through time. The data is also considered to be non-volatile, or never deleted or removed, and always growing from the continual feed of operational data. 9

Creating a Data Warehouse 10 Creating a Data Warehouse Figure 13.3 This illustration shows the process of constructing a data warehouse. From the operational data sources, you use Extraction Transformation and Loading (ETL) tools to extract, clean, and load the data into the data warehouse schema. The schema that is used in most data warehouses is not relational, but generally what is referred to as a “star” schema. Figure 13.3 10

Data Marts Single-subject data warehouse subset Decision support to small group Can be test for exploring potential benefits of Data warehouses Address local or departmental problems 11 Single-subject data warehouse subset Decision support to small group Can be test for exploring potential benefits of Data warehouses Address local or departmental problems Data Marts Data Marts are single subject “subsets” of the data warehouse that serve some single entity, usually a business entity like a department. So the marketing department, the sales department, the finance department, the human resources department, and so on, would have their own data marts. 11

DSS Architectural Styles Traditional mainframe-based OLTP Managerial information system (MIS) with 3GL First-generation departmental DSS First-generation enterprise data warehouse using RDMS Second-generation data warehouse using MDBMS 12 DSS Architectural Styles Traditional mainframe-based OLTP Managerial information system (MIS) with 3GL First-generation departmental DSS First-generation enterprise data warehouse using RDMS Second-generation data warehouse using MDBMS There are various styles and configurations of architecture for the DSS. Beginning with the legacy mainframe systems, to the 3rd Generation Language (3GL) management information systems, through the first generation departmental DSS, the first generation enterprise data ware house using a relational design, to the second generation data ware house using multidimensional data base management systems……data warehouse concepts are continuing to grow. 12

Online Analytical Processing (OLAP) Advanced data analysis environment Supports decision making, business modeling, and operations research activities Characteristics of OLAP Use multidimensional data analysis techniques Provide advanced database support Provide easy-to-use end-user interfaces Support client/server architecture 13 Advanced data analysis environment Supports decision making, business modeling, and operations research activities Characteristics of OLAP Use multidimensional data analysis techniques Provide advanced database support Provide easy-to-use end-user interfaces Support client/server architecture Online Analytical Processing (OLAP) The online analytical processing (OLAP) environment is different than the online transaction processing (OTLP) environment. The OLAP environment supports decision making, business modeling, and operations research and has the ability to use multidimensional data analysis techniques. 13

OLAP Client/Server Architecture 14 OLAP Client/Server Architecture Figure 13.6 Here is the basic OLAP environment implemented on client/server platform architecture. The OLAP system can take feeds from both operational data and data warehouse data. This allows the OLAP system to present to the user in a GUI format, both analytical processing and data processing logic from the operational data and the data warehouse data. Figure 13.6 14

OLAP Server Arrangement 15 OLAP Server Arrangement Figure 13.7 This illustration depicts the OLAP server configuration. The thing to note here is that the OLAP server, or “engine” provides the front-end to the data warehouse. Figure 13.7 15

OLAP Server with Multidimensional Data Store Arrangement 16 OLAP Server with Multidimensional Data Store Arrangement Figure 13.8 This illustration shows and advanced architecture for OLAP systems where the OLAP “engine” can construct reporting from multidimensional data in the data warehouse. On the right, you will see the multiple GUI interfaces that can be built to access the OLAP engine. Figure 13.8 16

OLAP Server with Local Mini-Data-Marts Figure 13.9 And a final complexity added to the architecture is the addition of local data marts built to interface with the OLAP GUIs. The local data marts cache the data “cube” that is built when the data is represented in multidimensional form. So the data cubes may be built to support the business intelligence needs of various departments within the organization. Figure 13.9 17

Relational OLAP (ROLAP) OLAP functionality Uses relational DB query tools Extensions to RDBMS Multidimensional data schema support Data access language and query performance optimized for multidimensional data Support for very large databases (VLDBs) 18 OLAP functionality Uses relational DB query tools Extensions to RDBMS Multidimensional data schema support Data access language and query performance optimized for multidimensional data Support for very large databases (VLDBs) Relational OLAP (ROLAP) When OLAP functionality uses relational query tools (SQL based tools) that have “extensions” built in to support the data warehouse’s multidimensional data schema, then we call it Relational Online Analytical Processing, or ROLAP, for short. 18

Typical ROLAP Client/Server Architecture 19 Typical ROLAP Client/Server Architecture Figure 13.10 A typical ROLAP environment implemented on client/server architecture would look like this. Figure 13.10 19

Multidimensional OLAP (MOLAP) OLAP functionality to multidimensional databases (MDBMS) Stored data in multidimensional data cube N-dimensional cubes called hypercubes Cube cache memory speeds processing Affected by how the database system handles density of data cube called sparsity 20 OLAP functionality to multidimensional databases (MDBMS) Stored data in multidimensional data cube N-dimensional cubes called hypercubes Cube cache memory speeds processing Affected by how the database system handles density of data cube called sparsity Multidimensional OLAP (MOLAP) Then we have Multidimensional Online Analytical Processing, or MOLAP. This is when OLAP functionality extends to the multidimensional data cube. In this architecture the caching of the data cube is done on the MOLAP server, on the MOLAP client, or both. MOLAP databases are typically known to be faster than ROLAP databases. 20

MOLAP Client/Server Architecture 21 MOLAP Client/Server Architecture Figure 13.11 Compare and contrast this client/server platform architecture implementation of a MOLAP system with that of the ROLAP system. Figure 13.11 21

Star Schema Data-modeling technique Maps multidimensional decision support into relational database Yield model for multidimensional data analysis while preserving relational structure of operational DB Four Components: Facts Dimensions Attributes Attribute hierarchies 22 Data-modeling technique Maps multidimensional decision support into relational database Yield model for multidimensional data analysis while preserving relational structure of operational DB Four Components: Facts Dimensions Attributes Attribute hierarchies Star Schema We said that the principal schema design used to host the data warehouse data is the “star” schema. The star schema is a different “topology” than the relational (normalized) schema. A star schema gives up what is effectively 100% indexing. It has four components: fact tables, dimension tables, attributes, and attribute hierarchies. 22

Simple Star Schema Figure 13.12 23 23 Simple Star Schema Figure 13.12 Here is a simple example of a star schema. You see the fact table in the middle, with dimension tables (location, product, and time) linked to it. Figure 13.12 23

Slice and Dice View of Sales 24 Slice and Dice View of Sales Figure 13.14 The nice thing about the construct is that you can see different “views” of the data….you can “slice and dice” the data into a variety of views. Figure 13.14 24

Star Schema Representation Facts and dimensions represented by physical tables in data warehouse DB Fact table related to each dimension table (M:1) Fact and dimension tables related by foreign keys Subject to the primary/foreign key constraints 25 Facts and dimensions represented by physical tables in data warehouse DB Fact table related to each dimension table (M:1) Fact and dimension tables related by foreign keys Subject to the primary/foreign key constraints Star Schema Representation Here are the basics of a star schema’s representation of the data. You have tables that store facts and dimensions, the fact table is related to the dimension table in a M:1 relationship, they are related by foreign keys that are subject to the primary/foreign key constraints. 25

Star Schema for Sales Figure 13.17 26 26 Star Schema for Sales This is an example of a star schema at the table level. You see the SALES “fact table” in the middle, with relations to the location, customer, product, and time dimension tables. Figure 13.17 26

Performance-Improving Techniques for Star Schema Normalization of dimensional tables Multiple fact tables representing different aggregation levels Denormalization of the fact tables Table partitioning and replication 27 Normalization of dimensional tables Multiple fact tables representing different aggregation levels Denormalization of the fact tables Table partitioning and replication Performance-Improving Techniques for Star Schema There are some ways to improve the performance of the star schema. One way is to normalize the dimension tables. You can also create multiple fact tables that represent different aggregation levels. Another technique is to denormalize the fact tables. Or one “physical” method is to partition the fact table. 27

Data Warehouse Implementation Road Map 28 Data Warehouse Implementation Road Map Figure 13.21 This is a road map to the implementation of a data warehouse. In effect, this is the life cycle model of development that you would follow if you were creating a data warehouse for a DSS. Figure 13.21 28

Data Mining Seeks to discover unknown data characteristics Automatically searches data for anomalies and relationships Data mining tools Analyze data Uncover problems or opportunities Form computer models based on findings Predict business behavior with models Require minimal end-user intervention 29 Seeks to discover unknown data characteristics Automatically searches data for anomalies and relationships Data mining tools Analyze data Uncover problems or opportunities Form computer models based on findings Predict business behavior with models Require minimal end-user intervention Data Mining Data mining is what we do to the data once we have it in a multidimensional format. It seeks to discover patterns, trends, and things unknown about the data. 29

Extraction of Knowledge from Data 30 Extraction of Knowledge from Data Figure 13.22 Study this illustration of the IT value proposition. You see that at the bottom of the pyramid you have the Data, next transform it to Information, and the final transformation is to Knowledge. This illustration identifies the various technologies and processes associated with the transformation process. 30 Figure 13.22

Data Mining Process Figure 13.23 31 31 Data Mining Process This illustration shows the data mining process. There are essentially four phases that begin with the Data preparation process and end with the Prognosis phase. Figure 13.23 31