Presentation is loading. Please wait.

Presentation is loading. Please wait.

CISB594 – Business Intelligence Data Warehousing Part I.

Similar presentations


Presentation on theme: "CISB594 – Business Intelligence Data Warehousing Part I."— Presentation transcript:

1 CISB594 – Business Intelligence Data Warehousing Part I

2 CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the following texts, unless stated otherwise.

3 CISB594 – Business Intelligence Objectives At the end of this lecture, you should be able to: Understand the basic definitions and concepts of data warehouses Understand how a data warehouse differs from a database Describe the characteristics of data warehouse Describe data warehouse process overview Describe the different types of data warehouse architectures CISB594 – Business Intelligence

4 Data Warehouse A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format “The data warehouse is a collection of integrated, subject- oriented databases design to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time” A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis. (Wikipedia) In your own words?

5 CISB594 – Business Intelligence Characteristics of data warehousing Main Subject oriented - Data organized by detailed subject, containing only information relevant for decision support, unlike operational database which are product oriented Subject oriented - Data organized by detailed subject, containing only information relevant for decision support, unlike operational database which are product oriented Integrated – must place data from different sources into a consistent format, to do so they must deal with naming conflict and discrepancies Integrated – must place data from different sources into a consistent format, to do so they must deal with naming conflict and discrepancies Time variant (time series) - maintains historical data. Data for analysis from multiple sources contain multiple time points Time variant (time series) - maintains historical data. Data for analysis from multiple sources contain multiple time points Nonvolatile - after data are entered into a data warehouse, users cannot change or update the data. Nonvolatile - after data are entered into a data warehouse, users cannot change or update the data.

6 CISB594 – Business Intelligence Characteristics of data warehousing Additional Relational/multidimensional Client/server Real-time Include metadata

7 CISB594 – Business Intelligence How is a data warehouse different from a database? Technically a data warehouse is a database, with certain characteristics to facilitate its role in decision support. Technically a data warehouse is a database, with certain characteristics to facilitate its role in decision support. However, it is an “integrated, time-variant, non­volatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organi­ zation.” - These characteristics, are not necessarily true of databases in general. However, it is an “integrated, time-variant, non­volatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organi­ zation.” - These characteristics, are not necessarily true of databases in general. As a practical matter most databases are highly normalized, in part to avoid update anomalies. As a practical matter most databases are highly normalized, in part to avoid update anomalies. Data warehouses are often denormalized for performance reasons. This is acceptable because their content is never updated, just added to. (Historical data are static) Data warehouses are often denormalized for performance reasons. This is acceptable because their content is never updated, just added to. (Historical data are static)

8 CISB594 – Business Intelligence Data Warehousing - Concept Data mart – Smaller and focuses on a particular subject or department. – It is a subset of data warehouse/departmental data warehouse – A data mart is a smaller DW designed around one problem, organizational function, topic, or other focus area. Can be Dependent data mart – A subset that is created directly from a data warehouse – Ensures that the end user is viewing the same version of the data that are accessed by all other data warehouse users Or Independent data mart – A small data warehouse designed for a strategic business unit or a department

9 CISB594 – Business Intelligence Data Warehousing - Concept Operational data stores (ODS) – It is a type of database often used as an interim area for a data warehouse, especially for customer information files. – Use for short term decisions rather than medium and long term – Similar to short term memory, stores only recent information

10 CISB594 – Business Intelligence Data Warehousing - Concept Oper marts An operational data mart. An oper mart is a small-scale data mart typically used by a single department or functional area in an organization

11 CISB594 – Business Intelligence Data Warehousing - Concept Enterprise data warehouse (EDW) – A large scale data warehouse used across the enterprise for decision support – Used to provide data for many types of DSS, including CRM, supply chain management, BPM, KMS etc Metadata – Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its use

12 CISB594 – Business Intelligence Data Warehousing Process Overview Organizations continuously collect data, information, and knowledge at an increasingly accelerated rate and store them in computerized systems The number of users needing to access the information continues to increase as a result of improved reliability and availability of network access, especially the Internet Creating of data warehouse involves the following: – Data are imported from various internal and external resources – Cleansed and organized to suit the organization’s needs. – Data marts can be loaded for specific department/area (alternatively data marts are created first and later integrated into EDW)

13 CISB594 – Business Intelligence Data Warehousing Process Overview The data warehousing process consists of the following steps: 1.Data are imported from various internal and external sources 2.Data are cleansed and organized consistently with the organization’s needs 3a. Data are loaded into the enterprise data warehouse 4a.If desired, data marts are created as subsets of the EDW —or— 3b.Data are loaded into data marts 4b.The data marts are consolidated into the EDW 5.Analyses are performed as needed

14 CISB594 – Business Intelligence Data Warehousing - Process Overview The major components of a data warehousing process Data sources Data sources. Data are sourced from operational systems and possibly from external data sources. Data extraction Data extraction. Data are extracted using custom-written or commercial software called ETL. Data loading Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse. Comprehensive database Comprehensive database. This is the EDW that supports decision analysis by providing relevant summarized and detailed information. Metadata Metadata. Metadata are maintained for access by IT personnel and users. Metadata include rules for organizing data summaries that are easy to index and search. Middleware tools Middleware tools. Middleware tools enable access to the data warehouse from a variety of front-end applications.

15 Data Warehousing - Process Overview

16 CISB594 – Business Intelligence Data Warehousing Architectures There are several basic architectures for data warehousing To distinguished the architectures data warehouse is divided into three parts: The data warehouse itself Data acquisition (back-end) software, which extracts data from legacy systems and external sources, consolidates and loads into the data warehouse Client (front-end) software, which allows users access and analyze data from the warehouse

17 CISB594 – Business Intelligence Three-tier DW architecture 1 st tier : operational systems contain data and software for data acquisition 2 nd tier : the data warehouse 3 rd tier : DSS/BI/BA engines Data from data warehouse are processed and deposited in multidimensional database and organized for easy analysis and presentation Advantage: its separation of the functions of the data warehouse, which eliminates resource constraints and makes it easy to create data marts

18 Data Warehousing Architectures CISB594 – Business Intelligence

19 Two-tier DW architecture 1 st tier : operational systems contain data and software for data acquisition (i.e the server) 2 nd tier : DSS/BI/BA engines and the data DSS engines run on the same hardware platform as the data warehouse, hence more economical Advantage: economical Disadvantage: performance problem for large data warehouse with data intensive applications for decision support

20 Data Warehousing Architectures CISB594 – Business Intelligence

21 Web-based DW architecture Data warehousing and the Internet are two key technologies that offer important solutions for managing corporate data The integration of these two produced Web-based data warehousing On the client side, the user needs an Internet connection and a Web browser using GUI The Internet/Intranet/Extranet is the communication medium between client and servers On the server side, a Web server is used to manage the flow of info between client and server Advantage: ease of access, platform independence, lower cost Disadvantage: server capacity must be well planned carefully, page loading speed

22 Data Warehousing Architectures CISB594 – Business Intelligence

23 Data Warehousing Architectures CISB594 – Business Intelligence

24 Data Warehousing Architectures CISB594 – Business Intelligence

25 Data Warehousing Architectures CISB594 – Business Intelligence

26 Data Warehousing Architectures CISB594 – Business Intelligence

27 Data Warehousing Architectures CISB594 – Business Intelligence

28 Data Warehousing Architectures CISB594 – Business Intelligence

29 Data Warehousing Architectures CISB594 – Business Intelligence

30 Data Warehousing Architectures Issues to consider when deciding which architecture to use: – Which database management system (DBMS) should be used? Most are built using RDBMS (Oracle, SQL server, DB2 are commonly used) Each supports client/server and Web- based architecture – Will parallel processing and/or partitioning be used? Parallel processing enables multiple CPUs to process data warehouse requests simultaneously and provide scalability. Partitioning will split into smaller tables for access effeciency

31 CISB594 – Business Intelligence Data Warehousing Architectures Issues to consider when deciding which architecture to use: – Will data migration tools be used to load the data warehouse? – What tools will be used to support data retrieval and analysis?

32 Data Warehousing Architectures 1.Information interdependence between organizational units 2.Upper management’s information needs 3.Urgency of need for a data warehouse 4.Nature of end-user tasks 1.5. Constraints on resources 2.6. Strategic view of the data warehouse prior to implementation 3.7. Compatibility with existing systems 4.8. Perceived ability of the in-house IT staff 5.9. Technical issues 6.10. Social/political factors Ten factors that potentially affect the architecture selection decision: CISB594 – Business Intelligence

33 Now ask if.. You are able to: Understand the basic definitions and concepts of data warehouses Understand how a data warehouse differs from a database Describe the characteristics of data warehouse Describe data warehouse process overview Describe the different types of data warehouse architectures CISB594 – Business Intelligence


Download ppt "CISB594 – Business Intelligence Data Warehousing Part I."

Similar presentations


Ads by Google