Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehouse Fundamentals

Similar presentations


Presentation on theme: "Data Warehouse Fundamentals"— Presentation transcript:

1 Data Warehouse Fundamentals
Rabie A. Ramadan, PhD 6

2 ARCHITECTURAL COMPONENTS

3 UNDERSTANDING DATA WAREHOUSE ARCHITECTURE
Objective We will study the architectural components in the order in which they enable the flow of data from the sources as business intelligence to the end-users We will be able to look at each area of the architecture and examine the functions, procedures, and features in that area.

4 Architecture: Definitions
The structure that brings all the components of a data warehouse together is known as the architecture. Example: School building includes the various classrooms, offices, library, corridors, gymnasiums, doors, windows, roof, and a large number of other such components. The structure that ties all of the components together is the architecture of the school building. Let us say , when that the builders were told to make the classrooms large. So they made the classrooms larger but eliminated the offices altogether, thus constructing the school building with a faulty architecture Correct architecture is critical for the success of your data warehouse.

5 Architecture Factors Data Warehouse Architecture includes a number of factors: The integrated data that is the centerpiece. The architecture includes everything that is needed to prepare the data and store it. All the means for delivering information from your data warehouse. Composed of the rules, procedures, and functions that enable your data warehouse to work and fulfill the business requirements. Finally, the architecture is made up of the technology that empowers your data warehouse. It defines the standards, measurements, general design, and support techniques.

6 Architecture in Three Major Areas
Data acquisition Data storage Information delivery

7 Architecture in Three Major Areas
OnLine Analytic Processing (OLAP) is a loosely defined set of software tools that provides a dimensional framework for decision support. The term OLAP also defines a confederation of vendors who offer non-relational, proprietary products aimed at decision support.

8 Different Objectives and Scope
In Operational data , the user requires single piece of information. Information about single order In data warehouse , the required data is large. A view about the whole year sales divided into quarters So, the scope is different Defining the scope for a datawarehouse is also difficult Online transaction processing, or OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.

9 What are all the factors you must consider for defining the scope?
You must consider the number and extent of the data sources. How many legacy systems are you going to extract the data from? What are the external sources? Are you planning to include departmental files, spreadsheets, and private databases? What about including the archived data? In a data warehouse, data granularity and data volumes are also important considerations

10 What is the scope of the election data warehouse ?

11 Data Content The “read-only” data in the data warehouse sits is the primary component in the architecture. Operational data is not “read-only” data. Data warehouse architecture must support the storing of data grouped by business subjects, not grouped by applications Data warehouse does not represent a snapshot

12 Complex Analysis and Quick Response
Your data warehouse architecture must, therefore, support variations for providing analysis. Users must be able to drill down, roll up, slice and dice data, and play with “what-if” scenarios. Users must have the capability to review the result sets in different output options. Users are no longer content with textual result sets or results displayed in tabular formats. Every result set in tabular format must be translated into graphical charts

13 Complex Analysis and Quick Response
Provision of strategic information is meant for making rapid decisions and to deal with situations quickly. Example, Let us say your vice president of marketing wants to quickly discover the reasons for the drop in sales for three consecutive weeks in the central region and make prompt decisions to remedy the situation. Your data warehouse must give him or her the tools and information for a quick response to the problem.

14 Complex Analysis and Quick Response
If your data warehouse supports real time information retrieval, the architecture has to expand to accommodate real time data capture and the ability to obtain strategic information in real time to make on-the-spot decisions. Real time data warehousing means delivery of information to a larger number of users both inside and outside the organization.

15 ARCHITECTURAL FRAMEWORK
Architecture Supporting Flow of Data Datawarehousing just means : Taking all the necessary source data, Preparing it, Storing it in suitable formats, and then Delivering useful information to the end-users.

16

17 Questions Need to be Answered
What happens at critical points of the flow of data? What are the architectural components, and how do these components enable the data flow?

18 At the Data Source Source data governs the extraction of data for preparation and storage in the datawarehouse. The data staging architectural component governs the transformation, cleansing, and integration of data.

19 In the Data Warehouse Repository
Includes the loading of data from the staging area . Storing the data in suitable formats for information delivery. The metadata architectural component is also a storage mechanism to contain data about the data at every point of the flow of data from beginning to end.

20 At the User End The information delivery architectural component includes: Dependent data marts, Special multidimensional databases, and A full range of query and reporting facilities, including dashboards and scorecards.

21 The Management and Control Module
An overall module managing and controlling the entire data warehouse environment. This component has two major functions: First to constantly monitor all the ongoing operations, and Next to step in and recover from problems when things go wrong.

22 Management Operations
Relating to data acquisition: Extracting data from the source systems either for full refresh or for incremental loads. Moving the data into the staging area and performing the data transformation. Manages and controls these data acquisition functions, ensuring that extracts and transformations are carried out correctly and in a timely fashion.

23 Management Operations
Relating to data storage: Manages backing up significant parts of the data warehouse and recovering from failures. Monitoring the growth and periodically archiving data from the data warehouse. Governs data security and provides authorized access to the data warehouse.

24 Management Operations
Relating to end-user information delivery Ensures that information delivery is carried out properly.

25 Architecture Factors Data Warehouse Architecture includes a number of factors: The integrated data that is the centerpiece. The architecture includes everything that is needed to prepare the data and store it. All the means for delivering information from your data warehouse. Composed of the rules, procedures, and functions that enable your data warehouse to work and fulfill the business requirements. Finally, the architecture is made up of the technology that empowers your data warehouse. It defines the standards, measurements, general design, and support techniques.

26 TECHNICAL ARCHITECTURE
Technical architecture of a data warehouse is the complete set of functions and services provided within its component structures. It includes the procedures and rules that are required to perform the functions and provide the services. It encompasses the data stores needed for each component to provide the services. The architecture is not the set of tools needed to perform the functions and provide the services. Tools are the means to implement the technical architecture.

27 Technical architecture for Data Acquisition
Major architectural components are : source data and data staging

28 Technical architecture for Data Acquisition
Data Flow Begins at the data sources and pauses at the staging area. After transformation and integration, the data is ready for loading into the data warehouse repository. Data Sources Enterprise’s operational systems. May use an SQL-based language for extracting data. For including data from outside sources, you will have to create temporary files to hold the data received from the outside sources.

29 Technical architecture for Data Acquisition
Intermediary Data Stores As data gets extracted from the data sources, it moves through temporary files. Sometimes, extracts of homogeneous data from several source applications are pulled into separate temporary files and then merged into another temporary file before moving it to the staging area.

30 Technical architecture for Data Acquisition
Staging Area All the extracted data is put together and prepared for loading into the data warehouse. The staging area is like an assembly plant or a construction area. In this area, you examine each extracted file, review the business rules, perform the various data transformation functions, sort and merge data, resolve inconsistencies, and cleanse the data.

31 Functions and Services of the Data Acquisition
Data Extraction Select data sources and determine the types of filters to be applied to individual sources. Generate automatic extract files from operational systems using replication and other techniques. Create intermediary files to store selected data to be merged later. Transport extracted files from multiple platforms. Provide automated job control services for creating extract files. Reformat input from outside sources. Reformat input from departmental data files, databases, and spreadsheets. Generate common application codes for data extraction. Resolve inconsistencies for common data elements from multiple sources.

32 Functions and Services of the Data Acquisition
Data Transformation Map input data to data for data warehouse repository. Clean data, deduplicate, and merge/purge. Denormalize extracted data structures as required by the dimensional model of the data warehouse. Convert data types. Calculate and derive attribute values. Check for referential integrity. Aggregate data as needed. Resolve missing values. Consolidate and integrate data.

33 Functions and Services of the Data Acquisition
Data Staging Provide backup and recovery for staging area repositories. Sort and merge files. Create files as input to make changes to dimension tables. If data staging storage is a relational database, create and populate database. Preserve audit trail to relate each data item in the data warehouse to input source. Resolve and create primary and foreign keys for load tables. Consolidate datasets and create flat files for loading through DBMS utilities. If staging area storage is a relational database, extract load files.

34 Data Storage The process of loading the data from the staging area into the data warehouse repository.

35 Data Storage Data Flow The data flow begins at the data staging area.
The transformed and integrated data is moved from the staging area to the data warehouse repository.

36 Data Storage Data Groups
The first group is the set of files or tables containing data for a full refresh. This group of data is usually meant for the initial loading of the data warehouse. The other group of data is the set of files or tables containing ongoing incremental loads. Most of these relate to nightly loads. Some incremental loads of dimension data may be performed at less frequent intervals.

37 Data Storage The Data Repository
Almost all of today’s data warehouse databases are relational databases. All the power, flexibility, and ease of use capabilities of the RDBMS become available for the processing of data.

38 Data Storage Functions and Services
Load data for full refreshes of data warehouse tables. Perform incremental loads at regular prescribed intervals. Support loading into multiple tables at the detailed and summarized levels. Optimize the loading process. Provide automated job control services for loading the data warehouse. Provide backup and recovery for the data warehouse database. Provide security. Monitor and fine-tune the database. Periodically archive data from the database according to preset conditions.

39 Technical Architecture Information Delivery

40 Technical Architecture Information Delivery
Almost all modern data warehouses provide for online analytical processing (OLAP). In this case, the primary data warehouse feeds data to proprietary multidimensional databases (MDDBs) where summarized data is kept as multidimensional cubes of information. The users perform complex multidimensional analysis using the information cubes in the MDDBs.

41 Technical Architecture Information Delivery
Data Flow Recently progressive organizations implement dashboards and scorecards as part of information delivery. Dashboards are real time or near real time information display devices. Data flows to the dashboards in real time from the data warehouse.

42 Technical Architecture Information Delivery
Service Locations You may provide query services from the user desktop, from an application server, or from the database itself. This will be one of the critical decisions for your architecture design.

43 Technical Architecture Information Delivery
Data Stores You may consider the following intermediary data stores: Proprietary temporary stores to hold results of individual queries and reports for repeated use Data stores for standard reporting Data stores for dashboards Proprietary multidimensional databases

44 Technical Architecture Information Delivery
Functions and Services Provide security to control information access. Monitor user access to improve service and for future enhancements. Allow users to browse data warehouse content. Simplify access by hiding internal complexities of data storage from users. Automatically reformat queries for optimal execution. Enable queries to be aware of aggregate tables for faster results. Govern queries and control runaway queries.

45 ARCHITECTURAL TYPES Centralized Corporate Data Warehouse

46 ARCHITECTURAL TYPES Independent Data Marts
Data warehouse could be a combination of independent data marts

47 ARCHITECTURAL TYPES Hub-and-Spoke
Data marts depend on the enterprise data warehouse for data feed

48 Assignment is posted


Download ppt "Data Warehouse Fundamentals"

Similar presentations


Ads by Google