Data Warehousing Concepts

Slides:



Advertisements
Similar presentations
Supervisor : Prof . Abbdolahzadeh
Advertisements

An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Business Information Warehouse Business Information Warehouse.
C6 Databases.
Data Warehousing M R BRAHMAM.
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
DATA WAREHOUSING.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Designing a Data Warehouse
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
© 2003, Prentice-Hall Chapter Chapter 2: The Data Warehouse Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
ETL By Dr. Gabriel.
Understanding Data Warehousing
Database Systems – Data Warehousing
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
The McGraw-Hill Companies, Inc Information Technology & Management Thompson Cats-Baril Chapter 3 Content Management.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
OLAP Theory-English version On-Line Analytical processing (Business Intelligence) [Ing.J.Skorkovský,CSc.] Department of corporate economy.
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
1 Data Warehouses BUAD/American University Data Warehouses.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Dimensional Modeling Primer Chapter 1 Kimball & Ross.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.
CISB594 – Business Intelligence Data Warehousing Part I.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Advanced Database Concepts
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Data Warehouse – Your Key to Success. Data Warehouse A data warehouse is a  subject-oriented  Integrated  Time-variant  Non-volatile  Restructure.
OLAP Theory-English version On-Line Analytical processing (Buisness Intelligence) Ing.Skorkovský,CSc Department of Corporate Economy Faculty of Economics.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Supervisor : Prof . Abbdolahzadeh
Advanced Applied IT for Business 2
Defining Data Warehouse Concepts and Terminology
Data warehouse.
Decision Support System by Simulation Model (Ajarn Chat Chuchuen)
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data storage is growing Future Prediction through historical data
MIS5101: Extract, Transform, Load (ETL)
Data Warehouse.
Applying Data Warehouse Techniques
Defining Data Warehouse Concepts and Terminology
MIS5101: Extract, Transform, Load (ETL)
Data Warehouse and OLAP
Database Vs. Data Warehouse
MIS5101: Extract, Transform, Load (ETL)
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
An Introduction to Data Warehousing
Data Warehousing Data Model –Part 1
MIS2502: Data Analytics Dimensional Data Modeling
Data Warehouse.
Analytics, BI & Data Integration
Data Warehouse and OLAP
Data Warehouse and OLAP Technology
Presentation transcript:

Data Warehousing Concepts

Introduction to Data Warehousing Data Warehouse Architecture Contents Introduction to Data Warehousing Data Warehouse Architecture Dimensional Modeling Data warehouse techniqued Data mining OLAP

1. Introduction to Data Warehousing Data Warehousing Definition Online Transaction Processing (OLTP) System Data Warehousing System Difference between OLTP and DW System Reasons for Building a Data Warehouse Benefits of Data Warehousing

1.1 What is a Data Warehouse? Data Warehouse is - primarily a centralized repository of an organization’s data. - holds large amount of data including historical info. - designed to support efficient data analysis and reporting.

1.2 OLTP Systems Focus: Designed to get data in quickly and to analyze the current events. Transaction Oriented. Organized around business processes such as Order Entry, Purchasing, Campaign Management, Trading etc. Avoidance of data duplication, maintainability etc. Characteristics: Process Oriented. Normalized Data. Current Data. Volatile Data. Real Time Updates.

1.3 Data Warehousing Systems Focus: Designed to get data out and quickly analyze. Concerned with customer, product etc. rather than order entry, campaign management. Focus on easy data access . Contains slices of data across different periods of time. Historical data supports trending, forecasting and time based performance reporting. Characteristics: Subject oriented rather than process oriented. Integrated across subjects and entire enterprise. De-Normalized Data. Time-Variant. Historical Data. Non Volatile Atomic and Summary Data.

1.4 OLTP Vs Data Warehouse OLTP Systems Data Warehouse Normalized Data De-Normalized Data Used to run the business Used to analyze the business Real-Time data update Updated on a predefined schedule Volatile Data Non-Volatile Data Current Data Historical Data Wider Audience. Transaction throughput Limited Audience. Fast Query Response Small to large database Large to Very Large Database

1.5 Why Build a Data Warehouse? No Single Version of Truth. Lack of standardized data across the enterprise for easy understanding and further decision-making. Absence of historical data for the purpose of analysis and decision making.

1.6 Benefits of Data Warehousing Rapid Access to data. Integrated data. Reliable Reporting. Better Decision making.

2. Data Warehouse Architecture Logical Architecture Elements of Data Warehouse

Data Warehouse - Logical Architecture BI Tools ETL Staging Area DQ Query Tools Datamarts OLAP Tools Data Mining Data Visualization

Elements of A Data Warehouse 1 2 3 ETL Tool or Process SAP CRM Inventory Manufacturing Staging 70% of Effort in a Data Warehousing solution is in developing a successful ETL strategy Operational Data Storage ETL & Staging ETL tool will interface with all the sources in the enterprise and extract data in a batch cycle or in real time Data Warehouse Quality Accounts Inventory Data Storage Enterprise Information is stored in the warehouse structure ETL Tool BI Tools, Portals Quality Finance Marktng Secured Access BI Tools interface with the databases to generate reports Reporting Layer METADATA Extracting The extract step is the first step involved in getting data into the data ware house environment. Extracting means reading and understanding the source data, and copying the parts that are needed to the data staging area for further work Extracting data needs to be done carefully so as not to effect production environments

Staging Area Transforming Once the data is extracted into the data staging area, there many possible transformation steps, including: Cleaning the data by correcting misspellings, resolving domain conflicts (such as a city name that is incompatible with a postal code), dealing with missing data elements, and parsing into standard formats Staging Area A storage area and set of processes that clean, transform, combine, duplicate, household, archive, and prepare source data for use in the data warehouse The data staging area is everything in between the source system and the presentation server The data staging area is not part of the physical data warehouse The staging area is dominated by the simple activities of sorting and sequential processing

Loading Data At the end of the transformation process, the data is in a position to be loaded across to the target warehouse First time bulk load to get the historical data into the Data Warehouse Periodic Incremental loads to bring in modified data Loading in the data warehouse environment usually takes the form of inserting data into dimension tables and fact table. These are the tables that are typically queried on by the users/tools while executing reports Bulk loading is a very important capability that is to be contrasted with record-at-a-time loading, which is far slower and can cause load times to be in the 10 hours+ range It may be required to drop and recreate indexes on the target warehouse structure each time data loading occurs

Data warehouse techniques  Data Mining  OLAP Data MINING Data mining access of a database differs from this traditional accesses in several ways: Query: The query might not be well formed or precisely stated. The data miner might not even be exactly sure of what he wants to see. Data: The data accessed is usually a different version from that of the original operational database. The data have been cleaned and modified to better support the mining process. Output: The output of the data mining query probably is not a subset of the database. Instead it is the output of some analysis of the contents of the database.

Data mining algorithms can be characterized as consisting of three parts: Model – The purpose of the algorithm is to fit a model to the data. Preference – Some criteria must be used to fit one model over another. Search – All algorithms require some technique to search the data.

4. OLAP

A B C Time D 1 2 3 4 5 Product SALES CUBE Q2 Q1 Dimensions Sales CUSTOMER Time Sales The general activity of querying and presenting text and number data from data warehouses in a dimensional format is known as OLAP The OLAP vendors’ technology is non relational and is almost always based on an explicit multidimensional cube of data OLAP databases are also known as multidimensional databases, or MDDBs. OLAP installations would be classified as small, individual data marts when viewed against the full range of data warehouse application SALES CUBE CUSTOMER A 11 43 12 49 71 B 33 15 65 94 45 C 59 77 37 78 12 Time Q2 D 09 53 20 73 32 Q1 1 2 3 4 5 Product

Thank You