Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction DATAWAREHOUSE www.notesvillage.com. What is Data Warehousing? A process of transforming data into information and making it available to.

Similar presentations


Presentation on theme: "Introduction DATAWAREHOUSE www.notesvillage.com. What is Data Warehousing? A process of transforming data into information and making it available to."— Presentation transcript:

1 Introduction DATAWAREHOUSE www.notesvillage.com

2 What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research] Data Information www.notesvillage.com

3 What are the users saying... Data should be integrated across the enterprise Summary data has a real value to the organization Historical data holds the key to understanding data over time What-if capabilities are required www.notesvillage.com

4 What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin] www.notesvillage.com

5 Data, Data everywhere yet... I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need –need an expert to get the data I can’t understand the data I found –available data poorly documented I can’t use the data I found –results are unexpected –data needs to be transformed from one form to other www.notesvillage.com

6 Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel? A producer wants to know…. www.notesvillage.com

7 Necessity Need for Strategic Information Retain consumer base Increase customer base by 20% Enhance consumer satisfaction level Launch two new products Information Crisis How to maintain lots of data in the organization IT resources and strategy is not effective for data Opportunities and Risks: Quick decisions Managerial analysis Crisis management www.notesvillage.com

8 History of DSS Ad hoc reports Small applications Information centers Decision support system www.notesvillage.com

9 Disadvantages Many adhoc requests. Supplementary report problems No interactivity of the user Flexible informations. www.notesvillage.com

10 OPERATIONAL VS INFORMATIONAL Current Values Optimized for transaction High frequency Read Update Delete Large user Response time is very fast Archived value Optimized for complex queries Low Read Adhoc,Random Relatively more several seconds www.notesvillage.com

11 Scenerio 1 ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system. www.notesvillage.com

12 Scenario 1 : ABC Pvt Ltd. Mumbai Delhi Chennai Banglore Sales Manager Sales per item type per branch for first quarter. www.notesvillage.com

13 Solution 1:ABC Pvt Ltd Extract sales information from each database. Store the information in a common repository at a single site. www.notesvillage.com

14 Solution 1:ABC Pvt Ltd. Mumbai Delhi Chennai Banglore Data Warehouse Sales Manager Query & Analysis tools Report www.notesvillage.com

15 Scenario 2 : One Stop Shopping Operational Database Data Entry Operator Management Wait Report www.notesvillage.com

16 Solution 2 Extract data needed for analysis from operational database. Store it in warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective. www.notesvillage.com

17 Solution 2 Operational database Data Warehouse Extract data Data Entry Operator Data Entry Operator Manager Report Transaction www.notesvillage.com

18 Scenario 3 Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions. www.notesvillage.com

19 Solution 3 Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries. www.notesvillage.com

20 Solution 3 Query and Analysis tool President Expansio n Improvemen t sales time Data Warehouse www.notesvillage.com

21 Definitions of a Data Warehouse - W.H. Inmon “A subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process” - Ralph Kimball “A copy of transaction data, specifically structured for query and analysis” 1. 2. www.notesvillage.com

22 Data Warehouse For organizational learning to take place, data from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW) DW allows an organization (enterprise) to remember what it has noticed about its data Data Mining techniques make use of the data in a DW www.notesvillage.com

23 What is Data Warehouse?? Inmons’s definition A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management’s decision making process www.notesvillage.com

24 Subject-oriented Data warehouse is organized around subjects such as sales, product, customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process. www.notesvillage.com

25 Integrated Data Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Legacy System Data Warehouse Flat File www.notesvillage.com

26 Integration In terms of data. encoding structures. Measurement of attributes. physical attribute. of data naming conventions. Data type format www.notesvillage.com

27 Time-variant Provides information from historical perspective e.g. past 5-10 years Data is stored as snapshots Every key structure contains either implicitly or explicitly an element of time Allows analysis for the past Relates information to the present Enables forecasts for the future www.notesvillage.com

28 Nonvolatile Data once recorded cannot be updated. Data warehouse requires two operations in data accessing Initial loading of data Access of data www.notesvillage.com

29 Data Granularity Operational system is kept at the lowest level Data is summarized at different levels The lower the level the more data granularity. www.notesvillage.com

30 Operational v/s Information System FeaturesOperationalInformation CharacteristicsOperational processing Informational processing OrientationTransactionAnalysis UserClerk,DBA,database professional Knowledge workers FunctionDay to day operationDecision support DataCurrentHistorical ViewDetailed,flat relationalSummarized, multidimensional DB designApplication orientedSubject oriented Unit of workShort,simple transaction Complex query AccessRead/writeMostly read www.notesvillage.com

31 Data Mart A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse. A Data Mart typically reflects the business rules of a specific business unit within an enterprise. www.notesvillage.com

32 Data Warehouse to Data Mart Data Warehouse Data Mart Decision Support Information Decision Support Information Decision Support Information www.notesvillage.com

33

34 Data Mart Layer Presentation/ Desktop Access Layer Meta-data Repository Layer Warehouse Management Layer Core DW Layer Data Staging and Quality Layer Data Access Layer Operational Data Layer External Data Layer Data Feed/ Data Mining/ Indexing Layer Virtual DW Coarse DW Central DW Distributed DW Application Messaging (Transport) Layer Internet/Intranet Layer direct queries virtual queries ad hoc queries 1 2a 2b 2c 3 4 56 7 8 9 10 11 Non-operational Data Layer Data Marts vs Data Warehouses www.notesvillage.com

35 Operational vs. Informational Systems OperationalSystems Information Delivery System DataWarehouse InformationalSystems DataMarts www.notesvillage.com

36 Virtual Date Warehouse A Virtual Data Warehouse approach is often chosen when there are infrequent demands for data and management wants to determine if/how users will use operational data. One of the weaknesses of a Virtual Data Warehouse approach is that user queries a made against operational DBs. One way to minimize this problem is to build a “Query Monitor” to check the performance characteristics of a query before executing it. www.notesvillage.com

37 A Coarse Data Warehouse is often chosen when the organization has a relatively clean/new operational system and management wants to make the operational data more easily available for just that system. A Central Data Warehouse is often chosen when the organization has a clear understanding about it Information Access needs and wants to provide “quality”, “integrated”, information to its knowledge workers A Distributed Data Warehouse is similar in most respects to a Central Data Warehouse, except that the data is distributed to separate mini-Data Warehouses (Data Marts )on local or specialized servers www.notesvillage.com

38 Data Mart Layer Presentation/ Desktop Access Layer Meta-data Repository Layer Warehouse Management Layer Core DW Layer Data Staging and Quality Layer Data Access Layer Operational Data Layer External Data Layer Data Feed/ Data Mining/ Indexing Layer Virtual DW Coarse DW Central DW Application Messaging (Transport) Layer Distributed DW Internet/Intranet Layer direct queries virtual queries ad hoc queries 1 2a 2b 2c 3 4 56 7 8 9 10 11 Non-operational Data Layer Central Data Warehouse www.notesvillage.com

39 Top-down approach Single, Central storage of data about the content Centralized Rule and Content May seek quick result if implemented with Iteration Needs high level cross functional skills High outlay with proof of concepts High exposure to risk/Failures www.notesvillage.com

40

41 Data Warehouse Architectures Generic Two-Level Architecture Independent Data Mart Dependent Data Mart and Operational Data Store Logical Data Mart and Real-Time Data Warehouse Three-Layer architecture ETL All involve some form of extraction, transformation and loading (ETL) www.notesvillage.com

42 Process Architecture These interrelated sub-processes are sometimes referred to as an “ETL” process. 1)Extract- Since data for the data warehouse can come from different sources and may be of different types, the plan to extract the data along with appropriate compression and encryption techniques is an important requirement for consideration. 2)Transform- Transformation of data with appropriate conversion, aggregation and cleaning besides de-normalization and surrogate key management is also an important process to be planned for building a data warehouse. 3)Load- Steps to be considered to load data with optimization by considering the multiple areas where the data is targeted to be loaded and retrieved is also an important part of the data warehouse architecture plan www.notesvillage.com

43 Data Model Architecture In Data Model Architecture (also known as Dimensional Data Model), there are 3 main data modeling styles for enterprise warehouses:Dimensional Data Model 3rd Normal Form - Top Down Architecture, Top Down Implementation Federated Star Schemas - Bottom Up Architecture, Bottom Up ImplementationStar Schemas Data Vault - Top Down Architecture, Bottom Up Implementation. www.notesvillage.com

44 Technology Architecture Scalability and flexibility is required in all facets. The extent of these features are largely depend upon organizational size, business requirements, nature of business etc. Technology or Technical architecture primary evolved from derivations from the process architecture, meta data management requirements based on business rules and security levels implementations and technology tool specific evaluation. Besides these, the Technology architecture also looks into the various technology implementation standards in database management, database connectivity protocols (ODBC, JDBC, OLE DB etc), Middleware (based on ORB, RMI, COM/DOM etc.), Network protocols (DNS, LDAP etc) and other related technologies. www.notesvillage.com

45 Information Architecture Information Architecture is the process of translating the information from one form to another in a step by step sequence so as to manage the storage, retrieval, modification and deletion of the data in the data warehouse. Information Architecture www.notesvillage.com

46 Resource Architecture Resource architecture is related to software architecture in that many resources come from software resources. Resources are important because they help determine performance. Workload is the other part of the equation. If you have enough resources to complete the workload in the right amount of time, then performance will be high. www.notesvillage.com

47 Figure 11-2: Generic two-level data warehousing architecture E T L One, company- wide warehouse Periodic extraction  data is not completely current in warehouse www.notesvillage.com

48 Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Fixing errors: Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data Figure 11-10: Steps in data reconciliation (cont.) www.notesvillage.com

49 Figure 11-3 Independent data mart data warehousing architecture Data marts: Mini-warehouses, limited in scope E T L Separate ETL for each independent data mart Data access complexity due to multiple data marts www.notesvillage.com

50 Figure 11-4 Dependent data mart with operational data store: a three-level architecture E T L Single ETL for enterprise data warehouse(EDW) Simpler data access ODS ODS provides option for obtaining current data Dependent data marts loaded from EDW www.notesvillage.com

51 E T L Near real-time ETL for Data Warehouse ODS data warehouse ODS and data warehouse are one and the same Data marts are NOT separate databases, but logical views of the data warehouse  Easier to create new data marts Figure 11-5 Logical data mart and real time warehouse architecture www.notesvillage.com

52 Figure 11-6 Three-layer data architecture for a data warehouse www.notesvillage.com


Download ppt "Introduction DATAWAREHOUSE www.notesvillage.com. What is Data Warehousing? A process of transforming data into information and making it available to."

Similar presentations


Ads by Google