Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing the data warehouse / data marts Methodologies and Techniques.

Similar presentations


Presentation on theme: "Designing the data warehouse / data marts Methodologies and Techniques."— Presentation transcript:

1 Designing the data warehouse / data marts Methodologies and Techniques

2 Basic principles

3 Life cycle of the DW Operational Databases Warehouse Database First time load Refresh Refresh Refresh Purge or Archive

4 Oracle Warehouse Components Relationaltools Applications/ Web Any Data Any Access Any Source Externaldata Operationaldata OLAPtools Text, image Oracle Medi` Relational / Multidimensional Spatial Audio, video Web

5 Oracle Intelligence Tools IS develops user’s Views Oracle Reports Current Business users Oracle Discoverer Tactical Analysts Oracle Express Strategic

6 Oracle Data Mart Suite Ware- housing Engines Data Modeling Oracle Data Mart Designer Data Management Oracle Enterprise Manager Data Extraction Oracle Data Mart Builder Data Access & Analysis Discoverer & Oracle Reports OLTP Engines OLTP Databases Data Mart Database Oracle8 SQL*PLUS

7 “Big Bang” Approach: Advantages and Disadvantages Advantages: –warehouse built as part of major project (eg: BPR) –Having a “big picture” of the data warehouse before starting the data warehousing project Disadvantages: –Involves a high risk, takes a longer time –Runs the risk of needing to change requirements –Costly and harder to get support for from users

8 Incremental Approach to Warehouse Development Multiple iterations Shorter implementations Validation of each phase Strategy Definition Analysis Design Build Production

9 Benefits of an Incremental Approach Delivers a strategic data warehouse solution through incremental development efforts Provides extensible, scalable architecture Quickly provides business benefits and ensures a much earlier return of investment Allows a data warehouse to be built based on a subject or application area at a time Allows the construction of an integrated data mart environment

10 Data Mart A subset of a data warehouse that supports the requirements of a particular department or business function. Characteristics include: –Do not normally contain detailed operational data unlike data warehouses. –May contain certain levels of aggregation

11 Marketing Sales Finance Human Resources Dependent Data Mart DataWarehouse Data Marts External Data Flat Files Operational Systems Marketing Sales Finance

12 Independent Data Mart Sales or Marketing External Data Flat Files Operational Systems

13 Reasons for Creating a Data Mart To give users more flexible access to the data they need to analyse most often. To provide data in a form that matches the collective view of a group of users To improve end-user response time. Potential users of a data mart are clearly defined and can be targeted for support

14 Reasons for Creating a Data Mart To provide appropriately structured data as dictated by the requirements of the end-user access tools. Building a data mart is simpler compared with establishing a corporate data warehouse. The cost of implementing data marts is far less than that required to establish a data warehouse.

15 Data Marts Issues Data mart functionality Data mart size Data mart load performance Users access to data in multiple data marts Data mart Internet / Intranet access Data mart administration Data mart installation

16 Example of DW tool OLAP Rotate and drill down to successive levels of detail. Create and examine calculated data interactively on large volumes of data. Determine comparative or relative differences. Perform exception and trend analysis. Perform advanced analytical functions for example forecasting, modeling, and regression analysis

17 Original OLAP Rules 1. Multidimensional conceptual view 2. Transparency 3. Accessibility 4. Consistent reporting performance 5. Client-server architecture

18 Original OLAP Rules 6. Multiuser support 7. Unrestricted cross-dimensional operations 8. Intuitive data manipulation 9. Flexible reporting 10. Unlimited dimensions and aggregation levels

19 1001 1007 1010 1020 Relational Database Model 31 42 22 32 FMMFFMMF Anderson Green Lee Ramos Attribute 1 Name Attribute 2 Age Attribute 3 Gender Row 1 Row 2 Row 3 Row 4 The table above illustrates the employee relation. Attribute 4 Emp No.

20 Multidimensional Database Model The data is found at the intersection of dimensions. Store GL_Line Time FINANCE Store Product Time SALES Customer

21 Two dimensions

22 Three dimensions

23 Specialised Multidimensional tool Benefits: –Quick access to very large volumes of data –Extensive and comprehensive libraries of complex functions analysis Strong modeling and forecasting capabilities –Can access multidimensional and relational database structures –Caters for calculated fields Disadvantages: –Difficulty of changing model –Lack of support for very large volumes of data –May require significant processing power

24 MOLAP Server The application layer stores data in a multidimensional structure The presentation layer provides the multidimensional view MOLAP Engine DSS client Application layer Warehouse Efficient storage and processing Complexity hidden from the user Analysis using preaggregated summaries and precalculated measures

25 ROLAP Server The warehouse stores atomic data. The application layer generates SQL for the three- dimensional view. The presentation layer provides the multidimensional view. ROLAP engine DSS client Application layer Warehouse server Multiple SQL

26 MOLAP ExpressServerExpressuserWarehouse Query Data MDDB Periodicload

27 ROLAP ExpressServer Expressuser Warehouse Datacache Livefetch Cache Query Data Also Hybrid (HOLAP)

28 Choosing a Reporting Architecture Business needs Potential for growth interface enterprise architecture Network architecture Speed of access Openness MOLAP ROLAP Simple Complex QueryPerformance Good OK Analysis

29 Data Acquisition Identify, extract, transform, and transport source data Consider internal and external data Perform gap analysis between source data and target database objects Plan move of data between sources and target Define first-time load and refresh strategy Define tool requirements Build, test, and execute data acquisition modules

30 Modeling Warehouses differ from operational structures:Warehouses differ from operational structures: –Analytical requirements –Subject orientation Data must map to subject oriented information:Data must map to subject oriented information: –Identify business subjects –Define relationships between subjects –Name the attributes of each subject Modeling is iterativeModeling is iterative Modeling tools are availableModeling tools are available

31 1.Defining the business model 2.Creating the dimensional model 3.Modeling summaries 4.Creating the physical model Physical model 1 2, 3 4 Select a business process Modeling the Data Warehouse

32 Identifying Business Rules Product Type Monitor Status PC15 inchNew Server17 inchRebuilt 19 inchCustom None Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Store Store > District > Region Time Month > Quarter > Year

33 Creating the Dimensional Model Identify fact tables –Translate business measures into fact tables –Analyze source system information for additional measures –Identify base and derived measures –Document additivity of measures Identify dimension tables Link fact tables to the dimension tables Create views for users

34 Dimension Tables Dimension tables have the following characteristics: Contain textual information that represents the attributes of the business Contain relatively static data Are joined to a fact table through a foreign key reference ProductChannel Facts (units, price) Customer Time

35 Fact Tables Fact tables have the following characteristics: Contain numeric measures (metrics) of the business May contain summarized (aggregated) data May contain date-stamped data Are typically additive Have key value that is typically a concatenated key composed of the primary keys of the dimensions Joined to dimension tables through foreign keys that reference primary keys in the dimension tables

36 Dimensional Model (Star Schema) ProductChannel Facts (units, price) Customer Time Dimension tables Fact table

37 Star Schema Model Central fact table Radiating dimensions Denormalized model Store Table Store_id District_id... Item Table Item_id Item_desc... Time Table Day_id Month_id Period_id Year_id Product Table Product_id Product_desc … Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units...

38 Star Schema Model Easy for users to understand Fast response to queries Simple metadata Supported by many front end tools Less robust to change Slower to build Does not support history

39 Snowflake Schema Model Time Table Week_id Period_id Year_id Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_id Mgr_name Product Table Product_id Product_desc Item Table Item_id Item_desc Dept_id Sales Fact Table Item_id Store_id Sales_dollars Sales_units Store Table Store_id Store_desc District_id District Table District_id District_desc

40 Snowflake Schema Model Direct use by some tools More flexible to change Provides for speedier data loading May become large and unmanageable Degrades query performance More complex metadata

41 Using Summary Data Provides fast access to precomputed data Reduces use of I/O, CPU, and memory Is distilled from source systems and precalculated summaries Usually exists in summary fact tables Phase 3: Modeling summaries

42 Designing Summary Tables UnitsSales(€)Store Product A Total Product B Total Product C Total Average Maximum Total Percentage

43 Summary Tables Example SALES FACTS SalesRegionMonth 10,000NorthJan 99 12,000SouthFeb 99 11,000North Jan 99 15,000WestMar 99 18,000South Feb 99 20,000North Jan 99 10,000EastJan 99 2,000WestMar 99 SALES BY MONTH/REGION MonthRegionTot_Sales$ Jan 99North41,000 Jan 99East10,000 Feb 99South40,000 Mar 99West17,000 SALES BY MONTH MonthTot_Sales Jan 9951,000 Feb 9940,000 Mar 9917,000

44 Summary Management in Oracle8i Product Region Time Sales summary City Sales State Summary usage Summary advisor Space requirements Summary recommendations

45 The Time Dimension How and where should it be stored? Time dimension Sales fact Time is critical to the data warehouse. A consistent representation of time is required for extensibility.


Download ppt "Designing the data warehouse / data marts Methodologies and Techniques."

Similar presentations


Ads by Google