Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing the data warehouse / data mart Methodologies and Techniques.

Similar presentations


Presentation on theme: "Designing the data warehouse / data mart Methodologies and Techniques."— Presentation transcript:

1 Designing the data warehouse / data mart Methodologies and Techniques

2 Basic principles

3 Life cycle of the DW Operational Databases Warehouse Database First time load Refresh Refresh Refresh Purge or Archive

4 Data transfers into a database First time system implementation –From a manual system Data warehousing projects Database version upgrade ERP projects Migration –From old to new system

5 Data transfers between systems Dynamic data (eg. sales orders) –Interface required? Static data (eg. customers) –Conversion required?

6 What can go wrong Data not available –feature activated from implementation onwards –Massive data entry –Eg: different account structure Data incomplete Data inconsistent (eg: engineering vs accounts) Wrong level of granularity Data not clean New system requires changes – new product codes

7 Data cleaning must address Different department record same info under different codes Multiple records of same company (under different names) Fields missing in input tables (eg: c/o) Different depts. Record different addresses for same customer Use of different units for time periods

8 Labour intensive tasks Data entry Data checks Working on solving conflicts Allocating new codes Solution = introduce as much automation as possible –SQL / SQL loader (Oracle) –Custom conversion programmes to extract, modify and upload data –Filtering –Parsing (eg: excel) –Staging areas for conversion in progress

9 Data utilities ORACLE is king of data handling Export: to transfer data between DBs –Extract both table structure and data content into dump file Import: corresponding facility SQL*loader automatic import from a variety of file formats into DB files –Needs a control file

10 Control files: using SQLloader Data tranfers in and out of DB can be automated using the loader –Create a data file with the data(!) –Create a control file to guide the operation Load creates two files –Log file –“bad transactions” file Also a discard file if control file has selection criteria in it

11 Example 1 – the supplier file Sup codeSup nameSup addressCityPhone 4 digits OLD New supplier code to include city where firm is based Assignation of category based on amounts purchased

12 Example 1 – the supplier file Sup codeSup nameSup addressCityPhone 4 digits Sup codeSup nameSup address…PhoneCat 3 letters +1,2,3 depending 4 digitson total purchases last year OLD NEW New supplier code to include city where firm is based Assignation of category based on amounts purchased

13 Example 2 – New Cost Accounting Structure Maintenance department expenditure: 1 account => separate accounts for different production activities Intervention codeDesc.DateLabourPartsTotal OLD

14 Example 2 – New Cost Accounting Structure Maintenance department expenditure: 1 account => separate accounts for different production activities Intervention codeDesc.DateLabourPartsTotal Intervention codeDesc.DatelabourPartsTotalAccount OLD NEW

15 Example 3: merging files Complete customer file based on Accounts and Sales and Shipping OLD (finance) CustIDnameaddresscityaccount numbercredit limitbalance OLD (sales) OLD (Shipping) CustID*nameaddresscitydiscount ratessales_to_daterep_name CustID**nameaddresscityPreferred haulier

16 Example 4: change of business practices Payment by bank draft for international customers Automatic payment into account for national customers Payment direct into account for all customers

17 Data Staging Area The construction site for the warehouse Required by most scenarios Connected to wide variety of sources Clean / aggregate / compute / validate data Extract Transform Operational system Transport (Load) Warehouse Data staging area

18 Remote Staging Model Data staging area within the warehouse environment Extract, transform, transport Transform Operational system Transport (Load) Data staging area Warehouse Warehouse environment Oper. envt. Data staging area in its own environment, avoiding negative impact on the warehouse environment Extract, transform, transport Transform Operational system Transport (Load) Data staging area Warehouse Staging envt. Oper. envt. Warehouse envt.

19 Onsite Staging Model Extract Transform Operational system Transport (Load) Data staging area Warehouse Operational environment WH envt. Data staging area within the operational environment, possibly affecting the operational system

20 Data Mart A subset of a data warehouse that supports the requirements of a particular department or business function. Characteristics include: –Do not normally contain detailed operational data unlike data warehouses. –May contain certain levels of aggregation

21 Marketing Sales Finance Human Resources Dependent Data Mart DataWarehouse Data Marts External Data Flat Files Operational Systems Marketing Sales Finance

22 Independent Data Mart Sales or Marketing External Data Flat Files Operational Systems

23 Reasons for Creating a Data Mart To give users more flexible access to the data they need to analyse most often. To provide data in a form that matches the specific needs of a group of users To improve end-user response time. Potential users of a data mart are clearly defined and can be targeted for support

24 Reasons for Creating a Data Mart To provide appropriately structured data as dictated by the requirements of the end-user access tools. Building a data mart is simpler (and much quicker) compared with establishing a corporate data warehouse. The cost of implementing data marts is far less than that required to establish a data warehouse.

25 Exploiting the DW data DW is a platform for creating a wide array of reports It solves data feed problems, but does not lead to specific decision support Need a model for organising data into meaningful reports Need specific interfaces for users

26 Extraction Cleaning Transformation Loading Relational Database on a dedicated Server De normalised, data Static Reporting Scrutinising Multidimensional Data Cubes OLAP tools Data Warehouse Source Systems Discovering Data Mining ……. Data Staging Area Exploiting the DW data

27 Multidimensional Models The data is found at the intersection of dimensions. Product P/L_Line Time FINANCE Market Product Time SALES Customer

28 Representing multidimensional data

29 MOLAP Server The application layer stores data in a multidimensional structure The presentation layer provides the multidimensional view MOLAP Engine DSS client Application layer Warehouse Efficient storage and processing Complexity hidden from the user (but NOT from developer) Analysis using preaggregated summaries and precalculated measures

30 ROLAP Server The warehouse stores atomic data. The application layer generates SQL for the three- dimensional view. The presentation layer provides the multidimensional view. ROLAP engine DSS client Application layer Warehouse server Multiple SQL

31 MOLAP ServeruserWarehouse Query Data MDDB Periodicload

32 ROLAP Server user Warehouse Datacache Livefetch Cache Query Data Also Hybrid (HOLAP)

33 Choosing a Reporting Architecture Business needs Potential for growth interface enterprise architecture Network architecture Speed of access Openness MOLAP ROLAP Simple Complex QueryPerformance Good OK Analysis

34 Modeling Warehouses differ from operational structures:Warehouses differ from operational structures: –Analytical requirements –Subject orientation Data must map to subject oriented information:Data must map to subject oriented information: –Identify business subjects –Define relationships between subjects –Name the attributes of each subject Modeling is iterativeModeling is iterative Modeling tools are availableModeling tools are available

35 1.Defining the business model 2.Creating the dimensional model 3.Modeling summaries 4.Creating the physical model Physical model 1 2, 3 4 Select a business process Modeling the Data Warehouse

36 Identifying Business Rules Product Type Monitor Status PC15 inchNew Server17 inchRebuilt 19 inchCustom None Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Store Store > District > Region Time Month > Quarter > Year

37 Creating the Dimensional Model Identify fact tables –Translate business measures into fact tables –Analyze source system information for additional measures –Identify base and derived measures –Document additivity of measures Identify dimension tables Link fact tables to the dimension tables Create views for users

38 Dimension Tables Dimension tables have the following characteristics: Contain textual information that represents the attributes of the business Contain relatively static data Are joined to a fact table through a foreign key reference ProductChannel Facts (units, price) Customer Time

39 Fact Tables Fact tables have the following characteristics: Contain numeric measures (metrics) of the business May contain summarized (aggregated) data May contain date-stamped data Are typically additive Have key value that is typically a concatenated key composed of the primary keys of the dimensions Joined to dimension tables through foreign keys that reference primary keys in the dimension tables

40 Dimensional Model (Star Schema) ProductChannel Facts (units, price) Customer Time Dimension tables Fact table

41 Star Schema Model Central fact table Radiating dimensions Denormalized model Store Table Store_id District_id... Item Table Item_id Item_desc... Time Table Day_id Month_id Period_id Year_id Product Table Product_id Product_desc … Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units...

42 Star Schema Model Easy for users to understand Fast response to simple queries Simple metadata Supported by many front end tools Less robust to change Does not support history

43 Using Summary Data Provides fast access to precomputed data Reduces use of I/O, CPU, and memory Is distilled from source systems and precalculated summaries Usually exists in summary fact tables Phase 3: Modeling summaries

44 Designing Summary Tables UnitsSales(€)Store Product A Total Product B Total Product C Total Average Maximum Total Percentage

45 Summary Tables Example SALES FACTS SalesRegionMonth 10,000NorthJan 99 12,000SouthFeb 99 11,000North Jan 99 15,000WestMar 99 18,000South Feb 99 20,000North Jan 99 10,000EastJan 99 2,000WestMar 99 SALES BY MONTH/REGION MonthRegionTot_Sales$ Jan 99North41,000 Jan 99East10,000 Feb 99South40,000 Mar 99West17,000 SALES BY MONTH MonthTot_Sales Jan 9951,000 Feb 9940,000 Mar 9917,000


Download ppt "Designing the data warehouse / data mart Methodologies and Techniques."

Similar presentations


Ads by Google