Presentation is loading. Please wait.

Presentation is loading. Please wait.

4 Copyright © Oracle Corporation, 2002. All rights reserved. Modeling the Data Warehouse.

Similar presentations


Presentation on theme: "4 Copyright © Oracle Corporation, 2002. All rights reserved. Modeling the Data Warehouse."— Presentation transcript:

1 4 Copyright © Oracle Corporation, 2002. All rights reserved. Modeling the Data Warehouse

2 4-2 Copyright © Oracle Corporation, 2002. All rights reserved. Objectives After completing this lesson, you should be able to do the following: Discuss data warehouse environment data structures Discuss data warehouse database design phases: –Defining the business model –Defining the dimensional model –Defining the physical model

3 4-3 Copyright © Oracle Corporation, 2002. All rights reserved. Data Warehouse Modeling Issues Among the main issues that data warehouse data modelers face are: Different data types Many ways to use warehouse data Many ways to structure the data Multiple modeling techniques Planned replication Large volumes of data

4 4-4 Copyright © Oracle Corporation, 2002. All rights reserved.

5 4-5 Copyright © Oracle Corporation, 2002. All rights reserved. Data Warehouse Environment Data Structures The data modeling structures that are commonly found in a data warehouse environment are: Third normal form (3NF) Star schema Snowflake schema

6 4-6 Copyright © Oracle Corporation, 2002. All rights reserved. Star Schema Model Product Table Product_id Product_disc,... Time Table Day_id Month_id Year_id,... Sales Fact Table Product_id Store_id Item_id Day_id Sales_amount Sales_units,... Item Table Item_id Item_desc,... Store Table Store_id District_id,... Central fact table Denormalized dimensions

7 4-7 Copyright © Oracle Corporation, 2002. All rights reserved. Snowflake Schema Model Time Table Week_id Period_id Year_id Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_id Mgr_name Product Table Product_id Product_desc Item Table Item_id Item_desc Dept_id Sales Fact Table Item_id Store_id Product_id Week_id Sales_amount Sales_units Store Table Store_id Store_desc District_id District Table District_id District_desc

8 4-8 Copyright © Oracle Corporation, 2002. All rights reserved. Snowflake Schema Model Direct use by some tools More flexible to change Provides for speedier data loading Can become large and unmanageable Degrades query performance More complex metadata CountryStateCountyCity

9 4-9 Copyright © Oracle Corporation, 2002. All rights reserved. Data Warehouse Database Design Phases Phase 1: Defining the business model Phase 2: Defining the dimensional model Phase 3: Defining the physical model

10 4-10 Copyright © Oracle Corporation, 2002. All rights reserved. Phase 1: Defining the Business Model Performing strategic analysis Creating the business model Documenting metadata

11 4-11 Copyright © Oracle Corporation, 2002. All rights reserved. Performing Strategic Analysis Identify crucial business processes Understand business processes Prioritize and select the business processes to implement Business Benefit LowHigh Low High Feasibility

12 4-12 Copyright © Oracle Corporation, 2002. All rights reserved. Creating the Business Model Defining business requirements: –Identifying the business measures –Identifying the dimensions –Identifying the grain –Identifying the business definitions and rules Verifying data sources

13 4-13 Copyright © Oracle Corporation, 2002. All rights reserved. Existing MetadataProduction ERD Model Business Requirements Research Business Requirements Drive the Design Process Primary input Secondary input

14 4-14 Copyright © Oracle Corporation, 2002. All rights reserved.

15 4-15 Copyright © Oracle Corporation, 2002. All rights reserved. Identifying Measures and Dimensions The attribute is perceived as constant or discrete: Product Location Time Size The attribute varies continuously: Balance Units Sold Cost Sales Measures Dimensions

16 4-16 Copyright © Oracle Corporation, 2002. All rights reserved.

17 4-17 Copyright © Oracle Corporation, 2002. All rights reserved. Using a Business Process Matrix Sample of business process matrix Business Dimensions Business Processes SalesReturnsInventory Customer Date Product Channel Promotion

18 4-18 Copyright © Oracle Corporation, 2002. All rights reserved. Determining Granularity YEAR? QUARTER? MONTH? WEEK? DAY?

19 4-19 Copyright © Oracle Corporation, 2002. All rights reserved. Identifying Business Rules Store Store > District > Region Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Product Type Monitor Status PC15 inchNew Server17 inchRebuilt 19 inchCustom None Time Month > Quarter > Year

20 4-20 Copyright © Oracle Corporation, 2002. All rights reserved. Documenting Metadata Documenting metadata should include: Documenting the design process Documenting the development process Providing a record of changes Recording enhancements over time

21 4-21 Copyright © Oracle Corporation, 2002. All rights reserved. Metadata Documentation Approaches Automated –Data modeling tools –ETL tools –End-user tools Manual

22 4-22 Copyright © Oracle Corporation, 2002. All rights reserved. Phase 2: Defining the Dimensional Model Identify fact tables: –Translate business measures into fact tables –Analyze source system information for additional measures Identify dimension tables Link fact tables to the dimension tables Model the time dimension

23 4-23 Copyright © Oracle Corporation, 2002. All rights reserved. Star Dimensional Modeling Store Table Store_id District_id... Item Table Item_id Item_desc... Sales Fact Table Product_id Store_id Item_id Day_id Sales_amount Sales_units... Product Table Product_id Product_desc... Time Table Day_id Month_id Period_id Year_id

24 4-24 Copyright © Oracle Corporation, 2002. All rights reserved. Fact Table Characteristics Contain numerical metrics of the business Can hold large volumes of data Can grow quickly Can contain base, derived, and summarized data Are typically additive Are joined to dimension tables through foreign keys that reference primary keys in the dimension tables Sales Fact Table Product_id Store_id Item_id Day_id Sales_amount Sales_units...

25 4-25 Copyright © Oracle Corporation, 2002. All rights reserved. Dimension Table Characteristics Dimension tables have the following characteristics: Contain textual information that represents the attributes of the business Contain relatively static data Are joined to a fact table through a foreign key reference

26 4-26 Copyright © Oracle Corporation, 2002. All rights reserved. Star Dimensional Model Characteristics The model is easy for users to understand. Primary keys represent a dimension. Nonforeign key columns are values. Facts are usually highly normalized. Dimensions are completely denormalized. Fast response to queries is provided. Performance is improved by reducing table joins. End users can express complex queries. Support is provided by many front-end tools.

27 4-27 Copyright © Oracle Corporation, 2002. All rights reserved. Using Time in the Data Warehouse Defining standards for time is critical. Aggregation based on time is complex.

28 4-28 Copyright © Oracle Corporation, 2002. All rights reserved. The Time Dimension Time is critical to the data warehouse. A consistent representation of time is required for extensibility. Where should the element of time be stored? Time dimension Sales fact

29 4-29 Copyright © Oracle Corporation, 2002. All rights reserved. Using Data Modeling Tools Tools with a GUI enable definition, modeling, and reporting. Avoid a mix of modeling techniques caused by: –Development pressures –Developers with lack of knowledge –No strategy Determine a strategy. Write and publish formally. Make available electronically.

30 4-30 Copyright © Oracle Corporation, 2002. All rights reserved.

31 4-31 Copyright © Oracle Corporation, 2002. All rights reserved. Phase 3: Defining the Physical Model Translate the dimensional design to a physical model for implementation. Define storage strategy for tables and indexes. Perform database sizing. Define initial indexing strategy. Define partitioning strategy. Update metadata document with physical information.

32 4-32 Copyright © Oracle Corporation, 2002. All rights reserved. Physical Model Design Tasks Define naming and database standards. Perform database sizing. Develop initial indexing strategy. Develop data partition strategy. Define storage parameters. Set initialization parameters. Use parallel processing. Define summary data. Determine hardware architecture.

33 4-33 Copyright © Oracle Corporation, 2002. All rights reserved. Database Object Naming Conventions Develop a reasonable list of abbreviations. List all the objects’ names, and work with the user community to define them. Resolve name disputes. Document your naming standards in the metadata document. Plan for the naming standards to be a living document.

34 4-34 Copyright © Oracle Corporation, 2002. All rights reserved. ScalabilityManageabilityAvailabilityExtensibility Flexibility Integration Architectural Requirements User Budget Business Technology

35 4-35 Copyright © Oracle Corporation, 2002. All rights reserved. Strategy for Architecture Definition Obtain existing architecture plans. Obtain existing capacity plans. Document existing interfaces. Prepare capacity plan. Prepare technical architecture. Document operating system requirements. Develop recovery plans. Develop security and control plans. Create architecture. Create technical risk assessment.

36 4-36 Copyright © Oracle Corporation, 2002. All rights reserved. Hardware Requirements SMP Cluster MPP NUMA Hybrids (employing both SMP and MPP)

37 4-37 Copyright © Oracle Corporation, 2002. All rights reserved. Making the Right Choice Requirements differ from operational systems. Benchmark –Available from vendors –Develop your own –Use realistic queries Scalability is important.

38 4-38 Copyright © Oracle Corporation, 2002. All rights reserved. Storage and Performance Considerations Database sizing –Test Load Sampling Data partitioning –Horizontal partitioning –Vertical partitioning Indexing –B-Tree indexes –Bitmap indexes –Bitmap-join indexes Star query optimization –Star transformation

39 4-39 Copyright © Oracle Corporation, 2002. All rights reserved. Database Sizing Sizing influences capacity planning and systems environment management. Sizing is required for: –The database –Other storage areas Sizing is not an exact science. Techniques vary.

40 4-40 Copyright © Oracle Corporation, 2002. All rights reserved. Test Load Sampling Analyze a representative sample of the data chosen using proven statistical methods. Ensure that the sample reflects: Test loads for different periods Day-to-day operations Seasonal data and worst-case scenarios Indexes and summaries

41 4-41 Copyright © Oracle Corporation, 2002. All rights reserved. Oracle9 i Database Architectural Advantages New and improved technologies: Real Application Clusters and Cache Fusion Self-managing in critical areas Flashback Query Data Guard and Recovery Manager

42 4-42 Copyright © Oracle Corporation, 2002. All rights reserved. Data Partitioning Breaking up of data into separate physical units that can be handled independently Data partitioning provides ease of: – Restructuring – Reorganization – Removal – Recovery – Monitoring – Management – Archiving – Indexing

43 4-43 Copyright © Oracle Corporation, 2002. All rights reserved.

44 4-44 Copyright © Oracle Corporation, 2002. All rights reserved. Horizontal Partitioning Table and index data are split by: –Time –Sales region or person –Geography –Organization –Line of business Candidate columns appear in a WHERE clause. Analysis determines requirements.

45 4-45 Copyright © Oracle Corporation, 2002. All rights reserved. Vertical Partitioning You can use vertical partitioning when: –Speed of query and update actions are improved by it –Users require access to specific columns –Some data is changed infrequently –Descriptive dimension text may be better moved away from the dimension itself

46 4-46 Copyright © Oracle Corporation, 2002. All rights reserved. Partitioning Methods Range partitioning List partitioning Hash partitioning Composite partitioning –Composite range-hash partitioning –Composite range-list partitioning Index partitioning

47 4-47 Copyright © Oracle Corporation, 2002. All rights reserved.

48 4-48 Copyright © Oracle Corporation, 2002. All rights reserved. Indexing Indexing is used for the following reasons: It is a huge cost saving, greatly improving performance and scalability. It can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed.

49 4-49 Copyright © Oracle Corporation, 2002. All rights reserved. B-Tree Index Most common type of indexing Used for high cardinality columns Designed for few rows returned

50 4-50 Copyright © Oracle Corporation, 2002. All rights reserved. Bitmap Indexes Provide performance benefits and storage savings Store values as 1s and 0s Use instead of B-tree indexes when: –Tables are large –Columns have relatively low cardinality

51 4-51 Copyright © Oracle Corporation, 2002. All rights reserved. Bitmap Join Indexes A bitmap index for the join of two or more tables: They are new to Oracle9 i. They provide better performance and storage savings.

52 4-52 Copyright © Oracle Corporation, 2002. All rights reserved. Star Query Optimization Star query optimization requires the following: Tuning star queries – A bitmap index should be build on each of the foreign key columns of the fact table. –The initialization parameter STAR_TRANSFORMATION_ENABLED should be set to TRUE. –The cost-based optimizer should be used. Using star transformation

53 4-53 Copyright © Oracle Corporation, 2002. All rights reserved. Star Transformation A cost-based query transformation aimed at executing star queries efficiently Works well for schemas with a small number of dimensions and dense fact tables Oracle processes a star query by using two basic phases: 1.The first phase retrieves exactly the necessary rows from the fact table (the result set). 2.The second phase joins this result set to the dimension tables.

54 4-54 Copyright © Oracle Corporation, 2002. All rights reserved.

55 4-55 Copyright © Oracle Corporation, 2002. All rights reserved. Parallelism Parallel Execution Servers Sales table Customers table P3P3 P3P3 P1P1 P1P1 P2P2 P2P2

56 4-56 Copyright © Oracle Corporation, 2002. All rights reserved. Using Summary Data Designing summary tables offers the following benefits: Provides fast access to precomputed data Reduces use of I/O, CPU, and memory

57 4-57 Copyright © Oracle Corporation, 2002. All rights reserved. Query Rewrite with Oracle9 i Choose (Based on cost) Generate Plan RewriteGenerate Plan Execute

58 4-58 Copyright © Oracle Corporation, 2002. All rights reserved. Summary In this lesson, you should have learned how to: Describe Data Warehouse Environment Data Structures Define the business model: –Performing strategic analysis –Creating the business model –Identifying business rules Define the dimensional model: –Star dimensional model characteristics Define the physical model: –Physical model design tasks –Architectural and hardware requirements –Storage and performance considerations

59 4-59 Copyright © Oracle Corporation, 2002. All rights reserved. Practice 4-1 Overview This practice covers the following topics: Specifying true or false to a series of statements Completing a series of sentences accurately Practicing identifying a simple business model Identifying indexing method

60 4-60 Copyright © Oracle Corporation, 2002. All rights reserved.

61 4-61 Copyright © Oracle Corporation, 2002. All rights reserved.

62 4-62 Copyright © Oracle Corporation, 2002. All rights reserved.


Download ppt "4 Copyright © Oracle Corporation, 2002. All rights reserved. Modeling the Data Warehouse."

Similar presentations


Ads by Google