Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Data Warehouse and Design. Summary The design of the data warehouse begins with the data model The primary concern of the data warehouse developer.

Similar presentations


Presentation on theme: "The Data Warehouse and Design. Summary The design of the data warehouse begins with the data model The primary concern of the data warehouse developer."— Presentation transcript:

1 The Data Warehouse and Design

2 Summary The design of the data warehouse begins with the data model The primary concern of the data warehouse developer is managing volume The data warehouse is fed data as it passes from the legacy operational environment. Data goes through a complex process of conversion, reformatting, and integration as it passes from the legacy operational environment into the data warehouse environment The data model exist at three levels – high level, mid level, and low level The creation of a data warehouse record is triggered by an activity or an event that has occurred in the operational environment A profile record is a composite record made up of many different historical activities. The star join is a database design technique that is sometimes mistakenly applied to the data warehouse environment

3 Beginning with Operational Data Three types of loads are made into the data warehouse from the operational environment: –Archival data –Data currently contained in the operational environment –Ongoing changes to the data warehouse environment from the changes (updates)that have occurred in the operational environment since the last refresh

4 Beginning with Operational Data (cont’d) Five common techniques are used to limit the amount of operational data scanned 1.Scan data that has been timestamped 2.Scan a ‘delta’ file 3.Scan a log file or an audit file 4.Modify application code 5.Rubbing a ‘before’ and an ‘after’ image of the operational file together

5 Data/Process Model and the Architected Environment The process model applies only to the operational environment The data model applies to both the operational environment and the data warehouse environment A process model typically consists of the following (in whole or in part) –Functional decomposition –Context-level zero diagram –Data Flow Diagram –Structure Chart –State Transition Diagram –HIPO chart –Pseudocode

6 The Data Warehouse and Data Models

7

8

9 The Data Warehouse data model There are three levels of data modeling –High-level modeling (ERD) –Middle level modelling (DIS=Data Item Set) –Low-level modeling (physical model)

10 Snapshots in the Data Warehouse Snapshots are created as a result of some event occuring. The snapshot triggered by an event has four basic components: –A key –A unit of time –Primary data that relates only to the key –Secondary data captured as part of the snapshot process that has no direct relationship to the primary data or key

11 Complexity of Transformation and Integration At first glance, when data is moved from the legacy environment to the data warehouse environment, it appears that nothing more is going on than simple extraction of data from one place to the next

12 Complexity of Transformation and Integration (cont’d) Some lists of functionality required as data passes from the operational, legacy environment to the data warehouse environment –The extraction of data from operational environment to the data warehouse environment require a change in technology (DBMS technology) –The selection data may be very complex –Operational input keys need to be restructured and converted –Nonkey data is reformatted –Data is cleansed –Multiple input sources of data exist and must be merged –Key resolution must be done –Input files need resequencedd –Default values must be supplied, –Many etc…

13 Profile records Profile records represent snapshots of data, just like individual activity records A profile record is created from the grouping of many detailed records


Download ppt "The Data Warehouse and Design. Summary The design of the data warehouse begins with the data model The primary concern of the data warehouse developer."

Similar presentations


Ads by Google