Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data.

Similar presentations


Presentation on theme: "1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data."— Presentation transcript:

1 1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data Warehousing Presented by Joseph M. Wilson EPA

2 2 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. In the Beginning, life was simple…

3 3 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. But…

4 4 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Our information needs…

5 5 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Kept growing. (The Spider web) SOURCE: William H. Inmon

6 6 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Purpose To explore and discuss the purpose and principles of data warehousing.

7 7 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Briefing Contents

8 8 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. So What Is a Data Warehouse? u Definition: A data warehouse is the data repository of an enterprise. It is generally used for research and decision support. u By comparison: an OLTP (on-line transaction processor) or operational system is used to deal with the everyday running of one aspect of an enterprise. u OLTP systems are usually designed independently of each other and it is difficult for them to share information.

9 9 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Why Do We Need Data Warehouses? u Consolidation of information resources u Improved query performance u Separate research and decision support functions from the operational systems u Foundation for data mining, data visualization, advanced reporting and OLAP tools

10 10 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. What Is a Data Warehouse Used for? u Knowledge discovery l Making consolidated reports l Finding relationships and correlations l Data mining l Examples n Banks identifying credit risks n Insurance companies searching for fraud n Medical research

11 11 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. u Goals u Structure u Size u Performance optimization u Technologies used How Do Data Warehouses Differ From Operational Systems?

12 12 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Comparison Chart of Database Types Data warehouseOperational system Subject orientedTransaction oriented Large (hundreds of GB up to several TB) Small (MB up to several GB) Historic dataCurrent data De-normalized table structure (few tables, many columns per table) Normalized table structure (many tables, few columns per table) Batch updatesContinuous updates Usually very complex queriesSimple to complex queries

13 13 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Design Differences Star Schema Data Warehouse Operational System ER Diagram

14 14 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Supporting a Complete Solution Operational System- Data Entry Data Warehouse- Data Retrieval

15 15 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Data Warehouses, Data Marts, and Operational Data Stores u Data Warehouse – The queryable source of data in the enterprise. It is comprised of the union of all of its constituent data marts. u Data Mart – A logical subset of the complete data warehouse. Often viewed as a restriction of the data warehouse to a single business process or to a group of related business processes targeted toward a particular business group. u Operational Data Store (ODS) – A point of integration for operational systems that developed independent of each other. Since an ODS supports day to day operations, it needs to be continually updated. SOURCE: Ralph Kimball

16 16 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Briefing Contents

17 17 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Building a Data Warehouse l Analysis l Design l Import data l Install front-end tools l Test and deploy Data Warehouse Lifecycle

18 18 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Stage 1: Analysis u Identify: l Target Questions l Data needs l Timeliness of data l Granularity u Create an enterprise-level data dictionary u Dimensional analysis l Identify facts and dimensions Analysis –Design –Import data –Install front-end tools –Test and deploy

19 19 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Stage 2: Design u Star schema u Data Transformation u Aggregates u Pre-calculated Values u HW/SW Architecture –Analysis Design –Import data –Install front-end tools –Test and deploy Dimensional Modeling

20 20 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Dimensional Modeling u Fact Table – The primary table in a dimensional model that is meant to contain measurements of the business. u Dimension Table – One of a set of companion tables to a fact table. Most dimension tables contain many textual attributes that are the basis for constraining and grouping within data warehouse queries. SOURCE: Ralph Kimball

21 21 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Stage 3: Import Data u Identify data sources u Extract the needed data from existing systems to a data staging area u Transform and Clean the data l Resolve data type conflicts l Resolve naming and key conflicts l Remove, correct, or flag bad data l Conform Dimensions u Load the data into the warehouse –Analysis –Design Import data –Install front-end tools –Test and deploy

22 22 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Importing Data Into the Warehouse Operational Systems (source systems)

23 23 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Stage 4: Install Front-end Tools u Reporting tools u Data mining tools u GIS u Etc. –Analysis –Design –Import data Install front-end tools –Test and deploy

24 24 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Stage 5: Test and Deploy u Usability tests u Software installation u User training u Performance tweaking based on usage –Analysis –Design –Import data –Install front-end tools Test and deploy

25 25 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Special Concerns u Time and expense u Managing the complexity u Update procedures and maintenance u Changes to source systems over time u Changes to data needs over time

26 26 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Briefing Contents

27 27 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Goals of the STORET Central Warehouse u Improved performance and faster data retrieval u Ability to produce larger reports u Ability to provide more data query options u Streamlined application navigation

28 28 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Old Web Application Flow

29 29 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Central Warehouse Application Flow Search Criteria Selection Report Size Feedback/ Report Customization Report Generation

30 30 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. http://epa.gov/storet/dw_home.html STORET Central Warehouse: Web Application Demo

31 31 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. STORET Central Warehouse – Potential Future Enhancements u More query functionality u Additional report types u Web Services u Additional source systems?

32 32 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Data Warehouse Components SOURCE: Ralph Kimball

33 33 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Data Warehouse Components – Detailed SOURCE: Ralph Kimball

34 34 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. Briefing Contents


Download ppt "1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data."

Similar presentations


Ads by Google