Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 15 Data Warehousing, OLAP, and Data Mining

Similar presentations


Presentation on theme: "Chapter 15 Data Warehousing, OLAP, and Data Mining"— Presentation transcript:

1 Chapter 15 Data Warehousing, OLAP, and Data Mining

2 Introduction Data, data, data…everywhere!
Information…that’s another story! Especially, the right the right time! Data warehousing’s goal is to make the right information the right time Data warehousing is a data store (eg., a database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes

3 Introduction Data warehouses are natural allies for data mining (work together well) Data mining can help fulfill some of the goal of data warehouses – right the right time Relational database management systems (RDBMS), such as Oracle, DB2, Sybase, Informix, Focus, SQL Server, etc. are often used for data warehousing

4 Definitions of a Data Warehouse
“A subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process” 1. - W.H. Inmon “A copy of transaction data, specifically structured for query and analysis” 2. - Ralph Kimball

5 Data Warehouse For organizational learning to take place, data from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW) DW allows an organization (enterprise) to remember what it has noticed about its data Data Mining techniques make use of the data in a Data Warehouse

6 Data Warehouse Enterprise “Database” Transactions Data Data Mining
Customers Orders Transactions Vendors Etc… Etc… Data Miners: “Farmers” – they know “Explorers” - unpredictable Copied, organized summarized Data Warehouse Data Mining

7 Data Warehouse A data warehouse is a copy of transaction data specifically structured for querying, analysis, reporting, and more rigorous data mining Note that the data warehouse contains a copy of the transactions which are not updated or changed later by the transaction system Also note that this data is specially structured, and may have been transformed when it was copied into the data warehouse

8 Data Mart A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse. A Data Mart typically reflects the business rules of a specific business unit within an enterprise.

9 Data Warehouse to Data Mart
Decision Support Information Data Mart Data Warehouse Decision Support Information Data Mart Decision Support Information Data Mart

10 Generic Architecture of Data
(synonym) Transaction data

11 Transaction (Operational) Data
Operational (production) systems create (massive number of) transactions, such as sales, purchases, deposits, withdrawals, returns, refunds, phone calls, toll roads, web site “hits”, etc… Transactions are the base level of data – the raw material for understanding customer behavior Unfortunately, operational systems change due to changing business needs Fortunately, operational systems can usually be changed to support changing business needs Data warehousing strategies need to be aware of operational system changes

12 Operational Summary Data
Summaries are for a specific time period and utilize the transaction data for that time period Other Examples???

13 Decision Support Summary Data
The data that are used to help make decisions about the business Financial Data, such as: Income Statements (Profit & Loss) Balance Sheets (Assets – Liabilities = Net Worth) Sales summaries Other examples??? Data warehouses maintain this type of data, however financial data “of record” (for audit purposes) usually comes from databases and not the data warehouse (confusing???) Generally, it is a bad idea to use the same system for analytic and operational purposes

14 Database Schema Database schema defines the structure of data, not the values of the data (e.g., first name, last name = structure; Ron Norman = values of the data) In RDBMS: Columns = fields = attributes (A,B,C) Rows = records = tuples (1-7)

15 Logical & Physical Database Schema
Describes data in a way that is familiar to business users Describes the data the way it will be stored in an RDBMS which might be different than the way the logical shows it

16 Metadata General definition: Data about data !!!
Examples: A library’s card catalog (metadata) describes publications (data) A file system maintains permissions (metadata) about files (data) A form of system documentation including: Values legally allowed in a field (e.g., AZ, CA, OR, UT, WA, etc.) Description of the contents of each field (e.g., start date) Date when data were loaded Indication of currency of the data (last updated) Mappings between systems (e.g., A.this = B.that) Invaluable, otherwise have to research to find it

17 Business Rules Highest level of abstraction from operational (transaction) data Describes why relationships exist and how they are applied Examples: Need to have 3 forms of ID for credit Only allow a maximum daily withdrawal of $200 After the 3rd log-in attempt, lock the log-in screen Accept no bills larger than $20 Others???

18 General Architecture for Data Warehousing
Source systems Extraction, (Clean), Transformation, & Load (ETL) Central repository Metadata repository Data marts Operational feedback End users (business)

19 Where does OLAP fit in?

20 OLAP Overview Interactive, exploratory analysis of multidimensional data to discover patterns

21 OLAP Architecture

22 Server Options Single processor Symmetric multiprocessor (SMP)
Massively parallel processor (MPP)

23 OLAP Server Options ROLAP (Relational) MOLAP (Multidimensional)
HOLAP (Hybrid)

24 OLAP – Online Analytical Processing
A definition: Data representation is in the form of a CUBE OLAP goes beyond SQL with its analysis capabilities Key feature of OLAP: Relevant multi-dimensional views such as products, time, geography

25 OLAP Cube - 1

26 OLAP Cube - 2

27 OLAP Cube - 3 Star Structure (quite common)

28 OLAP Cube - 4 The Cube

29 OLAP Cube - 5 Three- Dimensional Cube Display

30 OLAP Cube - 6 Six- Dimensional Cube

31 Rotation (Pivot Table)

32 Drill Down

33 OLAP Examples Excel Pivot Table example (similar to OLAP cube)

34 Sample of OLAP products
Just a snippet from ; not an endorsement

35 Data Mining versus OLAP

36 Data Mining versus OLAP
OLAP - Online Analytical Processing Provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening

37 Results of Data Mining Include:
Forecasting what may happen in the future Classifying people or things into groups by recognizing patterns Clustering people or things into groups based on their attributes Associating what events are likely to occur together Sequencing what events are likely to lead to later events

38 End of Chapter 15


Download ppt "Chapter 15 Data Warehousing, OLAP, and Data Mining"

Similar presentations


Ads by Google