Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.

Similar presentations


Presentation on theme: "Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling."— Presentation transcript:

1 Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling of Data Warehouses Defining a Snowflake Schema in Data Mining Query Language DMQL Multi-Tiered Architecture - Approaches to Building OLAP Server Indexing OLAP Data: Bitmap Index Data Warehouse Back-End Tools and Utilities From OLAP to On Line Analytical Mining OLAM, An OLAM Architecture Course outlines

2 2 What is a Warehouse? Collection of diverse data  subject oriented  aimed at executive, decision maker  often a copy of operational data  with value-added data (e.g., summaries, history)  integrated  time-varying  non-volatile more

3 3 What is a Warehouse? Collection of tools  gathering data  cleansing, integrating,...  querying, reporting, analysis  data mining  monitoring, administering warehouse

4 4 Warehouse Architecture Client Warehouse Source Query & Analysis Integration Metadata

5 5 OLAP Engine Data Sources Front-End Tools Data Storage Extract Transform Load ETL Refresh Data Warehouse Analysis Query Reports Data mining Monitor & Integrator Metadata Serve Data Marts Operational DBs Other sources OLAP Server Data warehouse software architecture Multi-Tiered Architecture

6 6 Data Cleaning Data Integration Databases Selection Data Mining Pattern Evaluation Data Warehouse Task-relevant Data Knowledge What is a Data Warehouse? Defined in many different ways, but not rigorously.  A decision support database that is maintained separately from the organization’s operational database  Support information processing by providing a solid platform of consolidated, historical data for analysis. “A data warehouse is a  subject-oriented,  integrated,  time-variant,  and nonvolatile collection of data in support of management’s decision-making process.” Data warehousing: The process of constructing and using data warehouses W. H. Inmon 

7 7 Data Warehouse (1/2) Subject Oriented  Organized around major subjects, such as customer, product, sales.  Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing.  Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process. Integrated  Integrate multiple, heterogeneous data sources - relational databases, flat files, on-line transaction records  Data cleaning and data integration techniques are applied.  Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources (E.g., Hotel price: currency, tax, breakfast covered, etc.)  When data is moved to the warehouse, it is converted. What is a data warehouse? Data Cleanin g Data Integration Databases Selection Data Mining Pattern Evaluation Data Warehouse Task- relevant Data Knowledge

8 8 Data Warehouse (2/2) Time Variant  The time horizon for the data warehouse is significantly longer than that of operational systems.  Operational database: current value data.  Data Warehouse data: provide information from a historical perspective (e.g., past 5-10 years)  Every key structure in the data warehouse  Contains an element of time, explicitly or implicitly  But the key of operational data may or may not contain “time element”. Non-Volatile  A physically separate store of data transformed from the operational environment.  Operational update of data does not occur in the data warehouse environment.  Does not require transaction processing, recovery, and concurrency control mechanisms  Requires only two operations in data accessing: initial loading of data and access of data. What is a data warehouse? Data Cleanin g Data Integration Databases Selection Data Mining Pattern Evaluation Data Warehouse Task- relevant Data Knowledge

9 9 Motivating Examples  Forecasting  Comparing performance of units  Monitoring, detecting fraud  Visualization

10 10 Why a Warehouse? Two Approaches:  Query-Driven  Warehouse Source ?

11 11 Data Warehouse vs. Heterogeneous DBMS Traditional heterogeneous DB integration:  Build wrappers/mediators on top of heterogeneous databases  Query driven approach:  When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set.  Complex information filtering, compete for sources  Data warehouse: update-driven, high performance. What is a data warehouse?

12 12 Query-Driven Approach Client Wrapper Mediator Source

13 13 Advantages of Warehousing  High query performance  Queries not visible outside warehouse  Local processing at sources unaffected  Can operate when sources unavailable  Can query data not stored in a DBMS  Extra information at warehouse  Modify, summarize (store aggregates)  Add historical information

14 14 Advantages of Query-Driven  No need to copy data  less storage  no need to purchase data  More up-to-date data  Query needs can be unknown  Only query interface needed at sources  May be less draining on sources

15 15 Data Warehouse vs. Operational DB Systems OLTP (on-line transaction processing) Describes processing at operational sites Major task of traditional relational DBMS - Most database operations are of a type called OLTP. Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. OLAP (on-line analytical processing) Major task of data warehouse system Data analysis and decision making What is a data warehouse?

16 16 Data Warehouse vs. Operational DB Systems Distinct features (OLTP vs. OLAP):  User and system orientation: customer vs. market  Data contents:current, detailed vs. historical, consolidated  Database design: ER + application vs. star + subject  View: current, local vs. evolutionary, integrated  Access patterns: update vs. read-only but complex queries What is a data warehouse?

17 17 OLTP vs. OLAP - Common architecture  Local databases, say one per branch store, handle OLTP,  while a warehouse integrating information from all branches handles OLAP. What is a data warehouse? The most complex OLAP queries are often referred to as data mining

18 18 OLAP – Online Analytical Processing A definition:  Data representation is in the form of a CUBE  OLAP goes beyond SQL with its analysis capabilities  Key feature of OLAP: Relevant multi-dimensional views such as products, time, geography

19 19 Why Separate Data Warehouse?  High performance for both systems:  DBMS — tuned for OLTP: access methods, indexing, concurrency control, recovery  Warehouse — tuned for OLAP: complex OLAP queries, multidimensional view, consolidation.  Different functions and different data: missing data: Decision support requires historical data which operational DBs do not typically maintain data consolidation: Decision support requires consolidation (aggregation, summarization) of data from heterogeneous sources data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled. What is a data warehouse? Data Cleanin g Data Integration Databases Selection Data Mining Pattern Evaluation Data Warehouse Task- relevant Data Knowledge


Download ppt "Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling."

Similar presentations


Ads by Google