Presentation is loading. Please wait.

Presentation is loading. Please wait.

Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema.

Similar presentations


Presentation on theme: "Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema."— Presentation transcript:

1 Background of OLAP, DW, Data Mining

2 Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema –Fact, dimension –Dimensional modeling –Data mining

3 Introductory Concepts OLAP Data Warehouse Data Mining Multidimensional Cube Star Schema Relevance of Terms

4 Introductory Concepts OLAP(On-Line Analytical Processing) –Complex query processing for decision making –Report generation for the summary/aggregate /statistics over source data –Advanced data analysis techniques OLTP(On-Line Transaction Processing) –Simple transaction processing for regular (daily) business operations –Manipulating databases maintained for the organization’s activities –Optimized for simple operations (select,insert...)

5 Introductory Concepts 본부 현업 부서 정보 시스템 총장, 교육부, 국회 업무 데이타베이스 OLTP DB Transaction 요청 통계 자료 요청 보고서 요청

6 Introductory Concepts 본부 현업 부서 정보 시스템 총장, 교육부, 국회 업무 데이타베이스 OLTP DB Transaction 요청 통계 자료 요청 보고서 요청 Solution (1) 통계 / 보고서 생성 모듈

7 Introductory Concepts 본부 현업 부서 정보 시스템 총장, 교육부, 국회 업무 데이타베이스 OLTP DB Transaction 요청 통계 자료 요청 보고서 요청 Solution (2) 임의의 통계 생성 도구 OLAP 시스템

8 Introductory Concepts 본부 현업 부서 정보 시스템 총장, 교육부, 국회 업무 데이타베이스 OLTP DB Transaction 요청 통계 자료 요청 보고서 요청 Solution (2) OLAP 시스템 분석용 / 요약 자료 Data Warehouse

9 Introductory Concepts OLAP systems/tools –Present summary/aggregate data for a large OLTP databases in several ways to decision makers –Manipulate DW over several dimensions dynamically to find useful information Data warehouse –A collection of data which are used for OLAP –Summary/aggregate data extracted from source OLTP databases which can support every possible OLAP requests. –Represented by star schema

10 Introductory Concepts OLAP Data Warehouse Data Mining Multidimensional Cube Star Schema Relevance of Terms

11 Page 11

12 Introductory Concepts OLTP(Online Transaction Processing) –Typical daily business query and update processing OLAP (Online Analytical Processing) –Complex query processing or report generation Advanced data analysis techniques ROLAP: Relational OLAP MOLAP: Multi-dimensional OLAP Data Warehouse –An enterprise-wide data repository for decision support Data Mart –A smaller targeted DW for a business process Star Schema –A DB structure for data warehouse

13 Data Warehouse DW is an integrated repository of data that is put into a form that can be easily understood, interpreted, and analyzed by the people who need to use it to make decisions Data are extracted from operational systems, then cleansed, integrated, transformed, and aggregated, into a read-only database that is optimized for decision making Data Warehouse is a –Subject-oriented –Integrated –Time-variant –Non-volatile –Collection of data in support of management’s decision- making process (W.H. Inmon)

14 Data Warehouse Motivation for DW –An enterprise-wide repository of data, information, knowledge, and meta-data Gather all the information into a single place for in-depth analysis Decouple such analysis from OLTP systems –Transform the data into information Provide right information in the right format at the right time –Perform sophisticated analysis of data Perform trend analysis, time series analysis, risk analysis, etc. Perform DSS exploration such as alternative formation, alternatives testing, decision-making, etc. Discover/visualize hidden facts, patterns, correlations, rules, exceptions using data mining techniques

15 Data Warehouse OLTP queries –How many shoes did we sell last month? –What are the age, address, phone of a certain student (e.g., Hong Gildong)

16 Data Warehouse OLAP queries –How many size 10 shoes in red did we sell last month in the Midwest, the Northeast, compared with the same month last year, actual vs. budget? –What are the top 25 brands, by products, styles, and regions, for this period for total US based on sales dollars? –How much promotional expenses did we spend on customers who purchased less than $100 worth of products? –How much discount should we offer to boost the sales volume significantly? –Find the correlation between buying patterns of products of type A and those of type B. –What are sales trends? –What percent of the market do we own? –How are our defect rates improving? –Are our profits are increasing or decreasing?

17 OLTPData Warehouses & OLAP PurposeDaily business support, Transaction processing Decision support, Analytic processing UserData entry clerk, administrator, developer Decision maker, executives DB designApplication orientedSubject-oriented DB design modelER modelStar, snowflake, multidimensional model Data structuresNormalized, complexDenormalized, simple DataCurrent, up-to-date operational data Atomic Isolated Historical Atomic and summarized Integrated UsageRepetitive, RoutineAd-hoc UpdateTransaction constantly generate New data, read/write Data is relatively static Often refreshed weekly, read mostly Response timeSubsecond to secondSeconds, minutes, worse Index typesB+ treesB+ trees, bitmap index, join index Systems requirements Transaction throughput, Data consistency Query throughput, Data accurary Comparison of OLTP and Data Warehouse –DWs require a new query-centric view of the data

18 Data Warehouse Example Benefits of Data Warehousing & Data Mining –Fast sophisticated report generations –4-15 times faster delivery –Young men buy beer on Friday nights when they buy diapers –More athletic shoes are sold on Friday evenings and Saturdays than during the rest of the week combined –More athletic shoes are sold when white tube socks are prominently displayed as part of a 2-for-1 sale. –In a retail chain, potato chip purchases were accompanied by a soda purchase in half the cases. That figure increases to 75% when there is a marketing promotion. –Blue Cross found some providers had superior treatment success rates for some fatal diseases –Victoria Secret found a particular incentive was ineffective, saving $300K per week.

19 Data Warehouse Database Design for DW –Objective of a DW Creating a database optimized for decision support –Limitations of ER model for DW applications Normalized data model support large numbers of transactions with very few records. ER models tend to be very complex and difficult to navigate. ER model identifies first entities, then relationships –Four basic requirements of a warehouse design The schema must be simple. The data must be clean, consistent, and accurate Query processing must be fast Load the data into the warehouse quickly –Two types of data representation in DW Star Schema Multi-dimensional array

20 Star Schema Dimensional Model (Star Schema)Star Schema –A database schema for data warehousing –Initially developed to simplify SQL queries (by Ralph Ki mball) –Consists of a few central fact tables and many dimensi on tables Dimension = analysis criteria Fact = measurements aggregated over dimensions to be analyzed –Simplifies end-user query processing and high query p erformance –Used to reduce joins by OLAP/Relational engines –Relatively few tables and well-defined join paths

21 Star Schema

22 Dimensional Model (Star Schema) consists of: –Fact table:Fact table: Stores all transactions or factual data that are analyzed Typically numeric measures From millions to more than billion rows Example: Revenue, Actuals, Budgets, Sales, Orders, Bookings, Claims –Dimension table:Dimension table: Attributes about facts Supports grouping, browsing, constraining Provides the entry points into the DW Example: Time, Customer, Promotions, Demographics, LifeStyles, Prod ucts, Stores, Markets

23 Star Schema Dimension hierarchy –Each dimension table has several attributes –These attributes may have a hierarchy Different level of aggregation for fact data –User may want aggregate data by some level of the di mension hierarchy Year Month day State County City Street

24 Star Schema The Strengths of Dimensional Modeling (Kimball 98) –Provides a predictable, standard framework Report writers, query tools, and UIs could take advantage of DM structure Supports presentation and performance Simplifies the understanding and navigation of metadata –Is robust against unexpected changes in user behavior Logical design is independent of the use of the schema All dimensions are equally entry points to the system –Can be changed gracefully Could add new unanticipated facts of the same grain Could add new dimensions Add new unanticipated dimensional attributes Break existing dimension records down to a lower level of granularity –Availability of common modeling situations Slowly changing dimensions Heterogeneous products Pay-in-advance databases Factless facts as in event-handling –Availability of administrative utilities and SW processes that manage an d use aggregates

25 Star Schema Variations of Star Schema –Star SchemaStar Schema All dimensions are denormalizeddenormalized Wide dimensions and deep facts –Snowflake Schema All dimensions are normalized into 3NF

26 Star Schema Summary of DW and Star Schema –DW is databases specially maintained for analytical processing –DW is organized by facts and dimensions –Star schema is a way of representing DW schema Cube, Data Mining??

27 Page 27

28 Multidimensional Cube Motivation of Cube –Similar OLAP queries can be executed repeatedly to find some valuable information. –Users want to view data in several different perspectives –It is time-consuming to calculate these data which use similar facts and dimensions –Cube is a special data structure to meet this kind of OLAP queries’ requirements dim1 dim2 dim3 dim4 dim5 dim6 dim7 dim8 dim9 dim10 fact1 fact2 fact3 dim1 dim2 dim7 fact1 dim1 dim7 fact1 dim1fact1

29 Multidimensional Cube –Maintain every possible pairs of dimensions and aggregation of fact data –Types of OLAP ROLAP : maintaining cube by relational table MOLAP : maintaining cube by multidimensional array dim1 dim2 dim3 dim4 dim5 dim6 dim7 dim8 dim9 dim10 fact1 fact2 fact3 dim1 dim7 dim2 Query1 Query2 Query3

30 Multidimensional Cube –Maintain every possible pairs of dimensions and aggregation of fact data –Types of OLAP ROLAP : maintaining cube by relational table MOLAP : maintaining cube by multidimensional array dim1 dim2 dim3 dim4 dim5 dim6 dim7 dim8 dim9 dim10 fact1 fact2 fact3 Query1 Query2 Query3

31 Multidimensional Cube Multidimension Cube –Consist of several dimensions and facts –Its schema can be represented by star schema –Two different perspectives Cube is a part of data warehouses Cube is a data structure for OLAP tool which is extracted from data warehouses

32 Introductory Concepts OLAP Operations –Drill down. View the ata in a more specialized level within a dimension Show me total sales for by quarter Show me total sales for this quarter by department Show me total sales for this quarter, dept 22, by state –Roll-up. Converse of drill-down Drilling down is adding new headers from the dimension tables Rolling up is moving row headers –Slice and dice. Slice (dice) data on columns (rows), eliminating the rest of the display. Keep a subset of dimensions for selected values. –Pivot. Switching the arrangement of the dimensions –Cross-tab. A matrix report with X and Y axis –Other Operations Ranking and computed attribute definitions Multiple granularity aggregates (e.g., cube operator)

33 Introductory Concepts OLAP Data Warehouse Data Mining Multidimensional Cube Star Schema Relevance of Terms

34 Page 34 Data Warehouse

35 Page 35 OLAP tools

36 Data Mining What is Data Mining? –Is the process of knowledge discovery (useful information, patterns, groups, etc.) from a large of data. –Is the principle of soring through large amounts of data and picking out relevant information (by Wikipedia). –Is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. –Is the science of extracting useful information from large data sets or databases. –Data may be stored in files, databases, webs, data warehouses, etc.

37 Data Mining Data Mining algorithm Mining Algorithm Large input data Knowledge Information Data Warehouse Multidimen- sional Cube OLAP tools

38 Summary OLAP Data Warehouse Data Mining Multidimensional Cube Star Schema Relevance of Terms

39 Page 39


Download ppt "Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema."

Similar presentations


Ads by Google