Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.

Similar presentations


Presentation on theme: "Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary."— Presentation transcript:

1 Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary

2 Learning Objectives Learning Objectives (LO) LO1: Learn about field data LO2 : Learn about storage and retrieval of field data LO3: Learn about spatial data warehouses What are data warehouses? Why are they interesting? What are aggregate functions? Which ones are easy to compute? Mapping Sections to learning objectives LO1-8.1 LO2- 8.1.2, 8.2 LO3-8.3

3 8.3 Why are Data Warehouses Interesting? Data Warehouse facilitate group decision making Consider a dataset 1 measure (i.e. Sales) 3 dimensions (e.g. Company, Year, Region) Analysis questions Q1. Rank Regions by total sales. Q2. Rank years by total sales. Q3. Where are sales consistently growing? Cross tabulates summaries reports used to analyze the trends Example:

4 8.3 Generating cross-tabulation summaries Traditional Approach Use custom software pulling data out of a DBMS Limitations: redundant of work, inefficient use of resources Data Warehouse approach Cross-tab. Can be generated using a set of simple report Each report is generated from a SQL “Select... group by” statement Example: Fig. 8.19 (pp. 244) and Table 8.3 (pp. 245) Cross-tab example in last slide is a union of SALES-L0-A, SALES-L1-A, SALES-L1-B and SALES-L2 Table 8.3 shows SQL queries to compute each part Advantage Rest of SQL is available for pre/post processing of data Performance gains by eliminating unnecessary copying of data

5 Example Data Warehouse (Fig. 8.19) Fig 8.19

6 8.3.4 Cross-tabulation vs.report hierarchy Spreadsheet view of a report Views a report a N-dim. Spreadsheet N = number of dimension attributes Each cell contains value of “measure” Cross-tabulation view of a Report hierarchy Example: report hierarchy for SALES-L0-A, SALES-L2-A, SALES-L1-B, SALES-L2, Fig. 8.19 (pp. 244)

7 8.3 What is a Data Warehouse? Data Warehouse is a special purpose database Primarily used for specialized data analysis purposes Facilitates generation and navigation of a hierarchy of reports Special purpose data-sets and queries Data consists of a few measure attributes a set of dimension attributes The measure attribute depends on dimension attributes Queries generate reports Report measure for selected values of dimensions Aggregate measure for given subset of dimensions What is a spatial data warehouse? Data warehouses with spatial measures or dimensions Example: census data - census tract is a spatial dimension Example: logistics data - route is a spatial dimension

8 8.3.4 Data Warehouse Operations Operations on a data warehouse Roll-up, Drill-down Slice, Dice Pivot Roll-up Inputs: A report R, A subset S of dimensions in R Output: A sequence of reports summarizing R Example 1: R = SALES-Base, S = (Year, Region) in Fig. 8.19 (pp. 244) Output consists of reports SALES-L0-A, SALES-L1-B, SALES-L2 Example 2: R = SALES-Base, S = (Region, Year) Output consists of reports SALES-L0-A, SALES-L1-A, SALES-L2 Drill-down Inputs: A report R, A dimension D not in R Output: A reports detailing R on D Example: R = SALES-L1-B, D = Region in Fig. 8.19 (pp. 244) Output : report SALES-L0-A

9 8.3.4 Data Warehouse Operations Slice, Dice Reduce dimensions in a table- (Fig. 8.7, pp 232). Inputs: A report R, A value V for a dimension D in R Output: A subset of R where D =V Example: R = SALES-L0-A, D = Year, V = 1994 in Fig. 8.19 (pp. 244) Output: Table 8.5 (pp. 246) includes tuple (ALL, 1994, America, 35) Fig 8.7

10 8.3.4 Data Warehouse Operations Pivot For a spreadsheet view of reports Transposes a spreadsheet Example Inputs: A spreadsheet view of a report R Output: A transposed spreadsheet Ex.: R= SALES-L0-A, Fig. 8.19 (pp. 244)

11 Logical Data Model of a DWH Purpose of a logical data model Specify a framework to specify computational structure Allow extension of SQL to model new needs Cube operation Input : A fact table Output: A set of summary reports covering all subsets of dimension columns Equivalent to union of all tables and reports in Fig. 8.19 (pp. 244) Ex. Fig. 8.18, pp. 243 SELECT Company, Year, Region, Sum(Sales) AS Sales FROMSALES GROUP BY CUBE Company, Year, Region

12 Fig 8.18

13 Physical Data Model of a DWH Purpose: Computationally efficient implementation Ideas: Pre-computation - pre-compute some of reports and use those to compute other reports New indexing methods, e.g. bit-map index Query Processing Strategies Strategies for aggregate functions New strategies for multi-table joins Let us look at strategies for aggregate functions

14 DWH Physical Model: Aggregate function strategies Aggregate Functions Compute summary statistics for a given set of values Examples: sum, average, centroid (Table 8.1, pp. 238) Strategies for efficient computation Characterize easy to compute aggregate functions 3 categories Distributive Algebraic Holistic First 2 categories can be computed easily in one scan of the dataset

15 Definitions of Aggregate Function Categories Notation: F, G, G1, G2, … Gn are aggregate functions where n is small S is a set of values, e.g. S = (1, 2, 3, 4) P = (S1, S2, …, Sp) is a partition of S, e.g. P = (S1, S2), S1 = (1, 2), S2 = (3, 4) Distributive( F ) if there exists a G such that F( S ) = G ( F(S1), F(S2), …, F(Sn) ) Example: sum is distributive Illustration: sum(1, 2, 3, 4) = sum ( sum(1, 2), sum(3, 4) Algebraic( F ) if there exists G1, …, Gn, (where n is small) and F( S ) = G ( G1(S1), …, Gn(S1), G2(S1), …, Gn(S2), …, G1(Sp), …, Gn(Sp) ) Example: average is distributive Illustration: average(1, 2, 3, 4) = { count(1, 2) * average(1, 2) + count(3, 4) * average } / { count(1,2) + count(3,4) }

16 Example: Distributive Aggregate Function Fig 8.14 Examples in cross-tabulation scenario (Fig. 8.14, pp.238): Example 1. Min is distributive Example 2. Count is distributive

17 Examples: Algebraic Aggregate Functions Fig 8.15 Examples in cross-tabulation scenario (Fig. 8.15, pp.239): Average and Variance are algebraic

18 Discussion - Spatial Data Warehouse Example Consider the example in Fig. 8.16, pp. 241 A map interpretation may be attached to each report Each row has a spatial footprint, which can be aggregated by geometric-union The collection of maps may be called a mapcube Issues: What is needed in OGIS standard to support map-cube operation? Hierarchical collection of maps in mapcube What is an appropriate cartography to convey the relationship among maps?

19 Spatial Data Warehouses and Mapcube Fig 8.16

20 Fig 8.17

21 Summary Field data useful in many applications due to rich content Represented as raster or image Operations can be categorized into local, focal, zonal, and global Field data storage and retrieval Tiling is a preferred way to divide raster data into disk blocks Meta-data based query is often used for retrieval Content based retrieval may be used for similarity searches Data warehouses support analysis e.g. cross-tabulation reports SQL CUBE operator support generation of DWH reports Distributive and Algebraic aggregate functions can be computed easily


Download ppt "Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary."

Similar presentations


Ads by Google