Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 8 Newer Database Topics Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business 3500.

Similar presentations


Presentation on theme: "Chapter 8 Newer Database Topics Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business 3500."— Presentation transcript:

1 Chapter 8 Newer Database Topics Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business 3500 DBMS Bob Travica Updated 2010

2 DBSYSTEMS 2 of 20 OLAP & Data Warehouse Online Transaction Processing (OLTP): Querying Databases with 3NF tables Operations’ data Predefined reports Online Analytical Processing (OLAP); Data warehousing; Data Mining. Usually denormalized data. Periodical transfers Interactive data analysis Flat files MIS 3500

3 DBSYSTEMS 3 of 20 OLTP vs. OLAP

4 DBSYSTEMS 4 of 20 Warehousing Goals  Integrate data from different sources to get a larger picture of business  Data aggregations (summaries on different dimensions)  Ad hoc queries (support non-routine decision making)  Statistical analysis (test hypotheses on relationships between pieces of data)  Discover new relationships (data mining)

5 DBSYSTEMS 5 of 20 Extraction, Transformation, and Transportation Data warehouse: All data must be consistent. Customers Convert “Client” to “Customer” Apply standard product numbers Convert currencies Fix region codes Transaction data from diverse systems. Preparations performed on data Extract TransformTransport

6 DBSYSTEMS 6 of 20 Three-Dimensional View of Data: Cube Sale Date Customer Location Category Similar ideas used in crosstab query and pivot table.

7 DBSYSTEMS 7 of 20 Data Hierarchy Year Quarter Month Week Day Levels Roll-up To get higher-level totals Drill-down To get lower-level details

8 DBSYSTEMS 8 of 20 Star Design Amount=SalePrice*Quantity Fact Table Sale SaleDate SalePrice Quantity Dimension Table Measures Amounts broken down by product category, period, and customer location. Product Category Customer Location Dimension Table Hierarchical: Dimension tables can link only via fact table.

9 DBSYSTEMS 9 of 20 Snowflake Design SaleID ItemID Quantity SalePrice Amount OLAPItems ItemID Description QuantityOnHand ListPrice Category Merchandise SaleID SaleDate EmployeeID CustomerID SalesTax Sale CustomerID Phone FirstName LastName Address ZipCode CityID Customer CityID ZipCode City State City Network-like design: Dimension tables can link directly.

10 DBSYSTEMS 10 of 20 Excel Pivot Table Reports Can place data in rows or columns. By grouping months, can instantly get quarterly or monthly totals.

11 DBSYSTEMS 11 of 20 CUBE Option (SQL 99) Bird1135.0000 Bird245.0000 … Bird(null)32.0000 Bird(null)607.5010 Cat1396.0000 Cat2113.8500 … Cat(null)1293.3010 (null)11358.801 (null)21508.9401 (null)32362.6801 … (null)(null)8451.7911 CategoryMonthAmountGcGm SELECT Category, Month, Sum, GROUPING (Category) AS Gc, GROUPING (Month) AS Gm FROM … GROUP BY CUBE (Category, Month...)

12 DBSYSTEMS 12 of 20 GROUPING SETS: Hiding Details Bird(null)607.50 Cat(null)1293.30 … (null)11358.8 (null)21508.94 (null)32362.68 … (null)(null)8451.79 CategoryMonthAmount SELECT Category, Month, Sum FROM … GROUP BY GROUPING SETS (ROLLUP (Category), ROLLUP (Month), ( ) )

13 DBSYSTEMS 13 of 20 SQL RANK Functions SELECT Employee, SalesValue RANK() OVER (ORDER BY SalesValue DESC) AS rank DENSE_RANK() OVER (ORDER BY SalesValue DESC) AS dense FROM Sales ORDER BY SalesValue DESC, Employee; EmployeeSalesValuerankdense Jones18,00011 Smith16,00022 Black16,00022 White14,00043 DENSE_RANK does not skip numbers Therefore, advances in SQL motivate DBMS vendors to support OLAP and data warehousing.

14 DBSYSTEMS 14 of 20 Data Mining  Goal: To discover unknown relationships in the data that can be used to make better decisions.  Exploratory analysis.  A bottom-up approach that scans the data to find relationships  Some statistical routines, but they are not sufficient  Statistics relies on averages  Sometimes the important data lies in more detailed pairs  Supervised by developer vs. unsupervised (self-organizing artificial neural networks)

15 DBSYSTEMS 15 of 20 Common Techniques  1. Classification/Prediction  2. Association Rules/Market Basket Analysis  3. Clustering

16 DBSYSTEMS 16 of 20 1. Classification (Prediction)  Purpose: “Classify” things that are causes and those that are effects.  Examples  Which borrowers/loans are most likely to be successful?  Which customers are most likely to want a new item?  Which companies are likely to file bankruptcy?  Which workers are likely to quit in the next six months?  Which startup companies are likely to succeed?  Which tax returns are fraudulent?

17 DBSYSTEMS 17 of 20 Classification Process  Clearly identify the outcome/dependent variable.  Identify potential variables that might affect the outcome.  Use sample data to test and validate the model.  Regression/correlation analysis, decision tables and trees, etc. IncomeCredit HistoryJob StabilityCredit Success 50000Good Yes 75000MixedBadNo

18 DBSYSTEMS 18 of 20 2. Association/Market Basket  Purpose: Determine what events or items go together/co-occur.  Examples:  What items are customers likely to buy together? (Business use: Consider putting the two together to increase cross-selling.)

19 DBSYSTEMS 19 of 20 Association Challenges  If an item is rarely purchased, any other item bought with it seems important. So combine items into categories.  Some relationships are obvious.  Burger and fries.  Some relationships are puzzling/meaningless.  Hardware store found that toilet rings sell well only when a new store first opened. But what does it mean?

20 DBSYSTEMS 20 of 20 3. Cluster Analysis  Purpose: Determine groups of people or some entities.  Examples  Are there groups of customers? (If so, we could target them; market segmentation)  Do the locations for our stores have elements in common? (If so, we can search for similar clusters for new locations.)  Do employees have common characteristics? (If so, we can hire similar, or dissimilar, people.) Small intracluster distance Large intercluster distance


Download ppt "Chapter 8 Newer Database Topics Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business 3500."

Similar presentations


Ads by Google