Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.

Similar presentations


Presentation on theme: "Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing."— Presentation transcript:

1 Chapter 18: Data Analysis and Mining Kat Powell

2 Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing ➔ Data Mining

3 Decision Support Systems ➔ Make business decisions, often based on data collected by on-line transaction-processing systems ➔ What items to stock? ➔ To whom to send advertisements? ➔ Examples of data used for making decisions ➔ Retail sales transaction details ➔ Customer profiles (income, age, gender, etc.)

4 Decision-Support Systems Overview ➔ Data analysis tasks are simplified by specialized tools and SQL extensions ➔ Data mining seeks to discover knowledge automatically in the form of statistical rules and patterns from large databases. ➔ A data warehouse archives information gathered from multiple sources, and stores it under a unified schema, at a single site. ✔ Important for large businesses that generate data from multiple divisions, possibly at multiple sites

5 Data Analysis and OLAP ➔ OnLine Analytical Processing (OLAP) ➔ Online interactive tools for analysis and summary of data that present it to humans in understandable format ➔ Multidimensional data – data modeled as dimension attributes and measure attributes in table summary format – Measure attributes measure some value and can be aggregated upon – Dimension attributes define the dimensions on which measure attributes (or aggregates thereof) are viewed

6 Cross Tabulation (aka cross-tab) of sales by item-name and color ➔ item_name, color, and size are the dimension attributes of the sales relation ➔ the number values in each cell represent measure attributes ➔ Extra column and row store the cell totals for each column and row Summary of sales

7 Relational Representation of Cross-tabs ➔ Cross-tabs also can be represented as relations ➔ The value 'all' is used to represent an aggregate, that is, the set of all values for an attribute ➔ The SQL:1999 standard actually uses 'null' values in place of 'all' despite confusion with regular 'null' values

8 Data Cube (Sales relation) ➔ Multidimensional generalization of a cross-tab ➔ Can have n dimensions; we see 3 here ➔ Cross-tabs can be used as views on a data cube ➔ All cells contain values...even the invisible (inner) cells

9 Online Analytical Processing ➔ Pivoting: changing the dimensions used in a cross-tab EX: An analyst may select a cross-tab on item_name and size OR on color and size ➔ Slicing: creating a cross-tab for fixed values only ● Sometimes called dicing, particularly when values for multiple dimensions are fixed. EX: An analyst may wish to see cross-tab on item_name and color for a fixed value of size (for instance...large), instead of the sum across all sizes.

10 Online Analytical Processing – OLAP systems permit users to view data at any desired level of granularity – Rollup: moving from finer-granularity data to a coarser granularity EX: Starting from the data cube on the sales table, got the cross-tab by rolling up on the attribute 'size' – Drill down: The opposite operation - that of moving from coarser-granularity data to finer- granularity data EX: usually generated from the original data like drilling for fine, rare jewels

11 Data Warehousing ➔ Data sources often store only current data, not historical data ➔ Corporate decision making requires a unified view of all organizational data, including historical data ➔ A data warehouse is a repository (archive) of information gathered from multiple sources, stored under a unified schema, at a single site ➔ Greatly simplifies querying, permits study of historical trends ➔ Shifts decision support query load away from transaction processing systems

12 Data Warehousing

13 Warehouse Design Issues ➔W➔When and how to gather data ➔S➔Source driven architecture: data sources transmit new information to warehouse, either continuously or periodically (e.g. at night) ➔D➔Destination driven architecture: warehouse periodically requests new information from data sources ➔K➔Keeping warehouse exactly synchronized with data sources (e.g. using two-phase commit) is too expensive ➔U➔Usually OK to have slightly out-of-date data at warehouse –D–D ata/updates are periodically downloaded form online transaction processing (OLTP) systems.

14 More Warehouse Design Issues ➔ What schema to use ➔ Schema integration – Data cleansing ● E.g. correct mistakes in addresses (misspellings, zip code errors, etc.) ● Merge address lists from different sources and purge duplicates – How to propagate updates ● Warehouse schema may be a (materialized) view of schema from data sources

15 Data Mining ➔ The process of semi-automatically analyzing large databases to find useful patterns ➔ Prediction based on past history ➔ Predict if a credit card applicant poses a good credit risk, based on some attributes (income, job type, age,..) and past history ➔ Predict if a pattern of phone calling card usage is likely to be fraudulent

16 Data Mining (Cont.) – Some other examples of prediction mechanisms: ● Classification – Given a new item whose class is unknown, predict to which class it belongs ● Regression formulae – Given a set of mappings for an unknown function, predict the function result for a new parameter value

17 Data Mining (Cont.) – Descriptive Patterns ● Associations – Find books that are often bought by “similar” customers. If a new such customer buys one such book, suggest the others too. ● Associations may be used as a first step in detecting causation – E.g. association between exposure to chemical X and cancer ● Clusters – E.g. typhoid cases were clustered in an area surrounding a contaminated well – remains important in detecting epidemics

18 Other Types of Mining ➔ Text mining: application of data mining to textual documents ➔ cluster Web pages to find related pages ➔ cluster pages a user has visited to organize their visit history ➔ classify Web pages automatically into a Web directory

19 Other Types of Mining ➔ Data visualization systems help users examine large volumes of data and detect patterns visually ➔ Can visually encode large amounts of information on a single screen ➔ Humans are very good a detecting visual patterns

20 The END


Download ppt "Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing."

Similar presentations


Ads by Google