Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining and Data Visualization

Similar presentations


Presentation on theme: "Data Mining and Data Visualization"— Presentation transcript:

1 Data Mining and Data Visualization
SOM 485 Fall 2007

2 Getting Started What is Data Mining? Online Analytical Processing
Data Mining Techniques Market Basket Analysis Limitations and Challenges to Data Mining Data Visualization Siftware Technologies

3 What is Data Mining (DM)?
Group of activities used to find different patterns in data Information provided through a Data Warehouse Provides valuable information for different types of research. -Set of activities used to find new, hidden, or unexpected patterns in data -A data warehouse is main source where all data is stored. Example: database -Research may be used for marketing or Customer Relationship Management

4 Applications of DM Customer Relationship Management (CRM)
software is an application that can benefit DM Activities of CRM One-to-One Marketing Sales Force Automation Sales Campaign Management Marketing Encyclopedia Call Center Automation Information found in Concepts in Enterprise Resource Planning by Brady, Monk, and Wagner

5 Verification of DM Requires a lot of prior knowledge on the decision maker’s part Used mainly in casinos i.e. Can determine if a new customer is a high roller, a souvenir buyer, a ticket purchaser, etc. Uses Siftware to help discover new patterns of customer spending habits Allows effective targeting to a specific group of customers -Requires a great deal of a priori knowledge on the part of the decision maker in order to verify a suspected relationship through the query. -The ability to categorize a new customer through their database has proven highly profitable. -Siftware: software specifically designed to find new and previously unclassified patterns in data.

6 Online Analytical Processing
Online Analytical Processing (OLAP) was introduced by E. F. Codd in 1993 OLAP: computer process that allows a user to extract data from different view points Scientific and Academic organizations store about 1 terabyte (1 trillion bytes) of new data each day. -Proposed that standard relational database used for transaction processing has reached its limit -Example: a user can request data to be analyzed to display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Florida in the same time period -Source came from a 2000 report from GTE research center.

7 OLAP continue… Codd’s 12 Rules for OLAP Multidimensional View
Transparent to the User Accessible Consistent Reporting Client-Server architecture Generic Dimensionality Dynamic Sparse Matrix Handling Multi-user Support Cross-Dimensional Operations Intuitive Data Manipulation Flexible Reporting Infinite Levels of Dimension and Aggregation -Codd developed 12 rules -To this day, not one implementation exist where all 12 rules are strictly obeyed. It may even be impossible

8 OLAP: MOLAP & ROLAP OLAP data is stored in a Multidimensional Database (MBD) MOLAP: OLAP application that accesses data from a multidimensional database MBD are frequently created using input from an existing Relational Database ROLAP: Relational Database server that can work with SQL for portability and scalability. - -MOLAP is a 3 dimensional database whereas ROLAP is 2 dimensional Information found on

9 DATA MINING TECHNIQUES
The popularity of data mining is growing at an astounding rate, and the new and innovative techniques to mine the warehouse are emerging at an unprecedented rate. Data mining techniques are sophisticated statistical and modeling software.

10 FOUR MAJOR CATEGORIES Classification Association Sequence Cluster
What are the techniques used to mine the data Data mining methods may be classified by the function they perform or by their class of application

11 CLASSIFICATION Mining processes intended to discover rules that define whether an item belongs to a particular class of data Two Sub-processes: 1) Building a Model 2) Predicting Classifications Suppose we want to look for undetermined buying patterns in a customer. A classification model can be constructed that maps the various customer attributes such as their age, gender, income with various product purchases like automobiles, clothing, books. From there given a set of predicting attributes, the model can be used against a list of customers to determine those most likely to make a particular purchase.

12 ASSOCIATION Techniques that employ association search all details from operational systems for patterns with a high probability of repetition Example: Market Basket Analysis Using a linkage approach, a retailer can mine data generated by a point-of-sale system, such as the price scanner at the grocery store. can find less obvious associations such as sixty-eight percent of the time that a customer buys beverages, he or she also buys pretzels. This type of information can be used to determine the location and content of promotional or end-of-aisle displays.

13 SEQUENCE Time series analysis methods relate events in time based on a series of preceding events Through analysis, various hidden trends, often highly predictive of future events, can be discovered. Example: Mail Industry An example of this application can be found in the direct mail industry, using a customers information, a catalog containing specific product types can be target mailed to a customer associated with a known sequence of purchases.

14 CLUSTER To create partitions so that all members of each set are similar according to some metric Simply a set of objects grouped together by virtue of their similarity or proximity to each other Example: Credit Card Transactions For instance, this approach might be used to mine credit card purchase data to discover that meals charged on a business-issued gold card are typically purchased on weekdays and have an average value greater than $250, whereas meals purchased using a personal platinum card occur mostly on weekends, have an average value of $175

15 DATA MINING TECHNOLOGIES
Providing new answers to old questions Developing new knowledge and understanding through discovery Statistical Analysis – statistically evaluating products and making a decision based on logical reasoning Neural Networks – attempts to mirror the way the human brain works in recognizing patterns by developing mathematical structures with the ability to learn There are numerous techniques that are available to assist in mining the data

16 DATA MINING TECHNOLOGIES CONT’
Genetic Algorithms and Fuzzy Logic – machine learning techniques derive meaning from complicated and imprecise data and can extract patterns from and detect trends within the data that are far too complex to be noticed by humans Decision Trees – assists in data mining applications by the classification of items or events contained within the warehouse

17 NEW APPLICATIONS FOR DATA MINING
Two new categories of applications 1) Text Mining – summarizes, navigates, and clusters documents contained in a database 2) Web Mining – integrates data and text mining within a Web site; enhances the Web site with intelligent behavior, such as suggesting related links or recommending new products to the consumer

18 Market Basket Analysis

19 Market Basket Analysis

20 Market Basket Analysis
Market Basket Analysis is an algorithm that examines a long list of transactions in order to determine which items are most frequently purchased together. It takes its name from the idea of a person in a supermarket throwing all of their items into a shopping cart (a "market basket").

21 Market basket analysis one of the most common and useful types of data analysis for marketing.
With the data gathered from MBA, marketers can group products that customers like and group them together. Market basket analysis can improve the effectiveness of marketing and sales tactics.

22 Benefits of Market Basket Analysis:
A good indication of consumer behavior Increase in sales Improves customer satisfaction Tracks what types of products interest consumer and finds relative alternative ones to introduce to the consumer.

23 ASSOCIATION RULES for MBA
Support Confidence Lift Method Association rules- are a common undirected data mining technique and complement market basket analysis. These rules are unidirectional Left-hand side rule IMPLIES Right-hand side rule ex. Pasta IMPLIES Wine, but Wine IMPLIES Pasta may not hold

24 40% of transactions that contain Pasta also contain Wine
40% of transactions that contain Pasta also contain Wine. 4% of transaction contain both of these items. Support- % measure of baskets where the association rule is true between the Left-hand side & the Right-hand side. ex. 4% of transactions contain both Confidence- Probability that the Right-hand side item is present once the Left-hand side item is present. ex. 40% of transactions that contain Pasta… p=.40 Lift- compares the likelihood of finding the right-hand side item in any random basket. Measures how well and associative rules performs by comparing how well an item can sell without the other item (improvement).

25 Method Frozen Pizza Milk Cola Potato Chips Pretzels 2 1 3

26 Market Basket Analysis
Market Basket analysis- determines what products customers purchase together

27 Limits to Market Basket Analysis
A large number of data is req. to obtain meaningful data, but data’s accuracy is compromised if all the products don’t occur w/in similar frequency. ex. Milk sells almost every transaction, but Elmer’s glue sells sporadically, its not effective to put them in same basket analysis. Sometimes presents results that are actually due to the success of previous market campaigns. ex. Discounted price of cola with purchase of pizza.

28 Using Data from MBA Once information has been gathered about different items and how they sell with respect to other items, a store may want to change their layout of items to improve their profits. ex. Lunchboxes and School Supplies For business without an actual storefront, they may want to offer promotions for products that sell together-increasing sales.

29 MARKET BASKET ANALYSIS In a Nutshell

30 Current Limitations and Challenges to Data Mining

31 Current Limitations & Challenges to Data Mining
New and underdeveloped field Identification of missing information Most companies run legacy systems Not DW (data warehouse) friendly DW designers have to convert existing ODSs (operational data stores) to homogenous form of DW

32 Current Limitations & Challenges to Data Mining
Not all knowledge about application domains are present in the data ODSs are normally limited to those needed by the operational application associated with that DB Data warehouse designers need to include mechanisms for “inventorying” data

33 Data noise & missing values
Most operational databases contain data errors in their values and/or classification Errors lead to misclassification Future data mining systems must incorporate more sophisticated mechanisms for treating “noisy data” Bayesian technique – a statistical technique

34 Large Databases & high dimensionality
Databases are large & dynamic Contents are always changing Data patterns must be constantly updated New discovery applications have to portion problems into smaller chunks of manageable data without losing any essential attributes of the data

35 Data Visualization Process by which numerical data are converted into meaningful 3-D images Example Intended to analyze complex data Data from: satellite photos, sonar measurements, surveys, or computer simulations

36 History of Data Visualization
Originated from statistics and science Example of 2-D Advancement credited to NCSA National Center for Supercomputing Applications Newest developments by Xerox PARC in virtual reality

37 Human Visual Perception
Human visual cortex dominates our perception Accelerates the identification of hidden patterns in data “A picture is worth a thousand words”

38 Geographical Information Systems (GIS)
A special-purpose DB which common spatial coordinate system is primary means of reference Requires: Data input Data storage, retrieval, and query Data transformation, analysis, and modeling Data reporting Integrates info. and aids in decision making

39 GIS continued Spatial Data – elements stored in map form
Contain three basic components: Points Lines Polygons Attribute Data – describes spatial data Example of GIS

40 Applications of Data Visualization Techniques
Retail Banking Government Insurance Health Care and Medicine Telecommunications Transportation Capital Markets Asset Management

41 Siftware Technologies

42 Siftware Technologies
IBM Informix Red Brick DB2 Oracle Silicon Graphics Sybase

43 Offers several Data Mining solutions, depending on users need.
IBM Information Warehouse Solutions IBM Visualizer Red Brick

44 Informix Three-tier model Tier 1: “Client” presentation layer
Tier 2: Hewlett-Packard hardware Tier 3: Data layer INFORMIX –OnLine database

45 Sybase Warehouse WORKS
Assemble data from may sources Transform data for a consistent and understandable view Distribute data where needed Provide high-speed access to the data

46 Leading company for large-scale data mining
Data spread across mutliple databases Data spread across processors for faster queries

47 Discover new patterns and trends that may not be realized using traditional SQL
Three-dimensional Visualization Visual models can save days and even months from the review process

48 Review Data mining (DM) Techniques used to mine data
Market Basket Analysis: The King of DM Algorithms

49 Review continued….. Current Limitations and Challenges to Data Mining
Data Visualization Siftware Technologies


Download ppt "Data Mining and Data Visualization"

Similar presentations


Ads by Google