Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.

Similar presentations


Presentation on theme: "Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data."— Presentation transcript:

1 Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data mining is a step in the KDD process of applying data analysis and discovery algorithms Machine learning, pattern recognition, statistics, databases, data visualization. Traditional techniques may be inadequate –large data

2 Why Mine Data? Huge amounts of data being collected and warehoused –Walmart records 20 millions per day –health care transactions: multi-gigabyte databases –Mobil Oil: geological data of over 100 terabytes Affordable computing Competitive pressure –gain an edge by providing improved, customized services –information as a product in its own right

3 Data mining Pattern –1212121? –’12’ pattern is found often enough So, with some confidence we can say ‘?’ is 2 –“If ‘1’ then ‘2’ follows” –Pattern  Model Confidence –12121? –12121231212123121212? –121212  3 Models are created using historical data by detecting patterns. It is a calculated guess about likelihood of repetition of pattern.

4 Where are Models Used? 1.Selection Business trying to select prospective customers (Profitability) A model that predicts the LD usage based on credit history. 2.Acquisition Selection is who would you like to invite to a party. Acquisition is about getting them to agree. Putting together a plan that will make them say yes. Again a model. 3.Retention Keeping your flock together! Sensing it before they jump the ship. 4.Extension Extending services to existing customers. Cross-selling

5 Data Mining Techniques Classification Clustering Association Rule Discovery Sequential Pattern Discovery

6 Classification Data defined in terms of attributes, one of which is the class Find a model for class attribute as a function of the values of other(predictor) attributes, such that previously unseen records can be assigned a class as accurately as possible. Training Data: used to build the model Test data: used to validate the model (determine accuracy of the model) Given data is usually divided into training and test sets.

7 Classification:Example

8 Classification: Direct Marketing Goal: Reduce cost of soliciting (mailing) by targeting a set of consumers likely to buy a new product. Data –for similar product introduced earlier –we know which customers decided to buy and which did not {buy, not buy} class attribute –collect various demographic, lifestyle, and company related information about all such customers - as possible predictor variables. Learn classifier model

9 Classification: Fraud detection Goal: Predict fraudulent cases in credit card transactions. Data –Use credit card transactions and information on its account- holder as input variables –label past transactions as fraud or fair. Learn a model for the class of transactions Use the model to detect fraud by observing credit card transactions on a given account.

10 Clustering Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that –data points in one cluster are more similar to one another –data points in separate clusters are less similar to one another. Similarity measures –Euclidean distance if attributes are continuous –Problem specific measures

11 Clustering: Market Segmentation Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. Approach: –collect different attributes on customers based on geographical, and lifestyle related information –identify clusters of similar customers –measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters.

12 Association Rule Discovery Given a set of records, each of which contain some number of items from a given collection –produce dependency rules which will predict occurrence of an item based on occurrences of other items

13 Association Rules:Application Marketing and Sales Promotion: Consider discovered rule: {Bagels, … } --> {Potato Chips} –Potato Chips as consequent: can be used to determine what may be done to boost sales –Bagels as an antecedent: can be used to see which products may be affected if bagels are discontinued –Can be used to see which products should be sold with Bagels to promote sale of Potato Chips

14 Association Rules: Application Supermarket shelf management Goal: to identify items which are bought together (by sufficiently many customers) Approach: process point-of-sale data (collected with barcode scanners) to find dependencies among items. Example –If a customer buys Diapers and Milk, then he is very likely to buy Beer –so stack six-packs next to diapers?

15 Sequential Pattern Discovery Given: set of objects, each associated with its own timeline of events, find rules that predict strong sequential dependencies among different events, of the form (A B) (C) (D E) --> (F) xg :max allowed time between consecutive event-sets ng: min required time between consecutive event sets ws: window-size, max time difference between earliest and latest events in an event-set (events within an event-set may occur in any order) ms: max allowed time between earliest and latest events of the sequence.

16 Sequential Pattern Discovery: Examples sequences in which customers purchase goods/services understanding long term customer behavior -- timely promotions. In point-of--sale transaction sequences –Computer bookstore: (Intro to Visual C++) (C++ Primer) --> (Perl for Dummies, TCL/TK) –Athletic Apparel Store: (Shoes) (Racket, Racketball) --> (Sports Jacket)


Download ppt "Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data."

Similar presentations


Ads by Google