Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Data Mining Dr. Hany Saleeb Why Data Mining? — Potential Applications zDirect Marketing yidentify which prospects should be included.

Similar presentations


Presentation on theme: "Introduction to Data Mining Dr. Hany Saleeb Why Data Mining? — Potential Applications zDirect Marketing yidentify which prospects should be included."— Presentation transcript:

1

2 Introduction to Data Mining Dr. Hany Saleeb

3 Why Data Mining? — Potential Applications zDirect Marketing yidentify which prospects should be included in a mailing list zMarket segmentation yidentify common characteristics of customers who buy same products zMarket Basket Analysis yIdentify what products are likely to be bought together zInsurance Claims Analysis ydiscover patterns of fraudulent transactions ycompare current transactions against those patterns

4 What Is Data Mining? zCombination of AI and statistical analysis to discover information that is “hidden” in the data yassociations (e.g. linking purchase of pizza with beer) ysequences (e.g. tying events together: marriage and purchase of furniture) yclassifications (e.g. recognizing patterns such as the attributes of employees that are most likely to quit) yforecasting (e.g. predicting buying habits of customers based on past patterns) Expert systems or small ML/statistical programs

5 What can data mining do? zClassification – Classify credit applicants as low, medium, high risk – Classify insurance claims as normal, suspicious zEstimation – Estimate the probability of a direct mailing response – Estimate the lifetime value of a customer zPrediction – Predict which customers will leave within six months – Predict the size of the balance that will be transferred by a credit card prospect

6 What can data mining do? (cont’d) zAssociation – Find out items customers are likely to buy together – Find out what books to recommend to Amazon.com users zClustering – Difference from classification: classes are unknown!

7 Market Analysis and Management zWhere are the data sources for analysis? yCredit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies zTarget marketing yFind clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc. zDetermine customer purchasing patterns over time yConversion of single to a joint bank account: marriage, etc. zCross-market analysis yAssociations/co-relations between product sales yPrediction based on the association information

8 Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Other Disciplines Information Science Machine Learning Visualization

9 Data Mining: On What Kind of Data? zRelational databases zData warehouses zTransactional databases zAdvanced DB and information repositories yObject-oriented and object-relational databases ySpatial databases yTime-series data and temporal data yText databases and multimedia databases yHeterogeneous and legacy databases yWWW

10 Data Mining Process Collecting relevant dataModel building Understanding of business Problem identification Business strategy and evaluation Learning Action

11 Requirements/challenges in Data Mining zUser interface zMining methodology zPerformance zData source zSocial and Security

12 Requirements/challenges in Data Mining(2) zUser interface - Data Visualization xUnderstandability and interpretation of results xInformation representation and rendering xScreen real-estate - Interactivity xManipulation of mined knowledge xfocus and refine mining tasks xFocus and refine mining results

13 Requirements/challenges in Data Mining(3) zMining Methodology yMining different kinds of knowledge in databases yInteractive mining of knowledge at multiple levels of abstraction yIncorporation of background knowledge yQuery languages yExpression and visualization of results yHandling noise and incomplete data yPattern evaluation

14 Requirements/challenges in Data Mining (4) zPerformance yEfficiency and scalability of data mining algorithms xLinear algorithms needed yParallel and distributed methods xIncremental methods xDivide and conquer?

15 Requirements/challenges in Data Mining(5) zData Source yDiversity of data types xHandling complex types of data xMining information from heterogenous data bases or information repositories xCan we expect a DM algorithm to do well on all types of data ? yData glut xAre we collecting the right data for the right answer? xDistinguish between important and unimportant data

16 Requirements/challenges in Data Mining(6) zSocial and Security -Social Impact xPrivate and sensitive data is gathered and mined without individual’s knowledge and/or consent xAppropriate use and distribution of discovered knowledge - Regulations Need for privacy and DM policies

17 Data Mining Tools

18 Summary zThe benefits of knowing one’s business is critical; technologies are coming together to support data mining. zData mining is the process and result of knowledge production, knowledge discovery and knowledge management.


Download ppt "Introduction to Data Mining Dr. Hany Saleeb Why Data Mining? — Potential Applications zDirect Marketing yidentify which prospects should be included."

Similar presentations


Ads by Google