Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.

Similar presentations


Presentation on theme: "Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent."— Presentation transcript:

1 Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent Systems 1

2 Data Mining: A First View Chapter 1 2

3 1.1 Data Mining: A Definition The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data. 3

4 Induction-based Learning The process of forming general concept definitions by observing specific examples of concepts to be learned. –Many televised golf tournaments are sponsored by online brokerage firms –Advertise rap music in magazines for senior citizens –Suspect a stolen credit card 4

5 Knowledge Discovery in Databases (KDD) The application of the scientific method to data mining. Data mining is one step of the KDD process. 5

6 1.2 What Can Computers Learn? 6

7 Four Levels of Learning Facts –Sea is blue Concepts –Trees, rules, networks, and mathematical equations Procedures –A step-by-step course of action to achieve a goal Principles –General truths or laws 7

8 Concepts Computers are good at learning concepts. Concepts are the output of a data mining session. 8 Three concept views Classical view Probabilistic view Exemplar view

9 Classical View All concepts have definite defining properties IF Annual Income >= 30,000 & Years at Current Position >= 5 & Owns Home = True THEN Good Credit Risk = True 9

10 Probabilistic View Represented by properties that are probable of concept members The majority of good credit risks own their own home 10

11 Exemplar View A given instance is determined to be an example of a particular concept Good credit risks example Annual Income = 32,000 Number of Years at Current Position = 6 Homeowner 11

12 Supervised Learning Build a learner model using data instances of known origin. Use the model to determine the outcome new instances of unknown origin. 12

13 Decision Tree A tree structure where nonterminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes. 13

14 14

15 15

16 16

17 Production Rules IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No THEN Diagnosis = Allergy 17

18 Unsupervised Clustering A data mining method that builds models from data without predefined classes. 18

19 19

20 Question Can I develop a general profile of an online investor? Can I determine if a new customer who does not initially open a margin account is likely to do so in the future Can I build a model able to accurately predict the average number of trades per month for a new investor? What characteristics differentiate female and male investors? 20

21 Candidate questions for unsupervised clustering What attribute similarities group customers? What differences in attribute values segment the customer database? 21

22 Three Clusters IF (Conditions) Margin Account=Yes & Age=20-29& Annual Income=40-59K THEN Cluster=1 Accuracy=0.8, Coverage=0.5 22 Accuracy => rule confidence for all instances EX. This rule will be erroneous in 20% Coverage => rule significance for the cluster 50% in the cluster satisfy the conditions

23 Other two rules IF Account Type=Custodial & Favorite Recreation=Skiing & Annual Income = 80-90K THEN Cluster=2 Accuracy=0.95, coverage=0.35 IF Account Type=Joint & Trades/Month>5 & Transaction Method=Online THEN Cluster=3 Accuracy=0.82, coverage=0.65 23

24 1.3 Is Data Mining Appropriate for My Problem? 24

25 Data Mining or Data Query? Shallow Knowledge –Is factual Multidimensional Knowledge –Is factual and stored in a multidimensional format Hidden Knowledge –Patterns or regularities Deep Knowledge –Need some direction to find it 25

26 Data Mining vs. Data Query Use data query if you already almost know what you are looking for. Use data mining to find regularities in data that are not obvious. 26

27 1.4 Expert Systems or Data Mining? 27

28 Expert System A computer program that emulates the problem-solving skills of one or more human experts. 28

29 Knowledge Engineer A person trained to interact with an expert in order to capture their knowledge. 29

30 30

31 1.5 A Simple Data Mining Process Model 31 Assembling the Data Mining the Data Interpreting the Results Result Application

32 32

33 Assembling the Data The Data Warehouse –Only data useful for decision support is extracted from the operational environment Relational Databases and Flat Files 33

34 Mining the Data 34 Supervised learning or unsupervised? Which instances will be used? Which attributes will be selected? Setting learning parameter

35 Interpreting the Results 35 If the results are less than optimal we can repeat the data mining step using new attributes and/or instances

36 Result Application 36 apply what has been discovered to new situations –Baby diapers and beer

37 1.6 Why Not Simple Search? Nearest Neighbor Classifier K-nearest Neighbor Classifier Problem: Computation times Differentiating between relevant and irrelevant attributes Which attributes are able to differentiate the classes 37

38 1.7 Data Mining Applications 38 Fraud Detection Health Care Business and Finance Scientific Applications Sports and Gaming

39 Customer Intrinsic Value 39 Customer ’ s expected value based on the historical value of similar customers. Once it is determined, an appropriate marketing strategy can be applied

40 40


Download ppt "Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent."

Similar presentations


Ads by Google