Presentation is loading. Please wait.

Presentation is loading. Please wait.

Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,

Similar presentations


Presentation on theme: "Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,"— Presentation transcript:

1 Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting, meaningful and actionable patterns hidden in large amounts of data Multidisciplinary field originating from artificial intelligence, pattern recognition, statistics, machine learning, econometrics, ….

2 Data mining is a process… Business objectives Model Development –Model objective –Data collection & preparation –Model construction –Model evaluation –Combining models with business knowledge into decision logic Model / decision logic deployment Model / decision logic monitoring

3 Data mining is a process… a marketing example Business objectives –Cross sell MMS bundle to lapsed users / non users Model Development –Model objective For consumers with no MMS bundle in past 6 months, predict MMS bundle ownership yes/no in next three months –Data collection & preparation All fields for all active customers as of end APR05; remove all customers with MMS bundle in NOV04- APR05; Left join MMS Bundle field from MAY05, JUNE05, JULY05 –Model construction Build various models to predict MMS Bundle MAY or JUNE or JULY = ‘N’ on 70% if the data –Model evaluation Evaluate predictive power on 70% data for model development and 30% test set –Combining models with business knowledge into decision logic Target the top 30% and randomly test two propositions (50 MMS for 5Euro; 100MMS for 7.50Euro) across two channel (Direct mail and SMS) Model / decision logic deployment –Run the campaign Model / decision logic monitoring –Compare predctions against actual response to evaluate model quality and robustness –What propositions / channels work best

4 Data mining tasks Undirected, explorative, descriptive, ‘unsupervised’ data mining –Matching & search –Profile & rule extraction –Clustering & segmentation; dimension reduction Directed, predictive, ‘supervised’ data mining –Predictive modeling

5 Data mining task example: Clustering & segmentation

6

7 Start Looking Glass Source: Sentient Information Systems (www.sentient.nl)

8 Tussenresultaat looking glass Source: Sentient Information Systems (www.sentient.nl)

9 Resultaat Looking Glass Source: Sentient Information Systems (www.sentient.nl)

10 Resultaat Looking Glass Source: Sentient Information Systems (www.sentient.nl)

11 Case A7 Case B4 10 9 8 7 6 5 4 3 2 1 Worse business Score Better business Case A Case B Past experience Data Behaviour Good Bad Good Model Data mining task example: predictive modeling

12 Collected data

13 score = (0 x Income) + (-1 x Age) + (25 x Children) Data mining task example: predictive modeling

14 Data mining techniques for predictive modeling Linear and logistic regression Decision trees Neural Networks Nearest Neighbor Genetic Algorithms ….

15 score = (0 x Income) + (-1 x Age) + (25 x Children) Linear Regression Models

16 Regression in pattern space ageincome Only a single line available in pattern space to separate classes Class ‘circle’ Class ‘square’

17 Decision Trees 20000 customers response 1% Income >150000? 18800 customers Purchases >10? 1200 customers balance>50000? 800 customers response 1,8% etc. 400 customers response 0,1% no yes no

18 Decision Trees in Pattern Space ageincome Line pieces perpendicular to axes Each line is a split in the tree, two answers to a question

19 Decision Trees in Pattern Space ageweight Goal classifier is to seperate classes (circle, square) on the basis of attribute age and income Each line corresponds to a split in the tree Decision areas are ‘tiles’ in pattern space

20 Nearest Neighbour Data itself is the classification model, so no abstraction like a tree etc. For a given instance x, search the k instances that are most similar to x Classify x as the most occurring class for the k most similar instances

21 = new instance Any decision area possible Condition: enough data available Nearest Neighbor in Pattern Space Classification fe agefe weight

22 Nearest Neighbor in Pattern Space Voorspellen f.e. agebvb. weight Any decision area possible Condition: enough data available

23 Example classification algorithm 3: Neural Networks Inspired by neuronal computation in the brain (McCullough & Pitts 1943 (!)) Input (attributes) is coded as activation on the input layer neurons, activation feeds forward through network of weighted links between neurons and causes activations on the output neurons (for instance diabetic yes/no) Algorithm learns to find optimal weight using the training instances and a general learning rule.

24 Example simple network (2 layers) Probability of being diabetic = f (age * weight age + body mass index * weight body mass index) Neural Networks Weight body mass index Probability of being diabetic age body_mass_index weight age

25 Neural Networks in Pattern Space Classification f.e. agef.e. weight Simpel network: only a line available (why?) to seperate classes Multilayer network: Any classification boundary possible

26 Dilbert’s Perspective on Data Mining


Download ppt "Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,"

Similar presentations


Ads by Google