Presentation is loading. Please wait.

Presentation is loading. Please wait.

VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering,

Similar presentations


Presentation on theme: "VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering,"— Presentation transcript:

1 VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering, Faculty of Eletrical Engineering, Czech Technical University in Prague, Czech Republic Presenting author: Pavel Kordík (kordikp@fel.cvut.cz) Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering, Faculty of Eletrical Engineering, Czech Technical University in Prague, Czech Republic Presenting author: Pavel Kordík (kordikp@fel.cvut.cz)

2 2 Overview Motivation Data mining models Visualization based on sensitivity analysis Regression problems Classification problems Definition of interesting plots Genetic search for 2D and 3D plots

3 3 Motivation Data mining – extracting new, potentially useful information from data DM Models are automatically generated Are models always credible? Are models comprehensible? How to extract information from models? Visualization

4 4 Data mining models Often black-box models generated from data E.g. Neural networks What is inside? Data mining black box model Input variables Output variable (s)

5 5 Inductive model Estimates output from inputs Generated automatically Evolved by niching GA Grows from minimal form Contains hybrid units Several training methods Ensemble of models

6 6 Example: Housing data CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA MEDV Per capita crime rate by town Proportion of owner-occupied units built prior to 1940 Weighted distances to five Boston employment centres Input variables Output variable Median value of owner-occupied homes in $1000's

7 7 Housing data – records CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA MEDV Input variables Output variable 24 0.00632 18 2.31 53.8 6.575 65.2 4.09 1 296 15.3 396.9 4.98 21.6 0.02731 0 7.07 46.9 6.421 78.9 4.9671 2 242 17.8 396.9 9.14 … … …

8 8 Housing data – inductive model CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA MEDV Input variables Output variable Niching genetic algorithm evolves units in first layer sigmoid MEDV=1/(1-exp(-5.724*CRIM+ 1.126)) sigmoid MEDV=1/(1-exp(-5.861*AGE+ 2.111)) Error: 0.13Error: 0.21

9 9 Housing data – inductive model CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA MEDV Input variables Output variable Niching genetic algorithm evolves units in second layer sigmoid Error: 0.13Error: 0.21 sigmoid Error: 0.26 linear Error: 0.24 polyno mial MEDV=0.747*(1/(1-exp(-5.724*CRIM+ 1.126))) +0.582*(1/(1-exp(-5.861*AGE+ 2.111))) 2 +0.016 Error: 0.10

10 10 Housing data – inductive model CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA MEDV Input variables Output variable sigmoid linear polyno mial polyno mial linear expo nential Error: 0.08 Constructed model has very low validation error!

11 11 Housing data – inductive model CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA MEDV Input variables Output variable S SSL P PL E Error: 0.08 MEDV=(exp((0.038* 3.451*(1/(1-exp(-5.724*CRIM+ 1.126)))*(1/(1- exp(2.413*DIS-2.581)))*(1/(1-exp(2.413*DIS-2.581)))+0.429*(1/(1- exp(-5.861*AGE+ 2.111)))+0.024*(1/(1-exp(2.413*DIS- 2.581)))+0.036+ 0.038*0.350*(1/(1-exp(-3.613*RAD-0.088)))+ 0.999*( 0.747*(1/(1-exp(-5.724*CRIM+ 1.126)))+0.582*(1/(1-exp(- 5.861*AGE+ 2.111)))*(1/(1-exp(-5.861*AGE+ 2.111)))+0.016)- 0.046*(1/(1-exp(-5.724*CRIM+ 1.126)))-0.079+ 0.002*INDUS- 0.001*LSTA+ 0.150)*0.860)*13.072)-14.874 Math equation is not comprehensible any more – we have to treat it as a black box model!

12 12 Visualization based on sensitivity analysis GAME

13 13 Sensitivity analysis of inductive model of MEDV House no. 189House no. 164 What will happen with the value of house when criminality in the area decreases/increases? Credible output?

14 14 Ensemble of inductive models Random initialization Random initialization Developing on the same Developing on the same training set Training affect just well Training affect just well defined areas of input space Each model - unique architecture, Each model - unique architecture, similar complexity similar transfer functions similar transfer functions Similar behavior for well defined areas Similar behavior for well defined areas Different behavior – under-defined areas Different behavior – under-defined areas ykyk y k-1 y k+1 i = x 2 minmax GAME

15 15 Credibility of models: Artificial data set Credibility: the criterion is a dispersion of models` responses. Advantages: No need of the training data set, Modeling method success considered, Inputs importance considered.

16 16 Example: Models of hot water consumption

17 17 Cold water consumption, increasing humidity

18 18 Models on Housing data Single modelEnsemble of 10 models BeforeAfter

19 19 Classification problems Data: Setoza class Virginica class Versicolor class Petal length Petal width Blue GAME model (Iris Setoza class) output = 1 decision boundary output = 0

20 20 Credibility of classifiers

21 21 Overlapping models X ensemble

22 22 Random behavior filtered out BeforeAfter

23 23 Problem – how to find information in n-dim space? Multidimensional space of input variables What we are looking for? –Interesting relationship of IO variables –Regions of high sensitivity –Credible models (compromise response) Can we automate the search?

24 24 When a plot is interesting for us? x istart x isize xixi

25 25 Definition of interesting plot Minimal volume of the envelope p min Maximal sensitivity of the output to the change of x i input variable – y size max Maximal size of the area – x isize max

26 26 Multiobjective optimization Interestingness: Unknown variables: –x 1,x 2,..., x i-1,x i+1,…x n x istart, x isize We will use “Niching” genetic algorithm Chromosome: x 1 x 2... x i-1 x i+1 … x n x istart x isize

27 27 Niching GA on simple data GAME ensemble Training data Chromosome: x start x size fitness = x size * 1/p * y size fitness x size x start Very simple problem Search space is 2D, can be visualized

28 28 Niching GA locates also local optima Three subpopulations (niches) of individuals survived

29 29 Automated retrieval of plots showing interesting behavior Genetic Algorithm Genetic algorithm with special fitness function is used to adjust all other inputs (dimensions) Best so far individual found (generation 0 – 17)

30 30 Housing data – interesting plot retrieved BeforeAfter Low fitness High fitness

31 31 Conclusion Credible regression Credible classification Automated retrieval

32 32 Future work I Automated knowledge extraction from data

33 33 Future work II FAKE GAME framework

34 34 Future work III Just released as open source project –Automated data preprocessing –Automated model building, validation –Optimization methods –Visualization see and join us: http://www.sourceforge.net/projects/fakegame


Download ppt "VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering,"

Similar presentations


Ads by Google