Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding Medical Data Włodzisław Duch Department of Informatics Nicholas Copernicus University, Toruń, Poland

Similar presentations

Presentation on theme: "Understanding Medical Data Włodzisław Duch Department of Informatics Nicholas Copernicus University, Toruń, Poland"— Presentation transcript:

1 Understanding Medical Data Włodzisław Duch Department of Informatics Nicholas Copernicus University, Toruń, Poland

2 PlanPlan 1. What’s the problem? 2. What we would like to have 3. How to get that 4. Some methods to understand the data 5. Some result 6. Example of applications 7. Expert system for psychometry 8. Conclusions

3 Department of Computer Methods Computational Intelligence Methods: neural networks neural networks decision trees decision trees similarity-based methods similarity-based methods visualization visualization Cognitive and Brain Science theory of mind theory of mind modeling human attention modeling human attentionApplications data from psychometrics, medicine, data from psychometrics, medicine, astronomy, physics, chemistry... astronomy, physics, chemistry...

4 with help from...

5 What’s the problem? 44,000 to 98,000 patients die from medical errors every year in US hospitals. More people die from medical errors in hospitalization than from motor vehicle accidents, breast cancer, or AIDS. Institute of Medicine (Dec. 1999) Decision Support Systems (DSS) Expert Systems: based on knowledge spoon-fed to the DSS - if domain knowledge is well understood. Intuition, experience, implicit knowledge: discover knowledge hidden in the data and use it in DSS.

6 What we would like to have Start from good & bad examples evaluated by experts. Understand the data: provide rules; provide prototype cases and similarity measures; provide visualization. Rules should be: simple and/or accurate; reliable and/or general; robust and stable; include alternatives, eliminate improbable.

7 Logical explanations Logical rules, if simple enough, are usually preferred. Rules may expose limitations of black box methods: statistical, neural or computational intelligence (CI). Only relevant features are used in rules. Rules are sometimes more accurate than NN and other CI methods. Overfitting is easy to control, rules usually have small number of parameters. Rules forever !? IF the number of rules is relatively small AND the accuracy is sufficiently high. THEN rules may be an optimal choice.

8 Data example Cleveland Clinic Foundation heart-disease data. Numerical, symbolic, logical, missing features. 44, male,atyp_angina,120,263,f,normal,173,no,0, up,0, reversible_defect,'<50‘ 52, male,non_anginal,172,199,t,normal,162,no,0.5, up,0, reversible_defect,'<50‘ 48, male,atyp_angina,110,229,f,normal,168,no,1, down,0, reversible_defect,'>50_1‘ 54, male,asympt,140,239,f,normal,160,no,1.2, up,0, normal,'<50‘ 48, female,non_anginal,130,275,f,normal,139,no,0.2, up,0, normal,'<50‘

9 Logical rules Crisp logic rules: for continuous x use linguistic variables (predicate functions): s k (x)  True [X k  x  X' k ], for example: low(x) = True{x|x < 70} normal(x) = True{x|x  [70,120]} high(x) = True{x|x > 120} Linguistic variables are used in crisp (prepositional, Boolean) logic rules: IF small-height(X) AND has-hat(X) AND has-beard(X) THEN (X is a Brownie) ELSE IF... ELSE...

10 Crisp/fuzzy logic decisions Crisp logic based on rectangular membership functions: True/False values jump from 0 to 1. Step functions partition the input space. Fuzzy:  x  (no/yes) replaced by a degree  x . Triangular, trapezoidal, Gaussian or other membership f.

11 Providing rules CI approaches: neural networks, decision trees, similarity-based methods, machine learning inductive methods... Neural networks: many types, large field. Most popular: threshold logic (perceptron) unit, realizing M-of-N rule: IF (M conditions of N are true) THEN... Multi-layer perceptron (MLP) networks: stack many units. Problem: for N inputs number of subsets is 2 N. Exponentially growing number of possible conjunctive rules.

12 MLP2LNMLP2LN Converts MLP neural networks into a network performing logical operations (LN). Input layer Aggregation: better features Output: one node per class. Rule units: threshold logic Linguistic units: windows, filters

13 MLP2LN training Constructive algorithm: add as many nodes as needed. Optimize cost function: minimize errors + enforce zero connections + leave only +1 and -1 weights makes interpretation easy.

14 L-unitsL-units Create linguistic variables. Numerical representation for R-nodes V sk =(  ) for s k =low V sk =(  ) for s k =normal L-units: 2 thresholds as adaptive parameters; logistic  (x) or tangh(x)  [  Soft trapezoidal functions change into rectangular filters (Parzen windows). 4 types, depending on signs S i.

15 Iris example Network after training: iris setosa:  =1 (0,0,0;0,0,0;+1,0,0;+1,0,0) iris versicolor:  =2 (0,0,0;0,0,0;0,+1,0;0,+1,0) iris virginica:  =1 (0,0,0;0,0,0;0,0,+1;0,0,+1) Rules: If (x 3 =s  x 4 =s) setosa If (x 3 =m  x 4 =m) versicolor If (x 3 =l  x 4 =l) virginica Makes 3 errors (98%).

16 Learning dynamics Decision regions shown every 200 training epochs in x 3, x 4 coordinates; borders are optimally placed with wide margins.

17 Other rule extraction methods Feature Space Mapping (FSM) neurofuzzy system. Neural adaptation, estimation of probability density distribution using single hidden layer network with nodes realizing separable functions: Separability Split Value (SSV) decision tree. Based on maximization of the number of correctly separated pairs of vectors. Uni- or multi-variate tests, easy to convert to rules. Similarity-Based Learner (SBL). Framework for many similarity-based methods. Searches in the space of all models for the best one for a given data. Gives prototype-based rules, more general than fuzzy rules.

18 Applying rules How to get probabilities out of rules? Gaussian uncertainties: x - Gaussian fuzzy number. A set R of crisp logical rules (or any other system) applied to fuzzy inputs X gives probabilities p(C i |X) via Monte Carlo sampling. For crisp rule R a (x) = {x  a} and fuzzy input value G x analytical probabilities evaluation is based on cumulant: accuracy of this approximation is < 0.02

19 Analytical solution Rule R ab (x) = {x  a,b]} and G x input has probability: Exact for  (x)(1-  (x)) error distributions instead of Gaussian. In MLP neural networks: L-units Rules with two or more features in conditions: add probabilities and subtract the overlap: Fuzzy rules + crisp data Crisp rules + fuzzy input Large receptive fields: linguistic variables; small receptive fields: smooth edges.

20 OptimizationOptimization Confidence-rejection tradeoff. Confusion matrix F(C i,C j |M) = N ij /N frequency of assigning class C j to class C i by the model M. Sensitivity : S + =F ++ /(F ++ +F +  )  [0,1] Specificity: S  =F  /(F  +F  )  [0,1] S  =1 class  (sick) is never assigned to class + (healthy) S  =1 class + is never assigned to  Perfect sensitivity/specificity: minimize off-diagonal elements of F(C i,C j |M) Maximize the number of correctly assigned cases (diagonal)

21 AdvantagesAdvantages Generation, uncertainty, optimization: 1. Network regularization parameters allow to discover different sets of rules: simplest, most general, less accurate vs. more complex, specialized, more accurate. 2. Continuous probabilities instead of Yes/No answers; stabilizes the model, no sudden jumps in predictions. 3. New data has always some probability, but it may be labeled as ‘unknown’, or elimination instead of classification may be used if classes are mixed strongly. 4. Multivariate ROC curves may be generated setting the output thresholds. 5. Data uncertainties s x may be used as adaptive parameters. Large-scale gradient optimization of a cost function:

22 ApplicationsApplications In medicine, science, technology... Fun and benchmark: Mushrooms. Stylometry - who wrote ‘The two noble kinsmen’? Medical Reccurence of breast cancer (Ljubliana). Diagnosis of breast cancer (Wisconsin). Thyroid screening (J. Cook University, Australia). Melanoma cancer (Rzeszów, Poland). Hepatobiliary disorders (Tokyo) Chemical Antibiotic activity of pyrimidine compounds. Carcinogenicity of organic chemicals Psychometry: MMPI evaluations

23 MushroomsMushrooms The Mushroom Guide: no simple rule for mushrooms; no rule like: ‘leaflets three, let it be’ for Poisonous Oak and Ivy. 8124 cases, 51.8% are edible, the rest non-edible. 22 symbolic attributes, up to 12 values each, equivalent to 118 logical features, or 2 118 =3. 10 35 possible input vectors. Odor: almond, anise, creosote, fishy, foul, musty, none, pungent, spicy Spore print color: black, brown, buff, chocolate, green, orange, purple, white, yellow. Safe rule for edible mushrooms: odor = (almond.or.anise.or.none)  spore-print-color =  green 48 errors, 99.41% correct This is why animals have such a good sense of smell! What does it tell us about odor receptors?

24 Mushrooms rules To eat or not to eat, this is the question! Not any more... A mushroom is poisonous if: R 1 ) odor =  (almond  anise  none); 120 errors, 98.52% R 2 ) spore-print-color = green 48 errors, 99.41% R 3 ) odor = none  stalk-surface-below-ring = scaly  stalk-color-above-ring =  brown 8 errors, 99.90% R 4 ) habitat = leaves  cap-color = white no errors! R 1 + R 2 are quite stable, found even with 10% of data; R 3 and R 4 may be replaced by other rules, ex: R' 3 ): gill-size = narrow  stalk-surface-above-ring = (silky  scaly) R' 4 ): gill-size = narrow  population = clustered Only 5 of 22 attributes used! Simplest possible rules? 100% in CV tests - structure of this data is completely clear.

25 Recurrence of breast cancer Institute of Oncology, University Medical Center, Ljubljana. 286 cases, 201 no (70.3%), 85 recurrence cases (29.7%) 9 symbolic features: age (9 bins), tumor-size (12 bins), nodes involved (13 bins), degree-malignant (1,2,3), area, radiation, menopause, node-caps. no-recurrence-events,40-49,premeno,25-29,0-2,?,2,left,right_low,yes Many systems tried, 65-78% accuracy reported. Single rule: IF (nodes-involved  [0,2]  degree-malignant = 3 THEN recurrence ELSE no-recurrence 77% accuracy, only trivial knowledge in the data: Highly malignant cancer involving many nodes is likely to strike back.

26 Breast cancer diagnosis. Data obtained from the University of Wisconsin Hospitals, Madison, collected by dr. W.H. Wolberg. 699 cases, 9 features quantized from 1 to 10: clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, mitoses Distinguish benign from malignant cases. Simplest rules from MLP2LN, large regularization: IF f 2  7  f 7  6 THEN malignant f 2 - uniformity of cell size ELSE benign f 7 - bland chromatin Overall accuracy (including ELSE condition) is 94.9%.

27 Breast cancer rules. Using lower regularization hierarchical sets of rules with increasing accuracy are created. Optimized set of rules: R 1 : IFf 2 <6  f 4 <3  f 8 <8 THEN malignant(99.8)% R 2 : f 2 <9  f 5 <4  f 7 <2  f 8 <5 (100)% R 3 : f 2 <10  f 4 <4  f 5 <4  f 7 <3 (100)% R 4 : f 2 <7  f 4 <9  f 5 <3  f 7  [4,9]  f 8 <4 (100)% R 5 : f 2  [3,4]  f 4 <9  f 5 <10  f 7 <6  f 8 <8 (99.8)% ELSE benign 6 errors, overall reclassification accuracy 99.0% R 1 and R 5 misclassify the same single benign vector. 100% reliable set of rules reject 51 cases (7.3%). Other solutions: 3 rules from SSV decision tree

28 Breast cancer - comparison. Results from the 10-fold (stratified) crossvalidation. Method 10xCV accuracy SSV, 3 crisp rules 96.3  0.2 FSM, 14 Gaussians 96.4  0.2 IncNet 97.1 k-NN, k=3, Manh97.0  0.1 Fisher LDA 96.8 MLP+backprop 96.7 LVQ 96.6 Bayes (pairwise dep.) 96.6 Naive Bayes 96.4 LDA 96.0 LFC, ASI, ASR dec. trees 94.4-95.6 CART (dec. tree)93.5 Rule-based Other classifiers

29 l Collected in the Outpatient Center of Dermatology in Rzeszów, Poland. l Four types of Melanoma: benign, blue, suspicious, or malignant. l 250 cases, with almost equal class distribution. l Each record in the database has 13 attributes: asymmetry, border, color (6), diversity (5). l TDS (Total Dermatoscopy Score) - single index l 26 new test cases only. l Goal: understand the data, find simple description, make a hardware scanner for preliminary diagnosis. Melanoma skin cancer

30 Method Rules Training % Test % MLP2LN, crisp rules 498.0 all 100 SSV Tree, crisp rules 497.5±0.3 100 FSM, rectangular f. 7 95.5±1.0 100 knn+ prototype selection 13 97.5±0.0 100 FSM, Gaussian f. 15 93.7±1.0 95±3.6 knn k=1, Manh, 2 feat. 250 97.4±0.3 100 LERS, rough rules 21 -- 96.2 Melnanoma results

31 27 features taken into account: polarity, size, hydrogen-bond donor or acceptor, pi-donor or acceptor, polarizability, sigma effect. Pairs of chemicals (54 features) are compared. Two classes: first compound has higher activity or vice versa. 2788 cases, 5-fold crossvalidation tests. Antibiotic activity of pyrimidine compounds. Pirimindines: which compound has stronger antibiotic activity? Common template, substitutions added at 3 positions, R 3, R 4 and R 5. Mean Spearman's rank correlation coefficient used:  1 { "@context": "", "@type": "ImageObject", "contentUrl": "", "name": "27 features taken into account: polarity, size, hydrogen-bond donor or acceptor, pi-donor or acceptor, polarizability, sigma effect.", "description": "Pairs of chemicals (54 features) are compared. Two classes: first compound has higher activity or vice versa. 2788 cases, 5-fold crossvalidation tests. Antibiotic activity of pyrimidine compounds. Pirimindines: which compound has stronger antibiotic activity. Common template, substitutions added at 3 positions, R 3, R 4 and R 5. Mean Spearman s rank correlation coefficient used:  1

32 PsychometricsPsychometrics MMPI (Minnesota Multiphasic Personality Inventory) test (v. 1). Forms were scannedForms were scanned or computerized version of the test is used.computerized version 1.Row data: 550 questions, ex: I am getting tired quickly Yes - Don’t know - No 2.Results are combined into 10 clinical scales and 4 validity scales using fixed coefficients. 3.Each scale measures tendencies towards hypochondria, schizophrenia, hypomania, psychopathic deviations, symptoms of depression, hysteria, paranoia, social introversion etc, but there is no simple correlation between single values and final diagnosis.Each scale 4.Results are displayed in form of a histogram, called ‘a psychogram’. Interpretation depends on the experience and skill of an expert, takes into account correlations between peaks.a psychogram’ Goal: an expert system providing evaluation and interpretation of MMPI tests at an expert level. Agreement between experts 70% of the time; alternative diagnosis and personality changes over time are important.

33 Psychometric data 1600 cases for woman, similar number for men. 27 classes: norm, psychopathia, schizophrenia, paranoia, neurosis, mania, simmulation, alcoholism, drug addiction, criminal tendencies, abnormal behavior due to... Extraction of rules: 14 scales, define linguistic variables and use FSM, MLP2LN, SSV - giving about 2-3 rules/class. MethodDataN. rulesAccuracy %+ G x % C 4.5♀5593.093.7 ♂6192.593.1 FSM♀6995.497.6 ♂9895.996.9 10-CV gives 82-85% with FSM and 79-84% with C4.5. Input uncertainty around 1.5% improves FSM results to 90-92%

34 ResultsResults Probabilities for different classes. For greater uncertainties more classes are predicted. Fitting the rules to the conditions: typically 3-5 conditions per rule, the Gaussian distributions around measured values that fall into the rule interval are shown in green. Verbal interpretation of each case, rule and scale dependent.

35 VisualizationVisualization Probability of classes versus input uncertainty. Detailed input probabilities around the measured values vs. change in the single scale; changes over time define ‘patients trajectory’.patients trajectory’ Interactive multidimensional scaling: zooming on the new case to inspect its similarity to other cases.

36 Bioinformatics example Evaluation of similarity of E. Coli gene sequences. Promoters: red; non-promoters - green Left: standard similarity measure; right - new measure.

37 SummarySummary Understanding data: extracting rules, prototypes, visualizing. Computational intelligence methods: neural, decision trees, similarity-based & other, help to understand the data. We are slowly getting there. All this and more is included in the Ghostminer, data mining software (in collaboration with Fujitsu) soon to be finished :) Small is beautiful  simple is the best! Simplest possible, but not simpler - regularization of models; accurate but not too accurate - handling of uncertainty; high confidence, but not paranoid - rejecting some cases. Challenges: hierarchical systems, discovery of theories rather than models, integration with image/signal analysis, reasoning in complex domains, applications in bioinformatics...

38 ReferencesReferences IEEE Transactions on Neural Networks 12 (2001) 277-306 (March issue) paper archive

Download ppt "Understanding Medical Data Włodzisław Duch Department of Informatics Nicholas Copernicus University, Toruń, Poland"

Similar presentations

Ads by Google