Introduction to Neural Networks in Medical Diagnosis Włodzisław Duch PhD, DSc Dept. of Informatics, Nicholas Copernicus University, Poland.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Universal Learning Machines (ULM) Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICONIP 2009,
Evolutionary Neural Logic Networks for Breast Cancer Diagnosis A.Tsakonas 1, G. Dounias 2, E.Panourgias 3, G.Panagi 4 1 Aristotle University of Thessaloniki,
Rule extraction in neural networks. A survey. Krzysztof Mossakowski Faculty of Mathematics and Information Science Warsaw University of Technology.
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Understanding Medical Data Włodzisław Duch Department of Informatics Nicholas Copernicus University, Toruń, Poland
An Overview of Machine Learning
GhostMiner Wine example Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland ISEP Porto,
Heterogeneous Forests of Decision Trees Krzysztof Grąbczewski & Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Torun, Poland.
What is Statistical Modeling
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
Understanding of data using Computational Intelligence methods Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland
Heterogeneous adaptive systems Włodzisław Duch & Krzysztof Grąbczewski Department of Informatics, Nicholas Copernicus University, Torun, Poland.
Introduction to Neural Networks in Medical Diagnosis Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland.
Understanding of complex data using Computational Intelligence methods Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Global Visualization of Neural Dynamics
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,
CS Instance Based Learning1 Instance Based Learning.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Ranga Rodrigo April 5, 2014 Most of the sides are from the Matlab tutorial. 1.
Data Mining Techniques
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Data Mining Chun-Hung Chou
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
WELCOME. Malay Mitra Lecturer in Computer Science & Application Jalpaiguri Polytechnic West Bengal.
_____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 14/15 – TP19 Neural Networks & SVMs Miguel Tavares.
NEURAL NETWORKS FOR DATA MINING
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Learning from observations
Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Week 1 - An Introduction to Machine Learning & Soft Computing
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Data Mining and Decision Support
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
Computational Intelligence: Methods and Applications Lecture 29 Approximation theory, RBF and SFN networks Włodzisław Duch Dept. of Informatics, UMK Google:
An Effective Hybridized Classifier for Breast Cancer Diagnosis DISHANT MITTAL, DEV GAURAV & SANJIBAN SEKHAR ROY VIT University, India.
Introduction Background Medical decision support systems based on patient data and expert knowledge A need to analyze the collected data in order to draw.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
1 Azhari, Dr Computer Science UGM. Human brain is a densely interconnected network of approximately neurons, each connected to, on average, 10 4.
Classification of Breast Cancer Cells Using Artificial Neural Networks and Support Vector Machines Emmanuel Contreras Guzman.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Hybrid Ant Colony Optimization-Support Vector Machine using Weighted Ranking for Feature Selection and Classification.
CSE 4705 Artificial Intelligence
Big data classification using neural network
Machine Learning with Spark MLlib
DATA MINING © Prentice Hall.
School of Computer Science & Engineering
Fuzzy rule-based system derived from similarity to prototypes
Artificial Intelligence Lecture No. 28
Somi Jacob and Christian Bach
Heterogeneous adaptive systems
CS+Social Good.
Presentation transcript:

Introduction to Neural Networks in Medical Diagnosis Włodzisław Duch PhD, DSc Dept. of Informatics, Nicholas Copernicus University, Poland

What is it about? Data is precious! But also overwhelming...Data is precious! But also overwhelming... Statistical methods are important but new techniques may frequently be more accurate and give more insight into the data.Statistical methods are important but new techniques may frequently be more accurate and give more insight into the data. Data analysis requires intelligence.Data analysis requires intelligence. Inspirations come from many sources, including biology: artificial neural networks, evolutionary computing, immune systems...Inspirations come from many sources, including biology: artificial neural networks, evolutionary computing, immune systems...

Computational Intelligence Computational Intelligence Data + Knowledge Artificial Intelligence Expert systems Fuzzy logic Pattern Recognition Machine learning Probabilistic methods Multivariate statistics Visuali- zation Evolutionary algorithms Neural networks

What do these methods do? Provide non-parametric models of data.Provide non-parametric models of data. Allow to classify new data to pre-defined categories, supporting diagnosis & prognosis.Allow to classify new data to pre-defined categories, supporting diagnosis & prognosis. Allow to discover new categories.Allow to discover new categories. Allow to understand the data, creating fuzzy or crisp logical rules.Allow to understand the data, creating fuzzy or crisp logical rules. Help to visualize multi-dimensional relationships among data samples.Help to visualize multi-dimensional relationships among data samples. Help to model real neural networks!Help to model real neural networks!

Neural networks Inspired by neurobiology: simple elements cooperate changing internal parameters.Inspired by neurobiology: simple elements cooperate changing internal parameters. Large field, dozens of different models, over 500 papers on NN in medicine each year.Large field, dozens of different models, over 500 papers on NN in medicine each year. Supervised networks: heteroassociative mapping X=>Y, symptoms => diseases, universal approximators.Supervised networks: heteroassociative mapping X=>Y, symptoms => diseases, universal approximators.Supervised networksSupervised networks Unsupervised networks: clusterization, competitive learning, autoassociation.Unsupervised networks: clusterization, competitive learning, autoassociation.Unsupervised networksUnsupervised networks Reinforcement learning: modeling behavior, playing games, sequential data.Reinforcement learning: modeling behavior, playing games, sequential data.Reinforcement learningReinforcement learning

Supervised learning Compare the desired with the achieved outputs … you can’t always get what you want.

Unsupervised learning Find interesting structures in data.

Reinforcement learning Reward comes after the sequence of actions.

Real and artificial neurons Synapses Axon Dendrites Synapses (weights) Nodes – artificial neurons Signals

Neural network for MI diagnosis Myocardial Infarction ~ p(MI|X) SexAgeSmoking ECG: ST Pain Intensity Pain Duration Elevation  1365 Inputs: Output weights Input weights

MI network function Training: setting the values of weights and thresholds, efficient algorithms exist. Effect: non-linear regression function Such networks are universal approximators: they may learn any mapping X => Y

Knowledge from networks Simplify networks: force most weights to 0, quantize remaining parameters, be constructive! Regularization: mathematical technique improving predictive abilities of the network. Result: MLP2LN neural networks that are equivalent to logical rules.

Recurrence of breast cancer Data from: Institute of Oncology, University Medical Center, Ljubljana, Yugoslavia. 286 cases, 201 no recurrence (70.3%), 85 recurrence cases (29.7%) no-recurrence-events, 40-49, premeno, 25-29, 0-2, ?, 2, left, right_low, yes 9 nominal features: age (9 bins), menopause, tumor-size (12 bins), nodes involved (13 bins), node-caps, degree-malignant (1,2,3), breast, breast quad, radiation.

Recurrence of breast cancer Data from: Institute of Oncology, University Medical Center, Ljubljana, Yugoslavia. Many systems used, 65-78% accuracy reported. Single rule: IF (nodes-involved  [0,2]  degree-malignant = 3 THEN recurrence, ELSE no-recurrence 76.2% accuracy, only trivial knowledge in the data: Highly malignant breast cancer involving many nodes is likely to strike back.

Recurrence - comparison. Method 10xCV accuracy MLP2LN 1 rule 76.2 SSV DT stable rules75.7  1.0 k-NN, k=10, Canberra74.1  1.2 MLP+backprop  9.4 (Zarndt) CART DT 71.4  5.0 (Zarndt) FSM, Gaussian nodes 71.7  6.8 Naive Bayes 69.3  10.0 (Zarndt) Other decision trees < 70.0

Breast cancer diagnosis. Data from University of Wisconsin Hospital, Madison, collected by dr. W.H. Wolberg. 699 cases, 9 features quantized from 1 to 10: clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, mitoses Tasks: distinguish benign from malignant cases.

Breast cancer rules. Data from University of Wisconsin Hospital, Madison, collected by dr. W.H. Wolberg. Simplest rule from MLP2LN, large regularization: If uniformity of cell size  3 Then benign Else malignant Sensitivity=0.97, Specificity=0.85 More complex NN solutions, from 10CV estimate: Sensitivity =0.98, Specificity=0.94

Breast cancer comparison. Method 10xCV accuracy k-NN, k=3, Manh97.0  2.1 (GM) FSM, neurofuzzy 96.9  1.4 (GM) Fisher LDA 96.8 MLP+backprop (Ster, Dobnikar) LVQ 96.6 (Ster, Dobnikar) IncNet (neural)96.4  2.1 (GM) Naive Bayes 96.4 SSV DT, 3 crisp rules 96.0  2.9 (GM) LDA (linear discriminant)96.0 Various decision trees

l Collected in the Outpatient Center of Dermatology in Rzeszów, Poland. l Four types of Melanoma: benign, blue, suspicious, or malignant. l 250 cases, with almost equal class distribution. l Each record in the database has 13 attributes: asymmetry, border, color (6), diversity (5). l TDS (Total Dermatoscopy Score) - single index l Goal: hardware scanner for preliminary diagnosis. Melanoma skin cancer

Method Rules Training % Test % MLP2LN, crisp rules all 100 SSV Tree, crisp rules 497.5± FSM, rectangular f ± knn+ prototype selection ± FSM, Gaussian f ±1.0 95±3.6 knn k=1, Manh, 2 features ± LERS, rough rules Melanoma results

27 features taken into account: polarity, size, hydrogen-bond donor or acceptor, pi-donor or acceptor, polarizability, sigma effect. Pairs of chemicals, 54 features, are compared, which one has higher activity? 2788 cases, 5-fold crossvalidation tests. Antibiotic activity of pyrimidine compounds. Pyrimidines: which compound has stronger antibiotic activity? Common template, substitutions added at 3 positions, R 3, R 4 and R 5.

Antibiotic activity - results. Pyrimidines: which compound has stronger antibiotic activity? Mean Spearman's rank correlation coefficient used:  r s  Method Rank correlation FSM, 41 Gaussian rules 0.77±0.03 Golem (ILP)0.68 Linear regression 0.65 CART (decision tree)0.50

Thyroid screening. Garavan Institute, Sydney, Australia 15 binary, 6 continuous Training: Validate: l Determine important clinical factors l Calculate prob. of each diagnosis. Hidden units Final diagnoses TSH T4U Clinical findings Age sex … T3 TT4 TBG Normal Hyperthyroid Hypothyroid

Thyroid – some results. Accuracy of diagnoses obtained with different systems. Method Rules/Features Training % Test % MLP2LN optimized 4/ CART/SSV Decision Trees 3/ Best Backprop MLP -/ Naïve Bayes -/ k-nearest neighbors -/

PsychometryPsychometry MMPI (Minnesota Multiphasic Personality Inventory) psychometric test. Printed formsPrinted forms are scanned or computerized version of the test is used. computerized version Printed formscomputerized version Raw data: 550 questions, ex: I am getting tired quickly: Yes - Don’t know - No Results are combined into 10 clinical scales and 4 validity scales using fixed coefficients. Each scale measures tendencies towards hypochondria, schizophrenia, psychopathic deviations, depression, hysteria, paranoia etc.Each scale

Scanned form

Computer input

ScalesScales

PsychometryPsychometry There is no simple correlation between single values and final diagnosis. Results are displayed in form of a histogram, called ‘a psychogram’. Interpretation depends on the experience and skill of an expert, takes into account correlations between peaks.a psychogram Goal: an expert system providing evaluation and interpretation of MMPI tests at an expert level. Problem: agreement between experts only 70% of the time; alternative diagnosis and personality changes over time are important.

PsychogramPsychogram

Psychometric data 1600 cases for woman, same number for men. 27 classes: norm, psychopathic, schizophrenia, paranoia, neurosis, mania, simulation, alcoholism, drug addiction, criminal tendencies, abnormal behavior due to... Extraction of logical rules: 14 scales = features. Define linguistic variables and use FSM, MLP2LN, SSV - giving about 2-3 rules/class.

Psychometric data 10-CV for FSM is 82-85%, for C4.5 is 79-84%. +G x Input uncertainty +G x around 1.5% (best ROC) improves FSM results to 90-92%. MethodData N. rules Accuracy +Gx%+Gx%+Gx%+Gx% C 4.5 ♀ ♂ FSM♀ ♂

Psychometric Expert Probabilities for different classes. For greater uncertainties more classes are predicted. Fitting the rules to the conditions: typically 3-5 conditions per rule, Gaussian distributions around measured values that fall into the rule interval are shown in green. Verbal interpretation of each case, rule and scale dependent.

MMPI probabilities

MMPI rules

MMPI verbal comments

VisualizationVisualization Probability of classes versus input uncertainty. Detailed input probabilities around the measured values vs. change in the single scale; changes over time define ‘patients trajectory’. Interactive multidimensional scaling: zooming on the new case to inspect its similarity to other cases.

Class probability/uncertainty

Class probability/feature

MDS visualization

SummarySummary Neural networks and other computational intelligence methods are useful additions to the multivariate statistical tools. They support diagnosis, predictions, and data understanding: extracting rules, prototypes. FDA has approved many devices that use ANNs: Oxford’s Instruments Ltd EEG analyzer, Cardionetics (UK) ECG analyzer. PAPNET (NSI), analysis of Pap smears …

ChallengesChallenges Discovery of theories rather than data models Discovery of theories rather than data models Integration with image/signal analysis Integration with image/signal analysis Integration with reasoning in complex domains Integration with reasoning in complex domains Combining expert systems with neural networks Combining expert systems with neural networks…. Fully automatic universal data analysis systems: press the button and wait for the truth … We are slowly getting there. More & more computational intelligence tools (including our own) are available.