Presentation on theme: "W. Art Chaovalitwongse Industrial & Systems Engineering"— Presentation transcript:
1 Medical Diagnosis Decision-Support System: Optimizing Pattern Recognition of Medical Data W. Art ChaovalitwongseIndustrial & Systems EngineeringRutgers UniversityCenter for Discrete Mathematics & Theoretical Computer Science (DIMACS)Center for Advanced Infrastructure & Transportation (CAIT)Center for Supply Chain Management, Rutgers Business SchoolThis work is supported in part by research grants from NSF CAREER CCF , and Rutgers Computing Coordination Council (CCC).
2 Outline Introduction Pattern-Based Classification Framework Classification: Model-Based versus Pattern-BasedMedical DiagnosisPattern-Based Classification FrameworkApplication in EpilepsySeizure (Event) PredictionIdentify epilepsy and non-epilepsy patientsApplication in Other Diagnosis DataConclusion and Envisioned OutcomeHere is the outline of this talk.The focus of this talk will be on epilepsy and brain disordersFirst I will try to convince the audience why this problem is important and those patients need our helpThen I will identify the research goals, then I’ll talk about how to acquire and process the data from the brain – specifically try to predict seizuresThe second research challenge is how to use optimization and data mining techniques to recognize/or classify normal and abnormal brain data– this framework can be applied to other medical data or data in other real life problems.
3 Pattern Recognition: Classification Supervised learning: A class (category) label for each pattern in the training set is provided.Positive Class?Negative Class
4 Model-Based Classification Linear Discriminant FunctionSupport Vector MachinesNeural Networks
5 Support Vector Machine A and B are data matrices of normal and pre-seizure, respectivelye is the vector of ones is a vector of real numbers is a scalaru, v are the misclassification errorsm = number of samples for class 1n = number of samples for class 2Bradley, Fung and Mangasarian revamped this idea – using this robust optimization model – it is very fast and scalableMangasarian, Operations Research (1965); Bradley et al., INFORMS J. of Computing (1999)
6 Pattern-Based Classification: Nearest Neighbor Classifiers Basic idea:If it walks like a duck, quacks like a duck, then it’s probably a duckTraining RecordsTest RecordCompute DistanceChoose k of the “nearest” records
7 Traditional Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x
8 Drawbacks Feature Selection Unbalanced Data Sensitive to noisy featuresOptimizing feature selectionn features, 2n combinations combinatorial optimizationUnbalanced DataBiased toward the class (category) with larger samplesDistance weighted nearest neighborsPick the k nearest neighbors from each class (category) to the training sample and compare the average distances.
9 Multidimensional Time Series Classification in Medical Data Positive versus NegativeResponsive versus UnresponsiveMultidimensional Time Series ClassificationMultisensor medical signals (e.g., EEG, ECG, EMG)Multivariate is ideal but computationally impossibleIt is very common that physicians always use baseline data as a reference for diagnosisThe use of baseline data - naturally lends itself to nearest neighbor classificationFor multidimensional time series, it is ideal to do multivariate analysis – but it is computationally impossible in our applicationIn our work , we use univariate analysis – perform classification on each electrode at a time.Then we use the idea of ensemble classification to make the final decision.Normal?Abnormal
10 Ensemble Classification for Multidimensional time series data Use each electrode as a base classifierEach base classifier makes its own decisionMultiple decision makers - How to combine them?Voting the final decisionAveraging the prediction scoreSuppose there are 25 base classifiersEach classifier has error rate, = 0.35Assume classifiers are independentProbability that the ensemble classifier makes a wrong prediction (voting):Most ensemble deal with how to sample the data Bagging, Bootstrapping Boosting, - here we use the idea of voting and averaging/or accumulating prediction scoreHere I give an example why we use ensemble classification
11 Modified K-Nearest Neighbor for MDTS AbnormalNormalK = 3D(X,Y)Time series distances: (1) Euclidean, (2) T-Statistical, (3) Dynamic Time Warping
12 Dynamic Time Warping (DTW) The minimum-distance warp path is the optimal alignment of two time series, where the distance of a warp path W is:is the Euclidean distance of warp path W.is the distance between the two data point indices(from Li and Lj) in the kth element of the warp path.Dynamic Programming:The optimal warping distance isExponential number of warp paths – we need to put some constraint on the warp path.Figure B) Is from Keogh and Pazzani, SDM (2001)
14 Support Feature Machine Given an unlabeled sample A, we calculate average statistical distances of A↔Normal and A↔Abnormal samples in baseline (training) dataset per electrode (channel).Statistical distances: Euclidean, T-statistics, Dynamic Time WarpingCombining all electrodes, A will be classified to the group (normal or abnormal) that yieldsthe minimum average statistical distance; orthe maximum number of votesCan we select/optimize the selection of a subset of electrodes that maximizes number of correctly classified samples
15 SFM: Averaging and Voting Two distances for each sample at each electrode are calculated:Intra-Class: Average distance from each sample to all other samples in the same class at Electrode jInter-Class: Average distance from each sample to all other samples in different class at Electrode jAveraging: If for Sample i (on average of selected electrodes)Average intra-class distance over all electrodesAverage inter-class distance over all electrodes<We claim that Sample i is correctly classified.Voting: If for Sample i at Electrode j (vote)Intra-class distance < Inter-class distance (good vote)Based on selected electrodes, if # of good votes > # of bad votes, then Sample i is correctly classified.Chaovalitwongse et al., KDD (2007) and Chaovalitwongse et al., Operations Research (forthcoming)
16 Distance Averaging: Training Sample i at Feature 1∙∙∙Sample i at Feature 2Sample i at Feature mSelect a subset of features ( ) such thatas many samples as possible.Industrial & Systems Engineering Rutgers University
17 Majority Voting: Training (Correct) if ; (Incorrect) otherwise.NegativePositiveiFeature ji’Industrial & Systems Engineering Rutgers University
18 SFM Optimization Model Intra-ClassInter-ClassChaovalitwongse et al., KDD (2007) and Chaovalitwongse et al., Operations Research (forthcoming)
19 Averaging SFM Maximize the number of correctly classified samples Logical constraints on intra-class and inter-class distances if a sample is correctly classifiedMust select at least one electrodeChaovalitwongse et al., KDD (2007) and Chaovalitwongse et al., Operations Research (forthcoming)
20 Voting SFM Precision matrix, A contains elements of Maximize the number of correctly classified samplesLogical constraints: Must win the voting if a sample is correctly classifiedMust select at least one electrodePrecision matrix, A contains elements ofChaovalitwongse et al., KDD (2007) and Chaovalitwongse et al., Operations Research (forthcoming)
24 Facts about EpilepsyAbout 3 million Americans and other 60 million people worldwide (about 1% of population) suffer from Epilepsy.Epilepsy is the second most common brain disorder (after stroke), which causes recurrent seizures (not vice versa).Seizures usually occur spontaneously, in the absence of external triggers.Epileptic seizures occur when a massive group of neurons in the cerebral cortex suddenly begin to discharge in a highly organized rhythmic pattern.Seizures cause temporary disturbances of brain functions such as motor control, responsiveness and recall which typically last from seconds to a few minutes.Based on 1995 estimates, epilepsy imposes an annual economic burden of $12.5 billion* in the U.S. in associated health care costs and losses in employment, wages, and productivity.Cost per patient ranged from $4,272 for persons** with remission after initial diagnosis and treatment to $138,602 for persons** with intractable and frequent seizures.Today about 3 million americans and other 60 million people worldwide have epilepsy.Epilepsy is the second most common brain disorder after stroke. It causes recurrent seizures, which appear to occur spontaneously and randomly.What happens when someone has a seizure – in his or her brain, there is a massive group of neurons hypersynchronized in a highly organized rhythmic patterns – which lasts about 20 seconds to a few minsThis brain disease causes our country so much money – in 1995 estimate, it imposes an economic burden of $12.5 billions – no just healthcare cost - including job loss, productivityPer patient, the healthcare cost ranged from 4k to almost 140k per year – and these numbers are more than 10 years ago.By now I hope I’ve convinced the audience that we should do something about this disease – next I will discuss standard diagnosis, treatment, (acquired data) and how we can help these patients.*Begley et al., Epilepsia (2000); **Begley et al., Epilepsia (1994).
25 Simplified EEG System and Intracranial Electrode Montage Electroencephalogram (EEG) is a traditional tool for evaluating the physiological state of the brain by measuring voltage potentials produced by brain cells while communicating
27 Goals: How can we help? Seizure Prediction Recognizing (data-mining) abnormality patterns in EEG signals preceding seizuresNormal versus Pre-SeizureAlert when pre-seizure samples are detected (online classification)e.g., statistical process control in production system, attack alerts from sensor data, stock market analysisEEG Classification: Routine EEG CheckQuickly identify if the patients have epilepsyEpilepsy versus Non-EpilepsyMany causes of seizures: Convulsive or other seizure-like activity can be non-epileptic in origin, and observed in many other medical conditions. These non-epileptic seizures can be hard to differentiate and may lead to misdiagnosis.e.g., medical check-up, normal and abnormal samplesGiven multi-dimensional time series and a set of events/episodes (if you will). How can we predict the eventClassification of medical data (normal and abnormal) for guiding the future diagnosisFeature selection -> initiating events – most differentiable
29 10-second EEGs: Seizure Evolution NormalPre-SeizureSeizure OnsetPost-SeizureChaovalitwongse et al., Annals of Operations Research (2006)
30 Normal versus Pre-Seizure Data Set EEG Dataset CharacteristicsPatient IDSeizure typesDuration of EEG(days)# of seizures1CP, SC3.5572CP, GTC, SC10.933CP8.85224,SC5.9319513.1317611.953.11986.092311.5320109.6512Total84.71153CP: Complex Partial; SC subclinical; GTC: Generalized Tonic/Clonic
31 Sampling ProcedureRandomly and uniformly sample 3 EEG epochs per seizure from each of normal and pre-seizure states.For example, Patient 1 has 7 seizures. There are 21 normal and 21 pre-seizure EEG epochs sampled.Use leave-one(seizure)-out cross validation to perform training and testing.SeizureDuration of EEG30 minutes8 hoursPre-seizureNormal
32 Information/Feature Extraction from EEG Signals Measure the brain dynamics from EEG signalsApply dynamical measures (based on chaos theory) to non-overlapping EEG epochs of seconds = 2048 points.Maximum Short-Term Lyapunov Exponentmeasure the stability/chaoticity of EEG signalsmeasure the average uncertainty along the local eigenvectors and phase differences of an attractor in the phase spacePardalos, Chaovalitwongse, et al., Math Programming (2004)
33 Evaluation Sensitivity = TP/(TP+FN) Specificity = TN/(TN+FP) Sensitivity measures the fraction of positive cases that are classified as positive.Specificity measures the fraction of negative cases classified as negative.Sensitivity = TP/(TP+FN)Specificity = TN/(TN+FP)Type I error = 1-SpecificityType II error = 1-SensitivityChaovalitwongse et al., Epilepsy Research (2005)
34 Leave-One-Seizure-Out Cross Validation P1N1Training SetSFMSelected ElectrodesN2P21234567.23242526N3P3N4P4Testing SetN5P5N – EEGs from Normal StateP – EEGs from Pre-Seizure Stateassume there are 5 seizures in the recordings
35 EEG ClassificationSupport Vector Machine [Chaovalitwongse et al., Annals of OR (2006)]Project time series data in a high dimensional (feature) spaceGenerate a hyperplane that separates two groups of data – minimizing the errorsEnsemble K-Nearest Neighbor [Chaovalitwongse et al., IEEE SMC: Part A (2007)]Use each electrode as a base classifierApply the NN rule using statistical time series distances and optimize the value of “k” in the trainingVoting and AveragingSupport Feature Machine [Chaovalitwongse et al., SIGKDD (2007); Chaovalitwongse et al., Operations Research (forthcoming)]Apply the NN rule to the entire baseline dataOptimize by selecting the best group of classifiers (electrodes/features)Voting: Optimizes the ensemble classificationAveraging: Uses the concept of inter-class and intra-class distances (or prediction scores)First we implement a modified support vector machine, which is one of the most commonly used classification technique. The main idea is to
37 Performance Characteristics: Upper Bound NN -> Chaovalitwongse et al., Annals of Operations Research (2006)SFM -> Chaovalitwongse et al., SIGKDD (2007); Chaovalitwongse et al., Operations Research (forthcoming)KNN -> Chaovalitwongse et al., IEEE Trans Systems, Man, and Cybernetics: Part A (2007)
38 Separation of Normal and Pre-Seizure EEGs From 3 electrodes selected by SFMFrom 3 electrodes not selected by SFM
39 Performance Characteristics: Validation Overfitting the dataSample sizeCPU timeSVM-> Chaovalitwongse et al., Annals of Operations Research (2006)SFM -> Chaovalitwongse et al., SIGKDD (2007); Chaovalitwongse et al., Operations Research (forthcoming)KNN -> Chaovalitwongse et al., IEEE Trans Systems, Man, and Cybernetics: Part A (2007)39
41 Epilepsy versus Non-Epilepsy Data Set Routine EEG check: minutes of recordings ~ with scalp electrodesEach sample is 5-minute EEG epoch (30 points of STLmax values).Each sample is in the form of 18 electrodes X 30 points
45 1 Fp1 – C3T6 – OzFz – OzThe issue is not just to get 100% classification – rather we focus more on why we get that kind of results and understand the data.For example, we look at the selected electrodes that help in distinguishing epilepsy and non-epilepsy patients.We found 3 electrodes that play a major role – when we went back to the neurologists and talked to him. He was very surprised to see.One would not expect to see that the selected electrodes would be involved in epilepsy mechanisms.Again it could be the scalp electrode – one focus on the left but electrodes on the right pick up first.
47 Other Medical Datasets Breast CancerFeatures of Cell Nuclei (Radius, perimeter, smoothness, etc.)Malignant or Benign TumorsDiabetesPatient Records (Age, body mass index, blood pressure, etc.)Diabetic or NotHeart DiseaseGeneral Patient Info, Symptoms (e.g., chest pain), Blood TestsIdentify Presence of Heart DiseaseLiver DisordersFeatures of Blood TestsDetect the Presence of Liver Disorders from Excessive Alcohol Consumption
49 Average Number of Selected Features LP SVMNLP SVMV-SFMA-SFMWDBC3011.68.5HD137.48.7PID84.34.5BLD63.33.7
50 Medical Data Signal Processing Apparatus (MeDSPA) Quantitative analyses of medical dataNeurophysiological data (e.g., EEG, fMRI) acquired during brain diagnosisEnvisioned to be an automated decision-support system configured to accept input medical signal data (associated with a spatial position or feature) and provide measurement data to help physicians obtain a more confident diagnosis outcome.To improve the current medical diagnosis and prognosis by assisting the physiciansrecognizing (data-mining) abnormality patterns in medical datarecommending the diagnosis outcome (e.g., normal or abnormal)identifying a graphical indication (or feature) of abnormality (localization)We envision the outcome of our research in medical diagnosis as a tool or apparatus to process medical data signalThis is just my vision but we still have a long way to go.We have started off with neurophysiological signals like electroencephalograms or fMRI –Then use the tools developed over the course of my research as an automated decision support systems for physicians to helpRecognize abnormal data or abnormal patterns in medical dataTry to localize the source of abnormalityRecommend the diagnosis outcome – rather improve the confidence in the diagnosis
51 Automated Abnormality Detection Paradigm Data AcquisitionMultichannelBrain ActivityOptimization: Feature Extraction/ ClusteringInterfaceTechnologyStatistical Analysis:Pattern RecognitionNurseStimulatorInitiate a warning or a variety of therapies (e.g., electrical stimulation, drug injection)User/PatientDrug
52 Acknowledgement: Collaborators E. Micheli-Tzanakou, PhDL.D. Iasemidis, PhDR.C. Sachdeo, MDR.M. Lehman, MDB.Y. Wu, MD, PhDStudentsY.J. Fan, MSOther undergrad students