Optimization and Data Mining in Epilepsy Research W. Art Chaovalitwongse Assistant Professor Industrial and Systems Engineering Rutgers University.

Optimization and Data Mining in Epilepsy Research W. Art Chaovalitwongse Assistant Professor Industrial and Systems Engineering Rutgers University

Acknowledgements Comprehensive Epilepsy Center, St. Peter’s University Hospital Rajesh C. Sachdeo, MD Deepak Tikku, MD Brain Institute, University of Florida Panos M. Pardalos, PhD J. Chris Sackellares, MD Paul R. Carney, MD Bioengineering, Arizona State University Leonidas D. Iasemidis, PhD

Agenda Background: Epilepsy Electroencephalogram (EEG) Time Series Chaos Theory: Dimensionality Reduction Seizure Prediction Feature Selection Process Monitoring Concluding Remarks

Facts About Epilepsy At least 2 million Americans and other 40-50 million people worldwide (about 1% of population) suffer from Epilepsy. Epilepsy is the second most common brain disorder (after stroke) The hallmark of epilepsy is recurrent seizures. Epileptic seizures occur when a massive group of neurons in the cerebral cortex suddenly begin to discharge in a highly organized rhythmic pattern.

Epileptic Seizures Seizures usually occur spontaneously, in the absence of external triggers. Seizures cause temporary disturbances of brain functions such as motor control, responsiveness and recall which typically last from seconds to a few minutes. Seizures may be followed by a post-ictal period of confusion or impaired sensorial that can persist for several hours.

Rationale Based on 1995 estimates, epilepsy imposes an annual economic burden of $12.5 billion in the U.S. in associated health care costs and losses in employment, wages, and productivity. Cost per patient ranged from $4,272 for persons with remission after initial diagnosis and treatment to $138,602 for persons with intractable and frequent seizures.

How To Fight Epilepsy Anti-Epileptic Drugs (AEDs) Mainstay of epilepsy treatment Approximately 25 to 30% remain unresponsive Epilepsy surgery Require long-term invasive EEG monitoring 50% of pre-surgical candidates do not undergo respective surgery Multiple epileptogenic zones Epileptogenic zone located in functional brain tissue Only 60% of surgery cases result in seizure free Electrical Stimulation (Vagus nerve stimulator) Parameters (amplitude and duration of stimulation) arbitrarily adjusted As effective as one additional AED dose Side Effects Seizure Prediction?

Vagus Nerve Stimulator

Open Problems Is the seizure occurrence random? If not, can seizures be predicted? If yes, are there seizure pre-cursors preceding seizures? If yes, what measurement can be used to indicate these pre-cursors? Does normal brain activity during differ from abnormal brain activity?

Electroencephalogram (EEG) …is a tool for evaluating the physiological state of the brain. …offers excellent spatial and temporal resolution to characterize rapidly changing electrical activity of brain activation …captures voltage potentials produced by brain cells while communicating. In an EEG, electrodes are implanted in deep brain or placed on the scalp over multiple areas of the brain to detect and record patterns of electrical activity and check for abnormalities.

From Microscopic to Macroscopic Level (Electroencephalogram - EEG)

Depth and Subdural electrode placement for EEG recordings LOF ROF LTDRTD LTD LST RST

Scalp EEG Data Acquisition

EEG Data Acquisition

Typical EEG Time Series Data

Goals of Research Test the hypothesis that seizures are not a random process. Employ data mining techniques to differentiate normal and abnormal EEGs Employ quantitative analysis to identify seizure pre-cursors Demonstrate that seizures could be predicted Develop a closed-loop seizure control device (Brain Pacemaker)

10-second EEGs: Seizure Evolution NormalPre-Seizure Seizure Post-Seizure

Dimensionality Reduction The brain is a non-stationary system. EEG time series is non-stationary. With 200 Hz sampling, 1 hour of EEGs is comprised of 200*60*60*30 = 21,600,000 data points = 43.2MB (assume 16-bit ASCI format) 1 day = 1 hour*24 1 week = 1 hour*168 20 patients = 1 hour*3360 Kilobytes → Megabytes → Gigabytes → Terabytes

Dimensionality Reduction Using Chaos Theory Chaos in Brain? Chaos in Stock Market? Chaos in Foreign Exchanges (Swedish Currency)? Measure the brain dynamics from EEG time series. Apply dynamical measures (based on chaos theory) to non-overlapping EEG epochs of 10.24 seconds = 2048 points. Maximum Short-Term Lyapunov Exponent measures the average uncertainty along the local eigenvectors and phase differences of an attractor in the phase space Measures the chaoticity of the brain waves

where M is the number of times we went through the loop above, and N is the number of time-steps in the fiduciary. NΔt = t n - t 0 Embed the data set (EEG). X i = (x(t i ),x(t i+τ ),…,x(t i+(p-1)τ )) T where τ is the selected time lag between the components of each vector in the phase space, p is the selected dimension of the embedding phase space, and t i  [1,T-(p-1) τ]. Pick a point x(t 0 ) somewhere in the middle of the trajectory. Find that point's nearest neighbor. Call that point z 0 (t 0 ). Compute |z 0 (t 0 ) - x(t 0 )| = L 0. Follow the ``difference trajectory" -- the dashed line -- forwards in time, computing |z 0 (t i ) - x(t i )| = L 0 (i) and incrementing i, until L 0 (i) > ε. Call that value L 0 ' and that time t 1. Find z 1 (t 1 ), the “nearest neighbor” of x(t 1 ), and go to step 3. Repeat the procedure to the end of the fiduciary trajectory t = t n, keeping track of the L i and L i '.

2-D Example: Circle of initial conditions evolves into an ellipse.

STLmax Profiles Pre-Ictal IctalPost-Ictal

Hidden Synchronization Patterns

By paired-T statistic: Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 60 points, 10 minutes) are Then, we calculate the average value,,and the sample standard deviation,, of. The T-index between EEG signal epochs i and j is defined as How similar are they? Statistics to quantify the convergence of STLmax

Statistically Quantifying the Convergence

IID (Independent and Identically Distributed) Test Assumption 1: Within a window of 30 STLmax points, the differences of STLmax values (D ij ) between two electrode sites i and j are independent. To verify this assumption, Employ “portmanteau” test of white noise developed by Ljung and Box. Assumption 2: Within a wt window of 60 points, the differences of STLmax values between two electrode sites i and j are normally distributed. To verify this assumption, Employ To check this assumption, we employed the Shapiro-Wilk W test, which is is a well-established and powerful test of departure from normality.

Convergence of STL max

Models  and  are intrinsic parameters.  and  ’ are directional coupling strengths. N = number of oscillators (1) (2) (3) Homoclinic Chaos (Silnikov’s Theorem): Rössler systems, Lorentz systems, population dynamical systems

STLmax versus time and coupling

Why Feature Selection? Not every electrode site shows the convergence. Feature Selection: Select the electrodes that are most likely to show the convergence preceding the next seizure.

Optimization: We apply optimization techniques to find a group of electrode sites such that … They are the most converged (in STLmax) electrode sites during 10-min window before the seizure They show the dynamical resetting (diverged in STLmax) during 10-min window after the seizure. Such electrode sites are defined as “critical electrode sites”. Hypothesis: The critical electrode sites should be most likely to show the convergence in STLmax again before the next seizure. Optimization Problem

Multi-Quadratic Integer Programming To select critical electrode sites, we formulated this problem as a multi- quadratic integer (0-1) programming (MQIP) problem with … objective function to minimize the average T-index among electrode sites a linear constraint to identify the number of critical electrode sites a quadratic constraint to ensure that the selected electrode sites show the dynamical resetting

x is an n-dimensional column vector (decision variables), where each x i represents the electrode site i. x i = 1 if electrode i is selected to be one of the critical electrode sites. x i = 0 otherwise. Q is an (n  n) matrix, whose each element q ij represents the T- index between electrode i and j during 10-minute window before a seizure. b is an integer constant. (the number of critical electrode sites) D is an (n  n) matrix, whose each element d ij represents the T- index between electrode i and j during 10-minute window after a seizure. α = 2.662*b*(b-1), an integer constant. 2.662 is the critical value of T-index, as previously defined, to reject H 0 : “`two brain sites acquire identical STL max values within 10-minute window” Notation and Modeling

Conventional Linearization Approach for Multi-Quadratic 0-1 Problem

Consider the quadratic 0-1 programming problem e T = (1,1,…,1) Relax x ≥ 0, we then have the following KKT conditions: KKT Conditions Approach Q is an (n  n) matrix. b is an integer constant x is an n-dimensional column vector

Add slack variables a and define s = u.e + a Minimizing slack variables, we can formulate this problem as: Note that this problem formulation is an efficient approach, as n increases, because it has the SAME number of 0-1 variables (n), and 2n additional continuous variables. KKT Conditions Approach Fix x  {0,1}

For any matrix Q where q ij ≥0 We want to prove that P and P are equivalent: Connections Between QIP problems and MILP problems Equivalent

Consider the MQIP problem We proved that the MQIP program is EQUIVALENT to a MILP problem with the SAME number of integer variables. Theoretical Results: MILP formulation for MQIP problem Equivalent

Reference: P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004.

Empirical Results: Performance on Larger Problems Reference: W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. Reduction of Multi-Quadratic 0-1 Programming Problems to Linear Mixed 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004.

Empirical Results: Performance on Larger Problems

Hypothesis: The critical electrode sites should be most likely to show the convergence in STL max (drop in T-index below the critical value) again before the next seizure. The critical electrode sites are electrode sites that are the most converged (in STL max ) electrode sites during 10-min window before the seizure show the dynamical resetting (diverged in STL max ) during 10-min window after the seizure Simulation: Based on 3 patients with 20 seizures, we compare the probability of showing the convergence in STL max (drop in T-index below the critical value) before the next seizure between the electrode sites, which are Critical electrode sites Randomly selected (5,000 times) Hypothesis Testing - Simulation

Optimal VS Non-Optimal

Simulation - Results

How to automate the system

Select critical electrode sites after every subsequent seizure EEG Signals Give a warning when: T-index value is greater than 5, then drops to a value of 2.662 or less Monitor the average T-index of the critical electrodes Continuously calculate STLmax from multi- channel EEG. ASWA Automated Seizure Warning System

Data Characteristics

Performance Evaluation for ASWS To test this algorithm, a warning was considered to be true if a seizure occurred within 3 hours after the warning. Sensitivity = False Prediction Rate = average number of false warnings per hour

Performance characteristics of automated seizure warning algorithm with the best parameter-settings of training data set. Training Results

ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between: the false positive rate (1-Specificity, plotted on X- axis) that needs to be minimized the detection rate (Sensitivity, plotted on Y-axis) that needs to be maximized. RECEIVER OPERATING CHARACTERISTICS (ROC)

ROC curve analysis for the best parameter settings of 10 patients

Test Results Performance characteristics of automated seizure warning algorithm with the best parameter settings on testing data set.

Validation of the ASWS algorithm Temporal Properties Surrogate Seizure Time Data Set 100 Surrogate Data Sets Spatial Properties Non-Optimized ASWS – Selecting non-optimal electrode sites 100 Randomly Selected Electrodes

Prediction Scores: ASWS

Prediction Scores: Surrogate Data and Non-Optimized ASWS W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005.

Prediction Scores: Surrogate Data and Non-Optimal ASWS

Concluding Remarks Overview of Epilepsy Research Applications of Data Mining and Optimization Techniques Interplay between theory and application The first online real-time seizure prediction system Seizure Prediction Predicting ~70% of temporal lobe seizures on average Giving a false alarm rate of ~0.16 per hour on average Ongoing and Future Research Classification of EEGs from normal and epileptic patients Classification of abnormal brain activity Cluster analysis of epileptic brains Analysis on scalp EEGs

W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005. W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. EEG Classification in Epilepsy. To appear in Annals of Operations Research. W. Chaovalitwongse and P.M. Pardalos. Optimization Approaches to Characterize the Hidden Dynamics of the Epileptic Brain: Seizure Prediction and Localization. To appear in SIAG/OPT Views-and-News. W. Chaovalitwongse, P.M. Pardalos, L.D. Iasemidis, D.-S. Shiau, and J.C. Sackellares. Dynamical Approaches and Multi-Quadratic Integer Programming for Seizure Prediction. Optimization Methods and Software, 20 (2-3): 383-394, 2005. L.D. Iasemidis, P.M. Pardalos, D.-S. Shiau, W. Chaovalitwongse, K. Narayanan, A. Prasad, K. Tsakalis, P.R. Carney, and J.C. Sackellares. Long Term Prospective On-Line Real-Time Seizure Prediction. Journal of Clinical Neurophysiology, 116 (3): 532-544, 2005. P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004. (INFORMS Pierskalla Best Paper Award 2004) W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. A New Linearization Technique for Multi-Quadratic 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004. (Rank 5th in Top 25 Articles in Operations Research Letters) Reference

Questions? Thank you

Classification of Brain Activity

Phase Profiles

Entropy H of Attractor

Classification of Physiological States

Nearest Neighbor Time Series Classification Normal Pre-Seizure Post-Seizure A

By paired-T statistic: Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 30 points, 5 minutes) are Then, we calculate the average value,,and the sample standard deviation,, of. The T-index between EEG signal epochs i and j is defined as Similarity Measure for EEG Time Series – T-test

T-Statistics Distance The T-index, T xy, between the time series x and y is then defined as: where E[ ] denotes the average of the value within an epoch of the time series, n is the length of the time series epoch, and σ xy is the sample standard deviation of the difference in value of x and y. Asymptotically, T xy index follows a t-distribution with n-1 degrees of freedom.

Nearest Neighbor Classification Rules Given an unknown-state epoch of EEG signals A, we calculate statistical distances between the EEG epoch and the groups of Normal, Pre-Seizure, and Post-Seizure EEGs in our database. EEG sample A will be classified in the group of patient’s states (normal, pre-seizure, and post- seizure) that yields the minimum T-index distance. Multiple Electrodes = Multiple Decisions Averaging Voting (Majority voting: selects action with maximum number of votes)

Preliminary Data Set 132 5-minute epochs of pre-seizure EEGs 132 5-minute epochs of post-seizure EEGs 300 5-minute epochs of normal EEGs Pre-seizure = 0-30 minutes before seizure Post-seizure = 2-10 minutes after seizure Normal = 10 hours away from seizure

Probability of Correct Classifications

Metrics for Performance Evaluation PREDICTED CLASS ACTUAL CLASS Class=YesClass=No Class=Yesab Class=Nocd a: TP (true positive); b: FN (false negative); c: FP (false positive); d: TN (true negative)

Sensitivity and Specificity Sensitivity measures the fraction of positive cases that are classified as positive. Specificity measures the fraction of negative cases classified as negative. Sensitivity = TP/(TP+FN) Specificity = TN/(TN+FP) Sensitivity can be considered as a detection (prediction or classification) rate that one wants to maximize. Maximize the probability of correctly classifying patient states. False positive rate can be considered as 1-Specificity which one wants to minimize.

ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between: the false positive rate (1-Specificity, plotted on X-axis) that needs to be minimized the detection rate (Sensitivity, plotted on Y- axis) that needs to be maximized. RECEIVER OPERATING CHARACTERISTICS (ROC)

ROC – Performance Characteristics Entropy Phase Lmax

ROC – Performance Characteristics Entropy Phase Lmax Entropy Phase Lmax

ROC – Performance Characteristics Entropy Phase Lmax Entropy Phase Lmax Average Voting

ROC – Performance Characteristics Entropy Phase Lmax Entropy Phase Lmax Average Voting Average Voting Sensitivity = 95.7% Specificity = 75.4%

Results

Any More Sophisticated Method?

Support Vector Machines 2-Class Linearly Separable Case

Mathematical Modeling

Leave-one-out Cross Validation Cross-validation can be seen as a way of applying partial information about the applicability of alternative classification strategies. K-fold cross validation: Divide all the data into k subsets of equal size. Train a classifier using k-1 groups of training data. Test a classifier on the omitted subset. Iterate k times.

Classification Results

QP for Clustering Clustering Epileptic Brains

Hierarchical Clustering a, b, c, d, e a d e c b a, d b, c b, c, e Agglomerative Divisive

Hierarchical Clustering Agglomerative Divisive a, b, c, d, e a d e c b a, d b, c b, c, e

Clustering via Concave Quadratic Programming (CCQP) Formulate a clustering problem as a Quadratic Integer Program (QIP) where A is an nxn T-index matrix of pairwise distance λ is a parameter adjusting the degree of similarity within a cluster x i is a 0-1 decision variable indicating whether or not point i is selected (assigned) to be in the cluster

Advantages In some instances when λ is large enough to make the quadratic function become concave function. QIP can be converted to a continuous problem (minimizing a concave quadratic function over a sphere)

CCQP Algorithm

Patient 1: Box Plot of Average Solution Lmax

Patient 1: Box Plots of Average Solution LmaxPhase

Patient 2: Box Plots of Average Solution LmaxPhase

Kruskal-Wallis Test …is a nonparametric version of the one-way ANOVA …is an extension of the Wilcoxon rank sum test to more than two groups …compares samples from two or more groups. …compares the medians of the samples in X, and returns the p-value for the null hypothesis that all samples are drawn from the same population (or equivalently, from different populations with the same distribution).

Assumptions The Kruskal-Wallis test makes the following assumptions about the data in X: All samples come from populations having the same continuous distribution, apart from possibly different locations due to group effects. All observations are mutually independent. The classical one-way ANOVA test replaces the first assumption with the stronger assumption that the populations have normal distributions.

T-test Test the hypothesis of the difference in means of two samples Determine whether two samples, x and y, could have the same mean when the standard deviations are unknown but assumed equal. Asymptotically, T xy index follows a t- distribution with n-1 degrees of freedom.

Results – Significance Level

Concluding Remarks Overview of Epilepsy Research Applications of Data Mining and Optimization Techniques Interplay between theory and application Quadratic Programming for Feature Selection Quadratic Programming for Clustering Long-Term Monitoring Analysis

Optimization and Data Mining in Epilepsy Research W. Art Chaovalitwongse Assistant Professor Industrial and Systems Engineering Rutgers University.

Similar presentations

Presentation on theme: "Optimization and Data Mining in Epilepsy Research W. Art Chaovalitwongse Assistant Professor Industrial and Systems Engineering Rutgers University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Optimization and Data Mining in Epilepsy Research W. Art Chaovalitwongse Assistant Professor Industrial and Systems Engineering Rutgers University.

Similar presentations

Presentation on theme: "Optimization and Data Mining in Epilepsy Research W. Art Chaovalitwongse Assistant Professor Industrial and Systems Engineering Rutgers University."— Presentation transcript:

Similar presentations

About project

Feedback