Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida.

Similar presentations

Presentation on theme: "Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida."— Presentation transcript:

1 Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

2 Data Mining: “the practice of searching through large amounts of computerized data to find useful patterns or trends.” Optimization: “An act, process, or methodology of making something (as a design, system, or decision) as fully perfect, functional, or effective as possible; specifically : the mathematical procedures (as finding the maximum of a function) involved in this.” Merriam Webster Dictionary Introduction

3 The combination of data mining and optimization: Find the “best” way to extract meaningful “patterns” from data. Not always an easy task.

4 How difficult Optimization can be? Given integers N 1,N 2,…,N k and M find a subset of N 1,N 2,…,N k such that their sum is equal to M. Can you find a better algorithm than of O(2 k ). Exponential complexity ?

5 Hard drive Cost Approximate ly 1/10 cheaper every 5 years

6 Hard Drive Capacity Approximat ely 10 times more every 5 years

7 Processing power Number of transistors of a computer processor double every two years

8 References Handbook of Massive Data Sets, co-editors: J. Abello, P.M. Pardalos, and M. Resende, Kluwer Academic Publishers, (2002).

9 Main problems in data mining Data preprocessing Dimensionality reduction Feature selection Regression Clustering (Unsupervised learning) Classification (Supervised Learning) Semi-Supervised learning (between unsupervised and unsupervised) Biclustering Result Validation Data Visualization/Representation Biomedical Informatics is a challenging area with lots of these problems.

10 Agenda Research Background Epilepsy Seizure Prediction Sources of Data Electroencephalogram (EEG) Time Series Dimensionality Reduction Chaos Theory Feature Selection for Brain Monitoring Time Series Classification of Neuro-Physiological States Brain Clustering Brain Network Models Concluding Remarks

11 Facts About Epilepsy At least 2 million Americans and other 40-50 million people worldwide (about 1% of population) suffer from Epilepsy. Epilepsy is the second most common brain disorder (after stroke) that causes recurrent seizures. Epileptic seizures occur when a massive group of neurons in the cerebral cortex suddenly begin to discharge in a highly organized rhythmic pattern.

12 Epileptic Seizures Seizures usually occur spontaneously, in the absence of external triggers. Seizures cause temporary disturbances of brain functions such as motor control, responsiveness and recall which typically last from seconds to a few minutes. Seizures may be followed by a post-ictal period of confusion or impaired sensorial that can persist for several hours.

13 10-second EEGs: Seizure Evolution NormalPre-Seizure Seizure Onset Post-Seizure

14 Why do we care? Based on 1995 estimates, epilepsy imposes an annual economic burden of $12.5 billion* in the U.S. in associated health care costs and losses in employment, wages, and productivity. Cost per patient ranged from $4,272 for persons** with remission after initial diagnosis and treatment to $138,602 for persons** with intractable and frequent seizures.

15 Current Epilepsy Treatment Pharmacological Therapy Anti-Epileptic Drugs (AEDs) Mainstay of epilepsy treatment Approximately 25 to 30% remain unresponsive Epilepsy Resective Surgery Require long-term invasive EEG monitoring to locate a specific, localized part of the brain where the seizures are thought to originate 50% of pre-surgical candidates do not undergo respective surgery Multiple epileptogenic zones Epileptogenic zone located in functional brain tissue Only 50-60% of surgery cases result in seizure free

16 Current Epilepsy Treatment Electrical Stimulation (Vagus nerve stimulator) Parameters (amplitude and duration of stimulation) arbitrarily adjusted As effective as one additional AED dose Side Effects Seizure Prediction? Monitoring Unit? Forecasting Impending Seizures? Seizure Control? Deep Brain Stimulator?

17 Electroencephalogram (EEG) …is a traditional tool for evaluating the physiological state of the brain. …offers excellent spatial and temporal resolution to characterize rapidly changing electrical activity of brain activation …captures voltage potentials produced by brain cells while communicating. In an EEG, electrodes are implanted in deep brain or placed on the scalp over multiple areas of the brain to detect and record patterns of electrical activity and check for abnormalities.

18 From Microscopic to Macroscopic Level (Electroencephalogram - EEG)

19 Electrode Montage and EEGs

20 Scalp EEG Data Acquisition

21 Open Problems Is the seizure occurrence random? If not, can seizures be predicted? If yes, are there seizure pre-cursors (in EEGs) preceding seizures? If yes, what data mining techniques can be used to indicate these pre-cursors? Does normal brain activity during differ from abnormal brain activity?

22 Goals of Research Test the hypothesis that seizures are not a random process. Demonstrate that seizures could be predicted Feature Selection to identify seizure pre-cursors (Statistical Process Control) Demonstrate that normal and abnormal EEGs can be differentiated Time Series Classification Better understand the epileptogenic process – how seizures are initiated and propagated. Brain Clustering Develop a closed-loop seizure control device (Brain Pacemaker)

23 Dimensionality Reduction Chaos Theory

24 EEGs with the Curse of Dimensionality The brain is a non-stationary system. EEG time series is non-stationary. With 200 Hz sampling, 1 hour of EEGs is comprised of 200*60*60*30 = 21,600,000 data points = 43.2MB (assume 16-bit ASCI format) 1 day = 1.04GB 1 week = 7.28GB 20 patients ≈ 0.15TB Kilobytes → Megabytes → Gigabytes → Terabytes

25 Data Transformation Using Chaos Theory Measure the brain dynamics from time series: Stock Market Currency Exchanges (e.g., Swedish Kroner) Apply dynamical measures (based on chaos theory) to non-overlapping EEG epochs of 10.24 seconds = 2048 points. Maximum Short-Term Lyapunov Exponent measure the average uncertainty along the local eigenvectors and phase differences of an attractor in the phase space measure the stability/chaoticity of EEG signals

26 Measure of Chaos

27 STLmax Profiles Pre-Seizure Seizure OnsetPost-Seizure

28 Hidden Synchronization Patterns

29 By paired-T statistic: Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 60 points, 10 minutes) are Then, we calculate the average value,,and the sample standard deviation,, of. The T-index between EEG signal epochs i and j is defined as How similar are they? Statistics to quantify the convergence of STLmax

30 Statistically Quantifying the Convergence

31 Convergence of STL max

32 Why Feature Selection? Not every electrode site shows the convergence. Feature Selection: Select the electrodes that are most likely to show the convergence preceding the next seizure.

33 Feature Selection Quadratic Integer Programming with Quadratic Constraints

34 Optimization: We apply optimization techniques to find a group of electrode sites such that … They are the most converged (in STLmax) electrode sites during 10-min window before the seizure They show the dynamical resetting (diverged in STLmax) during 10-min window after the seizure. Such electrode sites are defined as “critical electrode sites”. Hypothesis: The critical electrode sites should be most likely to show the convergence in STLmax again before the next seizure. Optimization Problem

35 x is an n-dimensional column vector (decision variables), where each x i represents the electrode site i. x i = 1 if electrode i is selected to be one of the critical electrode sites. x i = 0 otherwise. Q is an (n  n) matrix, whose each element q ij represents the T- index between electrode i and j during 10-minute window before a seizure. b is an integer constant. (the number of critical electrode sites) D is an (n  n) matrix, whose each element d ij represents the T- index between electrode i and j during 10-minute window after a seizure. α = 2.662*b*(b-1), an integer constant. 2.662 is the critical value of T-index, as previously defined, to reject H 0 : “`two brain sites acquire identical STL max values within 10-minute window” Notation and Modeling

36 Multi-Quadratic Integer Programming To select critical electrode sites, we formulated this problem as a multi- quadratic integer (0-1) programming (MQIP) problem with … objective function to minimize the average T-index among electrode sites a linear constraint to identify the number of critical electrode sites a quadratic constraint to ensure that the selected electrode sites show the dynamical resetting

37 Conventional Linearization Approach for Multi-Quadratic 0-1 Problem

38 Consider the MQIP problem We proved that the MQIP program is EQUIVALENT to a MILP problem with the SAME number of integer variables. Theoretical Results: MILP formulation for MQIP problem Equivalent

39 Empirical Results: Performance on Larger Problems

40 Hypothesis: The critical electrode sites should be most likely to show the convergence in STL max (drop in T-index below the critical value) again before the next seizure. The critical electrode sites are electrode sites that are the most converged (in STL max ) electrode sites during 10-min window before the seizure show the dynamical resetting (diverged in STL max ) during 10-min window after the seizure Simulation: Based on 3 patients with 20 seizures, we compare the probability of showing the convergence in STL max (drop in T-index below the critical value) before the next seizure between the electrode sites, which are Critical electrode sites Randomly selected (5,000 times) Hypothesis Testing - Simulation

41 Optimal VS Non-Optimal

42 Simulation - Results

43 Statistical Process Control: How to automate the system?

44 Select critical electrode sites after every subsequent seizure EEG Signals Give a warning when T-index value drops below a critical value Monitor the average T-index of the critical electrodes Continuously calculate STLmax from multi- channel EEG. ASWA Automated Seizure Warning System

45 Data Characteristics

46 Performance Evaluation for ASWS To test this algorithm, a warning was considered to be true if a seizure occurred within 3 hours after the warning. Sensitivity = False Prediction Rate = average number of false warnings per hour

47 Performance characteristics of automated seizure warning algorithm with the best parameter-settings of training data set. Training Results

48 ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between: the false positive rate (1-Specificity, plotted on X- axis) that needs to be minimized the detection rate (Sensitivity, plotted on Y-axis) that needs to be maximized. RECEIVER OPERATING CHARACTERISTICS (ROC)

49 Test Results Performance characteristics of automated seizure warning algorithm with the best parameter settings on testing data set.

50 Validation of the ASWS algorithm Temporal Properties Surrogate Seizure Time Data Set 100 Surrogate Data Sets Spatial Properties Non-Optimized ASWS – Selecting non-optimal electrode sites 100 Randomly Selected Electrodes

51 Prediction Scores: Surrogate Data and Non-Optimized ASWS

52 Remarks Optimization as feature selection for brain monitoring Developed an online real-time seizure prediction system Tested on the dataset of 10 patients suffering from temporal lope seizures ~90 days (2100 hours) of EEG data 58 seizures Seizure Prediction Predicting ~70% of temporal lobe seizures on average Giving a false alarm rate of ~0.16 per hour on average What’s next?-fundamental questions on brain physiology

53 Time Series Classification I Support Vector Machines with Dynamic Time Warping

54 Other Dynamical Measures: Phase Profiles

55 Other Dynamical Measures: Entropy H of Attractor

56 Classification of Physiological States

57 Support Vector Machines From 1 electrode

58 Input Standard SVM Input 30 electrodes, 30 data points, 3 dynamical features = 2,700 features Time Series SVM Input 30*29 data pairs, 3 dynamical features = 2,700 – 90 features

59 Dynamic Time Warping

60 Preliminary Data Set 132 5-minute epochs of pre-seizure EEGs 300 5-minute epochs of normal EEGs Pre-seizure = 0-30 minutes before seizure Normal = 10 hours away from seizure

61 Metrics for Performance Evaluation PREDICTED CLASS ACTUAL CLASS Class=YesClass=No Class=Yesab Class=Nocd a: TP (true positive); b: FN (false negative); c: FP (false positive); d: TN (true negative)

62 Sensitivity and Specificity Sensitivity measures the fraction of positive cases that are classified as positive. Specificity measures the fraction of negative cases classified as negative. Sensitivity = TP/(TP+FN) Specificity = TN/(TN+FP) Sensitivity can be considered as a detection (prediction or classification) rate that one wants to maximize. Maximize the probability of correctly classifying patient states. False positive rate can be considered as 1-Specificity which one wants to minimize.

63 Leave-one-out Cross Validation Cross-validation can be seen as a way of applying partial information about the applicability of alternative classification strategies. K-fold cross validation: Divide all the data into k subsets of equal size. Train a classifier using k-1 groups of training data. Test a classifier on the omitted subset. Iterate k times.

64 Empirical Results

65 Com User Interface Technology Multichannel Data Acquisition Pattern Recognition Initiate a variety of therapies (e.g., electrical stimulation, drug injection) VNS Automated Seizure Prediction Paradigm Drug Feature Extraction/ Cluster Analysis

66 Related Patents Multi-dimensional multi-parameter time series processing for seizure warning and prediction Patent 7,263,467 (Issued on August 28, 2007). Optimization of Multi-dimensional Time Series Processing for seizure warning and prediction Patent 7,373,199 (Issued on May 13, 2008). Optimization of spatio-temporal pattern processing for seizure warning and prediction Patent 7,461,045 (Issued on December 2, 2008). Multi-dimensional dynamical analysis U.S. Utility Patent application filed on December 21, 2006, Serial No.: 11/339,606. Closed-Loop State-Dependent Seizure Prevention Systems U.S. Utility Patent application filed on December 19, 2006, Serial No.: 11/641,292.

67 Brain Network Models Brain Connectivity Networks Based on fMRI Data

68 Certain neurological diseases are very difficult to diagnose at early stages Functional Magnetic Resonance Imaging (fMRI) technique provides vast amount of information about structure and function of human brain, but there is lack of methods to analyse these data Computational methods and algorithms based on mathematical models should be applied in order to find and recognize key patterns in this “ocean” of data The Problem

69 Network models of human brain Partition of the brain into regions of interest Functional interconnections between regions in brain Network Models

70 Connectivity Networks

71 Blood flow level as an indicator of neuronal activity Representation of values of signal in spatial voxels as 2D and 3D images MRI Data

72 The measurements are being performed every 2 seconds over 6 minutes for each voxel of brain of size 2mm x 2mm x 2mm The fMRI data is therefore a set of time series, corresponding to particular elementary volumes of the brain. In our data set each series contains 180 elements. fMRI Data

73 fMRI Data, Vector Representation Z X Y 0 (x, y, z)

74 Small world phenomenon first described by Stanley Milgram in 1960. “Six degrees of separation” “Erdos number” Small World Networks

75 Random graphs generally have property of low mean shortest path length and low clustering coefficient Regular lattice has high mean shortest path and high cluster coefficient Small world networks have low mean shortest path length while still high clustering coefficient From Random Graph to Regular Lattice

76 Random Graph vs Regular Lattice

77 Small World Network

78 Characteristic path length Clustering coefficient Global efficiency Nodal efficiency Quantitative Measures of “Small World” Property

79 Brain connectivity networks possess small world properties We predicted, that network characteristics, such as global and local efficiency values, would be decreased for people with Parkinson’s disease. Brain Connectivity Networks

80 How to define brain regions – nodes in the network? Clustering problem Standard MNI template Nodes in Connectivity Network

81 Signal Time Series Form Clusters time

82 Each data set contains roughly 100 000 of time series, each of them consist of 180 elements Efficient algorithms should be developed in order to solve this problem Clustering Problem

83 Partition of the brain into 116 brain regions Standard MNI Brain Atlas

84 Weighted graph with nodes corresponding to MNI brain regions Weights of edges defined based on correlation between averaged neural activity over the regions Edges in Connectivity Network

85 Neural activity Head movements during the MR session Respiratory and heart rhythms Noise Signal Processing

86 Wavelet is a “small wave” Wavelet transform is a decomposition of initial signal into linear combination of wavelets Maximal Overlap Discrete Wavelet Transform

87 Time Series Decomposition by Wavelets

88 Inter-regional correlations in resting state fMRI data are particularly salient at frequencies below 0.1 Hz Second scale wavelet coefficients correspond to 0.06 – 0.12 Hz frequency range Wavelet Coefficients Correlation

89 Averaged over the regions signal vectors Define level 2 wavelet coefficients of averaged signals,. The connectivity between regions A and B is Connectivity Strength

90 For each time series S = {s 1, s 2, …, s n } of size n there is a corresponding point in n- dimensional space For normal vectors x and y the distance between end points is equal to Therefore, (1 – corr(x,y)) may serve as a measure of distance between time series Definition of Distance Between Nodes

91 Geometrical Representation x y 0 x - y S = (s 1, s 2, …, s n )

92 15 healthy controls, 14 Parkinson patients Each network for each patient consist of 116 nodes Data Set

93 Averaged Connectivity Networks ControlParkinson

94 Global Network Efficiency Values Control (1.85 +/- 0.57), Parkinson (1.12 +/- 0.55), independent t-test p-value = 0.0017

95 Top 30 Nodal Efficiency Values

96 Nodal Efficiency Plot Red line – Control set, blue line - PD set

97 Parkinson’s brain network properties possess measurable alteration in comparison with healthy ones Further research, in particular, different network model, may reveal the pattern in brain networks, which could be used as a diagnosis criteria Discussion

98 Concluding Remarks Overview of Epilepsy Research Applications of Data Mining and Optimization Techniques Interplay between theory and application Feature Selection Time Series Classification Brain Clustering Brain Network Models

99 Related Patents Sensor registration by global optimization procedures Patent 7,653,513 (Issued January 26, 2010). Atomic Magnetometer Sensor Array Magnetoencephalogram Systems and Method United States Patent Application 20100219820 (Filed April 14, 2008)

100 References Handbook of Massive Data Sets, co-editors: J. Abello, P.M. Pardalos, and M. Resende, Kluwer Academic Publishers, (2002).

101 References “Feature Selection for Consistent Biclustering via Fractional 0-1 Programming” (with Stanislav Busygin and Oleg A. Prokopyev), Journal of Combinatorial Optimization, Volume 10, Number 1 (2005), pp. 7-21. “Biclustering in Data Mining” (with S. Busygin, and O. Prokopyev), Computers & Operations Research, Volume 35, Issue 9 (2008), pp. 2964-2987. “On Biclustering with Features Selection for Microarray Data Set” (with S. Busygin and O. Prokopyev), In (BIOMAT 2005) Proceedings of the International Symposium on Mathematical and Computational Biology (Edited by R. Mondaini & R. Dilao), World Scientific (2006), pp. 367- 377. “Biclustering: algorithms and applications in data mining and forecasting” (with P. Xanthopoulos, N. Boyko and N. Fan) In Encyclopedia of Operations Research and Management Science (accepted to appear) Wiley(2010). “Clustering Challenges on Biological Networks” S. Butenko, W. A. Chaovalitwongse and P. M. Pardalos, World Scientific (2009).

102 Quantitative Neuroscience, co-editors: P.M. Pardalos, C. Sackellares, P. Carney, and L. Iasemidis, Kluwer Academic Publishers, (2004). Biocomputing, co-editors: P.M. Pardalos and J. Principe, Kluwer Academic Publishers, (2002). References

103 New in 2010: Computational Neuroscience, co-editors: W.A. Chaovalitwongse, P.M. Pardalos, P. Xanthopoulos (Eds.) Series: Springer Optimization and Its Applications, Vol. 38.

104 References Optimization in Medicine, Carlos Alves,, Panos M. Pardalos, Luis Vicente (Eds.), 2008

105 References Handbook of Optimization in Medicine, Panos M. Pardalos, Edwin H. Romeijn (Eds.), 2009

106 W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005. W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. Electroencephalogram (EEG) time series classification: Applications in epilepsy, Annals of Operations Research, 148, 1 (2006), p 227- 250. Jicong Zhang, Petros Xanthopoulos,Chang-Chia Liu, Panos M. Pardalos. Real-time differentiation of nonconvulsive status epilepticus from other encephalopathies using quantitative EEG analysis: A pilot study“, Epilepsia, 51, 2 (2010), pp. 243-250 W. Chaovalitwongse, P.M. Pardalos, L.D. Iasemidis, D.-S. Shiau, and J.C. Sackellares. Dynamical Approaches and Multi-Quadratic Integer Programming for Seizure Prediction. Optimization Methods and Software, 20 (2-3): 383-394, 2005. L.D. Iasemidis, P.M. Pardalos, D.-S. Shiau, W. Chaovalitwongse, K. Narayanan, A. Prasad, K. Tsakalis, P.R. Carney, and J.C. Sackellares. Long Term Prospective On-Line Real-Time Seizure Prediction. Journal of Clinical Neurophysiology, 116 (3): 532-544, 2005. P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004. (INFORMS Pierskalla Best Paper Award 2004) W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. A New Linearization Technique for Multi-Quadratic 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004. (Rank 5th in Top 25 Articles in Operations Research Letters) Reference

107 Thank you for your attention! Questions?

108 Conference in 2011

Download ppt "Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida."

Similar presentations

Ads by Google