Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.

Similar presentations


Presentation on theme: "Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for."— Presentation transcript:

1 Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for each feature tend to be fairly close for the most part, with the exception being the energy feature. PCA For the most part, the components are similar for both classes. A notable exception is the sixth component of the third vector, which indicates a problem with polarity. Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for each feature tend to be fairly close for the most part, with the exception being the energy feature. PCA For the most part, the components are similar for both classes. A notable exception is the sixth component of the third vector, which indicates a problem with polarity. Abstract High performance automatic labeling of EEG signals for clinical applications requires statistical modeling systems that are trained on large amounts of data. The Temple University Hospital EEG Corpus (TUH EEG) is the world’s largest publicly available resource and contains over 28,00 EEG records. However, it is the nature of clinical data that there are many forms of inconsistency including channel labels, sampling rates and electrode positioning. In this study, we examine one such issue: 51% of the recordings in TUH EEG are REF referenced, while the other 49% are LE referenced. It is unclear whether this difference will have an affect on any statistical machine learning algorithms trained using the data. A rudimentary statistical analysis suggests that the means and variances of features generated from this data are significantly different and will require some form of post processing if the data are to be used together to train a single statistical model. Abstract High performance automatic labeling of EEG signals for clinical applications requires statistical modeling systems that are trained on large amounts of data. The Temple University Hospital EEG Corpus (TUH EEG) is the world’s largest publicly available resource and contains over 28,00 EEG records. However, it is the nature of clinical data that there are many forms of inconsistency including channel labels, sampling rates and electrode positioning. In this study, we examine one such issue: 51% of the recordings in TUH EEG are REF referenced, while the other 49% are LE referenced. It is unclear whether this difference will have an affect on any statistical machine learning algorithms trained using the data. A rudimentary statistical analysis suggests that the means and variances of features generated from this data are significantly different and will require some form of post processing if the data are to be used together to train a single statistical model. Statistical Comparison of REF vs. LE Referenced EEG signals Aaron Gross, Silvia Lopez, Dr. Iyad Obeid and Dr. Joseph Picone The Neural Engineering Data Consortium, Temple University Introduction Since EEG signals are fundamentally voltages, each channel needs to be measured with respect to some common reference voltage in order for a meaningful comparison of the channels to be done. 51% of the EEG recordings in TUH EEG are REF referenced — any electrode on the scalp is used as a reference, then the mean of all the electrodes are used to create an average reference. 49% are LE referenced – voltages are measured using the left ear lobe as a reference. The features calculated from these two types of recordings could have a systematic bias that can have a substantial negative impact on our machine learning algorithms. To further probe this issue, we performed two types of statistical analyses:  Simple descriptive statistical measures, such as mean and variance, that form the basis for our feature extraction process, were analyzed.  More advanced measures based on Principle Component Analysis (PCA) were employed to analyze the distribution of the variance in the data when varying its dimensionality. Introduction Since EEG signals are fundamentally voltages, each channel needs to be measured with respect to some common reference voltage in order for a meaningful comparison of the channels to be done. 51% of the EEG recordings in TUH EEG are REF referenced — any electrode on the scalp is used as a reference, then the mean of all the electrodes are used to create an average reference. 49% are LE referenced – voltages are measured using the left ear lobe as a reference. The features calculated from these two types of recordings could have a systematic bias that can have a substantial negative impact on our machine learning algorithms. To further probe this issue, we performed two types of statistical analyses:  Simple descriptive statistical measures, such as mean and variance, that form the basis for our feature extraction process, were analyzed.  More advanced measures based on Principle Component Analysis (PCA) were employed to analyze the distribution of the variance in the data when varying its dimensionality. College of Engineering Temple University Mean and Variance Normalization In mean and variance normalization, features are transformed to have zero mean and unit variance. This process is important for a number of reasons. Features with unit variance along each dimension have an identity covariance matrix, meaning that each feature is statistically independent. This makes factoring a joint probability distribution simpler. Mean and variance normalization is an important step to take before performing PCA. Since PCA depends on the magnitude of variances in data, feature scaling reduces the effects of biasing towards larger features. Similarly, in processes like gradient descent, larger features will cause certain weights to update faster than others. In summary, mean and variance normalization of features is an important step in machine learning that solves many problems that could arise during training, and generally leads to improved performance. The current baseline system uses a single global whitening transformation on the features. This portion of the experiment is currently ongoing. Mean and Variance Normalization In mean and variance normalization, features are transformed to have zero mean and unit variance. This process is important for a number of reasons. Features with unit variance along each dimension have an identity covariance matrix, meaning that each feature is statistically independent. This makes factoring a joint probability distribution simpler. Mean and variance normalization is an important step to take before performing PCA. Since PCA depends on the magnitude of variances in data, feature scaling reduces the effects of biasing towards larger features. Similarly, in processes like gradient descent, larger features will cause certain weights to update faster than others. In summary, mean and variance normalization of features is an important step in machine learning that solves many problems that could arise during training, and generally leads to improved performance. The current baseline system uses a single global whitening transformation on the features. This portion of the experiment is currently ongoing. Summary Systematic biases such as those found in the difference in mean and variance for the REF an LE classes can cause incompatibilities between features generated by the two. Normalization can be used to overcome incompatibilities such as this, and generally improves performance. This analysis was performed for only the first channel of data, meaning one location on the scalp. This poses a problem because channel features could change depending on how near or far the channel electrodes are from the reference. To obtain more conclusive results, the analysis will be repeated for several other channels to measure the degree of channel mismatch for each class. Acknowledgements This research was funded by the Temple University Honors Program’s Merit Scholar Stipend Program. Summary Systematic biases such as those found in the difference in mean and variance for the REF an LE classes can cause incompatibilities between features generated by the two. Normalization can be used to overcome incompatibilities such as this, and generally improves performance. This analysis was performed for only the first channel of data, meaning one location on the scalp. This poses a problem because channel features could change depending on how near or far the channel electrodes are from the reference. To obtain more conclusive results, the analysis will be repeated for several other channels to measure the degree of channel mismatch for each class. Acknowledgements This research was funded by the Temple University Honors Program’s Merit Scholar Stipend Program. Feature Extraction Feature extraction generates a vector of measurements 10 times a second. A 200 msec analysis window is used. For each frame, 9 basic features are computed:  frequency domain energy (E f )  7 cepstral coefficients (c 1 -c 7 )  differential energy (E d ) From these base features, derivative and second derivative terms are calculated, excluding the second derivative of the differential energy term. Any systematic biases in the statistics can be amplified by the differentiation process. The feature extraction process could result in features which are incompatible across the two reference classes. If so, normalization techniques must be use. The goal of this study is to investigate the need for such normalization techniques. Feature Extraction Feature extraction generates a vector of measurements 10 times a second. A 200 msec analysis window is used. For each frame, 9 basic features are computed:  frequency domain energy (E f )  7 cepstral coefficients (c 1 -c 7 )  differential energy (E d ) From these base features, derivative and second derivative terms are calculated, excluding the second derivative of the differential energy term. Any systematic biases in the statistics can be amplified by the differentiation process. The feature extraction process could result in features which are incompatible across the two reference classes. If so, normalization techniques must be use. The goal of this study is to investigate the need for such normalization techniques. Methodology Descriptive Statistics:  The mean and variance of each element of the feature vector were calculated for both subsets of the data to determine if there were systematic biases in the vectors.  These were also compared to the global mean of the pooled data to determine the significance and direction of the bias. PCA Analysis:  The covariance of this data was calculated, resulting in a 9x9 covariance matrix. The eigenvectors and eigenvalues of this matrix were calculated.  Our baseline technology, which uses hidden Markov models, typically assume the features are uncorrelated and models only the diagonal elements of this covariance matrix. This is known as variance normalization.  A major goal of this study was to asses the extent to which the features are correlated and cross- correlated, so that we can assess the need for something more sophisticated than a simple variance normalization approach. Methodology Descriptive Statistics:  The mean and variance of each element of the feature vector were calculated for both subsets of the data to determine if there were systematic biases in the vectors.  These were also compared to the global mean of the pooled data to determine the significance and direction of the bias. PCA Analysis:  The covariance of this data was calculated, resulting in a 9x9 covariance matrix. The eigenvectors and eigenvalues of this matrix were calculated.  Our baseline technology, which uses hidden Markov models, typically assume the features are uncorrelated and models only the diagonal elements of this covariance matrix. This is known as variance normalization.  A major goal of this study was to asses the extent to which the features are correlated and cross- correlated, so that we can assess the need for something more sophisticated than a simple variance normalization approach. Feature MeanVariance REFLEREFLE EfEf 13.433215.11667.407837.5394 c1c1 1.94003.06931.94881.6702 c2c2 0.89391.48360.91120.8668 c3c3 0.35810.39240.26360.2273 c4c4 0.0055-0.12240.13720.1692 c5c5 0.0287-0.05440.05430.0553 c6c6 -0.04350.00220.02750.0390 c7c7 0.04120.07430.01850.0260 EdEd 2.62911.71414.09413.4898 Figure 1. The International 10-20 electrode placement system Table 1. The mean and variance of the base features. www.nedcdata.org Figure 3. Variance as a function of magnitude of the eigenvalues for the REF and LE classes Figure 2. Frame-based feature extraction Figure 4. Covariance matrix eigenvector components


Download ppt "Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for."

Similar presentations


Ads by Google