Presentation is loading. Please wait.

Presentation is loading. Please wait.

Results from Mean and Variance Calculations The overall mean of the data for all features was 2.2290 for the REF class and 2.4612 for the LE class. The.

Similar presentations


Presentation on theme: "Results from Mean and Variance Calculations The overall mean of the data for all features was 2.2290 for the REF class and 2.4612 for the LE class. The."— Presentation transcript:

1 Results from Mean and Variance Calculations The overall mean of the data for all features was 2.2290 for the REF class and 2.4612 for the LE class. The overall variance of the data for all features was 19.1401 for the REF class and 26.1208 for the LE class These statistics were calculated only for the first channel’s worth of features. Results from Mean and Variance Calculations The overall mean of the data for all features was 2.2290 for the REF class and 2.4612 for the LE class. The overall variance of the data for all features was 19.1401 for the REF class and 26.1208 for the LE class These statistics were calculated only for the first channel’s worth of features. Abstract The Neural Engineering Data Consortium (NEDC) is researching seizure detection through the analysis of electroencephalogram (EEG) signals. The most significant body of data used by NEDC is the Temple University Hospital electroencephalogram (TUH EEG) corpus, which contains over 28,00 EEG records. There are many forms of inconsistency in the TUH corpus, one of which is that about 51% of the recordings are REF referenced, while the other 49% are LE referenced. It is unclear whether the difference in reference will have an affect on any statistical machine learning algorithms trained using the data. Several forms of statistical analysis on the features extracted from these signals suggest that the two types of referencing are compatible, but more testing needs to be done before a conclusion can be made. Abstract The Neural Engineering Data Consortium (NEDC) is researching seizure detection through the analysis of electroencephalogram (EEG) signals. The most significant body of data used by NEDC is the Temple University Hospital electroencephalogram (TUH EEG) corpus, which contains over 28,00 EEG records. There are many forms of inconsistency in the TUH corpus, one of which is that about 51% of the recordings are REF referenced, while the other 49% are LE referenced. It is unclear whether the difference in reference will have an affect on any statistical machine learning algorithms trained using the data. Several forms of statistical analysis on the features extracted from these signals suggest that the two types of referencing are compatible, but more testing needs to be done before a conclusion can be made. Statistical Comparison of REF vs. LE referenced EEG signals Aaron Gross, Silvia Lopez, Dr. Iyad Obeid and Dr. Joseph Picone The Neural Engineering Data Consortium, Temple University Introduction Since EEG signals are fundamentally voltages, each channel needs to be measured with respect to some common reference voltage in order for a meaningful comparison of the channels to be done. Some of the EEG recordings in the TUH corpus are REF referenced, others are LE referenced. LE signals are referenced via an electrode on the left ear, but REF is a vague descriptor that could mean any number of referencing schemes. The difference in referencing scheme could have a significant effect on the features extracted from the signals, and consequently on the machine learning algorithms trained on those features. There are several avenues through which to compare the two different classes of signals. In this case, both simple statistical methods, such as taking the mean and variance, as well as the more complicated process of principle component analysis (PCA) were employed. Introduction Since EEG signals are fundamentally voltages, each channel needs to be measured with respect to some common reference voltage in order for a meaningful comparison of the channels to be done. Some of the EEG recordings in the TUH corpus are REF referenced, others are LE referenced. LE signals are referenced via an electrode on the left ear, but REF is a vague descriptor that could mean any number of referencing schemes. The difference in referencing scheme could have a significant effect on the features extracted from the signals, and consequently on the machine learning algorithms trained on those features. There are several avenues through which to compare the two different classes of signals. In this case, both simple statistical methods, such as taking the mean and variance, as well as the more complicated process of principle component analysis (PCA) were employed. College of Engineering Temple University Results from PCA [ These figures are placeholders, showing the correct results, but need to be remade so the text is more visible.] The percent variance plots for both classes are very similar. Results from PCA [ These figures are placeholders, showing the correct results, but need to be remade so the text is more visible.] The percent variance plots for both classes are very similar. Summary The overall variance of the LE class is much higher than the REF class, although most of this variance seems to arise in the energy feature. The means of both the REF and LE classes are very similar for each feature, as well as being close overall. The percent variance curves obtained from performing PCA look the same. These curves were also obtained after performing mean-variance normalization on the features, indicating that this step may make the REF and LE features functionally the same. This analysis was performed for only the first channel of data, meaning one location on the scalp. To obtain more conclusive results, the analysis will be repeated for several other channels. Acknowledgements Thanks to the Temple University Honors Program, which funded research via the Merit Stipend Summer Scholar program. Summary The overall variance of the LE class is much higher than the REF class, although most of this variance seems to arise in the energy feature. The means of both the REF and LE classes are very similar for each feature, as well as being close overall. The percent variance curves obtained from performing PCA look the same. These curves were also obtained after performing mean-variance normalization on the features, indicating that this step may make the REF and LE features functionally the same. This analysis was performed for only the first channel of data, meaning one location on the scalp. To obtain more conclusive results, the analysis will be repeated for several other channels. Acknowledgements Thanks to the Temple University Honors Program, which funded research via the Merit Stipend Summer Scholar program. Principle Component Analysis Principle Component Analysis is a method by which higher dimensional clusters of points can be projected onto a lower dimensional space, preserving components of the vectors accounting for most of the variance in the data. PCA is performed by calculating the covariance matrix of a set of vectors, and using the Eigen vectors corresponding to the largest Eigen values to perform a linear transformation on the data and then project it into an n dimensional space. The magnitude of the Eigen Values of this matrix are proportional to the percent variance in the data accounted for by the corresponding Eigen vector. Thus, the distribution of the variance of the data over the Eigen vectors can be plotted for multiple dimensions to examine how the distribution of the data varies. Principle Component Analysis Principle Component Analysis is a method by which higher dimensional clusters of points can be projected onto a lower dimensional space, preserving components of the vectors accounting for most of the variance in the data. PCA is performed by calculating the covariance matrix of a set of vectors, and using the Eigen vectors corresponding to the largest Eigen values to perform a linear transformation on the data and then project it into an n dimensional space. The magnitude of the Eigen Values of this matrix are proportional to the percent variance in the data accounted for by the corresponding Eigen vector. Thus, the distribution of the variance of the data over the Eigen vectors can be plotted for multiple dimensions to examine how the distribution of the data varies. Methodology For the mean and variance calculations: For each class of EEG, a super vector was formed by vertically concatenating the 400 th through 1000th rows in the first nine elements of the feature vectors, creating a long vector with nine columns. Next, the mean of each column was found, as was the variance of each column. Finally, the overall mean and variance were found. For the PCA analysis: For each class of EEG, a super vector was formed by horizontally concatenating the 400 th through 1000 th observations in the first nine elements of the feature vectors, creating a 600 x (9*n) matrix, where n is the number of files of each class. PCA was then used to reduce the dimensionality of the data for each class. The final dimension count was varied between 20 and 220 in increments of 50. Once the number of dimensions was reduced, the percent variance as a function of the Eigen values of the covariance matrix was plotted. Methodology For the mean and variance calculations: For each class of EEG, a super vector was formed by vertically concatenating the 400 th through 1000th rows in the first nine elements of the feature vectors, creating a long vector with nine columns. Next, the mean of each column was found, as was the variance of each column. Finally, the overall mean and variance were found. For the PCA analysis: For each class of EEG, a super vector was formed by horizontally concatenating the 400 th through 1000 th observations in the first nine elements of the feature vectors, creating a 600 x (9*n) matrix, where n is the number of files of each class. PCA was then used to reduce the dimensionality of the data for each class. The final dimension count was varied between 20 and 220 in increments of 50. Once the number of dimensions was reduced, the percent variance as a function of the Eigen values of the covariance matrix was plotted. REFLE Energy13.80415.402 Mfcc12.1813.099 Mfcc20.9041.516 Mfcc30.3300.399 Mfcc4-0.015-0.134 Mfcc50.0200-0.063 Mfcc6-0.0390.009 Mfcc70.0450.075 Figure 5. Cumulative Percent Variance as a function of the Eigen Value number for varying dimensions of REF class features Figure 6. The same as Figure 3, but for LE class features. Figure 1. A simplified electrode placement diagram, showing an LE reference system. Figure 2. Eigen Vectors of the covariance matrix for a set of points in two dimensional space. REFLE Energy6.80627.401 Mfcc11.8491.721 Mfcc20.8750.898 Mfcc30.2620.224 Mfcc40.1400.174 Mfcc50.054 Mfcc60.0280.043 Mfcc70.0190.026 Figure 3. The mean of the first eight features for both classes. Figure 4. The variance of the first eight features for both classes. www.isip.piconepress.com


Download ppt "Results from Mean and Variance Calculations The overall mean of the data for all features was 2.2290 for the REF class and 2.4612 for the LE class. The."

Similar presentations


Ads by Google