THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium.

Slides:



Advertisements
Similar presentations
M. Tabrizi: Seizure Detection May, A COMPARATIVE ANALYSIS OF NONLINEAR FEATURES FOR AN HMM-BASED SEIZURE DETECTION SYSTEM Masih Tabrizi and Joseph.
Advertisements

1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
Abstract Electrical activity in the cortex can be recorded by surface electrodes. Electro Encephalography (EEG) machine records potential difference between.
Manual Interpretation of EEGs: A Machine Learning Perspective Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data Consortium College.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Get Involved Download and use the data Take the survey Join the user group Neural Engineering Data Consortium Iyad Obeid PhD, Joseph Picone PhDTemple University,
SHELLY GUFFEY MAKING THE MOST OF YOUR REVENUE CYCLE MANAGEMENT TECHNOLOGY
Area 4 SHARP Face-to-Face Conference Phenotyping Team – Centerphase Project Assessing the Value of Phenotyping Algorithms June 30, 2011.
Automated Anomaly Detection, Data Validation and Correction for Environmental Sensors using Statistical Machine Learning Techniques
Take the Survey! Big data needs? How could membership benefit you? Automatic Interpretation of EEGs Statistics Acknowledgements DARPA/MTO (D13AP00065)
Corpus Development EEG signal files and reports had to be manually paired, de-identified and annotated: Corpus Development EEG signal files and reports.
Abstract EEGs, which record electrical activity on the scalp using an array of electrodes, are routinely used in clinical settings to.
Automatic Labeling of EEGs Using Deep Learning M. Golmohammadi, A. Harati, S. Lopez I. Obeid and J. Picone Neural Engineering Data Consortium College of.
Decision Support for Quality Improvement
Big Data in Biomedical Engineering Iyad Obeid, PhD November 7, 2014 Temple University Neural Engineering Data Consortium
Data Processing Machine Learning Algorithm The data is processed by machine algorithms based on hidden Markov models and deep learning. They are then utilized.
Analysis of Temporal Lobe Paroxysmal Events Using Independent Component Analysis Jonathan J. Halford MD Department of Neuroscience, Medical University.
Abstract The emergence of big data and deep learning is enabling the ability to automatically learn how to interpret EEGs from a big data archive. The.
© 2003 East Collaborative e ast COLLABORATIVE ® eC SoftwareProducts TrackeCHealth.
Acknowledgements This research was also supported by the Brazil Scientific Mobility Program (BSMP) and the Institute of International Education (IIE).
The Neural Engineering Data Consortium: ‘Déjà Vu All Over Again’ Iyad Obeid, Associate Professor Joseph Picone, Professor Department of Electrical and.
THE TUH EEG CORPUS: The Largest Open Source Clinical EEG Corpus Iyad Obeid and Joseph Picone Neural Engineering Data Consortium Temple University Philadelphia,
THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium.
Why Big Data is Crucial Overall progress in the field is not commensurate with the scope of investment. The existence of massive corpora has proven to.
Manual Interpretation of EEGs: A Machine Learning Perspective Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data Consortium College.
Exploration of Instantaneous Amplitude and Frequency Features for Epileptic Seizure Prediction Ning Wang and Michael R. Lyu Dept. of Computer Science and.
Data Acquisition An EEG measurement represents a difference between the voltages at two electrodes. The signal is usually displayed using a montage which.
The Goal Use computers to aid physicians in diagnosis of neurological diseases, particularly epilepsy Detect pathological events in real time Currently,
TUH EEG Corpus Data Analysis 38,437 files from the Corpus were analyzed. 3,738 of these EEGs do not contain the proper channel assignments specified in.
Big Mechanism for Processing EEG Clinical Information on Big Data Aim 1: Automatically Recognize and Time-Align Events in EEG Signals Aim 2: Automatically.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Automated Interpretation of EEGs: Integrating Temporal and Spectral Modeling Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data.
Abstract Automatic detection of sleep state is important to enhance the quick diagnostic of sleep conditions. The analysis of EEGs is a difficult time-consuming.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
Demonstration A Python-based user interface: Waveform and spectrogram views are supported. User-configurable montages and filtering. Scrolling by time.
Feature Extraction Find best Alignment between primitives and data Found Alignment? TUH EEG Corpus Supervised Learning Process Reestimate Parameters Recall.
Market: Customer Survey: 57 clinicians from academic medical centers and community hospitals, and 44 industry professionals. Primary Customer Need: 70%
Data Analysis Generation of the corpus statistics was accomplished through the analysis of information contained in the EDF headers. Figure 4 shows some.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Abstract Automatic detection of sleep state is an important queue in accurate detection of sleep conditions. The analysis of EEGs is a difficult time-consuming.
Improved EEG Event Classification Using Differential Energy A.Harati, M. Golmohammadi, S. Lopez, I. Obeid and J. Picone Neural Engineering Data Consortium.
The Royal College of Emergency Medicine The Royal College of Emergency Medicine Clinical Audits Initial management of the fitting child Clinical Audit.
The Neural Engineering Data Consortium Mission: To focus the research community on a progression of research questions and to generate massive data sets.
Constructing Multiple Views The architecture of the window was altered to accommodate switching between waveform, spectrogram, energy, and all combinations.
Constructing Multiple Views The architecture of the window was altered to accommodate switching between waveform, spectrogram, energy, and all combinations.
Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.
Scalable EEG interpretation using Deep Learning and Schema Descriptors
BioSignal Analytics Inc.
Automated Identification of Abnormal Adult EEG
CLASSIFICATION OF SLEEP EVENTS IN EEG’S USING HIDDEN MARKOV MODELS
G. Suarez, J. Soares, S. Lopez, I. Obeid and J. Picone
Enhanced Visualizations for Improved Real-Time EEG Monitoring
THE TUH EEG SEIZURE CORPUS
Enhanced Visualizations for Improved Real-Time EEG Monitoring
N. Capp, E. Krome, I. Obeid and J. Picone
Optimizing Channel Selection for Seizure Detection
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Big Data Resources for EEGs: Enabling Deep Learning Research
To learn more, visit The Neural Engineering Data Consortium Mission: To focus the research community on a progression of research questions.
AN ANALYSIS OF TWO COMMON REFERENCE POINTS FOR EEGS
E. von Weltin, T. Ahsan, V. Shah, D. Jamshed, M. Golmohammadi, I
Improved EEG Event Classification Using Differential Energy
Automatic Interpretation of EEGs for Clinical Decision Support
feature extraction methods for EEG EVENT DETECTION
EEG Event Classification Using Deep Learning
A Dissertation Proposal by: Vinit Shah
Deep Residual Learning for Automatic Seizure Detection
EEG Event Classification Using Deep Learning
Presentation transcript:

THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium Temple University M. P. Jacobson, M.D. and S. Tobochnik Department of Neurology, Lewis Katz School of Medicine Temple University

S. Lopez: Automatic Interpretation of EEGs December 13, A technician administers a 30−minute recording session. An EEG specialist (neurologist) interprets the EEG. An EEG report is generated with the diagnosis. Patient is billed once the report is coded and signed off. Manual Interpretation of EEGs

S. Lopez: Automatic Interpretation of EEGs December 13, Automatic Interpretation  Machine learning is used to map signals to event and epoch labels.  Algorithms typically require “truth-marked” data for supervised learning.  Such data is very difficult to create for clinical applications.

S. Lopez: Automatic Interpretation of EEGs December 13, EEG Reports Two Types of Reports:  Preliminary Report: contains a summary diagnosis (usually in a spreadsheet format).  EEG Report: the final “signed off” report that triggers billing. Inconsistent Report Formats:  The format of reporting has changed several times over the past 12 years. Report Databases:  MedQuist (MS Word.rtf)  Alpha (OCR’ed.pdf)  EPIC (text)  Physician’s  Hardcopies (OCR’ed pdf)

S. Lopez: Automatic Interpretation of EEGs December 13, The TUH EEG Corpus Number of Sessions: 25,000+ Number of Patients: ~15,000 Frequent Flyer: 42 sessions Age Range (Years): 16 to 90+ Sampling: Rates : 250, 256 or 512 Hz Resolution: 16 bits Data Format: European Data Format (EDF) Number of Channels: Variable Variations in channels and electrode labels are very real challenges Number of channels ranges from [28, 129] (one annotation channel per EDF file) Over 90% of the alternate channel assignments can be mapped to the standard configuration.

S. Lopez: Automatic Interpretation of EEGs December 13, The TUH EEG Corpus Corpus is growing at a rate of about 2,750 EEGs per year. Two general types of EEGs:  Short-term: 20 to 30 minutes  Long-term: 18 to 36 hours In 2014, more 40-minute EEGs are being administered. A sample EDF header. Data has been carefully deidentified (e.g., removal of medical record number, patient name and exact birthdate) “Pruned EEGs” are being used.

S. Lopez: Automatic Interpretation of EEGs December 13, Manual Annotations Epileptiforms: 1)SPSW: spike and sharp wave 2)GPED: generalized periodic epileptiform discharges and triphasic 3)PLED: periodic lateralized epileptiform discharges Background: 4)ARTF: Artifact 5)EYBL: Eye Blink 6)BCKG: Background

S. Lopez: Automatic Interpretation of EEGs December 13, Two-Level Machine Learning Architecture Feature Extraction Sequential Modeler Post Processor Epoch Label Epoch Temporal and Spatial Context Hidden Markov Models Finite State Machine

S. Lopez: Automatic Interpretation of EEGs December 13, Unsupervised Training Through Active Learning Active Learning: Seed models with a small amount of transcribed data using reports that clearly indicate the existence of the desired events. Classify the data. Train models based on generated labels. Select high confidence data and iterate.

S. Lopez: Automatic Interpretation of EEGs December 13, Performance on TUH EEG Correct recognitions for the three primary event classes (SPSW, PLED, and GPED) are above 40% though misrecognitions are also about 40%. To be relevant for clinical use it is not necessary to detect every spike correctly. A high false alarm rate is of great concern. A confusion matrix for the HMM-based system on the evaluation data: Detections and false alarms can be adjusted using confidence measures: The same baseline technology provides state of the art results on epileptic seizure detection (CHB-MIT). However, that technology performs extremely poorly on TUH EEG. Performance goal: 95% detection and 5% false alarm.

S. Lopez: Automatic Interpretation of EEGs December 13, Analysis of Performance Bayesian problem: an extremely small percentage of the data are SPSW, yet this class is crucial to good clinical performance. Traditional Bayesian techniques choose to ignore SPSW.

S. Lopez: Automatic Interpretation of EEGs December 13, The TUH EEG Corpus:  Represents a unique opportunity to advance EEG analysis using state of the art machine learning.  Under development for two years, with an initial release in February  The official release will be done in phases during 1Q 2015, with a maintenance release expected in Summer  See for more details. Machine learning results using unsupervised training are promising:  Baseline performance of a two-level classification system using sequential decoding for event detection are promising: 70% DET / 7% FA.  More sophisticated systems are under development and delivering much higher performance, approaching the performance needed to be clinically relevant.  High performance system can run hyper real-time (e.g., 100 times faster than real-time). Summary

S. Lopez: Automatic Interpretation of EEGs December 13, Brief Bibliography [1] Strayhorn, D. (2014). The Atlas of Adult Electroencephalography. EEG Atlas Online. Retrieved January 18, [2]Tatum, W., Husain, A., Benbadis, S., & Kaplan, P. (2007). Handbook of EEG Interpretation. (Kirsch, Ed.) (p. 276). New York City, New York, USA: Demos Medical Publishing (available online at Brainmasters Technologies Inc.). [3]D. Wulsin, Bayesian Nonparametric Modeling of Epileptic Events, University of Pennsylvania, [4]S. I. Choi, I. Obeid, M. Jacobson, and J. Picone, “The Temple University Hospital EEG Corpus,” The Neural Engineering Data Consortium, College of Eng., Temple Univ., [Online]. Available: isip.piconepress.com/projects/tuh_eeg. [Accessed: 06-Jan-2013]. [5]D. Wulsin, J. Blanco, R. Mani, and B. Litt, “Semi-Supervised Anomaly Detection for EEG Waveforms Using Deep Belief Nets,” in International Conference on Machine Learning and Applications (ICMLA), 2010, pp. 436–441. [6]J. Picone, “Continuous speech recognition using hidden Markov models,” IEEE ASSP Magazine, vol. 7, no. 3, pp. 26–41, Jul [7]Shoeb, A. H., & Guttag, J. V. (2010). Application of machine learning to epileptic seizure detection. Proceedings of the International Conference on Machine Learning (ICML) (pp ). Haifa, Israel.

The Neural Engineering Data Consortium Mission: To focus the research community on a progression of research questions and to generate massive data sets used to address those questions. To broaden participation by making data available to research groups who have significant expertise but lack capacity for data generation. Impact: Big data resources enables application of state of the art machine-learning algorithms A common evaluation paradigm ensures consistent progress towards long-term research goals Publicly available data and performance baselines eliminate specious claims Technology can leverage advances in data collection to produce more robust solutions Expertise: Experimental design and instrumentation of bioengineering-related data collection Signal processing and noise reduction Preprocessing and preparation of data for distribution and research experimentation Automatic labeling, alignment and sorting of data Metadata extraction for enhancing machine learning applications for the data Statistical modeling, mining and automated interpretation of big data To learn more, visit

The Temple University Hospital EEG Corpus Synopsis: The world’s largest publicly available EEG corpus consisting of 20,000+ EEGs collected from 15,000 patients, collected over 12 years. Includes physician’s diagnoses and patient medical histories. Number of channels varies from 24 to 36. Signal data distributed in an EDF format. Impact: Sufficient data to support application of state of the art machine learning algorithms Patient medical histories, particularly drug treatments, supports statistical analysis of correlations between signals and treatments Historical archive also supports investigation of EEG changes over time for a given patient Enables the development of real-time monitoring Database Overview: 21,000+ EEGs collected at Temple University Hospital from 2002 to 2013 (an ongoing process) Recordings vary from 24 to 36 channels of signal data sampled at 250 Hz Patients range in age from 18 to 90 with an average of 1.4 EEGs per patient Data includes a test report generated by a technician, an impedance report and a physician’s report; data from 2009 forward inlcudes ICD-9 codes A total of 1.8 TBytes of data Personal information has been redacted Clinical history and medication history are included Physician notes are captured in three fields: description, impression and correlation fields.

Automated Interpretation of EEGs Goals: (1) To assist healthcare professionals in interpreting electroencephalography (EEG) tests, thereby improving the quality and efficiency of a physician’s diagnostic capabilities; (2) Provide a real-time alerting capability that addresses a critical gap in long-term monitoring technology. Impact: Patients and technicians will receive immediate feedback rather than waiting days or weeks for results Physicians receive decision-making support that reduces their time spent interpreting EEGs Medical students can be trained with the system and use search tools make it easy to view patient histories and comparable conditions in other patients Uniform diagnostic techniques can be developed Milestones: Develop an enhanced set of features based on temporal and spectral measures (1Q’2014) Statistical modeling of time-varying data sources in bioengineering using deep learning (2Q’2014) Label events at an accuracy of 95% measured on the held-out data from the TUH EEG Corpus (3Q’2014) Predict diagnoses with an F-score (a weighted average of precision and recall) of 0.95 (4Q’2014) Demonstrate a clinically-relevant system and assess the impact on physician workflow (4Q’2014)