Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ramon Maldonado, Travis Goodwin, Sanda M. Harabagiu

Similar presentations


Presentation on theme: "Ramon Maldonado, Travis Goodwin, Sanda M. Harabagiu"— Presentation transcript:

1 Active Deep Learning-Based Annotation of Electroencephalography Reports for Cohort Identification
Ramon Maldonado, Travis Goodwin, Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute travis, sanda}

2 There are no conflicts of interest
First, I’m going to introduce out method and why we’ve done it.

3 Outline Introduction The data Multi-task Active Deep Learning
Deep Learning Architectures Sampling Method Experimental Results Conclusion First, I’m going to introduce out method and why we’ve done it.

4 Introduction Clinical electroencephalography (EEG) is the most important investigation in the diagnosis and management of epilepsies. As more clinical EEG becomes available, the interpretation of EEG signals can be improved by providing neurologists with results of search for patients that exhibit similar EEG characteristics. MERCuRY (Multi-modal ElectroencephalogRam patient Cohort discoveRY) - Goodwin & Harabagiu (2016)1 for cohort identification - The EEG signal is complex, and thus its interpretation documented in EEG reports is known to have moderate inter-observer agreement. - Recently, Goodwin & Harabagiu (2016)1 have described the MERCuRY (Multi-modal ElectroencephalogRam patient Cohort discoveRY) system that operates on a multi-modal EEG index created by the automatic processing of both the EEG signals and the EEG reports that document and interpret the signals. - The MERCuRY system allows neurologists to search a vast data archive of clinical EEG signals and EEG reports, enabling them to discover patient populations relevant to specific queries.

5 Introduction QUERY: Patients taking topiramate (Topomax) with a diagnosis of headache and EEGs demonstrating sharp waves, spikes or spike/polyspike and wave activity EXAMPLE RECORD: CLINICAL HISTORY: Recently [seizure]PROB-free but with [episodes of light flashing in her peripheral vision]PROB followed by [blurry vision]PROB and [headaches] PROB MEDICATIONS: [Topomax]TR DESCRIPTION OF THE RECORD: There are also bursts of irregular, frontally predominant [sharply contoured delta activity]ACT, some of which seem to have an underlying [spike complex]ACT from the left mid-temporal region. The discovery of relevant patient cohorts satisfying the characteristics expressed in queries like this one relies on the ability to automatically and accurately recognize medical concepts both in the queries and throughout the collection of EEG reports. This example record is clearly relevant to the query because it shares key medical concepts with the query including Topomax Headache spikes In general, to determine which records are relevant to a query we can automatically annotate/detect medical concepts in both the query and records. As more EEG data becomes available, new deep learning techniques show promise for producing such annotations with high efficiency and accuracy. The context surrounding EEG activity mentions contains relevant, characteristic information pertaining to the EEG activities. Instead of annotating the entire span of relevant text, we encode much of the information in the form of attributes. i.e., [sharply contoured delta activity] has location=frontal

6 Introduction Active Learning has been proven to effectively reduce the amount of human annotation and validation needed when an efficient sampling mechanism is utilized because it selects, for validation, instances whose annotation will have the most impact on learning quality. In the work of Hahn et al. (2012)2, active-learning-based annotation operating on MEDLINE abstracts was used to identify medical concepts. However, in our work, in addition to annotating medical concepts in biomedical text, we annotate attributes of those concepts and annotate non-contiguous mentions of one type of medical concept (EEG Activities) using an annotation schema that captures the semantic richness of attributes of EEG Activities. - To automatically annotate these kind of medical concepts, we first need to manually annotate a subset of the reports to use as training data. - Because EEG Activities themselves are so complex, their representations in the text are equally complex.

7 Outline Introduction The data Multi-task Active Deep Learning
Deep Learning Architectures Sampling Method Experimental Results Conclusion

8 The Data EEG reports from Temple University Hospital (TUH) Sections:
25,000 reports from 15,000 patients collected over 12 years Sections: Clinical History: Lists past and current medical problems, symptoms, signs, and treatments as well as significant medical events. Medications Introduction: depiction of the techniques used for the EEG Description: a complete and objective description of the EEG, noting all observed activity, patterns, and events Impression: states whether the EEG test is normal or abnormal and, if abnormal, lists the abnormalities in order of importance Clinical Correlation: explains what the EEG findings mean in terms of clinical interpretation - American Clinical Neurophysiology Society Guidelines for writing EEG reports - Idiopathic generalized epilepsy

9 Outline Introduction The data Multi-task Active Deep Learning
Deep Learning Architectures Sampling Method Experimental Results Conclusion In order to both reduce the amount of manual annotation and to train deep learning systems capable of automatic annotation, we developed the multi-task active deep learning paradigm

10 Multi-task Active Deep Learning
The goal of the Multi-task Active Deep Learning (MTADL) paradigm is to concurrently perform multiple annotation tasks corresponding to the identification of: EEG Activities EEG Events Medical Problems Medical Treatments Medical Tests The relevant attributes for each medical concept type The Modality3 of each of the above medical concepts The Polarity3 of each of the above medical concepts Medical Concept Type EEG Activity attributes - EEG event: any extracerebral force that activates the EEG - i2b on evaluating temporal relations in medical text Modality: used to determine if a medical concept mentions have actually occurred, have possibly occurred, of are proposed to be occurring now in the future

11 Multi-task Active Deep Learning
The MTADL Paradigm consists of 5 steps: STEP 1: The development of an annotation schema STEP 2: Annotation of initial training data STEP 3: Design of deep learning methods capable of learning from the data STEP 4: Development of sampling methods for MTADL STEP 5: Usage of the Active Learning system involving: STEP 5.a: Accepting/Editing annotations of sampled examples STEP 5.b: Re-training the deep learning methods

12 MTADL – Annotation Schema
Medical Concept Annotation Schema Type Medical Problem Medical Treatment Medical Test EEG Event EEG Activity Modality Factual Possible Proposed Polarity Positive Negative

13 MTADL – Annotation Schema
EEG Activity Attributes Morphology: represents the type or “form” of EEG waves Rhythm Transient Single Wave Spike Sharp Wave Complex K-complex Polyspike complex Pattern PLED Suppression Frequency Band: alpha, beta, delta, theta, gamma Background: is the EEG activity in the background Magnitude: describes the amplitude of the EEG activity if it is emphasized Recurrence: describes how often the EEG activity occurs Dispersal: describes the spread of the activity over regions of the brain Hemisphere: describes which hemisphere of the brain the activity occurs in Brain Location: the region of the brain in which the activity occurs Recurrence: continuous, repeated, none Brain location (9): standard system of electrode placement

14 MTADL – Annotation Schema
When the patient relaxes and the eye blinks stop, there are frontally predominant generalized spike and wave discharges as well as polyspike and wave discharges at 4 to 4.5 Hz. “spike and wave discharges” Morphology: Spike and Slow Wave Complex Freq. Band: Theta Background: No Magnitude: Normal Recurrence: Repeated Dispersal: Generalized Hemisphere: n/a Brain Location: Frontal “polyspike and wave discharges” Polyspike and Slow Wave Complex Theta No Normal Repeated Generalized n/a Frontal If we wanted to annotate the entire span of text that describes each activity, we would end up annotating the same span for both activities in this example. Instead, we annotate the span of text corresponding to each activity’s morphology. We refer to these spans as EEG Activity Anchors. The rest of the information in encoded in the form of attribtues

15 Multi-task Active Deep Learning
From the full corpus of EEG Reports we randomly select a small subset and manually annotate medical concepts and their attributes. We use this initial training data to train two deep learning systems The first learns to detect EEG Activity anchors and the textual boundaries of the other medical concepts The second deep learning architecture learns to predict the attributes of each medical concept Once the two deep learners are trained, they are used to automatically annotate the entire corpus of EEG reports We then use our sampling method to select a subset of the automatically annotated reports and we manually validate and edit the automatic annotations in those reports. We introduce the newly validated documents into the training set and begin the active learning process anew by retraining the deep learning models In addition to the two deep learning architectures shown here, the sampling method used to select new documents for annotation is integral to achieving efficiency during active learning

16 Outline Introduction The data Multi-task Active Deep Learning
Deep Learning Architectures Sampling Method Experimental Results Conclusion

17 Deep Learning Architectures
Stacked Long Short-Term Memory6 (LSTM) network EEG Activity Anchors Medical Concept Boundaries Deep Rectified Linear Network7 (DRLN) EEG Activity attributes including modality and polarity Medical Concept type (EEG Event, medical problem, medical treatment, medical test), modality, and polarity LSTM – boundary detection, including anchors DRLN – attribute classification

18 Deep Learning Architectures – Stacked LSTM
Operates at the sentence level Assigns a label {I, O, B} to each token in the sentence occasional left anterior temporal sharp and slow wave complexes Token Features: Lemma of the token and previous/next tokens PoS of the token and previous/next tokens Phrase chunk of the token and the previous/next tokens Brown cluster5 of the token UMLS Concept Unique Identifier (cui) of UMLS concepts containing the token Title of the section containing the token Two Models Assigns a label to each token in the sentence, {i, o, b} corresponding to whether a token is inside, outside, or at the beginning of a medical concept mention

19 Deep Learning Architectures – Stacked LSTM
Updates a memory state that is shared throughout the network Each LSTM cell incorporates information about the current token and all previous tokens in the sentence. Softmax layer produces a probability distribution over the labels

20 Deep Learning Architectures – DRLN
Deep Rectified Linear Network for Attribute Classification Traditionally, attribute classification is performed by training a classifier, such as an SVM, to determine the value for each attribute. This approach would require training 18 separate attribute classifiers for EEG Activities and 3 classifiers for all other medical concepts. However, by leveraging the power of deep learning, we can simplify this task by learning one multi-task embedding – a low-dimensional vector representation of a medical concept – and use this representation to determine each attribute simultaneously with the same deep learning network. Traditionally, attribute classification is performed by training a classifier, such as an SVM, to determine the value for each attribute. This approach would require training 18 separate attribute classifiers for EEG Activities and 3 classifiers for all other medical concepts. However, by leveraging the power of deep learning, we can simplify this task by learning one multi-task embedding – a low-dimensional vector representation of a medical concept – and use this representation to determine each attribute simultaneously with the same deep learning network.

21 Deep Learning Architectures – DRLN
There are two Deep Rectified Linear Networks, one for EEG Activity attribute detection and one for attribute detection for the other types of medical concepts. Both networks pass a feature vector representing a medical concept through five fully connected rectified linear units to produce the multi-task embedding. The multi-task embedding is then passed to one softmax layer per attribute type to produce a probability distribution over that attribute’s values.

22 Deep Learning Architectures – DRLN
DRLN Features The text of medical concept mention itself The lemmas of each token in the medical concept mention The PoS of each token in the medical concept mention The lemmas of 3 tokens before/after the medical concept mention The title of the containing section Context Features: For each token, t, in the sentence: The syntactic dependency path to t. The number of words between the medical concept mention and t The number of “hops” in the syntactic dependency path from the head of the medical concept mention to t The number of medical concepts between the medical concept mention and t The features used by the DRLN are described in the paper, but it should be noted that several context features are used to encode information contained in the sentence that might pertain to the medical concept, but may not be near it.

23 Outline Introduction The data Multi-task Active Deep Learning
Deep Learning Architectures Sampling Method Experimental Results Conclusion

24 Sampling Method Rank Combination Protocol4: combine several single-task active learning selection decisions into one Usefulness rank The usefulness score 𝑠𝑋𝑗 (𝑑) of each un-validated EEG report 𝑑 is calculated with respect to each annotation task 𝑋j Each score is translated into a rank 𝑟Xj(𝑑) where higher usefulness means lower rank For each EEG report, we sum the ranks of each annotation task to get the overall rank, 𝑟(𝑑) All reports are sorted by this rank and the reports with lowest rank are selected for validation By combining the individual ranks for each annotation task, we are able to choose the documents that have the most usefulness for all the tasks as a whole. When choosing a new record to manually annotate, the sampling method we use must be able to incorporate information about each annotation task we are trying to do.

25 Sampling Method To calculate the usefulness score for a report with respect to an annotation task, we use the average Shannon entropy of each annotation for that task in the report. For example, to get the score for EEG Activity Anchor boundary detection, we average the Shannon entropy over each token in that document given by softmax layers of the stacked LSTM used for anchor detection.

26 Outline Introduction The data Multi-task Active Deep Learning
Deep Learning Architectures Sampling Method Experimental Results Conclusion

27 Experimental Results Boundary Detection Attribute Classification
EEG Activity Anchors Other Medical Concepts Precision, Recall, F1 Attribute Classification 10 attribute classes for EEG Activities 3 attribute classes for other medical concepts Precision, Recall, F1, Accuracy Active Learning Learning curve as active learning progresses F1 by active learning iteration - For both boundary detection and attribute classification, we use precision, recall and F1 measure which is a combination of both. - For attribute classification, we also report accuracy. - The learning curve reported for active learning shows the F1 measure of each task as a function of active learning iteration.

28 Experimental Results – Boundary Detection
The performance of the stacked LSTM models when automatically detecting anchors and boundaries EEG Activity Anchors Other Medical Concept Boundaries Measure Exact Partial Precision .8949 .9591 .9169 .9469 Recall .8125 .8228 .8797 .8831 F1 .8517 .8857 .8975 .9139 As we can see we are able to achieve an F1 score of when detecting the exact spans of EEG Activity anchors and an F1 score of on all other medical concept boundaries Both numbers increase if we relax the evaluation parameters to allow for partial matches as was done in the i2b shared task.

29 Experimental Results – Attribute Classification
Accuracy Precision Recall F1 Morphology 0.990 0.757 0.704 0.724 Hemisphere 0.924 0.775 0.754 0.762 Magnitude 0.909 0.806 0.710 0.750 Recurrence 0.831 0.739 0.731 Dispersal 0.871 0.733 0.751 Freq. Band 0.982 0.664 0.620 0.640 Background 0.960 0.890 0.820 0.854 Location 0.970 0.653 0.560 0.602 Modality 0.977 0.527 0.397 0.426 Polarity 0.741 0.816 Type 0.943 0.936 0.939 0.973 0.742 0.605 0.659 0.978 0.829 0.719 0.770 Here we see the experimental results for attribute classification. As we can see from the high accuracies, our method is able to accurately classify the attributes of medical concepts in the majority of cases. However, as we can see from the moderate F1 measures, that there is still work to be done. For instance, consider the morphology attribute with an accuracy of .99 but an F1 score of .724. This is due to the fact that there are 25 morphology classes, some of which are under-represented in the data, skewing the F1 score which is averaged among the classes. The performance of the DRLN models when automatically detecting attributes. The first ten rows correspond to EEG Activity attributes, the last three rows are attributes of the other four medical concept types.

30 Experimental Results – Active Learning
Here we see the learning curves shown for the first 100 EEG reports annotated, evaluated with F1 measure. Each curve shows a clear increase from the beginning to the end Interestingly, the first two iterations of active learning produce decreases in the performance of EEG Activity Anchor detection Anchors are spans of text corresponding to the morphology of the activity. Since the AL system selects documents it is most uncertain about, it is likely to hone in on document with activities with morphologies as yet unseen in the training data. This will cause the performance to drop since these new morphologies may be completely underrepresented in the rest of the training data. However, as active learning progresses, this is less and less of a problem, and performance increases. Learning Curves shown for the first 100 EEG reports annotated and evaluated with F1 measure.

31 Experimental Results - Discussion
Rare attribute values F1 score for morphology: 0.724 F1 score for morphology for classes with >=10 instances: 0.875 Future work may benefit from incorporating domain knowledge (Neurological Ontologies, general knowledge representations) Ungrammatical sentences “There are rare sharp transients noted in the record but without after going slow waves as would be expected in epileptiform sharp waves.” The annotations produces by MTADL enables the generation of EEG-specific qualified medical knowledge Graphical Representations Embedded knowledge graphs The largest problem brought to light by the evaluations is the difficulty out methods have predicting attribute values that are uncommon in the data

32 Outline Introduction The data Multi-task Active Deep Learning
Deep Learning Architectures Sampling Method Experimental Results Conclusion

33 Conclusion In this paper, we described a novel active learning annotation framework that operates on a large corpus of EEG Reports using two deep learning architectures. We devised an annotation schema capable of capturing the complexity and semantic richness of EEG activity mentions in the reports We designed two deep learning architectures to Discover the textual boundaries of medical concepts in the reports Perform multi-task attribute detection We used a sampling method that allows the MTADL system to incorporate information about each task into one active learning sampling decision The experimental evaluations have yielded promising results.

34 Acknowledgements Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health under award number 1U01HG The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

35 References Goodwin TR, Harabagiu SM. Multimodal Patient Cohort Identification from EEG Report and Signal Data. In: AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2016. Hahn U, Beisswanger E, Buyko E, Faessler E. Active Learning-Based Corpus Annotation—The PathoJen Experience. In: AMIA Annual Symposium Proceedings [Internet]. American Medical Informatics Association; 2012 [cited 2016 Sep 23]. p Available from: Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc JAMIA Sep;20(5):806–13. Reichart R, Tomanek K, Hahn U, Rappoport A. Multi-Task Active Learning for Linguistic Annotations. In: ACL [Internet] [cited 2016 Sep 22]. p. 861–9. Available from: Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC. Class-based n-gram models of natural language. Comput Linguist. 1992;18(4):467–79. Pascanu R, Gulcehre C, Cho K, Bengio Y. How to construct deep recurrent neural networks. ArXiv Prepr ArXiv [Internet] [cited 2016 Sep 22]; Available from: Glorot X, Bordes A, Bengio Y. Deep Sparse Rectifier Neural Networks. In: Aistats [Internet] [cited 2016 Sep 22]. p Available from:

36 Questions ???


Download ppt "Ramon Maldonado, Travis Goodwin, Sanda M. Harabagiu"

Similar presentations


Ads by Google