Presentation is loading. Please wait.

Presentation is loading. Please wait.

CRF &SVM in Medication Extraction

Similar presentations


Presentation on theme: "CRF &SVM in Medication Extraction"— Presentation transcript:

1 CRF &SVM in Medication Extraction
A Cascade Approach to Extracting Medication Events Dongfang Xu School of Information

2 Outline Methodology Results & Discussion Conclusion
Extraction Definition Data Preparation System Architecture Results & Discussion NER(CRF) Experiment Relationship Classification (SVM) CONTEXT Engine Evaluation Conclusion

3 Extraction Definition
Extract the following information(called field) on Medication experienced by the patient from discharge summary: Medications (m): names, brand names, generics, and collective names of prescription substances, over the counter medications, and other biological substances Dosages (do): indicating the amount of a medication Modes (mo): indicating the route for administering the medication Frequencies (f): indicating how often each dose of the medication should be taken. Durations (du): indicating how long the medication is to be administered. Reasons (r): stating the medical reason for which the medication is given. List/narrative (ln): indicating whether the medication information appears in a list structure or in narrative running text in the discharge summary.

4 Data Preparation 160 Discharge Summaries
Training Data Testing Data Annotated by Physician, revised by the researcher 130 30 The annotation process took approximately 1.5 hours per record due to the length of clinical records.

5 Outline Methodology Results & Discussion Conclusion
Extraction Definition Data Preparation System Architecture Results & Discussion NER(CRF) Experiment Relationship Classification (SVM) CONTEXT Engine Evaluation Conclusion

6 System Architecture Basic strategy for medication event extraction
Use CRF to identify the entities, including medication, dosage, frequency, mode, etc. Build pairs for each medication relationship (only consider drug and its related entity). Classify the binary relationship by SVM Generate medication entries based on the results from the CRF and SVM.

7 System Architecture Basic strategy for medication event extraction

8 System Architecture CRF feature builder
7 feature sets were built for CRF, including drug, dosage, mode, frequency, duration, reason, morphology. Many other features were also used: the medical category for each word, whether the word is capitalized , etc. Use backward elimination to get the best useful feature sets. The context window for the CRF was set to be five words.

9 System Architecture Basic strategy for medication event extraction

10 System Architecture SVM Convertor converts CRF results into SVM input:
Unigram Sentences Each pair of medication elements at the unigram sentence level is used to build an SVM training record. Sentence Pairs MEDICATION and its REASON could be across two sentences. Like the mechanism to generate the unigram sentence input, medication pairs are also built at the sentence pair level.

11 System Architecture SVM Features built based on the input data:
1. Three words before and after the first entity. 2. Three words before and after the second entity 3. Words between the two entities. 4. Words inside of each entity. 5. The types of the two entities determined by the CRF classifier. 6. The entities types between the two entities.

12 System Architecture Basic strategy for medication event extraction

13 System Architecture Context Engine Medication Entry Generation
The CONTEXT engine identifies the medication en-try under the special section headings, such as “MEDICATIONS ON ADMISSION:”, “DISCHARGE MEDICATIONS:” etc., or in the narrative part of the clinical record. Medication Entry Generation The results from the previous steps are used here, namely CRF, SVM and CONTEXT Engine.

14 Outline Methodology Results & Discussion Extraction Definition
Data Preparation System Architecture Results & Discussion NER(CRF) Experiment Relationship Classification (SVM) CONTEXT Engine Evaluation Final output evaluation

15 NER (CRF) results The comparison of the performance for exact match by using the 7 feature sets and bag of words feature sets (baseline, in bracket).

16 NER (CRF) results The F scores for Duration and Reason by using the 7 features sets are approximately 10% higher than baseline, because: The frequencies for the REASON and DURATION are much smaller than the other four entity types. For the DURATION entities, the rule based regular expression can match other non-medication terms (low precision). Some DURATION terms that can’t be discovered by our rules (low recall). REASON extraction depends highly on the Finding category in SNOMED CT and the performance of TTSCT (Patrick et al. 2007).

17 NER (CRF) results The F scores for Mode, Dosage and Frequency were improved by 2%. These errors come from: 1. Misspelling of drug names, such as “nitrog-lycerin” . 2. Drug names used in other contexts, such as the “coumadin” in the “Coumadin Clinic” phrase. 3. The drug allergies detector cannot cover all situations.

18 Relation Classification
The comparison of the performance for relation classification by using the all feature sets and subset feature sets (1,2,4; baseline, in bracket) in SVM. The baseline F-score for the HAS RELATIONSHIP set of the unigram sentence level is 70.38% and 95.20% in the NO RELATIONSHIP set. The difference can be attributed to the fact that the total number of the NO RELATIONSHIP set is 7 times larger than the HAS RELATIONSHIP set. high performance is achieved in which the F-score for the “has relation” set of the unigram sen-tence level is 98.39%, while 96.47% is achieved in the bigram sentence level indicating little if any systematic errors.

19 CONTEXT Engine Evaluation
The CONTEXT engine was adopted to discover the span of the medication list (the span between the medication heading and the next following heading).

20 Final Output Evaluation
Due to the errors in the NER, Relationship Classification and Medication Entry Generator, the final F-scores for each entity type are lower than in the NER processing. The final scores for the medication event are between 86.23%. The identification of medication is low, which cause the lower relationship identification among other medication events. The frequency of appearance of multiple REASONs is relatively high, and the multiple REASONs should be used to construct multiple medication entries in the gold standard. In this way, the loss in REASON recognition would lead to the decrease in recall of all other entity types and the medication event.

21 Thank you!


Download ppt "CRF &SVM in Medication Extraction"

Similar presentations


Ads by Google