Presentation is loading. Please wait.

Presentation is loading. Please wait.

Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,

Similar presentations


Presentation on theme: "Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,"— Presentation transcript:

1 Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany, NY, USA 10.16.2012- Rochester, MN MCORES: a system for noun phrase coreference resolution for clinical records 2012 SHARPn Summit “Secondary Use”

2 Medical coreference resolution system (MCORES) Experimental results Conclusion Page 2

3 Electronic Medical Records (EMRs) – large information repositories Clinical information requires processing  Lower level: sentence parsing, tokenization  Higher level: coreference resolution, semantic disambiguation Coreference resolution: a fundamental step in text processing Page 3

4 English medical corpus provided by i2b2 National Center for Biomedical Computing  De-identified medical discharge summaries ▪ Source: PH & BIDMC ▪ Content: 230(PH) + 196(BIDMC) discharge summaries  Annotated concepts and coreference chains Concept types Page 4 Persons Problems Treatments Tests Pronouns

5 NP Instance Creation Feature Generation Classification Output Clustering Page 5

6 Markables of same semantic category are paired together MCORES creates positive instances only from neighboring markable pairs in a chain 1 Instance creation akin to McCharty and Lehnert Page 6

7 Page 7 Table 3: Distribution of coreferent and non-coreferent instances per semantic category over instances containing exact, partial, and no textual overlap.

8 Multi-perspective features  Antecedent perspective  Anaphor perspective  Greedy perspective  Stingy perspective Phrase-level lexical Sentence-level lexical Syntactic Semantic Miscellaneous Page 8

9 Phrase-level lexical Token overlap* Normalized token overlap Edit-distance Normalized edit-distance Sentence-level lexical Sentence-level token overlap* Filtered sentence-level token overlap* Left and right mention overlap  stingy and greedy perspectives only Page 9 * multi-perspective feature

10 Syntactic Number agreement Noun overlap* Surname match Semantic UMLS CUI overlap* UMLS CUI token overlap* UMLS semantic type overlap* Anaphor UMLS semantic type Page 10 * multi-perspective feature

11 Token distance Mention distance All-mention distance Sentence distance Section match Section distance Page 11

12 C4.5 decision tree algorithm  Flexible  Readable prediction model Classify pairs of markables based on values of the feature vectors Page 12

13 Classifier makes pairwise predictions only Pairwise predictions clustered into coference chains  Aggressive-merge 1 clustering algorithm prediction [M 1 ] - [M 2 ] all preceding pairwise predictions linked to [M 1 ]or [M 2 ] 1 Aggresive-merge algorithm proposed by McCarthy and Lehnert Page 13

14 Feature set evaluation Perspectives evaluation Performance evaluation against  In house baseline  Third party system (RECONCILE ACL09 & BART) Evaluation metric: unweighted averages of Recall, Precision, and F-measures of  MUC  B 3  CEAF  BLANC Page 14

15 Page 15

16 MCORES’ advantage comes from linking markables with no token overlap Phrase-level sub-MCORES performs similarly to MCORES Greedy perspective system is the most favorable single-perspective system Multi-perspective system performs as well or better than single-perspective systems Error analysis  MCORES fails to classify misspelled person pairs  Medical problems false positives due to difference between newly and recurring events  Treatments false positives due to medications presenting different routes of administration  Tests false positive due to the large number of full overlap instances that did not corefer Page 16

17 Developed coreference resolution system for the medical domain (MCORES) MCORES innovates through a multi-perspective and knowledge-based feature set MCORES outperforms third party systems and an in-house baseline, improving coreference resolution on clinical records Page 17


Download ppt "Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,"

Similar presentations


Ads by Google