Download presentation
Presentation is loading. Please wait.
Published byEunice Adams Modified over 9 years ago
1
Henk Harkema Andrea Setzer Ian Roberts Rob Gaizauskas Mark Hepple University of Sheffield Jeremy Rogers University of Manchester Richard Power Open University Extraction and Analysis of Information from Structured and Unstructured Clinical Records AHM 2005 Text Mining Workshop 29/9/5
2
2 Overview Background Information Extraction Information Integration
3
3 Background: CLEF Clinical e-Science Framework Objective: To develop a high quality, secure and interoperable information repository, derived from operational electronic patient records to enable ethical and user-friendly access to patient information in support of clinical care and biomedical research Duration, funding, participants: 2003 – 2005 (CLEF), 2005 – 2007 (CLEF-Services) Funded by Medical Research Council (MRC) Six universities, Royal Marsden Hospital, industrial partners engaged through CLEF Industrial Forum Meetings
4
4 Sheffield NLP & CLEF Information Extraction Analyzing clinical narratives to extract medically relevant entities and events, and their properties and relationships Information Importation Importing extracted information into the CLEF repository Information Integration Combining extracted information with structured information (i.e., non-narrative data) already in repository in order to build summary of patient’s conditions and treatment over time
5
5 Medical IE Standard Information Extraction tasks: Entity/event extraction & relationship extraction Additional challenges: Cross-document event co-reference Same event mentioned in multiple documents; many documents provide only partial descriptions of events Modality of Information Negation: “I cannot feel any lump in her right supraclavicular fossa” Uncertainty: “I just wonder if there is an outside possibility that she might have mediastinal fibrosis to account for her symptomology” Temporality of Information
6
6 Entities, Events & Relationships Entities, events: Problem: melanoma, swelling, … Present/absent Clinical course: getting worse, getting better, no change Intervention: amputation, chemotherapy, … Status: planned, booked, started, completed, … Investigation: CT scan, ultrasound, … Status: planned, booked, started, completed, … Goal: treat, cure, palliate Drug: Atenolol, antibiotics, … Locus: abdomen, blood, … Laterality: left, right
7
7 Entities, Events & Relationships Relationships: Location of problem: problem locus hip pain lesions in her liver Finding of investigation: investigation problem An ECG examination revealed atrial fibrillation CT scan of her thorax and abdomen shows progressive disease Target of intervention: intervention locus radiotherapy to back breast radiotherapy Further relationships
8
8 IE Approach Pipeline of processing modules Pre-processing: Tokenization, sentence splitting Lexical & terminological processing: Morphological analysis, term look-up, term parsing Syntactic & semantic processing: Sentence-based syntactic, semantic analysis Discourse processing & IE pattern application: Integration of semantic representations into discourse model Application of patterns to collect information to be extracted
9
9 Terminology Processing Termino: a large-scale terminological resource to support term processing for information extraction, retrieval, and navigation Termino contains a database holding large numbers of terms imported from various existing terminological resources, including UMLS Efficient recognition of terms in text is achieved through use of finite state recognizers compiled from contents of database The results of lexical look-up in Termino can feed into further term processing components, e.g., term parser
10
10 Terminology Processing Termino for CLEF Imported 160,000 terms from UMLS drawn from semantic types such as pharmacologic substances, anatomical structures, therapeutic procedures, diagnostic procedures, … Term grammars Rules for combining terms identified by term look-up in Termino into longer terms Example: locations in the lung Termino location_np latitude_adj area_noun latitude_adj:upper, middle, lower, mid, basal area_noun:zone, region, area, field, lung, lobe
11
11 Information Extraction Patterns IE patterns inspect syntactic and semantic analyses and assert properties of entities and relationships between entities Example: finding of investigation “CT scan of her thorax shows progressive disease” IE pattern: invest_finding(I, P) if investigation(I), problem(P), show_event(S), lsubj(S, I), lobj(S, P).
12
12 Information Extraction Patterns Finding patterns Hand-crafted patterns “Redundancy” approach: given a patient for whom a relationship between two particular entities is known to exist (e.g., we know patient has a tumor in his lung), … find all sentences in all notes of this patient that contain these two entities, … and assume these sentences express the same relationship
13
13 Information Integration Combining structured information in repository with information extracted from narratives into coherent overview of patient’s condition and treatment over time Issues in Information Integration: Ambiguity: given an event extracted from a narrative, to which event in the structured data does it correspond? Fragmentation & duplication: Information Extraction over narrative data produces collection of potentially fragmented and duplicated descriptions of medical events which need to be sorted out Investigation of contribution of temporal information found within narratives to Information Integration
14
14 Linking extracted and structured events Reduce ambiguity through use of: Medical information: type of event, relationships, … Temporal information: time stamps, temporal expressions, verbal tense & aspect, … Type: X-RAY Location: chest Date: 2000-05-23 Type: X-RAY Location: chest Date: 2000-05-26 Type: MRI Location: abdomen Date: 2000-05-23 Type: X-RAY Location: chest Date: 2000-07-19 1324 Chest X-RAY arranged for next week. 2000-05-16 The chest X-RAY performed … 2000-05-24 12 Events in structured data Events in narratives
15
15 Constraint Satisfaction Ambiguity reduction as a Constraint Satisfaction problem Each narrative event is associated with a time domain, i.e., set of possible dates on which event could have taken place Temporal and medical information extracted from narratives is formulated as set of constraints on time domain of narrative event Use Constraint Logic Programming tools to resolve time domains of narrative events If resolved time domain of narrative event contains date of structured event, link narrative event to structured event
16
16 Evaluation Evaluation of effectiveness of temporal constraints in Information Integration Link each narrative event to set of potentially matching events of same type in structured data according to medical constraints Measure how well application of temporal constraints narrow down this initial set of “structured” candidates We used a semi-automated pipeline to produce an idealised version of what a fully automatic system would provide as the input to the CSP component Results must be viewed in the light of the idealised input
17
17 Data and Gold Standard Confined to investigation events Patient notes of 5 patients analysed and annotated (large overhead of manual annotation) 446 documents, of which 94 contain 152 investigation events Manually created Gold Standard linking each narrative event to structured events of the same type, and correct targets
18
18 Annotating Temporal Information We annotate times, events (i.e., investigations) and temporal relations holding between these The annotation scheme used is a subset of the TimeML annotation scheme Example: We have arranged an MRI scan for next week. during
19
19 Evaluation: Recall & Precision We want to quantify the impact of using temporal constraints to reduce the ambiguity of mapping narrative events to structured events Ideally, temporal constraints should greatly reduce ambiguity by eliminating incorrect candidates from the set of possible targets in structured data – but not eliminate the true target Global evaluation measures: Recall: proportion of correct targets recognised as possible targets Precision: proportion of recognised possible targets that are correct We applied both metrics before and after application of temporal constraints in CSP and compared the results
20
20 Evaluation: Strict & Liberal Accuracy The limitation of the Recall and Precision metrics is that they score for the overall data set – i.e. over all events for all 5 patients If even only a small number of events retain a large number of possible targets, the overall precision score will be low even though most events are close to being correctly resolved Consequently, we developed two “accuracy” based scores (liberal and strict), which quantify for each narrative event the extent to which it is correctly resolved, and then average across all narrative events Liberal score for single event: 1 if at least one true target is correctly preserved, 0 otherwise Strict score for single event: proportion of recognised possible targets that are correct
21
21 Results Before CSPAfter CSP Recall1.00.94 Precision0.050.09 Liberal Accuracy0.830.78 Strict Accuracy0.080.27
22
22 Discussion The results show that there is a substantial amount of ambiguity at the start, which is reduced by application of temporal constraints, as best shown by the strict accuracy score A large degree of ambiguity remains, but … Use of temporal information is conservative E.g., a “past” narrative event is linked to all structured events dated before the date of the letter, but could heuristically be linked to the one structured event dated immediately before the date of the letter We have not yet exploited additional medical information, e.g., the locus of an investigation, nor additional temporal information, e.g., temporal relationships between events
23
23 Conclusions & Future Work Information Extraction Essential functionality implemented Extending coverage of system Evaluating performance Information Integration Initial assessment of approach Automating processing pipeline Extending method to other events
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.