Henk Harkema Andrea Setzer Ian Roberts Rob Gaizauskas Mark Hepple University of Sheffield Jeremy Rogers University of Manchester Richard Power Open University.

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

NLP Highlights GS Savova And team. Medication CEM template associatedCode Change_status Conditional Dosage Duration End_date Form Frequency Generic Negation_indicator.
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
ISMB Demo; June 27, 2005 Integrating Text Mining into Bio-Informatics Workflows Neil Davis George Demetriou Robert Gaizauskas Yikun Guo Ian Roberts Henk.
Summarisation and Visualisation of e-Health Data Repositories Catalina Hallett, Richard Power, Donia Scott Centre for Research in Computing The Open University.
Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2006.
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Progress Report on TT03 by the Institute for Language and Information Technologies (ILIT), UMBC Sergei Nirenburg Marge McShane.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Enterprise Data Quality CDEP: Tailoring Parser Configuration.
1 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference.
Introduction to Machine Learning Approach Lecture 5.
Flash talk by: Aditi Garg, Xiaoran Wang Authors: Sarah Rastkar, Gail C. Murphy and Gabriel Murray.
Stefan Schulz, Thorsten Seddig, Susanne Hanser, Albrecht Zaiß, Philipp Daumke Checking coding completeness by mining discharge summaries.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
WP5.4 - Introduction  Knowledge Extraction from Complementary Sources  This activity is concerned with augmenting the semantic multimedia metadata basis.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Institute of Informatics and Telecommunications – NCSR “Demokritos” Bootstrapping ontology evolution with multimedia information extraction C.D. Spyropoulos,
December 2006 MAGE and the Biospecimen Research Database Experiment Design and other issues Ian Fore, D.Phil U.S. National Cancer Institute - Center for.
Initial Findings from Evaluation of Service Improvement Activity Dr Zoe Radnor Giovanni Bucci AtoZ Business Consultancy.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Concept Model for observables, investigations, and observation results For the IHTSDO Observables Project Group and LOINC Coordination Project Revision.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Problem Solving. Why is it required? To perform a given task ; To perform a given task ; –quickly, –efficiently, and –as comfortably as possible for the.
ITTL.ppt-1 Information Technology & Telecommunications Laboratory Semantic Technologies Applied to FOIA Review William Underwood Partnerships in Innovation:
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
(Spring 2015) Instructor: Craig Duckett Lecture 10: Tuesday, May 12, 2015 Mere Mortals Chap. 7 Summary, Team Work Time 1.
Knowledge-Based Semantic Interpretation for Summarizing Biomedical Text Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S.
1 IRU – database design part one Geoff Leese September 2009.
This material was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator.
Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2005.
Part4 Methodology of Database Design Chapter 07- Overview of Conceptual Database Design Lu Wei College of Software and Microelectronics Northwestern Polytechnical.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
TimeML compliant text analysis for Temporal Reasoning Branimir Boguraev and Rie Kubota Ando.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
MedKAT Medical Knowledge Analysis Tool December 2009.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
Enterprise Architectures Course Code : CPIS-352 King Abdul Aziz University, Jeddah Saudi Arabia.
EDM Council / Object Management Group Semantic Standards Workstream Definitions and Detailed Objectives May 04, 2011.
The PLA Model: On the Combination of Product-Line Analyses 강태준.
COP Introduction to Database Structures
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
(Winter 2017) Instructor: Craig Duckett
CRF &SVM in Medication Extraction
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Data challenges in the pharmaceutical industry
Extracting Semantic Concept Relations
Chapter 13 Quality Management
SNOMED-CT representation Radiologic report Admission Letter
Mary Jane King, Grace Liu, Ontario Cancer Registry
Presentation transcript:

Henk Harkema Andrea Setzer Ian Roberts Rob Gaizauskas Mark Hepple University of Sheffield Jeremy Rogers University of Manchester Richard Power Open University Extraction and Analysis of Information from Structured and Unstructured Clinical Records AHM 2005 Text Mining Workshop 29/9/5

2 Overview Background Information Extraction Information Integration

3 Background: CLEF Clinical e-Science Framework Objective: To develop a high quality, secure and interoperable information repository, derived from operational electronic patient records to enable ethical and user-friendly access to patient information in support of clinical care and biomedical research Duration, funding, participants: 2003 – 2005 (CLEF), 2005 – 2007 (CLEF-Services) Funded by Medical Research Council (MRC) Six universities, Royal Marsden Hospital, industrial partners engaged through CLEF Industrial Forum Meetings

4 Sheffield NLP & CLEF Information Extraction Analyzing clinical narratives to extract medically relevant entities and events, and their properties and relationships Information Importation Importing extracted information into the CLEF repository Information Integration Combining extracted information with structured information (i.e., non-narrative data) already in repository in order to build summary of patient’s conditions and treatment over time

5 Medical IE Standard Information Extraction tasks: Entity/event extraction & relationship extraction Additional challenges: Cross-document event co-reference Same event mentioned in multiple documents; many documents provide only partial descriptions of events Modality of Information Negation: “I cannot feel any lump in her right supraclavicular fossa” Uncertainty: “I just wonder if there is an outside possibility that she might have mediastinal fibrosis to account for her symptomology” Temporality of Information

6 Entities, Events & Relationships Entities, events: Problem: melanoma, swelling, … Present/absent Clinical course: getting worse, getting better, no change Intervention: amputation, chemotherapy, … Status: planned, booked, started, completed, … Investigation: CT scan, ultrasound, … Status: planned, booked, started, completed, … Goal: treat, cure, palliate Drug: Atenolol, antibiotics, … Locus: abdomen, blood, … Laterality: left, right

7 Entities, Events & Relationships Relationships: Location of problem: problem  locus hip pain lesions in her liver Finding of investigation: investigation  problem An ECG examination revealed atrial fibrillation CT scan of her thorax and abdomen shows progressive disease Target of intervention: intervention  locus radiotherapy to back breast radiotherapy Further relationships

8 IE Approach Pipeline of processing modules Pre-processing: Tokenization, sentence splitting Lexical & terminological processing: Morphological analysis, term look-up, term parsing Syntactic & semantic processing: Sentence-based syntactic, semantic analysis Discourse processing & IE pattern application: Integration of semantic representations into discourse model Application of patterns to collect information to be extracted

9 Terminology Processing Termino: a large-scale terminological resource to support term processing for information extraction, retrieval, and navigation Termino contains a database holding large numbers of terms imported from various existing terminological resources, including UMLS Efficient recognition of terms in text is achieved through use of finite state recognizers compiled from contents of database The results of lexical look-up in Termino can feed into further term processing components, e.g., term parser

10 Terminology Processing Termino for CLEF Imported 160,000 terms from UMLS drawn from semantic types such as pharmacologic substances, anatomical structures, therapeutic procedures, diagnostic procedures, … Term grammars Rules for combining terms identified by term look-up in Termino into longer terms Example: locations in the lung Termino location_np  latitude_adj area_noun latitude_adj:upper, middle, lower, mid, basal area_noun:zone, region, area, field, lung, lobe

11 Information Extraction Patterns IE patterns inspect syntactic and semantic analyses and assert properties of entities and relationships between entities Example: finding of investigation “CT scan of her thorax shows progressive disease” IE pattern: invest_finding(I, P) if investigation(I), problem(P), show_event(S), lsubj(S, I), lobj(S, P).

12 Information Extraction Patterns Finding patterns Hand-crafted patterns “Redundancy” approach: given a patient for whom a relationship between two particular entities is known to exist (e.g., we know patient has a tumor in his lung), … find all sentences in all notes of this patient that contain these two entities, … and assume these sentences express the same relationship

13 Information Integration Combining structured information in repository with information extracted from narratives into coherent overview of patient’s condition and treatment over time Issues in Information Integration: Ambiguity: given an event extracted from a narrative, to which event in the structured data does it correspond? Fragmentation & duplication: Information Extraction over narrative data produces collection of potentially fragmented and duplicated descriptions of medical events which need to be sorted out Investigation of contribution of temporal information found within narratives to Information Integration

14 Linking extracted and structured events Reduce ambiguity through use of: Medical information: type of event, relationships, … Temporal information: time stamps, temporal expressions, verbal tense & aspect, … Type: X-RAY Location: chest Date: Type: X-RAY Location: chest Date: Type: MRI Location: abdomen Date: Type: X-RAY Location: chest Date: Chest X-RAY arranged for next week The chest X-RAY performed … Events in structured data Events in narratives

15 Constraint Satisfaction Ambiguity reduction as a Constraint Satisfaction problem Each narrative event is associated with a time domain, i.e., set of possible dates on which event could have taken place Temporal and medical information extracted from narratives is formulated as set of constraints on time domain of narrative event Use Constraint Logic Programming tools to resolve time domains of narrative events If resolved time domain of narrative event contains date of structured event, link narrative event to structured event

16 Evaluation Evaluation of effectiveness of temporal constraints in Information Integration Link each narrative event to set of potentially matching events of same type in structured data according to medical constraints Measure how well application of temporal constraints narrow down this initial set of “structured” candidates We used a semi-automated pipeline to produce an idealised version of what a fully automatic system would provide as the input to the CSP component Results must be viewed in the light of the idealised input

17 Data and Gold Standard Confined to investigation events Patient notes of 5 patients analysed and annotated (large overhead of manual annotation) 446 documents, of which 94 contain 152 investigation events Manually created Gold Standard linking each narrative event to structured events of the same type, and correct targets

18 Annotating Temporal Information We annotate times, events (i.e., investigations) and temporal relations holding between these The annotation scheme used is a subset of the TimeML annotation scheme Example: We have arranged an MRI scan for next week. during

19 Evaluation: Recall & Precision We want to quantify the impact of using temporal constraints to reduce the ambiguity of mapping narrative events to structured events Ideally, temporal constraints should greatly reduce ambiguity by eliminating incorrect candidates from the set of possible targets in structured data – but not eliminate the true target Global evaluation measures: Recall: proportion of correct targets recognised as possible targets Precision: proportion of recognised possible targets that are correct We applied both metrics before and after application of temporal constraints in CSP and compared the results

20 Evaluation: Strict & Liberal Accuracy The limitation of the Recall and Precision metrics is that they score for the overall data set – i.e. over all events for all 5 patients If even only a small number of events retain a large number of possible targets, the overall precision score will be low even though most events are close to being correctly resolved Consequently, we developed two “accuracy” based scores (liberal and strict), which quantify for each narrative event the extent to which it is correctly resolved, and then average across all narrative events Liberal score for single event: 1 if at least one true target is correctly preserved, 0 otherwise Strict score for single event: proportion of recognised possible targets that are correct

21 Results Before CSPAfter CSP Recall Precision Liberal Accuracy Strict Accuracy

22 Discussion The results show that there is a substantial amount of ambiguity at the start, which is reduced by application of temporal constraints, as best shown by the strict accuracy score A large degree of ambiguity remains, but … Use of temporal information is conservative E.g., a “past” narrative event is linked to all structured events dated before the date of the letter, but could heuristically be linked to the one structured event dated immediately before the date of the letter We have not yet exploited additional medical information, e.g., the locus of an investigation, nor additional temporal information, e.g., temporal relationships between events

23 Conclusions & Future Work Information Extraction Essential functionality implemented Extending coverage of system Evaluating performance Information Integration Initial assessment of approach Automating processing pipeline Extending method to other events