Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics.

Slides:



Advertisements
Similar presentations
Negation Detection in Swedish Clinical Text Maria Skeppstedt PhD Student at Stockholm University Department of Computer and Systems Sciences.
Advertisements

Natural Language Understanding Difficulties: Large amount of human knowledge assumed – Context is key. Language is pattern-based. Patterns can restrict.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Retrieval of Similar Electronic Health Records using UMLS Concept Graphs Laura Plaza and Alberto Díaz Universidad Complutense de Madrid.
Rapid Deployment and Adoption of Health Information Technology for Real Time Biosurveillance Primary support: NCI, NLM, CDC, and the DF/HCC.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
CORE COMPETENCIES IN CLINICAL & TRANSLATIONAL RESEARCH: The Child Health Perspective I. Clinical & Translational Research Questions: Extract information.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Text Classification and Information Extraction from Abstracts of Randomized Clinical Trials: One step closer to personalized semantic medical evidence.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Empirical Methods in Information Extraction - Claire Cardie 자연어처리연구실 한 경 수
I nformation Extraction from Radiology Reports: System Design and Implementation Information Model System Architecture – UIMA Automatic Report Segmentation.
Technical Assistance Webinar: NLM Institutional Training Grants for Research Training in Biomedical Informatics RFA-LM Q & A Only NLM Extramural.
Data Mining: A Closer Look
Opportunities for Big Data in Medical Records Mike Conlon, PhD
Extraction of Adverse Drug Effects from Clinical Records E. ARAMAKI* Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D.,K.WAKI.
Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.
Elizabeth Karlson, MD Associate Professor of Medicine
Performing the Study Data Collection
Knowledge Discovery and Data Mining to Assist Natural Language Understanding (Adam Wilcox, M.A., George Hripcsak, M.D. Department of Medical Informatics,
Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
De-identifying Pathology Reports for Pathology Informatics
Computers in Healthcare Jinbo Bi Department of Computer Science and Engineering Connecticut Institute for Clinical and Translational Research University.
Resolving abbreviations to their senses in Medline S. Gaudan, H. Kirsch and D. Rebholz-Schuhmann European Bioinformatics Institute, Wellcome Trust Genome.
By: Dr Alireza Kazemi.  Computer science, the study of complex systems, information and computation using applied mathematics, electrical engineering.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Open Health Natural Language Processing Consortium (OHNLP)
Development of ConText Tools in Python Brian E. Chapman, PhD, Glenn Dayton, Wendy W. Chapman, PhD Division of Biomedical Informatics.
THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium.
Natural Language Processing for Biosurveillance Wendy W. Chapman, PhD Center for Biomedical Informatics University of Pittsburgh.
De-Identification Jules J. Berman, Ph.D., M.D. Panel #: 1, March 8.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.
Generating Consensus Syndrome Case Definitions September 24-25, 2007 Pittsburgh, PA Hosted by Wendy Chapman and John DowlingFunded by ISDS.
CTAKES The clinical Text Analysis and Knowledge Extraction System.
Multimodal User Interface with Natural Language Classification for Clinicians At Point of Care Health Informatics Showcase Peter Budd Sponsors: NCCH -
De-identification: A Critical Success Factor in Clinical and Population Research Steven Merahn MD Dee Lang, RHIT Prepared for 2007 APIII Pittsburgh, PA.
SSO: THE SYNDROMIC SURVEILLANCE ONTOLOGY Okhmatovskaia A, Chapman WW, Collier N, Espino J, Conway M, Buckeridge DL Ontology Description The SSO was developed.
Acknowledgements Contact Information Anthony Wong, MTech 1, Senthil K. Nachimuthu, MD 1, Peter J. Haug, MD 1,2 Patterns and Rules  Vital signs medoids.
Community Acquired Pneumonia in the Emergency Department (ED) Emergency Department Nurses & Physicians Dr. Mark Cichon, Director; Bridget Gaughan, Manager.
1 Guy Divita Qing Zeng-Treitler Salt Lake City VA, University of Utah School of Medicine Pragmatic Interoperability.
Knowledge Discovery for a Focused Domain Scanning of documents and messages of interest to a business and the extraction of relevant facts for knowledge.
Classification of Emergency Department CT Imaging Reports using Natural Language Processing and Machine Learning Efsun Sarioglu, Kabir Yadav, Meaghan Smith,
Clinical Data Normalization Dr. Chute Aims: Build generalizable data normalization pipeline Semantic normalization annotators involving LexEVS Establish.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Data Quality Improvement This material was developed by Johns Hopkins University, funded by the Department of Health and Human Services, Office of the.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.
Unit 11.2a: Data Quality Attributes Data Quality Improvement Component 12/Unit 11 Health IT Workforce Curriculum Version 1.0/Fall
Quality Improvement: Overview of Principles and Techniques
Open Health Natural Language Processing Consortium
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
© 2016 Chapter 6 Data Management Health Information Management Technology: An Applied Approach.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Research in Population Informatics
Sentiment analysis algorithms and applications: A survey
Gregory Cooper Professor of Biomedical Informatics Director, Center for Causal Discovery Vice Chair Research, Department of Biomedical Informatics.
CaBig February 6, 2007 Jules Berman, Ph.D., M.D.
A Description Logics Approach to Clinical Guidelines and Protocols
Gregory Cooper Professor of Biomedical Informatics Director, Center for Causal Discovery Vice Chair, Department of Biomedical Informatics Research involves.
Extracting claim sentences from biomedical documents:
A Short Tutorial on Causal Network Modeling and Discovery
Extracting Semantic Concept Relations
Gregory Cooper Professor of Biomedical Informatics Director, Center for Causal Discovery Vice Chair Research, Department of Biomedical Informatics.
A Description Logics Approach to Clinical Guidelines and Protocols
By Hossein Hematialam and Wlodek Zadrozny Presented by
Presentation transcript:

Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics

Background 1994: B.A. in Linguistics & Chinese –University of Utah 2000: Ph.D. in Medical Informatics –University of Utah –Peter Haug 2003: Postdoctoral Fellowship –University of Pittsburgh –Bruce Buchanan –Greg Cooper 2003-present: Faculty –University of Pittsburgh

Problems Being Addressed with IE My work Identifying patients with pneumonia from chest radiograph reports Understanding the components of a clearly written radiology report –Train radiologist to dictate Classifying patients into syndrome categories from chief complaints –Cough/SOB  Respiratory patient Characterizing patient’s clinical state from ED reports –Outbreak detection –Outbreak investigation NLP-assisted ontology learning Locating pathology specimens

Problems Being Addressed with IE Future areas of application I would like to work on Learning genotype-phenotype patterns for diseases Quality control –Ensure physicians are complying with core measures required by Medicare –Look for medical errors Automatically assigning billing codes

Where is the Field Now? Field mainly focused on sentence-level problems –Identifying clinical conditions, therapies, medications A few systems for encoding characterizing information for condition

Less work on discourse level tasks—these are crucial for successful annotation of clinical texts –Contextual features Negation Uncertainty Experiencer Temporality Finding validation –Coreference resolution –Inference

What Technologies Work? IE of clinical concepts (80% “simple”, 20% difficult) Shallow parsing quite effective –MetaMap can identify many of the UMLS concepts in texts Concept-value pairs important—Regular expressions quite effective –“temperaure 39C” Structure of report important –Neck: no lymphadenopathy  cervical lymphadenopathy –CXR: evidence of pneumonia  radiological evidence of pneumonia

Where Do We Need More Work? Non-contiguous information –need deep parse Inference –“pain when press on left side of sternum”  non-pleuritic chest pain semantic networks –Opacity consistent with pneumonia  localized infiltrate Bayesian networks

What Technologies Work? Contextual Features (80% “simple”, 20% difficult) Rules based on trigger terms work quite well –NegEx –ConText

Negation Is the condition negated? Negated Affirmed Patient Experience Did the patient experience the condition? Yes No Temporality When did the condition occur? Historical Recent Hypothetical Three Contextual Features

ConText Algorithm Four elements –Trigger terms –Pseudo-trigger terms –Scope of the trigger term –Termination terms Assign appropriate value to contextual features for clinical conditions within scope of trigger terms Scope is usually until end of sentence or until trigger term

ConText: Determine Values for Contextual Features Based on negation algorithm NegEx Patient denies cough but complains of headache. No change in the patient’s chest pain. trigger term termination term pseudo-trigger term scope Clinical condition:Cough Negation:Negated

Evaluation of ConText Test set –90 ED reports Reference standard Physician annotations with NLP-assisted review –55 conditions –3 contextual features Outcome measures –Recall –Precision

ConText’s Performance 1,620 annotations FeatureRecallPrecision Negation(773)97%97% Historical(98)67%74% Hypothetical (40)83%94% Experiencer (8)100%100%

What is Needed for the 20% More knowledge modeling –Historicity often depends on the condition not on explicit time triggers –Coreference resolution needs fine-grained semantic knowledge Statistical techniques –Integrating information regarding sentence- level and discourse level information Annotated data sets

Why Haven’t We Implemented Many NLP Applications? Are we addressing the best application areas? Do we need more semi-automated applications?

Sharing Clinical Reports University of Pittsburgh IRB –Chief complaints are non human subjects data Can share openly as long as can’t triangulate patient –Chief complaint, age, hospital  patient –To use clinical report, must apply De-ID software Once apply De-ID, considered deidentified caBIG project, can share de-identified reports I hope to establish repository

National Sharing Maybe as some institutions begin sharing, others will follow? Can the NLM help? –Apply de-identification –Encrypted hospital information –Password protected –Repository of texts and annotations –Folk annotations?

Annotation Sets of Ours Chief Complaints (40,000) Syndrome classifications ED Reports Syndrome classification 55 respiratory-related clinical conditions –Negation –Experiencer –Historical –Hypothetical 6 report types All clinical conditions –Contextual features

Annotation Evaluation Measuring annotators’ –Reliability –Agreement More difficult if measuring agreement on what text was marked –F-measure Measuring quality of annotation schema –Dependent variable = agreement between annotators

AB C AB C Baseline Schema StageAnnotation Schema Stage

Photos courtesy Brian Chapman