Biomedical Informatics and Clinical NLP in Translational Science Research Piet C. de Groen, M.D.
Overview - Examples Patient-specific research – N=1 study Understanding a disease Finding the right MD, diagnosis and treatment
Renal Transplant patient May, 2005 Hepatobiliary Clinic Consultation Abnormal liver tests – using Lipitor™ Diarrhea and weight loss Challenge Very complex medical history Nobody understands the case HUGE history with hundreds of notes
Patient January 16, 2006 Total weight of printed pages presented for review: 5 lbs.
Patient January 16, 2006 Total number of X-rays presented for review: 16,902
Questions What is exactly the patient’s problem? –Are liver tests and weight loss due to Lipitor? –When did she use Lipitor? –What was the weight on what date? Impossible to review all notes! –Which notes are relevant to current symptoms? –Which have notes have weights and drug information?
What I need I need to see trends over time –Weight –Lipitor use –Effects of Lipitor on lipids and liver tests But I cannot see trends over time –EMR does not have structured data for weight or Lipitor use –EMR only allows for display of laboratory test results in very large tables or simple graphs
Data Warehouse to the Rescue! Demographics –MC # = xx-xxx-xxx Clinical Notes –Patient Vitals Weight exists Result –243 notes 43 had weight Start DialysisTransplantNew Problem
What happened to Cholesterol? She was on Lipitor, but: –When was it discontinued? –Did it do anything to her lipid levels?
NLP to the rescue! Sort 33 identified Clinical Notes on date First note is from 1997 –Lipitor is highlighted in the note –…Dr. X recommended discontinuation of Pravachol and initiation of Lipitor … have written a prescription for Lipitor … Last note is from 2005 –… Lipitor was discontinued in 2004 … –March 2004 note confirms discontinuation
Warehouse to the Rescue! Demographics –MC # = xx-xxx-xxx Tests –Cholesterol exists Clinical Notes –“Lipitor” Result –22 cholesterol levels –243 notes: 33 mentioned “Lipitor” Lipitor
Recommendations 72 hour stool fat on 100 gram fat diet –689 gram, 23 gram fat/day (2-7 Normal) EGD/EUS with biopsies and aspirate –Esophagitis - ? Candida – biopsy negative –Duodenal diverticula, normal pancreas –Duodenal biopsy normal –Aerobes > 100,000 Gram negative bacillus cfu/mL –Anaerobes > 10,000 Bacteroides Fragilis cfu/mL –Yeast 1,000-10,000 cfu/mL Small Bowel X-ray –Numerous diverticula
Understanding a disease Hepatocellular Cancer in Obesity
Spring 2006 Based on simple queries of MCLSS For NASH the ICD-9 code was used; this code may include other diagnoses, but the vast majority is NASH For Primary Liver Cancer the ICD-9 codes and were used For Obesity ICD-9 code was used, or Diagnosis section Clinical Notes BMI was retrieved from Clinical Notes; maximum value during life time was used
Primary Liver Cancer NASH Cases with BMI>30 Cases
Cancers with Increasing Incidence 2012 report US: 1999 through 2008 CA: A Cancer Journal for Clinicians Volume 62, Issue 2, pages , 4 JAN 2012 DOI: /caac Volume 62, Issue 2,
Finding the right MD, diagnosis and treatment Interval Colorectal Cancer
Time Line Example of Interval Colorectal Cancer Pathology Endoscopy Diagnoses Time Line Year Benign ColonColon CancerNon-Colon Disease < 3 years
Colon Cancer (Pathology data) 4,203,857 specimens 238,177 specimens Part description = “COL/RECT” AND Valid MCN 19,259 specimens 13,477 specimens (10,136 patients) (Endoscopy data) 325,370 Procedures 2,692 patients 4,743 procedures (date, other features) Missed Lesions (Anatomic location, tumor size, other characteristics) Diagnosis_code = One of 50 identified cancer diagnosis codes Unique? (One specimen may have multiple diagnosis codes) Patients with CC diagnosis and C procedure Extract all C procedures, the date and other features Compare the CC diagnosis and C dates Remove Patients with Research Authorization = ‘No’ Colonoscopy
Methods Pathology = Colorectal Cancer Negative History Year Truly Missed No lesions at colonoscopy Probably Missed Seen, removed Lesions at colonoscopy Seen, not removed Colorectal Cancer History Recurrent, 2 nd, 3 rd cancer not prevented
Results Summary Truly missed case –90 days to 3 years Probably missed case –3 to 5 years A lesion was seen –removed <5 years –not removed <5 years Local recurrence or 2 nd, 3 rd cancer >44 54 >283 ©Ralph A. Clevenger
Tumor Growth Curves Truly Missed Probably Missed Seen & Removed Recurrent, 2 nd, 3 rd Time Interval (days) Tumor Size (mm) t = 3 yrs 3 Months Doubling Time
Number Not Detected Number Seen Numbers for each Endoscopist Truly Missed Probably Missed Seen & Removed Recurrent, 2 nd, 3 rd
% Not Detected Miss Rate for each Endoscopist Truly Missed Probably Missed Seen & Removed
Detection of cancers in previously seen patients (self) Detection of cancers in patients seen by colleagues (others) Endoscopist
Overview - Examples Patient-specific research – N=1 study Understanding a disease Finding the right MD, diagnosis and treatment