Presentation is loading. Please wait.

Presentation is loading. Please wait.

Managed by UT-Battelle for the Department of Energy Learning Cue Phrase Patterns from Radiology Reports Using a Genetic Algorithm Robert M. Patton, Ph.D.

Similar presentations


Presentation on theme: "Managed by UT-Battelle for the Department of Energy Learning Cue Phrase Patterns from Radiology Reports Using a Genetic Algorithm Robert M. Patton, Ph.D."— Presentation transcript:

1 Managed by UT-Battelle for the Department of Energy Learning Cue Phrase Patterns from Radiology Reports Using a Genetic Algorithm Robert M. Patton, Ph.D. Applied Software Engineering Research

2 2Managed by UT-Battelle for the Department of Energy Current Status  Worldwide, more women than ever are getting mammograms: –Radiologists cannot keep up with the growing number of readings they are facing –Growing shortage of radiologists –Human error in reading the images –Exams may do not necessarily take into consideration the patient’s prior exams –Computer-aided Detection (CAD) systems still need improvement Images courtesy of Memorial Sloan-Kettering Cancer Center via http://www.cancerquest.org

3 3Managed by UT-Battelle for the Department of Energy Challenges  What is “normal”? What is “abnormal”?  Reports vary widely in the words that are used regardless of Bi-RADS rating  Some radiologists “talk” a lot (i.e., very wordy reports), while other radiologists say very little  Longitudinal view: 1 patient may have N radiologists

4 4Managed by UT-Battelle for the Department of Energy Characteristics of Reports  Abnormal reports tend to have a wider variation in the language that is used  Normal reports use more negation phrases than abnormal reports –“no new focal masses” –“no radiographic lesions seen”  Multiple ways to say the same thing –“no radiographic evidence of malignancy” –“no mammographic findings of malignancy”

5 5Managed by UT-Battelle for the Department of Energy Example Reports Normal

6 6Managed by UT-Battelle for the Department of Energy Example Reports Abnormal

7 7Managed by UT-Battelle for the Department of Energy Multiple ways to say the same thing  Use skip grams (s-grams) to represent patterns of phrases that have similar meaning  S-grams are word pairs in their respective sentence order that allow for arbitrary gaps between the words  Example: –“no radiographic evidence of malignancy” –“no mammographic findings of malignancy” –S-gram: the words “no” & “malignancy”

8 8Managed by UT-Battelle for the Department of Energy Mammogram Classifier  Goal: Develop classifier based on s-grams that will distinguish between abnormal and normal reports using no labeled data or training set (unsupervised learning)  Classifier based on Maximum Variation Sampling implemented as a Genetic Algorithm (GA) –1 st Objective: Identify the most diverse reports (typically, abnormal reports), then extract the s-grams that they have in common –2 nd Objective: From the failed individuals in the GA (typically, normal reports), extract the negation s-grams that they have in common

9 9Managed by UT-Battelle for the Department of Energy Maximum Variation Sampling  Non-probabilistic sampling technique  Seeks to identify a sample that represents the largest diversity of data in the population  Abnormal reports are easily identified with this approach without any need for prior labeling or use of keywords –Abnormal reports tend to be longer and use diverse and unique language than normal reports  Implemented using a genetic algorithm

10 10Managed by UT-Battelle for the Department of Energy GA Implementation  Genetic Representation: Sample size of N –Each gene value is a unique, real-valued document ID  Fitness Function –Minimize the following fitness function Document 1Document 2….Document N Gene 1Gene 2….Gene N Individual i

11 11Managed by UT-Battelle for the Department of Energy Data Set  Primary data set consists of 9,000 patients studied over 5 year period  120,000 reports –Duplicate reports –Cancellation reports

12 12Managed by UT-Battelle for the Department of Energy Top Ten S-grams from MVS-GA best solution RankS-gramExample Observed Variants 1 left & breastleft breast demonstrating apparent distortion3640 2 core & biopsy stereotactic guided core biopsy of microcalcification 636 3 compression & views additional bilateral anterior compression mlo views 762 4 spot & viewslaterally exaggerated craniocaudal spot views838 5 magnification & views magnification views requested648 6 spot & compression mediolateral oblique spot compression views1094 7 needle & localization ultrasound-guided needle localization procedure 233 8 nodular & density showing questionable increased nodular density 2701 9 lymph & nodeatypically located intramammary lymph node717 10 spot & magnification breasts requiring spot magnification imaging624

13 13Managed by UT-Battelle for the Department of Energy Top Ten S-grams with the word "no" RankS-gramExample Observed Variants 1 no & suspiciousno finding strongly suspicious1231 2 no & massesno new focal masses365 3 no & focalno dominant focal lesion210 4 no & evidenceno evidence of cyst716 5 no & specificno specific palpable abnormality detected187 6 no & findingsno current physical findings308 7 no & massno development of abnormal dominant mass534 8 no & mammographic no persisting mammographic abnormalities390 9 no & radiographicno radiographic lesions seen285 10 no & calcificationsno clear cut clustered punctate calcifications138

14 14Managed by UT-Battelle for the Department of Energy Future Work  Additional work is needed to classify and identify more than two classes of reports  Fuse text features with image features to develop CAD training set  Longitudinal analysis: Develop methods to recognize precursors to abnormal conditions in patients

15 15Managed by UT-Battelle for the Department of Energy Longitudinal Patient Analysis spot magnificationsimple cyst nodular density 7 years successful aspiration

16 16Managed by UT-Battelle for the Department of Energy Longitudinal Patient Analysis “There is prominent nodular density posteriorly and inferiorly in both breasts on the mediolateral oblique views, left more than right.” … “Prominent nodular tissue bilaterally in the posterior inferior breasts is interpretated as normal breast tissue” May 1984 “A 9 mm nodule is present in the left breast at the 6 o'clock position. It was not definitely seen previously and may be a new finding.” … “New left inferior nodule and questionable new right superolateral microcalcifications. The patient should return for additional views with compression spot magnification and possible ultrasound for further evaluation.” Dec 1991 * Note: Emphasis added

17 17Managed by UT-Battelle for the Department of Energy Acknowledgements  Research sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U. S. Department of Energy.  Our thanks to Robert M. Nishikawa, Ph.D., Department of Radiology, University of Chicago for providing the large dataset of unstructured mammography reports.

18 18Managed by UT-Battelle for the Department of Energy Questions? Robert M. Patton, Ph.D. Applied Software Engineering Research Oak Ridge National Laboratory (865) 576-3832 pattonrm@ornl.gov http://aser.ornl.gov


Download ppt "Managed by UT-Battelle for the Department of Energy Learning Cue Phrase Patterns from Radiology Reports Using a Genetic Algorithm Robert M. Patton, Ph.D."

Similar presentations


Ads by Google