Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.

Similar presentations


Presentation on theme: "Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University."— Presentation transcript:

1

2 Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake (cblake@ics.uci.edu) Information & Computer Science University of California, Irvine Wanda Pratt (wpratt@u.washington.edu) Information School and Division of Biomedical & Health Informatics University of Washington

3 Motivation Information overload –MEDLINE = 11 million citations 8,000 each week –additional 8,000 each week Specialization of research –low communication between scientific areas –little focus on ‘big picture’

4 Goal Provide scientists with promising new treatment strategies Medical literature has implicit links Deductive logic can identify these links If A then B and If B then C then A  C Assumptions

5 Previous Approach Swanson and Smalheiser (1997) Target Literature A Magnesium Source Literature C Migraine B-Calcium Channel Blockers B-Platelet Activity B-Serotonin...

6 Current Pruning WordsDistinct Words No Pruning 14,0512,762 Stemmed 13,1122,492 Manual Pruning 150 - 200 Remove ‘redundancies and non-useful terms’ ~92-94% of B-terms are manually pruned !

7 Our Approach Semantic representation –Unify synonymous text expressions –e.g. Serotonin = {5-HT, 5HT, Enteramine, 5-Hydroxytryptamine, 3-(2-Aminoethyl)- 1H-indol-5-ol } Prune using semantic types –e.g. Serotonin is a {Organic Chemical, Pharmacologic Substance, Neuroreactive Substance or Biogenic Amige}

8 Unified Medical Language System (UMLS) (1) Metathesaurus 311 vocabularies 776, 940 concepts ~11 million relationships 2.10 million strings (2) Semantic Network 134 semantic types 54 semantic relations (3) SPECIALIST lexicon POS + morphological 163 899 entries 133 945 nouns 13 179 verbs

9 Methodology Collect migraine citations Generate alternative features –word –concept –semantically pruned concepts Evaluate C  B connections

10 Word Representation Domain independent Common choice Title words (to compare with Swanson) Removed –417 generic stopwords* e.g. a, and, between, their, really, room, said, think, the,... –31 medical stopwords e.g clinical, observed, provide, selection, study, therapy, test,... * Source: Sanderson, M. (1999) Available at http://www.dcs.gla.ac.uk/idom/ir_resources

11 Concept Representation Medical specific Titles mapped to UMLS concept Mapped automatically (1) partition title sentences into phrases (2) for each phrase (2a) direct concept match (UMLS API) (2b) if not found approx match (UMLS API) select the best concept

12 Semantically Pruned Concept Used 37 of 134 semantic types in UMLS Substance Chemical Hormone Gene or Genome Enzyme Cell Amino Acid, Peptide or Protein Neuroreactive Substance or Biogenic Amine... Goal : generalize semantic types not blinded to B-terms

13 Evaluation Number of Relevant Items Step 1: Find potentially relevant titles –any representation + synonyms –e.g. calcium channel blockers any word in { calcium, channel, blokers, blocker } Step 2: Verify each title –Not all relevant B-terms indicated relevant links –E.g. Timolol maleate, a beta blocker, in the treatment of common migraine headache  calcium channel blocker 461 366

14 Evaluation - Metrics (1) Precision = (2) Recall = (3) Number of C  B links identified (4) Feature space dimensionality Number of relevant B-terms Number of B-terms returned Number of relevant B-terms Number of relevant titles

15 Interpolated Precision

16 Number of Links Identified

17 Dimensionality

18 Future Work Extend to B  A connections Use abstracts –dimensionality consequences Generalize –Raynaud’s disease and fish oil –other research questions

19 Conclusions Concept vs Words improved precision and recall more of the 11 connections in top 50 B-terms Semantic Pruning vs Concept degraded recall improved precision more of the 11 connections in top 50 B-terms

20 http://www.ics.uci.edu/~cblake Catherine Blake (cblake@ics.uci.edu) Wanda Pratt (wpratt@u.washington.edu)

21 References Davis, R (1989). The Creation of New Knowledge by Information Retrieval and Classification. Journal of Documentation 45(4) 273-301. Lindsay, R. K. and M. D. Gordon (1999). Literature-Based Discovery by Lexical Statistics. Journal of the American Society for Information Science 50(7): 574-587. Sanderson, M. (1999). Stop word list. Available at: http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/ Swanson, D. R. (1988). Migraine and magnesium: eleven neglected connections. Perspect. Biol. Med. 31: 526-557. Swanson, D. R. and N. R. Smalheiser (1997a). An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artifical Intelligence: 183-203. Weeber, M., Klein,H., Mork,J.G, Jong-van den Berg,L., Vos,R. (2000). Text- Based Discovery in Biomedicine: The Architecture of the DAD-system. AMIA.


Download ppt "Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University."

Similar presentations


Ads by Google