Free-text Medical Document Retrieval via Phrase-based Vector Space Model Wenlei Mao, MS and Wesley W. Chu, PhD and Computer.

Slides:



Advertisements
Similar presentations
February Best Practices in Advancement Services Customer Service: Benchmarking with the Best Jennifer Houlihan Warwick Loyola Marymount University.
Advertisements

12/09/20021 Engineer Training Program Drivers & AP Installation Guide for N34BS3 Written By Suzanne Yu Uniwill Computer Intl Corp Gateway Blvd. Fremont,
Overview of Full Use Guide on Electric Power Distribution Reliability Indices Panel Session – How to Define Major Events July 22, 2002 Presented.
Transantarctic Mountains Deformation Network (TAMDEF) GPS measurements of bedrock crustal motions Larry Hothem, USGS, Reston, VA Terry Wilson, Department.
ASEM-DUO Fellowship Programme Secretariat for ASEM-DUO Fellowship.
6 June The Dublin Core Metadata Initiative Makx Dekkers, Managing Director, Dublin Core Metadata Initiative
Fabio Asnicar Torii Access the Digital Research Community 1 st Open Archives Forum Pisa, May 2002.
CiLTHE Conference 4th July New Technology. A Universal Panacea? Dr. Wendy Beekes Department of Accounting and Finance Lancaster University.
HRTC Hard Real-time CORBA IST WP3 / K. Nilsson / Viena September 11-13, HRTC Robot Testbed
SIGN Cambodia Oct From Urban to Rural Health Care Waste Management in India Srishti Health Care Without Harm India.
Unwarranted Court Ordered Medication: A Call to Action James B. Gottstein, Esq. Law Project for Psychiatric Rights NARPA - December, 20021
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Summer Time, Rate, and Productivity Management of Operations Brad C. Meyer.
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
KNF Pocket Card Klamath NF Pocket Cards Fire Season 2002.
Tapestry Architecture and status UCB ROC / Sahara Retreat January 2002 Ben Y. Zhao
Movements towards a European dimension in Quality Assurance and Accreditation Don F. Westerheijden Conference Working on the European Dimension of Quality.
CSCI Intelligent Embedded Systems, Spring A Distributed Location System for the Active Office Andy Harter, Andy Hopper.
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, RAL: an RDF Algebra Flavius Frasincar.
Welcome to CMPE003 Personal Computers: Hardware and Software Dr. Chane Fullmer Fall 2002 UC Santa Cruz.
Cocoa Butter Crystallisation
March 14, CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton
PRI Prediction Enhances EW Training By Scott McDonald & Ken McRitchie Ottawa, Ontario, Canada Visit us at MC Countermeasure Inc.
EPICANN Accra, Ghana, March International Domain Names facts and dilemmas Elisabeth Porteneuve ICANN Accra, Ghana 9-14 March 2002.
IMPACT 12th September Investigation of extreme flood Processes and uncertainty IMPACT Investigation of Extreme Flood Processes And Uncertainty.
CE80N Introduction to Networks & The Internet Dr. Chane L. Fullmer UCSC Winter 2002.
Abhigyan, Aditya Mishra, Vikas Kumar, Arun Venkataramani University of Massachusetts Amherst 1.
Welcome to CMPE003 Personal Computers: Hardware and Software Dr. Chane Fullmer Fall 2002 UC Santa Cruz.
CSE331: Introduction to Networks and Security Lecture 30 Fall 2002.
A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA
CSE331: Introduction to Networks and Security Lecture 16 Fall 2002.
ADBIS Navigation Through Query Result Using Concept Order Tomáš Skopal, Václav Snášel, Daniela Ďuráková Department of Computer Science FEI, VŠB-Technical.
1 A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies Barbara Kitchenham Emilia Mendes Guilherme Travassos.
European Tax Issues of Mergers & Reorganizations - An Overview - Geerten M.M. Michielse Technical Assistance Advisor to the IMF Georgetown University Law.
ICAO Aerodrome Safety Workshop Almaty, Kazakhstan – 18 to 22 November 2002 The use of military aerodromes by civil aircraft.
Venture Capitalists As Benevolent Vultures: The Role of Network Externalities in Financing Choice
January 22, What is a Function?. January 22, What is a Function? Central service agency (CSA) is central to the operation of State government.
Korat Automated Testing Based on Java Predicates Chandrasekhar Boyapati, Sarfraz Khurshid, Darko Marinov MIT ISSTA 2002 Rome, Italy.
Seminario Swarm Seminario su Swarm Pietro Terna web.econ.unito.it/terna.
A Vector Space Model for Automatic Indexing
Chapter 5: Introduction to Information Retrieval
Evaluation of Decision Forests on Text Categorization
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Retrieval Effectiveness of an Ontology-based Model for Information Selection Khan, L., McLeod, D. & Hovy, E. Presented by Danielle Lee.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles
Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Evaluation INST 734 Module 5 Doug Oard. Agenda  Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
No. 1 Knowledge Acquisition from Documents with both Fixed and Free Formats* Shigeich Hirasawa Department of Industrial and Management Systems Engineering.
Web- and Multimedia-based Information Systems Lecture 2.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles
INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
No. 1 Classification Methods for Documents with both Fixed and Free Formats by PLSI Model* 2004International Conference in Management Sciences and Decision.
Associative Query Answering via Query Feature Similarity
Citation-based Extraction of Core Contents from Biomedical Articles
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Color Image Retrieval based on Primitives of Color Moments
Color Image Retrieval based on Primitives of Color Moments
Presentation transcript:

Free-text Medical Document Retrieval via Phrase-based Vector Space Model Wenlei Mao, MS and Wesley W. Chu, PhD and Computer Science Department University of California, Los Angeles

11/9-13/2002AMIA Outline Vector space model (VSM) in document retrieval Stem-based VSM Concept-based VSM Conceptual similarity Phrase-based VSM Retrieval effectiveness comparison Conclusion

11/9-13/2002AMIA Document Retrieval Find free-text documents to answer queries like, “Hyperthermia, leukocytosis, increased intracranial pressure, and central herniation. Cerebral edema secondary to infection, diagnosis and treatment.”

11/9-13/2002AMIA Vector Space Model (VSM) Leukocytosis Hyperthermia Words as terms d  q  d q

11/9-13/2002AMIA Stem-based VSM Morphological variants bear similar content E.g., “edema” and “edemas” Use stemmer to extract stems Lovins stemmer and Porter stemmer Query: “Hyperthermia, leukocytosis, increased intracranial pressure”… Stems: “hypertherm”, “leukocytos”, “increas”, “intracran”, “pressur”… Baseline of comparison

11/9-13/2002AMIA Shortcomings of Stem-based VSM Inability to capture multi-word concepts 1. “Increased intracranial pressure” Inability to utilize the relations between concepts: 2. Synonyms: “hyperthermia” and “fever” 3. IS-A relation: “hyperthermia” and “body temperature elevation”

11/9-13/2002AMIA Concept-based VSM Uses concepts in knowledge base (KB) as terms KB: Metathesaurus in UMLS Captures multi-word concepts Captures synonyms Query: “Hyperthermia, leukocytosis, increased intracranial pressure”… CUIs: (C ), (C ), (C )…

11/9-13/2002AMIA Shortcomings of Concept-based VSM Concepts may be related: E.g. “hyperthermia” and “body temperature elevation” are not identical but related concepts Need to quantify conceptual relations Knowledge bases are often incomplete, which reduces the retrieval effectiveness

11/9-13/2002AMIA Conceptual Similarity Evaluation c1 c2 c3 c4 Body temperature elevation Hyperthermia Disease Animal disease Node Distance d(c3,c4)=1 Descendant Count D(c3)=2 D(c4)=0

11/9-13/2002AMIA Deriving Conceptual Similarity From Hypernym Hierarchy c1 c2 c3 c4 Body temperature elevation Hyperthermia Disease Animal disease

11/9-13/2002AMIA Shortcomings of Concept-based VSM Concepts may be related: The conceptual similarity measure, s(c i,c j ), quantifies relations between concepts. Knowledge bases are often incomplete, which reduces the retrieval effectiveness.

11/9-13/2002AMIA Incompleteness of the Knowledge Bases Missing concepts in KB, e.g., “Infiltrative small bowel process” (), (C ), () In general, concept-based VSM cannot outperform stem-based VSM (cerebral edema)(cerebral lesion) Missing links between related concepts, e.g.,

11/9-13/2002AMIA Phrase-based Indexing Examples “Infiltrative small bowel process” [(); “infiltr”] [(C ); “smal”, “bowel”] [(); ”proces”] Query: “Cerebral edema” Document: “Cerebral lesion” [(C ); “cerebr”, “edem”] [(C ); “cerebr”, “lesion”] Query: “Hyperthermia, leukocytosis, increased intracranial pressure…” Phrases: [(C ); “hypertherm”] [(C ); “leukocytos”] [(C ); “increas”, “intracran”, “pressur”]…

11/9-13/2002AMIA Evaluate Phrase-based Document Similarity Due to the conceptual similarity s(c i,c j ) between concepts in p q and p d Due to the stem overlap in p q and p d

11/9-13/2002AMIA To Compare Retrieval Effectiveness The test set: OHSUMED 106 queries, 14K documents Expert relevance judgment: R or N Retrieval effectiveness: Recall – the percentage of relevant documents retrieved so far Precision – the percentage of retrieved documents that are relevant

11/9-13/2002AMIA Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS) 16% 100 queries vs. 5% 50 queries

11/9-13/2002AMIA Stem and Concept Similarity Contribution Weights : similarity contribution weight for concepts : similarity contribution weight for stems

11/9-13/2002AMIA Sensitivity of Retrieval Effectiveness to f s and f c Stems Concepts Optimal region

11/9-13/2002AMIA Computation Complexity Using Phrase-based VSM Data reorganization: Build separate indexes on stems and concepts Keep a list of related concepts c j ’s and conceptual similarity s(c i,c j ) with c i. Time complexities of document similarity calculation, same order of magnitude Stem-based VSM: Phrase-based VSM:

11/9-13/2002AMIA Conclusion A new document indexing paradigm based on phrases is proposed Use phrases (concept and its word stems) as terms Document similarity is derived from both the stem and the concept contributions Conceptual similarity quantifies the concept relations and improves retrieval effectiveness Stems remedy the incomplete coverage of the knowledge base (missing concepts and missing links between related concepts) Experimental results reveal a significant retrieval effectiveness improvement of the phrase-based VSM over the stem-based VSM

11/9-13/2002AMIA Acknowledgement This research is supported in part by NIC/NIH Grant#

11/9-13/2002AMIA c1 c2 Concept Unrelated Model Comparison ? ? ? s1 s2 Stems p1 p2 Phrase Concept Unrelated   Stem overlap in p1 and p2 p1 p2 Phrase Concept Related   max(s(c1,c2), stem overlap in p1 and p2) c1 c2 Concept Related  