We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byMicah Poulton
Modified about 1 year ago
©2013 MFMER | slide-1 An Incremental Approach to MEDLINE MeSH Indexing Presenter: Hongfang Liu BioASQ 2013 Team Member: Mayo Clinic: Wu Stephen, James Masanz, and Hongfang Liu University of Delaware: Dongqing Zhu, Ben Carterette
©2013 MFMER | slide-2 Outline Motivation & Task Incremental Systems MetaMap-based Search-based LLDA-based Experiment Setup Evaluation Conclusion
©2013 MFMER | slide-3 Motivation of BioASQ Task Reduce human effort in MeSH indexing Increasing number of new articles Low consistency among annotators [Funk and Reid] Automatic MeSH indexing Suggest MeSH terms for a given new article
©2013 MFMER | slide-4 Motivation of Mayo’s Participation Information retrieval (IR)-based ontology annotation Traditional approach has been information extraction-based Three levels of intelligence in artificial intelligence Knowledge-base intelligence Data intelligence User intelligence > Explore the use of topic modeling and distant supervision for ontology annotation
©2013 MFMER | slide-5 Proposed Approaches MetaMap-based Search-based LLDA-based Three approaches can work either independently or together in an incremental way DUI
©2013 MFMER | slide-6 MetaMap-based System Title: Age-period-cohort effect on mortality from cervical cancer. Abstract: to estimate the effect of age, period and birth cohort … CUI Candidates Score C C C …… MetaMap Restricted to MeSH ontology … … …… …… ….. …… A ranked list of CUI => a ranked list of DUI A ranked list of CUI => a ranked list of DUI
©2013 MFMER | slide-7 MetaMap-based System Parameter Tuning Titles concepts are more important Low threshold roughly leads to high precision/recall Tradeoff between P/R
©2013 MFMER | slide-8 Search-based System Retrieval Model DUI Aggregation Docs D01, D02, D03 … D08, D03, D01 … D02, D03, D01 … DUI ranked by tf * score(Q, D)
©2013 MFMER | slide-9 Search-based System #weight(2.0 examination 2.0 cow 2.0 ultrasonographic 3.0 navel 3.0 urachal 3.0 extra-abdominal 2.0 pathologic 2.0 abscess) #weight(2.0 examination 2.0 cow 2.0 ultrasonographic 3.0 navel 3.0 urachal 3.0 extra-abdominal 2.0 pathologic 2.0 abscess) #weight(3.5 #uw2(hiv-1 infection) 4.5 #uw2(differential susceptibility) 2.0 #uw2(actin dynamics) 2.0 actin 4.5 #uw2(cortical actin) 4.5 #uw3(naive t cells) 2.5 dichotomy 3.5 #uw2(human memory) 3.5 #uw3(chemotactic actin activity) 2.0 cd45ro) #weight(3.5 #uw2(hiv-1 infection) 4.5 #uw2(differential susceptibility) 2.0 #uw2(actin dynamics) 2.0 actin 4.5 #uw2(cortical actin) 4.5 #uw3(naive t cells) 2.5 dichotomy 3.5 #uw2(human memory) 3.5 #uw3(chemotactic actin activity) 2.0 cd45ro)
©2013 MFMER | slide-10 Search-based System Parameter Tuning Less smoothing => better performance A small set of highly relevant documents Tradeoff between P/R
©2013 MFMER | slide-11 Systems LLDA-based LDA Process Each document is a mixture of topics Each topic is a multinomial word distribution Labeled LDA Incorporate label information
©2013 MFMER | slide-12 Systems LLDA-based Top categories in MeSH … … Top-level categories as topics (e.g., Anatomy Category, Chemicals and Drugs Category, etc.) root Each label below is converted to corresponding top-level labels
©2013 MFMER | slide-13 Systems LLDA-based DUI candidate list pruning A pruned rank list doc Search-based LLDA-based Categories DUI
©2013 MFMER | slide-14 Data Training -- Testing -- input: output:
©2013 MFMER | slide-15 Evaluation MM: MetaMap-based system Mi: micro LCA: lowest common ancestor
©2013 MFMER | slide-16 Conclusion and Future Work Three Systems MetaMap-based, search-based, LLDA-based Research findings Explored impact of various parameter on performance Promising results from search-based labeling Future Direction Better concept weighting strategies E.g., corpus-level statistics, external resources Comprehensive comparisons with existing methods A better strategy for incorporating hierarchical info. Into LLDA
©2013 MFMER | slide-17 Questions & Discussion
©2013 MFMER | slide-18 Baseline: MetaMap-based Labeling CONCEPT WEIGHTING CONCEPT DETECTOIN 1.Concepts (K): phrases or terms mapping to UMLS CUI 2.List (L) of CUI (c) with confidence scores (S c ) 3.Negation information for each K 1.Select non-negated CUI (c), with score higher than threshold h 2.Merge & rank c with weighted scores as follows α -> weights assigned to T(itle) β -> weights assigned to A(bstract) 3. β fixed to 1.0 while optimizing α Converge high ranked list of c to MeSH Descriptor Unique Identifiers (DUI)
©2013 MFMER | slide-19 Incremental Labeling: Search-based Labeling 1 Index training set with Indri Retrieve MeSH for testing set Filter out words with a medical stoplist Extract stems with Porter stemmer Indexing fields including titles and abstracts Retrieve Model Retrieve Model w i -> weights for ith matched query term q i f(q i,D) -> the query term matching function defined as: |D| and |C|: length of documents and collections tf qi, D & tf qi, C : document & collection term frequencies of q i μ : the Dirichlet smoothing parameter Query Formulation Result Aggregation
©2013 MFMER | slide-20 Search-based Labeling 2 Index training set with Indri Retrieve MeSH for testing set Retrieve Model Retrieve Model K T : terms in title extracted by MetaMap K A : terms in abstract likewise Query Formulation Result Aggregation Long Query (LQ) Phrase Query (PQ) Term Query (TQ) TQ Example: PQ Example: Longer query than phrase, order & proximity considered PQ: consider collocations
©2013 MFMER | slide-21 Parameter Explorations 2 Parameter setting for MetaMap-based Labeling a)Figure a shows the higher weights for Title, the better the results b)Figure b shows the best CI threshold at 600 c)Figure c shows recall is proportional to the number of DUI while precision is anti-proportional
©2013 MFMER | slide-22 Parameter Explorations 3 Parameter setting for MetaMap-based Labeling a)Figure d: more smoothing hurts the performance b)Figure e: best results come from number of top documents is 20 c)Figure f: similar to figure c, recall is proportional to the number of DUI while precision is anti-proportional
©2013 MFMER | slide-23 θdθd θdθd L mes h L mes h w w α α z z γ γ ψ ψ N D Incremental filtering with Labeled Latent Dirichlet Allocations (LLDA) Generative Story: 1)A generative topic model 2)Both α and ψ play the role of prior for topic generations 3)Θ d generates document topics tuned by both α and Mesh labels L 4)Word topic distribution γ and doc topic z d generate word w i Training and Testing Training: Parameter estimation with Gibbs Sampling for Θ and γ using 10% of provided PubMED corpus. Testing: The trained model suggests multiple mesh terms for testing data Filtering: Utilizing suggested mesh term sets to filter out results obtained from search- based labeling LLDA
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
The Fudan-UIUC participation in the BioASQ Challenge Task2a: The Antinomyra System Ke Liu 1, Junqiu Wu 2, Shengwen Peng 1,Chengxiang Zhai 3, Shanfeng Zhu.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Querying Structured Text in an XML Database By Xuemei Luo.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
KDD-2008 Anticipating Annotations and Emerging Trends in Biomedical Literature Fabian Mörchen, Mathäus Dejori, Dmitriy Fradkin, Julien Etienne, Bernd Wachmann.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Natural Language Processing Topics in Information Retrieval August, 2002.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
1 Evaluating the Performance of IR Sytems. 2 Outline Summary of IR system. –indexing: stop list, stemming, term weights –file organisation for term indexes.
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Chapter 6: Information Retrieval and Web Search Dr. Mehmet S. Aktaş Acknowledgement: Thanks to Dr. Bing Liu for teaching materials.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Personalized Search Result Diversification via Structured Learning SHANGSONG LIANG, ZHAOCHUN REN, MAARTEN DE RIJKE UNIVERSITY OF AMSTERDAM PRESENTED BY.
The Relevance Model A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
MusicSense: Contextual Music Recommendation using Emotional Allocation Modeling Rui Cai, Chao Zhang, Chong Wang, Lei Zhang, and Wei-Ying Ma Proceedings.
Chapter 5: Introduction to Information Retrieval.
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
© 2017 SlidePlayer.com Inc. All rights reserved.