We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byMicah Poulton
Modified over 2 years ago
©2013 MFMER | slide-1 An Incremental Approach to MEDLINE MeSH Indexing Presenter: Hongfang Liu BioASQ 2013 Team Member: Mayo Clinic: Wu Stephen, James Masanz, and Hongfang Liu University of Delaware: Dongqing Zhu, Ben Carterette
©2013 MFMER | slide-2 Outline Motivation & Task Incremental Systems MetaMap-based Search-based LLDA-based Experiment Setup Evaluation Conclusion
©2013 MFMER | slide-3 Motivation of BioASQ Task Reduce human effort in MeSH indexing Increasing number of new articles Low consistency among annotators [Funk and Reid] Automatic MeSH indexing Suggest MeSH terms for a given new article
©2013 MFMER | slide-4 Motivation of Mayo’s Participation Information retrieval (IR)-based ontology annotation Traditional approach has been information extraction-based Three levels of intelligence in artificial intelligence Knowledge-base intelligence Data intelligence User intelligence > Explore the use of topic modeling and distant supervision for ontology annotation
©2013 MFMER | slide-5 Proposed Approaches MetaMap-based Search-based LLDA-based Three approaches can work either independently or together in an incremental way DUI
©2013 MFMER | slide-6 MetaMap-based System Title: Age-period-cohort effect on mortality from cervical cancer. Abstract: to estimate the effect of age, period and birth cohort … CUI Candidates Score C00078471000 C03025921000 C0998265861 …… MetaMap Restricted to MeSH ontology … … …… …… ….. …… A ranked list of CUI => a ranked list of DUI A ranked list of CUI => a ranked list of DUI
©2013 MFMER | slide-7 MetaMap-based System Parameter Tuning Titles concepts are more important Low threshold roughly leads to high precision/recall Tradeoff between P/R
©2013 MFMER | slide-8 Search-based System Retrieval Model DUI Aggregation Docs D01, D02, D03 … D08, D03, D01 … D02, D03, D01 … DUI ranked by tf * score(Q, D)
©2013 MFMER | slide-9 Search-based System #weight(2.0 examination 2.0 cow 2.0 ultrasonographic 3.0 navel 3.0 urachal 3.0 extra-abdominal 2.0 pathologic 2.0 abscess) #weight(2.0 examination 2.0 cow 2.0 ultrasonographic 3.0 navel 3.0 urachal 3.0 extra-abdominal 2.0 pathologic 2.0 abscess) #weight(3.5 #uw2(hiv-1 infection) 4.5 #uw2(differential susceptibility) 2.0 #uw2(actin dynamics) 2.0 actin 4.5 #uw2(cortical actin) 4.5 #uw3(naive t cells) 2.5 dichotomy 3.5 #uw2(human memory) 3.5 #uw3(chemotactic actin activity) 2.0 cd45ro) #weight(3.5 #uw2(hiv-1 infection) 4.5 #uw2(differential susceptibility) 2.0 #uw2(actin dynamics) 2.0 actin 4.5 #uw2(cortical actin) 4.5 #uw3(naive t cells) 2.5 dichotomy 3.5 #uw2(human memory) 3.5 #uw3(chemotactic actin activity) 2.0 cd45ro)
©2013 MFMER | slide-10 Search-based System Parameter Tuning Less smoothing => better performance A small set of highly relevant documents Tradeoff between P/R
©2013 MFMER | slide-11 Systems LLDA-based LDA Process Each document is a mixture of topics Each topic is a multinomial word distribution Labeled LDA Incorporate label information
©2013 MFMER | slide-12 Systems LLDA-based Top categories in MeSH … … Top-level categories as topics (e.g., Anatomy Category, Chemicals and Drugs Category, etc.) root Each label below is converted to corresponding top-level labels
©2013 MFMER | slide-13 Systems LLDA-based DUI candidate list pruning A pruned rank list doc Search-based LLDA-based Categories DUI
©2013 MFMER | slide-14 Data Training -- Testing -- input: output:
©2013 MFMER | slide-15 Evaluation MM: MetaMap-based system Mi: micro LCA: lowest common ancestor
©2013 MFMER | slide-16 Conclusion and Future Work Three Systems MetaMap-based, search-based, LLDA-based Research findings Explored impact of various parameter on performance Promising results from search-based labeling Future Direction Better concept weighting strategies E.g., corpus-level statistics, external resources Comprehensive comparisons with existing methods A better strategy for incorporating hierarchical info. Into LLDA
©2013 MFMER | slide-17 Questions & Discussion
©2013 MFMER | slide-18 Baseline: MetaMap-based Labeling CONCEPT WEIGHTING CONCEPT DETECTOIN 1.Concepts (K): phrases or terms mapping to UMLS CUI 2.List (L) of CUI (c) with confidence scores (S c ) 3.Negation information for each K 1.Select non-negated CUI (c), with score higher than threshold h 2.Merge & rank c with weighted scores as follows α -> weights assigned to T(itle) β -> weights assigned to A(bstract) 3. β fixed to 1.0 while optimizing α Converge high ranked list of c to MeSH Descriptor Unique Identifiers (DUI)
©2013 MFMER | slide-19 Incremental Labeling: Search-based Labeling 1 Index training set with Indri Retrieve MeSH for testing set Filter out words with a medical stoplist Extract stems with Porter stemmer Indexing fields including titles and abstracts Retrieve Model Retrieve Model w i -> weights for ith matched query term q i f(q i,D) -> the query term matching function defined as: |D| and |C|: length of documents and collections tf qi, D & tf qi, C : document & collection term frequencies of q i μ : the Dirichlet smoothing parameter Query Formulation Result Aggregation
©2013 MFMER | slide-20 Search-based Labeling 2 Index training set with Indri Retrieve MeSH for testing set Retrieve Model Retrieve Model K T : terms in title extracted by MetaMap K A : terms in abstract likewise Query Formulation Result Aggregation Long Query (LQ) Phrase Query (PQ) Term Query (TQ) TQ Example: PQ Example: Longer query than phrase, order & proximity considered PQ: consider collocations
©2013 MFMER | slide-21 Parameter Explorations 2 Parameter setting for MetaMap-based Labeling a)Figure a shows the higher weights for Title, the better the results b)Figure b shows the best CI threshold at 600 c)Figure c shows recall is proportional to the number of DUI while precision is anti-proportional
©2013 MFMER | slide-22 Parameter Explorations 3 Parameter setting for MetaMap-based Labeling a)Figure d: more smoothing hurts the performance b)Figure e: best results come from number of top documents is 20 c)Figure f: similar to figure c, recall is proportional to the number of DUI while precision is anti-proportional
©2013 MFMER | slide-23 θdθd θdθd L mes h L mes h w w α α z z γ γ ψ ψ N D Incremental filtering with Labeled Latent Dirichlet Allocations (LLDA) Generative Story: 1)A generative topic model 2)Both α and ψ play the role of prior for topic generations 3)Θ d generates document topics tuned by both α and Mesh labels L 4)Word topic distribution γ and doc topic z d generate word w i Training and Testing Training: Parameter estimation with Gibbs Sampling for Θ and γ using 10% of provided PubMED corpus. Testing: The trained model suggests multiple mesh terms for testing data Filtering: Utilizing suggested mesh term sets to filter out results obtained from search- based labeling LLDA
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Querying Structured Text in an XML Database By Xuemei Luo.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
KDD-2008 Anticipating Annotations and Emerging Trends in Biomedical Literature Fabian Mörchen, Mathäus Dejori, Dmitriy Fradkin, Julien Etienne, Bernd Wachmann.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
© 2017 SlidePlayer.com Inc. All rights reserved.