Julia Stoyanovich, William Mee, Kenneth A. Ross New England DB Summit 2010 Semantic Ranking and Result Visualization for Life Sciences Publications.

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

Transferable Skills beyond the academic training 22nd January, 14-18h, Building 3, Floor 1, Computer Room 9 (16.P1.E3) 29nd January, 14-18h, Building.
Searching and Exploring Biomedical Data Vagelis Hristidis School of Computing and Information Sciences Florida International University.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea NLDB.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Retrieval 2/2 BDK12-6 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Search Engines and Information Retrieval Chapter 1.
1 LIST – DTSI – Service Réalité virtuelle, Cognitique et Interfaces sensorielles A Conceptual Approach to Web Image Retrieval Adrian Popescu Gregory Grefenstette.
AuthorLink: Instant Author Co-Citation Mapping for Online Searching Xia Lin Howard D. White Jan Buzydlowski Drexel University Philadelphia,
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
Which of the two appears simple to you? 1 2.
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
CNI Spring Meeting April 26, 1999 Washington, DC THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory Graduate School.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
MeSH The Medical Subject Headings from the National Library of Medicine.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A semantic similarity metric combining features and intrinsic information content Presenter: Chun-Ping.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
CODE (Committee on Digital Environment) July 26, 2000 Rice University THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.
Workshop on The Transformation of Science Max Planck Society, Elmau, Germany June 1, 1999 TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge.
Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory.
Semi-Automatic Image Annotation Liu Wenyin, Susan Dumais, Yanfeng Sun, HongJiang Zhang, Mary Czerwinski and Brent Field Microsoft Research.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1.
Wen Chan 1 , Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan.
BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Neighborhood - based Tag Prediction
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Bridget McInnes Ted Pedersen Serguei Pakhomov
Accounting for the relative importance of objects in image retrieval
Panagiotis G. Ipeirotis Luis Gravano
PubMed.
MedSearch is a retrieval system for the medical literature
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Presentation transcript:

Julia Stoyanovich, William Mee, Kenneth A. Ross New England DB Summit 2010 Semantic Ranking and Result Visualization for Life Sciences Publications

2

3 Data and Query Processing PubMed corpus –over 19 million articles and growing –articles annotated with MeSH terms –annotators are instructed to annotate with the most specific term possible Medical Subject Headings (MeSH) annotations –over 25K term descriptors –organized into a polyhierarchy –17 trees, almost no cycles Entrez search engine –query translation, synonym & ontology expansions mosquito -> "culicidae"[MeSH Terms] OR "culicidae"[All Fields] OR "mosquito"[All Fields]

4 Connective Tissue DiseasesAutoimmune Diseases Rheumatic Diseases Diseases Skin & Connective Tissue Diseases RA Felty’s Sy Arthritis, Juvenile Rheumatoid Nodule Still’s Disease, Adult-Onset Skin Diseases …. … Immune System Diseases …. RASLE …. Lupus Nephritis Lupus Vasculitis Lupus Nephritis Lupus Vasculitis Felty’s Sy Sjögren’s Sy Still’s Disease, Adult-Onset Arthritis, Juvenile Rheumatoid Caplan Sy Sjögren’s Sy Scleroderma, Systemic …. MeSH: A Scoped Polyhierarchy

5

6 CC T A D B E GHF G F Q = { C }D = { G, E } term-scope(D) F E G C term-scope(Q) G H G C H E F term-similarity (Q, D) = | term-scope(Q)  term-scope(D) | Semantics of Query Relevance

7 Q = { E, B } D = { F, G } CC T A D B E GHF G F term-similarity (Q, D) = 2 term-scope(Q) G C E term-scope(D) F B But F contributes to both query terms, while G only contributes to one! Idea: count occurrences of document terms within the context of query terms. Semantics of Query Relevance

8 conditional-similarity (Q, D): count the # of ancestor-descendant pairs balanced-similarity (Q, D): normalize the contribution of each query term CC T A D B E GHF G F BCEFGFG Semantics of Query Relevance Q = { E, B } D = { F, G }

9 Q = { q 1, …, q n }D = { d 1, …, d m } 1.term-scope (Q) = term-scope(q 1 )  …  term-scope( q n ) 2.term-scope (D) = term-scope(d 1 )  …  term-scope(d m ) 3.term-similarity (Q, D) = | term-scope(Q)  term-scope(D) | Can be expensive for queries, documents with large term scopes! | (A  B)  (Y  Z) | = | (A  Y)  (A  Z)  (B  Y)  (B  Z) | < |A  Y| + |A  Z| + |B  Y| + |B  Z| Pre-compute term-similarity (s,t) for all (s,t) –Practical, since 160K pairs have term-similarity(s,t) > 0, out of over 600M At query time – Compute score upper-bounds for all documents – Compute term-similarity only for the promising documents Useful upper-bounds also hold for conditional and balanced-similarity Computation of term-similarity abc abcc

10 System Architecture batch 3 Query Manager 1 query eUtils API batch 2 batch 1 Java RMI In-memory DB PubMed

11 Performance: Ranked Retrieval * results for 150 queries in our workload

12 Performance: Ranked Retrieval * results are cumulative over 150 queries

13 Performance: Skyline for term-similarity * large queries > 20K results; 30% of the workload, 75% of the time

14 User study –8 users, researchers in medicine, biology, bioinformatics 1 query per user, total 670 individual, 335 pair-wise relevance judgments conducted free-form interviews with some users –2 baselines distance-based information-theoretic Quantitative analysis of results –We appear to outperform baselines for queries with polyhierarchy features –Baselines appear to outperform our measures for several other queries –For some queries no measure correlated with user’s perception of quality Qualitative analysis of results –Many aspects inform a user’s judgment, ontology is one of them –Both general and specific concepts are important Plan to scale up the evaluation by making our system available to the scientific community at large Evaluation of Effectiveness

15 Related Work Hierarchy-based similarity measures –[Ganesan et al, 2003] compare sets / multisets of terms, leaf nodes, hierarchy is a tree –[Rada & Bicknell, 1989] distance is a mean-path length between pairs of query & document terms –[Lin & Kim, 1993; Resnik, 1995] information-theoretic measures, typically distance via ancestor Weighted set similarity [Hadjieleftheriou, 2007] Bibliographic search in life sciences –Entrez, GoPubMed, NextBio Efficient computation of skylines –[Bentley 1980; Borzsonyi et al 2001; ….]

16 Contributions Similarity measures for scoped polyhierarchies –Distance is via descendants, not via ancestors –Scoping is exploited –Alternative semantics of combining contributions of individual terms to the score Efficient computation of similarity using score upper-bounds Efficient computation of a 2D skyline using score upper-bounds, with lazy evaluation of coordinates Experimental evaluation –Efficiency –User study

17 Thank you!

18