Marianne Lykke Nielsen September 2008 Indexing with semantic components improve information retrieval in domain-specific web portal Marianne Lykke Nielsen.

Slides:



Advertisements
Similar presentations
ISKO 2010Marianne Lykke Royal School of Library and Information Science Susan L. Price and Lois M. L. Delcambre Portland State University ISKO 2010 Conference.
Advertisements

Cultural Heritage in REGional NETworks REGNET Project Meeting Content Group Part 1: Usability Testing.
The Feasibility of Using the Semantic Components Model for Indexing Documents in Digital Libraries * Susan Price + Marianne Lykke Nielsen * Lois Delcambre.
Indexing challenges in work place information retrieval Marianne Lykke Nielsen & Anna Gjerluf Eslau NKOS 2006 Controlled, human indexing vs full-text indexing.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Modern Information Retrieval Chapter 1: Introduction
Developing and Evaluating a Query Recommendation Feature to Assist Users with Online Information Seeking & Retrieval With graduate students: Karl Gyllstrom,
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
A Quality Focused Crawler for Health Information Tim Tang.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Evaluating Search Engine
INFO 624 Week 3 Retrieval System Evaluation
1 Enhancing Evidence-based Information Access to Inform Public Health Practice Modeling Public Health Information Needs and Accessing Requirements December.
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
© Tefko Saracevic1 Search strategy & tactics Governed by effectiveness&feedback.
Presented by Zeehasham Rasheed
Enterprise social bookmarking - in a community of practice in IBM, Denmark by Joachim Florentz Boye and Marianne Lykke Nielsen Royal School of Library.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Creating Knowledge V, 2008 A search thesaurus for the domain of linguistics Creating a domain specific search tool on the basis of user behaviour study.
Usability Evaluation of Digital Libraries Stacey Greenaway Submitted to University of Wolverhampton module Dec 15 th 2006.
The Value of Patient Education Kiosks Jackie A. Smith, Ph.D. April 12, 2000.
STAT!Ref ® IT! Powered by Teton Data Systems. Presentation Goals: What is STAT!Ref®? Where is STAT!Ref? How to STAT!Ref® it!
Search Engines and Information Retrieval Chapter 1.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
How doctors apply semantic components to specify search in work-related information retrieval Marianne Lykke, professor, Aalborg University Susan Price,
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Kathleen Padova INFO 861 January 20, Emerged in different disciplines, academically Continued to develop in different disciplines in practice Information.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
A Multiple Ontology, Concept-Based, Context-Sensitive Search and Retrieval Robert Moskovitch and Prof. Yuval Shahar Medical Informatics Research Center.
1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Which of the two appears simple to you? 1 2.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
A qualitative study of functioning, disability, and rehabilitation of patients after hip fracture surgery Monica Milter Ehlers PhD student, MSc in Nursing.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Marianne Lykke Nielsen September 2008 Plenum discussion: Emerging trends in tagging – and its relation to KOS Marianne Lykke Nielsen Information Interaction.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
End-user interaction with corporate digital thesaurus Marianne Lykke Nielsen The Royal School of Library and Information Science Department of Information.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
Medical Information Retrieval: eEvidence System By Zhao Jin Mar
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
L&I SCI 110: Information science and information theory Instructor: Xiangming(Simon) Mu Sept. 9, 2004.
GL14 Fourteenth International Conference on Grey Literature National Research Council, Rome, Italy November 2012  Where to search for Language.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Evaluation of Information Retrieval Systems Xiangming Mu.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
User Errors in Formulating Queries and IR Techniques to Overcome Them Birger Larsen Information Interaction and Information Architecture Royal School of.
Evaluation of IR Systems
EnTag Enhanced Tagging for Discovery Koraljka Golub, Jim Moon,
Openness about malpractice in medical care
Toshiyuki Shimizu (Kyoto University)
Information Science in International Perspective
Introduction to Information Retrieval
1Micheal T. Adenibuyan, 2Oluwatoyin A. Enikuomehin and 2Benjamin S
Measuring Learning During Search: Differences in Interactions, Eye-Gaze, and Semantic Similarity to Expert Knowledge Florian Groß Mai
Presentation transcript:

Marianne Lykke Nielsen September 2008 Indexing with semantic components improve information retrieval in domain-specific web portal Marianne Lykke Nielsen Information Interaction and Information Architecture Royal School of Library and Information Science NKOS Workshop 2008 At ECDL, Århus, Denmark September 19, 2008

Marianne Lykke Nielsen September 2008 Agenda Indexing with semantic components Indexing and searching problems in domain-specific IR systems User evaluation: indexing and retrieval Results Summing up

Marianne Lykke Nielsen September 2008 Searching problems Domain-specific IR systems often contain a large number of documents about a limited set of topics – difficult to distinguish between documents Domain experts have often very specific information needs related to specific work tasks Situational relevance, determined by contextual dimensions is often more important than topical relevance Limited time for searching

Marianne Lykke Nielsen September 2008 Indexing problems Topics – what topics should be indexed Exhaustivity and specificity - what level of specificity and completeness (exhaustivity) Terminology – what terms and terminology should be used to express the topics Consistency – how do we ensure consistency Costs – indexing takes time and demands domain knowledge and indexing skills and knowledge

Marianne Lykke Nielsen September 2008 Indexing with semantic components (SC) Marking up semantic sections of text (semantic components) Sections contain information about specific aspects or facets of the overall topic, e.g. diagnosis, treatment, referral, risk factors Method based on assumptions and research results that show: −Domain experts know document types with a certain domain. They know document structure, and use this knowledge when reading and using documents (Dillon, 1991; Orlikowaki & Yates, 1994; Bishop, 1999) –Content and structure in domain-specific documents correspond to aspects/facets of domain-specific information needs (Ely et al, 1999,2000; Price, Delcambre, Nielsen, 2006) Supplementary to basic indexing methods, manual as well as automatic

Marianne Lykke Nielsen September 2008 Research team General medicine Peter Vedsted MD, Ph.D. Research Unit General Medicine, Århus University Jens Rubak MD Praksis.dk, Region Midtjylland Information and computer science Lois Delcambre, Ph.D., Professor Susan Price, MD, Ph.D. student Computer Science Department Portland State University, USA Marianne Lykke Nielsen, Ph.D., Asso. professor Information Interaction and Architecture Royal School of Library and Information Science, Denmark sundhed.dk Vibeke Luk Frans la Cour Information specialist IT consultant sundhed.dkAutonomy Supported by US National Science Foundation, sundhed.dk and Region Midtjylland

Marianne Lykke Nielsen September 2008 Case study sundhed.dk: national, Danish health portal Established in 2001, documents Two main target groups: citizens and professionals in the health care sector Use of automatic as well as manual indexing methods: – ICPC – ICD-10 – Citizens Thesaurus –Large and varied group of indexers – 17 regions – Up to 250 indexers per region Research project focuses on family practitioners

Marianne Lykke Nielsen September 2008

Marianne Lykke Nielsen September 2008 General information Practical information Clinical method

Marianne Lykke Nielsen September 2008 General information Risk factors After treatment Clinical method

Marianne Lykke Nielsen September 2008 Semantic components (SC) in sundhed.dk In sundhed.dk we have identified 6 document types: −Clinical problem −Clinical method −Services −Drug product −Clinical unit −Note Each document type has its own set of semantic components, for example Clinical method contains 6 semantic components: − General information − Practical information − Referral − After treatment − Risk factors − Expected result

Marianne Lykke Nielsen September 2008

Marianne Lykke Nielsen September 2008

Marianne Lykke Nielsen September 2008 … enter a term to search within a semantic component in addition to your main search term …

Marianne Lykke Nielsen September 2008 Enter an asterisk to find all (and only) the Clinical problem documents about asthma that contain the Referral semantic component

Marianne Lykke Nielsen September 2008 User evaluation - searching Comparison between standard, control retrieval system (System 1) and experimental retrieval system with semantic components (system 2) Test persons: 30 family practitioners with experience with sundhed.dk and online information retrieval Training sessions: introduction to semantic components and the two retrieval systems Search jobs: 4 controlled, simulated search scenarios –2 search jobs are carried out in System 1 –2 search jobs are carried out in System 2 Random distribution of search jobs and retrieval systems Data collection: −Searching behavior, graded relevance assessments (user and system relevance), time used, ease of use, and user satisfaction)

Marianne Lykke Nielsen September 2008 Search scenarios Search scenario C Anna is childless. She had two abortions (un-intentional). She is now ready to get pregnant again. There is something about folate. Should she have it, and what doses? Search task: Find documents that help you to decide, whether Anna should take folate, and how big a doses.

Marianne Lykke Nielsen September 2008 Test persons (n = 30) Experience Internet search engines (year) Experience sundhed.dk (year) Experience Information retrieval (1 Much – 5 None) Experience Medical professional (year) Mean7,22,4 21,4 Maximum12,55434,2 Minimum10,616,3 Std2,821,430,937,59

Marianne Lykke Nielsen September 2008 Search performance MAP for Best query per session (Mean ± SE) Search scenario A Search scenario B Search scenario C Search scenario D Total System ± ± ± ± ± 0.03 System ± ± ± ± ± 0.04 nDCG for Best query per session (Mean ± SE) Search scenario A Search scenario B Search scenario C Search scenario D Total System ± ± ± ± ± 0.03 System ± ± ± ± ± 0.03

Marianne Lykke Nielsen September 2008 Ease of use Ease of use, measured by queries and time (n = 120) No of queries per session Time used (minutes) per session System 1 (n = 60) System 2 (n = 60) System 1 (n = 60) System 2 (n = 60) Mean2,553,2 5,87,2 Minimum11 1,92,2 Maximum811 16,816,6 Std1,9082,386 --

Marianne Lykke Nielsen September 2008 Ease of use (1 Good – 5 Bad) How easy to formulate How satisfied with result How useful to specify with Information type How useful to specify with Region How useful to specify with SC How useful to search with SC System 12,022,121,901,54-- System 22,072,281,621,651,581,73

Marianne Lykke Nielsen September 2008 User satisfaction 73% of users indicate that they will use SC for more than 50% of their daily searches SC is useful for specific search tasks –SC is useful for complex, specific tasks – for tasks where you are ”on new and unknown territory” SC is not intuitive, demands training – Users need training to use SC effectively – Labels are not intuitive Better functionality –More simple: fewer SC and no division into document types – Personalization – Direct access from interface to semantic component in the text

Marianne Lykke Nielsen September 2008 User evaluation - indexing Comparison of keyword indexing with indexing with semantic components Test persones: 16 Danish sundhed.dk indexers Training sessions: introduction to indexing with semantic components (SC) Indexing tasks: 12 sundhed.dk documents –6 documents were indexed by semantic components (SC) –6 documents were indexed by keywords: free, ICPC, ICD-10, domain- specific thesaurus (Borgertesaurus) Random distribution of indexing tasks and indexing methods Data collection: –Indexing data –Time used –User satisfaction

Marianne Lykke Nielsen September 2008 Evaluation metrics - indexing Indexing methodAccuracy (by use of reference standard) Consistency Semantic components (SC)Recall and Precision - Of characters in each component Binary K - Agreement of marking up a semantic component - Over all characters in each component KeywordsRecall and Precision - keywords Binary K - Agreement of keyword assignment - Over all keywords, selected by one indexer or in reference standard Traditional consistency metrics

Marianne Lykke Nielsen September 2008 Akkurathed – semantisk komponenter DocumentSemantic componentKeywords Recall macroaverage Precision macroaverage Recall macroaverage Precision macroaverage ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0.34

Marianne Lykke Nielsen September 2008 Summing up We developed the method Indexing with semantic components – to improve precision in retrieval We tested the retrieval effectiveness of the method in a case study with 30 family practitioners in sundhed.dk Over all scenarios, the improvement in was 35.5% as measured by MAP and 25.6% by nDCG – compared with the standard, control system We tested indexing accuracy in an indexing test with 16 sundhed.dk indexers Findings indicate that it is easier to obtain higher accuracy with semantic component indexing as regards recall Future research Develop the method, e.g. simpler version, automation of up marking, use of XML for up marking, end-user up-marking Evaluation in other domains

Marianne Lykke Nielsen September 2008 Literature Dillon, M (1991). Reader’s model of text structures: the case of academic articles. International Journal of Man-Machine Studies, – 925. Ely, J, Osheroff, J, Ebell, M, Bergus, G, Levy, B Chambliss, M & Evans, E (1999). Analysis of wquestions asked by family doctors regarding patient care. BMJ, 310 (7206). 358 – 361. Ely, J, Osheroff, J, Gorman, P, Ebell, M, Bergus, G, Levy, B Chambliss, M, Pifer, E & Stavri, P (2000). A taxonomy of generic clinical questions: classification study. BMJ, 321 (7278) Hearst, M & Plaunt, C (1993). Subtopic structuring for full length document access. Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 59 – 69. Jarvelin, K, Price, S, Delcambre, L M L, Nielsen, M L (2008). Discounted Cumulated Gain based Evaluation of Multiple-Query IR Sessions. Proceedings 30th European Conference on Informaformation retrieval (ECIR), Glasgow, March 30 – April 3, 2008 Orlikowski, W J & Yates, J (1994). Genre repertoire: the structuring of communicative practices in organizations. Administrative Science Quarterly, – 574. Price, S, Delcambre, L & Nielsen, M L (2006). Using semantic components to express questions against document collections. Proceedings International Workshop on Health Information and Knowledge Management (HIKM 2006), Arlington (VA). Price, S, Nielsen, M L, Delcambre, L & Vedsted, P (2007). Semantic components enhance retrieval of domain-specific documents. Proceedings of the ACM Sixteenth Conference on Information and Knowledge Management (CIKM), Lisboa, November 6 - 8, 2007.