1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.

Slides:



Advertisements
Similar presentations
European Thesaurus on International Relations and Area Studies A multilingual terminological tool on international affairs Axel Huckstorf Stiftung Wissenschaft.
Advertisements

PubMed/How to Search, Display, Download & (module 4.1)
DELOS WP5 Workshop: Semantic Interoperability in DL systems, 17 th September 2004, Bath, UK Semantic Interoperability in Digital Library Systems Task 3:
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
Chapter 5: Introduction to Information Retrieval
How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Information Retrieval Review
Modern Information Retrieval
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
Thesaurus Design and Development
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
New Ways of Mapping Knowledge Organization Systems Using a Semi-Automatic Matching- Procedure for Building Up Vocabulary Crosswalks Andreas Oskar Kempf.
1/ 27 The Agriculture Ontology Service Initiative APAN Conference 20 July 2006 Singapore.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Title of the Poster. “Digital library services and their impact with reference to a developing country: The case of the Faculty of Health Sciences library,
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Federal Department of Home Affairs FDHA Swiss Federal Office of Culture FOC Swiss National Library SNL Multilingual Access to Subjects (MACS) Patrice Landry.
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Applying Text Classification in Conference Management: Some Lessons Learned Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber.
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
GESIS Dr. Maximilian Stempfhuber Head of Research and Development Social Science Information Centre, Bonn, Germany How to deal with heterogeneity when.
Science Research: Journey to 10,000 Sources Presented by: Abe Lederman, President and Founder Deep Web Technologies, Inc. Special Libraries Association.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
The European Thesaurus on International Relations and Area Studies A Multilingual Resource for Indexing, Retrieval, and Translation SWP Michael Kluck and.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
In pursuit of interoperability: Can we standardize mapping types? Stella G Dextre Clarke Project Leader, ISO NP
Information Retrieval Evaluation and the Retrieval Process.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde.
CNI Spring Meeting April 26, 1999 Washington, DC THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory Graduate School.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
The KOS interoperability in aquatic science field through mapping processes Carmen Reverté Reverté Aquatic Ecosystems Documentation Center. IRTA. (Sant.
Translingual Retrieval Moving between vocabularies MACS 2010 Jahns / Karg, Deutsche Nationalbibliothek Concepts in Context - Cologne Conference on Interoperability.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.
Stiftung Wissenschaft und Politik German Institute for International and Security Affairs SWP Developing a New Portal on International Relations and Area.
Creation of custom KOS-based recommendation systems Thomas Lüke, Wilko van Hoek, Philipp Schaer, Philipp Mayr TPDL 2012 Paphos, Cyprus,
June 20, 2007ESRI Intl. User Conference Dawn Wright - Oregon State University Val Cummins - Coastal & Marine Resources Centre, IRELAND Liz O’Dea - Coastal.
Semantic Portal Business and Economics – Project Report NKOS Workshop September 19 th 2008 Aarhus, Denmark Project Report: Semantic Portal Business and.
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Workshop on The Transformation of Science Max Planck Society, Elmau, Germany June 1, 1999 TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge.
Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
6 th ECDL NKOS Workshop Organisers: Doug Tudhope Traugott Koch Marianne Lykke Nielsen NKOS Workshop, Budapest, 2007.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
L&I SCI 110: Information science and information theory Instructor: Xiangming(Simon) Mu Sept. 9, 2004.
What Does the User Really Want ? Relevance, Precision and Recall.
Comparing the accuracy of the semantic similarity provided by the Normalized Google Distance (NGD) and the Search Term Recommender (STR). Wilko van Hoek,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
D3.4 Report on Cross-Language Subject Access Options Subject access seminar, Prague Patrice Landry Swiss National Library.
Lecture 12: Relevance Feedback & Query Expansion - II
CrissCross, Seoul
RECENT TRENDS IN METADATA GENERATION
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
IL Step 3: Using Bibliographic Databases
Introduction to Information Retrieval
Semantic Interoperability in Digital Library Systems
PubMed Database Interface (Basic Course: Module 4)
Presentation transcript:

1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany 8 th NKOS Workshop at the 13 th ECDL Conference Corfu, Greece, 01. October 2009

2 KoMoHe Project ( ) KoMoHe (Competence Center Modeling and Treatment of Semantic Heterogeneity) Goals: –Models for searching heterogeneous collections –Development, organization & management of cross-walks between controlled vocabularies –IR evaluation of the mappings (effectiveness of intellectual mapping)

3 Relations Equivalence Narrower Term Broader Term Related Term Null: no mapping manually created, directed relations between controlled terms of two knowledge organization systems (KOS) KOS 1RelationKOS 2 Library= Bibliothéque Library> Special library Thesaurus< KOS Hacker^ Computers + Security Virus0

4 Cross-concordances 25 Vocabularies in 64 cross-concordances –Thesauri (16) –Descriptor lists (4) –Classifications (3) –Subject heading lists (2) 380,000 mapped terms 465,000 relations 205,000 equivalence relations 13 German, 8 English, 1 Russian, 3 multilingual

5 Disciplines

6 Net of Cross-concordances Each node represents a KOS

7 Objectives Translate search terms into other terminologies Increase diversity of documents from different databases Improve search experience without effort for searcher Test the effect for IR in different disciplines (social science and others)

8 Main questions What is examined? –the quality of the mappings –or the quality of the associated search Can we enable distributed search with the subject access tools over several information systems? –In one discipline –Between at least two disciplines Is the impact of terminology mapping on recall and precision measurable? The mappings are helpful to whom?

9 Information Retrieval Test Question: How effective are the mappings in an actual search? Does the application of term mappings improve search over a non-transformed subject (i.e. controlled vocabulary) search?

10 Information Retrieval Tests Thesauri mappings only Only equivalence relations Real queries (~6 per tested cross-concordance) Databases: 80,000 – 16 mio. documents Test 1 (CT  TT): 13 Cross-concordances Test 2 (FT  FT+TT): 8 Cross-concordances

11 Mayr & Petras, 2008

12 Steps Requesting recent research topics from our partners (social science and others) Intellectually translating the topics into controlled term searches in a KOS A Automatically translating the controlled terms via HTS into the controlled terms of a KOS B Retrieving documents from two runs 1. Controlled term (CT) search (KOS A) in database B 2. Translated term (TT) search (KOS B) in database B

13 Information Retrieval Test CT-TT HTS (Heterogeneity Service) ~ Web service providing the mappings Run 1 Run 2

14 Information Retrieval Tests Test 1 Intradisciplinary: Social sc. – Social sc. TheSoz – DZI DZI – TheSoz TheSoz – SWD SWD – TheSoz CSA – TheSoz 5 concordances 3 databases 35 topics Test 3 Interdisciplinary: Int. Relations – Economics Medical sc. – Psychology IBLK – STW STW – IBLK Mesh – Psyndex Psyndex – Mesh 4 concordances 4 databases 28 topics Test 2 Interdisciplinary: Social sc. – Psychology Social sc. – Economics TheSoz – Psyndex Psyndex – TheSoz TheSoz – STW STW – TheSoz 4 concordances 3 databases 19 topics

15 Methodology Downloading the documents for both runs (CT, TT), cutt-off: 1,000 docs Pooling both runs (CT, TT) for each topic Importing the documents into a assessment tool Relevance assessment of the documents by experts Analysis of the assessment data –Retrieved: average number of retrieved documents (across all search types) –Relevant: average number of relevant retrieved documents (across all search types) –Rel_ret: average number of relevant retrieved documents for a particular search type –Recall: proportion of relevant retrieved documents out of all relevant documents (averaged across all queries of one search type) –Precision: proportion of relevant retrieved documents out of all retrieved documents (averaged across all queries of one search type)

16 Assessment of the documents: by experts

17 Information Retrieval Tests - Results CT  TT (Improvements in %) Recall = Hitrate Precision = Accuracy Intradisciplinary+39%+34% Interdisciplinary+136%+68% Recall = Hitrate Precision = Accuracy Intradisciplinary+20%-12% Interdisciplinary+24%-24% FT  FT+TT (Improvements in %) Detailled results can be found in Mayr & Petras, 2008

18 Discussion Overlap and more identical terms in intradisciplinary mappings –Mapping in one discipline is simpler: just one expert –Lesser effect on search –Automatic mapping may be more useful in intradisciplinary sets: mainly syntactic matching Language plays a major role –we had just one bilingual mapping in the test Restrictions of the study: no real users or interactions, only thesauri, KOS in German

19 Summary Why are cross-concordances in one discipline less effective for IR? Amount of identical terms are significantly higher in one discipline (one language) No effective transformation possible for IR, if you have identical terms Mapping projects should more often perform IR tests to measure the effect of their mappings.

20 Conclusion Cross-concordances improve subject search with controlled terms & free-text search: larger measurable effects on interdisciplinary mappings Only 24% relations utilized (equivalence) Potential: –Other relations –STR  CT translation More mappings which are not evaluated Mappings are used e.g. in portals like sowiport, vascoda, ireon, … and other projects

21 Next steps Visualization of the terminology network Combined evaluation with other value- added services (search term recommendation) Conversion to SKOS Evaluation of other disciplines Evaluation of indirect term transformation (term – switching term – end term)

22 Publications Mayr, Philipp; Petras, Vivien (2008): Cross-concordances: terminology mapping and its effectiveness for information retrieval. In: 74th IFLA World Library and Information Congress. Québec, Canada- en.pdf en.pdf Mayr, Philipp; Mutschke, Peter; Petras, Vivien (2008): Reducing semantic complexity in distributed Digital Libraries: treatment of term vagueness and document re- ranking. In: Library Review. 57 (2008) 3. pp

23 Indirect term transformations Social sciences – gerontology – medicine

24 Sowiport Search

25 KoMoHe Project information_technology/komohe.htm