Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICS-FORTH January 11, 2000 1 Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.

Similar presentations


Presentation on theme: "ICS-FORTH January 11, 2000 1 Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January."— Presentation transcript:

1 ICS-FORTH January 11, 2000 1 Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January 11, 2000 Centre for Cultural Informatics and Documentation Systems

2 ICS-FORTH January 11, 2000 2 Thesaurus Mapping The Problem p Logical aspects u Semantics of involved entities u Notions of translation u Objectives and logics of mapping p Production of mappings u Human u Language engineering, cluster analysis p Architecture u Mapping management u Mapping service u Integration in IT environment

3 ICS-FORTH January 11, 2000 3 Thesaurus Mapping Why do we need mapping? p Thesauri for information retrieval depend on: u View point (e.g. functional, morphological, social, special database fields etc.) u Language or social group (experts, common people etc.) u Size and distribution of target material (effective partitioning) p Therefore u Concepts differ u Use of concepts differs u Semantic embedding differs p Even if we agree on the same world u Research topic: Formalisation of views and context

4 ICS-FORTH January 11, 2000 4 Thesaurus Mapping Semantics of entities p Concepts are defined by agreement, e.g. orange (colour) p Concepts identify sets of real world objects p Concepts are identified by u scope notes, literature references, examples, images p Concepts should not be changed u they should be created or abandoned u they should be understood, accepted or rejected p A Descriptor is a concept identifier

5 ICS-FORTH January 11, 2000 5 Thesaurus Mapping Semantics of entities p Links should express opinions and differences u about set relation between concepts subsumtion, disjointness etc. u about derived concepts u about term usage u opinions may be human or computational ! p Terms (noun phrases) should be used u by social groups to refer to (multiple) concepts u without direct linguistic meaning u one term is selected as concept identifier

6 ICS-FORTH January 11, 2000 6 Thesaurus Mapping Semantics of entities p concept - concept relations: u set semantics : BT, between thesauri/ version - for query expansion, users u associative: RTs, BTP, etc, - for user guidance p concept - term : u authoritative: preferred, used for - for cataloguers, users u statistical, possible synonyms: - for information retrieval p term - term relations : u dictionary entries: - limited precision, within LE tools

7 ICS-FORTH January 11, 2000 7 p A translated thesaurus: For comprehension u Established concepts and terms from one user group u Optimally interpreted in words of another or more languages u Translations are not established terms p Mapped thesauri (ISO5964): For transition u Independent thesauri, each one from another user group u Established concepts and terms. u links declare “overlap” between concepts p Interlingua: For communication and knowledge sharing u Compromise to share concepts between many user groups u Optimally interpreted in words of another language Thesaurus Mapping What is a Multilingual Thesaurus?

8 ICS-FORTH January 11, 2000 8 Thesaurus Mapping Functionality of Mapping p Transparent query transformation (Z39.50!) u Replace Boolean term combination from thesaurus A with optimal term combination from thesaurus B to retrieve equivalent results u Guaranteed transition needed (ev. to higher concepts) u Need controlled loss of precision or recall (research!) u Combinatorial explosion: Need cascading Thes A => Thes B => Thes C

9 ICS-FORTH January 11, 2000 9 o Interthesaurus relations (ISO 5964) (from Descriptor of Thes. A to Descriptor of Thes. B ) partial equivalence Better: broader equivalence narrower equivalence exact equivalence inexact equivalence (“+/-”) good for FTR only single to multiple equivalence Better: exact equivalence to BOOLEAN combination of target terms. “AND” (intersection), “OR” (union), “NOT” (complement) Thesaurus Mapping Logics of Mapping

10 ICS-FORTH January 11, 2000 10 AND English Heritage Thesaurus Merimee Thesaurus English Vocabulary French Vocabulary Interthesaurus relations linguistic translation linguistic translation +/- Interlingua +/- Thesaurus Mapping Translation and Mapping

11 ICS-FORTH January 11, 2000 11 BT Thesaurus Mapping Boolean OR-Combinations A C B B OR C Exact equivalence Boolean Compound Combines instances of B and C Uses properties of either B or C Is BT of B, C and NT of their common broader terms.

12 ICS-FORTH January 11, 2000 12 BT Thesaurus Mapping Boolean AND-Combinations A B AND C Exact equivalence Boolean Compound Uses instances of both, B and C Combines properties of B and C Is NT of B, C and BT of their common narrower terms. C B

13 ICS-FORTH January 11, 2000 13 BT Thesaurus Mapping Approximation by Inclusion A C B Broader equivalence Narrower equivalences

14 ICS-FORTH January 11, 2000 14 BT Thesaurus Mapping Avoid redundant linking! A B Broader equivalence Narrower equivalences Exact equivalence

15 ICS-FORTH January 11, 2000 15 Thesaurus Mapping Problems of Mapping p Consistency and reasoning (Description Logics!) p Optimal substitution of combined query terms p Protocol to propagate recall/ precision control p Inverse reading of one-to-many links. p Postcoordination : unclear semantics ! e.g. “grinding & factories”, solution by DL ?

16 ICS-FORTH January 11, 2000 16 Thesaurus Mapping Production of Mappings p Human assessment needs (see Term-IT): u CSCW, work flow, decentralised management tools u Excellent comparative presentation of thesaurus contents p Language engineering (see Term-IT): u termhood recognition, automatic translation by parallel texts, filtering by occurrence in target indexing language. u Excellent for preprocessing ! p Analysis of use: u Cluster analysis with doubly indexed entries. u Libraries: problem to identify the same “work” !

17 ICS-FORTH January 11, 2000 17 SIS - Thesaurus Management System Co-operative linking BT Version 0 Version 1 Version 0 Version 1 Version 2 New Workspace Group 1Group 2 New Workspace obsolete term links of group2 links of group1

18 ICS-FORTH January 11, 2000 18 Thesaurus Mapping Users Environment

19 ICS-FORTH January 11, 2000 19 Search Aid Tool Thesaurus Mapping Three-level Architecture CMS Maintainer CMS CMS Maintainer CMS National Authority Providers concept proposal Thesaurus initialization Local TMS End User Cascaded mapping service concept proposal Thesaurus initialization Update term use Update term use

20 ICS-FORTH January 11, 2000 20 Thesaurus Mapping Architectural Considerations p We propose to distinguish: u Collection Management Systems with local term management u National authority providers u Mapping service p Mapping service: u Co-operative mapping production environment and system, - for few languages (3?), domain specific ? u Large scale mapping tables detached from production system, accessible as replicated Web resource. p Integration: u Access engines connect to mapping resources on demand u Provision of suitable metadata for CMS capabilities

21 ICS-FORTH January 11, 2000 21 Thesaurus Mapping Conclusions p Thesaurus mapping is feasible and the best means to access coherently multiple CMS with controlled vocabulary p Thesaurus mapping is a major investment in human resources and IT environment p Targeted research can much improve the currently feasible - quality of mapping - quality of service - and production cost


Download ppt "ICS-FORTH January 11, 2000 1 Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January."

Similar presentations


Ads by Google