The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan.

The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan Schlobach, Shenghui Wang Cultural Heritage on the Semantic Web Workshop Oct. 12 th, 2007

OAEI 2007: Results from the Library Track Introduction One important problem in CH Heterogeneity of description resources Thesauri (at large) Classification schemes, subject heading lists … Hampers access across collections

OAEI 2007: Results from the Library Track Introduction Ontology alignment can help Semantic links between ontology elements o1:Cat owl:equivalentClass o2:Chat Using automatic tools E.g. exploiting labels, structure

OAEI 2007: Results from the Library Track Introduction Problem: not much on alignment applications Need further research on context-specific alignment Generation Deployment Evaluation Important context dimension: application scenarios

OAEI 2007: Results from the Library Track Agenda Introduction Dutch National Library Scenarios for Alignment Book Re-indexing Scenario-specific Evaluation

OAEI 2007: Results from the Library Track KB and Thesaurus Alignement National Library of the Netherlands (KB) 2 main collections Each described (indexed) by its own thesaurus Problem: maintenance optimized wrt. redundancy/size?

OAEI 2007: Results from the Library Track Usage Scenarios for Thesaurus Alignment at KB Concept-based search Retrieving GTT-indexed books using Brinkman concepts Re-indexing Indexing GTT-indexed books with Brinkman concepts Integration of one Thesaurus into the other Inserting GTT elements into the Brinkman thesaurus Thesaurus Merging Building a new thesaurus fro GTT and Brinkman Free-text search matching user search terms to both GTT or Brinkman concepts Navigation: browse the 2 collections through a merged version of the thesauri

OAEI 2007: Results from the Library Track The Book Re-indexing Scenario Scenario: re-indexing of GTT-indexed books by Brinkman concepts

OAEI 2007: Results from the Library Track The Book Re-indexing Scenario If one of the thesauri is dropped, legacy data has to be indexed according to the other voc. Automatically Semi-automatically, users presented with candidate annotations

OAEI 2007: Results from the Library Track Scenario Requirements Mapping sets of GTT concepts to sets of Brinkman align reindex : {g1,…,gm} →{b1,…,bn} Option where users select based on probabilities Candidates concepts are given weights (e.g. [0;1]) align reindex ’: {g1,…,gm} → {(b1,w1),…,(bn,wn)} Generated index should be generally small 99.2% of depot books indexed with no more than 3 Brinkman concepts

OAEI 2007: Results from the Library Track Semantic Interpretation of Re-indexing Function 1-1 case: g1->b1 b1 is semantically equivalent to g1 OK b1 is more general than g1 Loss of information Possible if b1 is the most specific subsumer of g1’s meanings Indexing specificity rule

OAEI 2007: Results from the Library Track Semantic Interpretation of Re-indexing Function Generic cases: combinations of concepts Considerations on semantic links are the same Combination matters Indexing is post-coordinated {“Geography”; “the Netherlands”} in GTT -> book about geography of the Netherlands Different granularities/indexing points of view Brinkman has “Netherlands; Geography”

OAEI 2007: Results from the Library Track Problem of Alignment Deployment Results of existing tools may need re-interpretation Unclear semantics of mapping links "=","<" weights Single concepts involved in mappings

OAEI 2007: Results from the Library Track Example of Alignment Deployment Approach Starting from similarity measures over both thesauri sim(X,Y)=n Aggregation strategy: simple Ranking For a concept, take the top k similar concepts Gather GTT concepts and Brinkman ones Re-indexing function specified by conditions for firing rules E.g., if the book indexing contains the left part of the rule

OAEI 2007: Results from the Library Track Evaluation Design We do not assess the rules We assess their application on book indexing 2 classical aspects: Correctness (cf. precision) Completeness (cf. recall)

OAEI 2007: Results from the Library Track Evaluation Design: Different Variants and Settings Fully automatic evaluation Using the set of dually indexed books as gold standard Manual evaluation 1 Human expert assesses candidate indices Unsupervised setting: margin of error should be very low Supervised setting: less strict, but size also matter Manual evaluation 2 Same as 1, but a first index has been produced by the expert Distance between the two indices is assessed Eventually changing original index

OAEI 2007: Results from the Library Track Human Evaluation vs. Automatic Evaluation Taking into account Indexing variability Automatic evaluation compares with a specific indexing choice Especially important if thesaurus doesn’t match book subject Evaluation variability Only one expert judgment is considered per book Evaluation set bias Dually-indexed books (may) present specific characteristics

OAEI 2007: Results from the Library Track New Developments, Outside of Paper! Reviews: you should add actual results of good general alignment systems as compared to your scenario Ontology Alignment Evaluation Initiative http://oaei.ontologymatching.org/2007 State-of-the-art aligners applied to specific cases This paper: grounding for an OAEI Library track KB vocabularies Evaluation in re-indexing scenario

OAEI 2007: Results from the Library Track Automatic Evaluation There is a gold standard for re-indexing scenario General method: for dually indexed books, compare existing Brinkman annotations and new ones

OAEI 2007: Results from the Library Track Automatic Evaluation Book level: Precision & Recall for matched books Books for which there is one good annotation Minimal hint about users’ (dis)satisfaction Annotation level: P & R for candidate annotations Note: counting over annotations and books, not rules and concepts Rules & concepts used more often are more important

OAEI 2007: Results from the Library Track Automatic Evaluation Results Notice: for exactMatch only

OAEI 2007: Results from the Library Track Manual Evaluation Method Variant 1, for supervised setting Selection of 100 books 4 KB evaluators Paper forms + copy of books

OAEI 2007: Results from the Library Track Paper Forms

OAEI 2007: Results from the Library Track Annotation Transl.: Manual Evaluation Results Research question: quality of candidate annotations Measures used: cf. automatic evaluation Performances are consistently higher [Left: manual evaluation, Right: automatic evaluation]

OAEI 2007: Results from the Library Track Annotation Transl.: Manual Evaluation Results Research question: evaluation variability Krippendorff’s agreement coefficient (alpha) High variability: overall alpha=0.62 <0.67, classic threshold for Computational Linguistics tasks But indexing seems to be more variable than usual CL tasks

OAEI 2007: Results from the Library Track Annotation Transl.: Manual Evaluation Results Research question: indexing variability Measuring acceptability of original book indices Kripendorff’s agreement for indices chosen by evaluators 0.59 overall alpha confirms high variability

OAEI 2007: Results from the Library Track Conclusions Better characterization of alignment scenarios is needed For a single case there are many scenarios and variants Requires to elicit requirements And evaluation criteria

OAEI 2007: Results from the Library Track Discussion: Differences between scenarios? Concept-based search Re-indexing Integration of one thesaurus into the other Thesaurus merging Free-text search aided by thesauri Navigation

OAEI 2007: Results from the Library Track Discussion: Differences between scenarios? Semantics of alignment A core of primitives that are be useful broader/narrower, related Some constructs are more specific “AND” combination for re-indexing Interpretation of equivalence? Thesaurus merging: “excavation” = “excavation” Query reformulation: “excavation” = “archeology; Netherlands”

OAEI 2007: Results from the Library Track Discussion: Differences between scenarios? Multi-concept alignment Useful for re-indexing or concept-based search Less for thesaurus re-engineering scenarios Combinations are not fully dealt with by thesaurus formats But simple links involving a same concept can be useful C1 BT C2 C1 BT C3

OAEI 2007: Results from the Library Track Discussion: Differences between scenarios? Precision and recall Browsing -> emphasis on recall For other scenarios, it depends on the setting Supervised vs. unsupervised

OAEI 2007: Results from the Library Track Thanks!

The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan.

Similar presentations

Presentation on theme: "The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan.

Similar presentations

Presentation on theme: "The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan."— Presentation transcript:

Similar presentations

About project

Feedback