Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aligning Thesauri for an integrated Access to Cultural Heritage Collections Antoine ISAAC (including slides by Frank van Harmelen) STITCH Project UDC Conference.

Similar presentations


Presentation on theme: "Aligning Thesauri for an integrated Access to Cultural Heritage Collections Antoine ISAAC (including slides by Frank van Harmelen) STITCH Project UDC Conference."— Presentation transcript:

1 Aligning Thesauri for an integrated Access to Cultural Heritage Collections Antoine ISAAC (including slides by Frank van Harmelen) STITCH Project UDC Conference June 5 th, 2007

2 Aligning Thesauri for an integrated Access to CH Collections Background CATCH Continuous Access To Cultural Heritage Funded by NWO 10 computer science research projects applied to the Cultural Heritage field STITCH SemanTic Interoperability To access Cultural Heritage Exchanging and integrating metadata Beware: this is research!

3 Aligning Thesauri for an integrated Access to CH Collections Agenda The Semantic Interoperability problem Demo Semantic Web solutions for interoperability Conceptual vocabulary alignment Conceptual vocabulary representation

4 Aligning Thesauri for an integrated Access to CH Collections KB Illustrated Manuscripts

5 Aligning Thesauri for an integrated Access to CH Collections KB Illustrated Manuscripts

6 Aligning Thesauri for an integrated Access to CH Collections BNF Mandragore

7 Aligning Thesauri for an integrated Access to CH Collections BNF Mandragore

8 Aligning Thesauri for an integrated Access to CH Collections The Semantic Interoperability Problem Trend: simultaneous access to different collections Problem: conceptual heterogeneity No standard vocabulary/thesaurus “classical ruins” vs. “landscape with ruins” “the Virgin Mary” vs. “Saint Mary” We don’t really want it different vocabularies for different domains, traditions, tasks Practical consequence: Searching for “the Virgin Mary” misses “Saint Mary” Unless we know both vocabularies

9 Aligning Thesauri for an integrated Access to CH Collections Old situation

10 Aligning Thesauri for an integrated Access to CH Collections Vocabulary alignment Find semantic correspondences between vocabulary elements “classical ruins” ≈ “landscape with ruins” “the Virgin Mary” = “Saint Mary”

11 Aligning Thesauri for an integrated Access to CH Collections New situation

12 Aligning Thesauri for an integrated Access to CH Collections Demo http://stitch.cs.vu.nl/rp33333/MANDRA-SV-ICE- mandraNewNONE, amphibianshttp://stitch.cs.vu.nl/rp33333/MANDRA-SV-ICE- mandraNewNONE Wheat [Screenshots at the end of these slides]

13 Aligning Thesauri for an integrated Access to CH Collections Agenda The Semantic Interoperability problem Demo Semantic Web solutions for interoperability Conceptual vocabulary alignment Conceptual vocabulary representation

14 Aligning Thesauri for an integrated Access to CH Collections Vocabulary alignment Find correspondences between vocabulary elements “klassieke ruïnes” ≈ “landschap met ruïnes” “maagd Maria” = “Heilige Moeder” STITCH aim: doing it (semi-)automatically Vocabularies are big They evolve over time Using techniques from Semantic Web research domain Problem comparable to ontology alignment Techniques already investigated there Linguistics, statistics

15 Aligning Thesauri for an integrated Access to CH Collections Automatic alignment techniques Lexical Structural Statistical Background knowledge

16 Aligning Thesauri for an integrated Access to CH Collections Lexical alignment Labels of entities, textual definitions tumor brainLongtumor Long More specific than

17 Aligning Thesauri for an integrated Access to CH Collections Automatic Alignment Techniques Lexical Structural Statistical Background knowledge

18 Aligning Thesauri for an integrated Access to CH Collections Statistical alignment Object information (e.g. book indexing)

19 Aligning Thesauri for an integrated Access to CH Collections Statistical alignment: KB collections (4951 1152 613) Nederlands - Nederlandse taalkunde (280 714 243) Diabetes mellitus - suikerziekte

20 Aligning Thesauri for an integrated Access to CH Collections Automatic Alignment Techniques Lexical Structural Statistical Background knowledge

21 Aligning Thesauri for an integrated Access to CH Collections background knowledge Alignment using shared background knowledge Using a shared conceptual reference to find links thesaurus 1 thesaurus 2

22 Aligning Thesauri for an integrated Access to CH Collections Alignment: no universal solution No single technique gives an ideal solution Different techniques have to be selected/combined, depending on the application case Poor vs. rich semantic structure Extensive vs. limited lexical coverage Existence of collections described by several vocabularies Alignment is a difficult research problem

23 Aligning Thesauri for an integrated Access to CH Collections Agenda The Semantic Interoperability problem Demo Semantic Web solutions for interoperability Conceptual vocabulary alignment Conceptual vocabulary representation

24 Aligning Thesauri for an integrated Access to CH Collections Representing Vocabularies Many different models and formats to represent vocabularies Need for standard formats to develop standardized tools and methods Alignment process Browsing/information retrieval tools using vocabularies Need to represent features commonly used by these tools Especially lexical information and semantic links

25 Aligning Thesauri for an integrated Access to CH Collections SKOS (Simple Knowledge Organisation System) World Wide Web Consortium (W3C) Model to represent simple conceptual vocabularies (thesauri, classification schemes) on the Semantic Web Comparable to Dublin Core, for conceptual vocabularies SKOS offers building blocks to create XML/RDF data Concept s and ConceptScheme s Lexical properties ( prefLabel, altLabel ) Semantic relations ( broader, related ) Notes ( scopeNote, definition )

26 Aligning Thesauri for an integrated Access to CH Collections SKOS: Small UDC Example skos:Concepthttp://www.udcc.org/udc/class_512 skos:prefLabel512@zxx skos:prefLabel Algebra@en skos:broaderhttp://www.udcc.org/udc/class_51 Beware: this is a standard, not everything can be represented! E.g. for UDC, difficult to represent all types of auxiliaries Is -2 Evidence of religion a standard concept?

27 Aligning Thesauri for an integrated Access to CH Collections Agenda The Semantic Interoperability problem Demo Semantic Web solutions for interoperability Conceptual vocabulary alignment Conceptual vocabulary representation

28 Aligning Thesauri for an integrated Access to CH Collections Conclusion: New opportunities for making knowledge accessible Integration of collections at the semantic level Semantic integration and vocabulary alignment Representation and publication of conceptual vocabularies SKOS is an open, web-compatible standard Semantic Web research can help Cultural Heritage Vision: a global network of interconnected collections and vocabularies that can be exploited by standard tools? Or somewhere in-between present situation and the vision

29 Aligning Thesauri for an integrated Access to CH Collections Discussion: UDC and Semantic Interoperability? UDC as pivot language (spine) for multilingual access Ideal for multilingual scenarios Compatible with common information needs “Front-office” scenario Aligning initial vocabularies to UDC Using UDC in the access system MSAC Multilingual Subject Access to Catalogues of National Libraries UDC as a searching/browsing means, with other vocabularies

30 Aligning Thesauri for an integrated Access to CH Collections Discussion: UDC and Semantic Interoperability? “Back-office” scenario? UDC as a background resource for automatic pairwise alignment between the initial vocabularies Multilingual information, rich semantic structure Both scenarios require more accessible UDC And experimentation…

31 Aligning Thesauri for an integrated Access to CH Collections Thanks!

32 Aligning Thesauri for an integrated Access to CH Collections Links STITCHhttp://stitch.cs.vu.nlhttp://stitch.cs.vu.nl Demo collections BNF Mangragorehttp://mandragore.bnf.frhttp://mandragore.bnf.fr KB illuminated manuscripts http://www.kb.nl/manuscripts/http://www.kb.nl/manuscripts/ Library-originated integration projects: MSAC search interfacehttp://sigma.nkp.czhttp://sigma.nkp.cz MACS projecthttp://macs.cenl.orghttp://macs.cenl.org Semantic web links Semantic Web at W3C http://www.w3.org/2001/sw/http://www.w3.org/2001/sw/ SKOS http://www.w3.org/2004/02/skos/http://www.w3.org/2004/02/skos/ Semantic Web projects dealing with Cultural Heritage MuseumFinlandhttp://www.museosuomi.fi/http://www.museosuomi.fi/ eCulturehttp://e-culture.multimedian.nl /http://e-culture.multimedian.nl /

33 Aligning Thesauri for an integrated Access to CH Collections Demo (1) Subject vocabulary, collection 1 Subjects

34 Aligning Thesauri for an integrated Access to CH Collections Demo (2) Hierarchical path from root to selected subject Possible specialization for selected subject

35 Aligning Thesauri for an integrated Access to CH Collections Document from Collection 2 Semantic alignment of subjects activated Demo (3)

36 Aligning Thesauri for an integrated Access to CH Collections Demo (4) Subject from voc2 aligned to voc1:amphibians” Back


Download ppt "Aligning Thesauri for an integrated Access to Cultural Heritage Collections Antoine ISAAC (including slides by Frank van Harmelen) STITCH Project UDC Conference."

Similar presentations


Ads by Google