Download presentation
Presentation is loading. Please wait.
Published byAmberly Newman Modified over 9 years ago
1
Expressing Lexical Complexity in SKOS(XL) Thomas Bandholtz 5th ECOTERM MEETING at FAO, Rome, Italy 05-06 October 2009 innoQ Deutschland GmbH D-40880 Ratingen www.innoq.comwww.innoq.com thomas.bandholtz@innoQ.com
2
Content Expressing Lexical Complexity in SKOS(XL) Motivation Thesaurus Models with regard to lexical complexity UMTHES extensions of SKOSXL Examples using RDF Turtle syntax 5/6 October 20092Ecoterm 2009: Lexical Complexity SKOS(XL)
3
Motivation What is „lexical complexity“? Why should we care? The case: UMTHES in SKOS Umweltbundesamt (DE) & innoQ develop iQvoc
4
What is „lexical complexity“? Each Concept may be represented by multiple terms Preferred / non-preferred term, multilingualism, etc. Each term may have many lexical representations inflection abbreviation “legal” variants in orthography historical versions of “legal” orthography (in German: 1880 - 2006) common misspellings regional variants in the same language Each term may be a compound term a compound term may contain term delimiters (spaces or hyphens) the components may appear dispersed within a sentence the components may designate different concepts by themselves. 5/6 October 20094Ecoterm 2009: Lexical Complexity SKOS(XL)
5
(a side note about orthography) 5/6 October 20095Ecoterm 2009: Lexical Complexity SKOS(XL) “Before compulsory education has been established, it was something to be able to write.” tb: just like Cervantes, Dante, Goethe, Shakespeare, Whitman, etc. “Since then, you have to be a proper speller.” (Peter Bichsel, Der Leser. Das Erzählen. Frankfurter Poetik- Vorlesungen. 1982)
6
Why should we care? Traditional: (nice-to-have): Alphabetic lists of subject indices show some lexical variants. Contemporary (prerequisite): automatic (machine-made) detection of Concepts covered by a natural language document (“Named Entity Recognition”) must capture a covered Concept as concise as possible considering all possible lexical appearances, including term composition Language dependant: English is comparatively simple in this regard. German is awful! (add your language here) 5/6 October 20096Ecoterm 2009: Lexical Complexity SKOS(XL)
7
The case: UMTHES in SKOS The German Environmental Thesaurus UMTHES ~ 12,000 preferred + 25,000 non-preferred terms + 11 000 'multiple- composition' (spelling) forms needs to be serialized in SKOS for migration into the iQvoc vocabulary management tool includes sophisticated knowledge about lexical complexity we don‘t want to loose this moving to SKOS(XL) 5/6 October 20097Ecoterm 2009: Lexical Complexity SKOS(XL)
8
UBA(de) & innoQ develop … iQvoc - Open Source Vocabulary Management Tool Totally Web-based, supports distributed editorial teams Safe and comfortable, schema driven editing features Simple but powerful workflow implementation Conformance W3C “Cool URI” design and deployment W3C SKOS Recommendation Availability GNU public license (GPL) iQvoc version 1 demo (GEMET) at: http://apps.innoq.com/iqvoc/about.html http://apps.innoq.com/iqvoc/about.html iQvoc 2 availability planned for Q1 2010 5/6 October 20098Ecoterm 2009: Lexical Complexity SKOS(XL)
9
Thesaurus models with regard to lexical complexity Traditional - ISO 2788:1986 ISO Model revised (Draft 2008-11-18) SKOS W3C Recommendation 2009-08-18
10
Traditional - ISO 2788:1986 “Guidelines for the establishment and development of monolingual thesauri” indexing language: “A controlled set of terms selected from natural language and used to represent, in summary form, the subjects of documents.” thesaurus: “The vocabulary of a controlled indexing language, formally organized …” preferred term: “A term used consistently when indexing to represent a given concept … sometimes known as descriptor.“ non-preferred term: “The synonym or quasi-synonym of a preferred term. A non-preferred term is not assigned to documents but is provided as an entry point … sometimes known as a non-descriptor" 5/6 October 200910Ecoterm 2009: Lexical Complexity SKOS(XL)
11
ISO 2788:1986 Model (1) 5/6 October 200911Ecoterm 2009: Lexical Complexity SKOS(XL) (hierarchical and associative relations between preferred terms here not in focus) term equivalence see next slide
12
ISO 2788:1986 Model (2) compound term: “An indexing term which can be factored morphologically into separate components, each of which could be expressed, or re-expressed, as a noun that is capable of serving independently as an indexing term. a) the focus or head, i.e. the noun component which identifies the general class of concepts to which the term as a whole refers. Examples: ‘printed indexes’, ‘hospitals for children’. b) The difference or modifier, i.e. one or more further components which serve to narrow the extension of the focus and so specify one of its subclasses. Examples: ‘printed indexes’, ‘hospitals for children’. The focus and its difference(s) may be written as separate words, as in ‘dining rooms’ and ‘soup spoons’, or they may be concatenated into single words, as in ‘bedrooms’ and ‘teaspoons’”. 5/6 October 200912Ecoterm 2009: Lexical Complexity SKOS(XL)
13
ISO Model revised (Draft 2008-11-18) Leonard Will 2009-02-13 in the public SKOS mailing list: “I write as Chair of the ‘Data Modeling, Exchange Formats and Protocols’ subgroup of the ISO working group SC9WG8/Project 25964, currently revising the ISO standard for thesauri for information retrieval, but as these standards are still in draft form anything I say here is my own interpretation of the way we are going, and is not authoritative”. … “The ISO model is firmly based on relationships between concepts, not terms. Terms are used as labels for concepts, as in SKOS”. http://lists.w3.org/Archives/Public/public-esw-thes/2009Feb/0033.html (see diagram on next slide) 5/6 October 200913Ecoterm 2009: Lexical Complexity SKOS(XL)
14
ISO Model revised (Draft 2008-11-18) 5/6 October 200914Ecoterm 2009: Lexical Complexity SKOS(XL)
15
W3C SKOS Recommendation Simple Knowledge Organization System “SKOS is an area of work developing specifications and standards to support the use of knowledge organization systems (KOS) such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web …” Started in 2004: http://www.w3.org/2004/02/skos/http://www.w3.org/2004/02/skos/ 2009-08-18: W3C Recommendation status SKOS Reference: http://www.w3.org/TR/2009/REC-skos-reference-20090818/ http://www.w3.org/TR/2009/REC-skos-reference-20090818/ SKOS Primer: http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/ http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/ SKOS Use Cases and Requirements: http://www.w3.org/TR/2009/NOTE-skos-ucr-20090818/ http://www.w3.org/TR/2009/NOTE-skos-ucr-20090818/ 5/6 October 200915Ecoterm 2009: Lexical Complexity SKOS(XL)
16
SKOS Model about Concepts not terms 5/6 October 200916Ecoterm 2009: Lexical Complexity SKOS(XL) “anything“ can have these labels (~terms) and notes includes relations known from ISO “preferred term”: hierarchical, associative, but not equivalence ~ ISO node label
17
ISO 2788:1986 mapped to SKOS 5/6 October 200917Ecoterm 2009: Lexical Complexity SKOS(XL) ISO 2788:1986~ SKOS (without XL) documentout of scope indexing language n/a, (may be described as the set of all values assigned to prefLabel or altLabel properties of Concept instances in a ConceptScheme) thesaurusConceptScheme (any kind of "controlled structured vocabulary“) mentioned but not definedConcept “An idea or notion; a unit of thought.” indexing termn/a, indexing should use Concept references preferred termvalue of prefLabel assigned to a Concept instance non-preferred termvalue of altLabel assigned to a Concept instance compound termn/a. node labelCollection term hierarchybroader/narrower not between terms but Concept instances term associationrelated not between terms but Concept instances term equivalence n/a, (may be seen between values assigned to prefLabel / altLabel of the same Concept instance Scope note, definitionnote (changeNote, definition, editorialNote, example, scopeNote, …)
18
What is added by SKOSXL? skosxl:Label is a Class not a literal skosxl:Label has (exactly one) literalForm skosxl:Label can have labelRelation to another Label What you don’t see in the diagram: skos:prefLabel etc. are extended by a „property chain“ (seen from a rdfs:Resource ) : the value of an assigned skos:prefLabel is equivalent to the value of the skosxl:literalForm of an assigned skosxl:Label. 5/6 October 200918Ecoterm 2009: Lexical Complexity SKOS(XL)
19
Extensions of SKOSXL by UMTHES properties of skosxl:Label complementing skosxl:literalForm baseForm inflectional “root” of the term (add suffixes to this) inflectionalCode encoding of a regular inflectional pattern lexicalVariant any lexical variant that may appear in a written document inflectional - derived by inflection acronym - any kind of abbreviation cultural - any (sub) cultural variation misspelled - common spelling errors subProperties of skosxl:labelRelation homograph homograph part of a qualified name hasQualifier qualifier part of a qualified name lexicalExtension may point to historical orthography, or verb form, etc. compoundFrom composition (value is a rdf:List) 5/6 October 200919Ecoterm 2009: Lexical Complexity SKOS(XL)
20
Examples using SKOS(XL) (mostly stripped down to a topic)
21
Switching to Turtle Syntax Terse RDF Triple Language W3C Team Submission 14 January 2008 http://www.w3.org/TeamSubmission/turtle/http://www.w3.org/TeamSubmission/turtle/ by TBL Used in W3C SKOS Recommendation as well as in OWL 2 Draft Everything can be expressed in XML as well. Turtle syntax makes more sense for human reading. see yourself … 5/6 October 200921Ecoterm 2009: Lexical Complexity SKOS(XL)
22
UMTHES in SKOS(XL) examples Namespace prefixes used in the following: @prefix rdf:. @prefix rdfs:. @prefix owl:. @prefix skos:. @prefix skosxl:. @prefix ext:. # no prefix means: defined in the local namespace 5/6 October 200922Ecoterm 2009: Lexical Complexity SKOS(XL)
23
waste & garbage # SKOS only :4711 rdf:type skos:Concept; skos:prefLabel “waste”; skos:altLabel “garbage”. # exactly the same in SKOSXL :4711 rdf:type skos:Concept; skosxl:prefLabel :waste; skosxl:altLabel :garbage. :waste rdf:type skosxl:Label; skosxl:literalForm “waste”. :garbage rdf:type skosxl:Label; skosxl:literalForm “garbage”. 5/6 October 200923Ecoterm 2009: Lexical Complexity SKOS(XL) NOTE: Local instance identifiers ( :4711, :waste, :garbage, etc.) in these examples follow a local naming convention which addresses human reading only. “4711” used to be the brand name of a Cologne based perfume manufacturer (“Eau de Cologne”). This has emerged to a generic ID symbol in informatics in the 80/90s. So, :4711 stands for “any kind of unique, but by itself meaningless ID”. The only functional requirements for IDs in this place are: being unique within the assigned namespace; being part of a working http URI. NOTE: Local instance identifiers ( :4711, :waste, :garbage, etc.) in these examples follow a local naming convention which addresses human reading only. “4711” used to be the brand name of a Cologne based perfume manufacturer (“Eau de Cologne”). This has emerged to a generic ID symbol in informatics in the 80/90s. So, :4711 stands for “any kind of unique, but by itself meaningless ID”. The only functional requirements for IDs in this place are: being unique within the assigned namespace; being part of a working http URI.
24
waste & garbage # SKOS only :4711 rdf:type skos:Concept; skos:prefLabel “waste”; skos:altLabel “garbage”. # exactly the same in SKOSXL :4711 rdf:type skos:Concept; skosxl:prefLabel :waste; skosxl:altLabel :garbage. :waste rdf:type skosxl:Label; skosxl:literalForm “waste”. :garbage rdf:type skosxl:Label; skosxl:literalForm “garbage”. # this looks like saying the same stuff in a more complicated way # but wait... 5/6 October 200924Ecoterm 2009: Lexical Complexity SKOS(XL)
25
“waste water” composition :4711 rdf:type skos:Concept; skosxl:prefLabel :wasteWater. :wasteWater rdf:type skosxl:Label; skosxl:literalForm “waste water”; ext:lexicalVariant “wastewater”; ext:compoundFrom (:waste :water). # already defined in the previous slide, could skip it here: :waste rdf:type skosxl:Label; skosxl:literalForm “waste”. #only the noun, “wasted water” is NOT “waste water”! :water rdf:type skosxl:Label; skosxl:literalForm “water”; ext:inflectional “waters”. 5/6 October 200925Ecoterm 2009: Lexical Complexity SKOS(XL)
26
Multiple Composition in German # @en: technique of facilities for the recycling of waste water :4711 rdf:type skos:Concept; skosxl:prefLabel :abwasserAufbereitungsAnlagenTechnik. :abwasserAufbereitungsAnlagenTechnik rdf:type skosxl:Label; skosxl:literalForm “Abwasseraufbereitungsanlagentechnik”; ext:compoundFrom (:abwasser :aufbereitung :anlage :technik); ext:compoundFrom (:abwasserAufbereitung :anlage :technik); ext:compoundFrom (:abwasserAufbereitungsAnlage :technik); ext:compoundFrom (:abwasser :Aufbereitungsanlage :technik); ext:compoundFrom (:abwasserAufbereitung :anlagenTechnik); ext:compoundFrom (:abwasser :aufbereitung: :anlagenTechnik); ext:compoundFrom (:abwasser :aufbereitungsAnlagenTechnik). # maybe I missed some composition variant? Not joking! 5/6 October 200926Ecoterm 2009: Lexical Complexity SKOS(XL)
27
Lexical extension example in German # in English: “cleaning” :reinigung rdf:type skosxl:Label; skosxl:literalForm “Reinigung”@de; ext:lexicalExtension :reinigen. # extended by the verb form, English “to clean” Caution: see “wasted water” :reinigen rdf:type skosxl:Label; skosxl:literalForm “reinigen“@de; ext:baseForm “reinig”; ext:inflectionalCode “007” ext:inflectional “reinige”; ext:inflectional “reinigen”; ext:inflectional “reinigte”; ext:inflectional “gereinigt”; ext:inflectional “gereinigte”; ext:inflectional “gereinigter”; ext:inflectional “gereinigtes”; ext:inflectional “reinigend”; ext:inflectional “reinigende”; ext:inflectional “reinigender”; ext:inflectional “reinigendes”; #to be continued … 5/6 October 200927Ecoterm 2009: Lexical Complexity SKOS(XL)
28
Homograph & qualifier :4711 rdf:type skos:Concept; skosxl:prefLabel :bass--fish. # [ ˈ bas] :4712 rdf:type skos:Concept; skosxl:prefLabel :bass--music. # [ ˈ bās] :bass rdf:type skosxl:Label; skosxl:literalForm “bass”. :fish rdf:type skosxl:Label; skosxl:literalForm “fish”. :bass--fish rdf:type skosxl:Label; skosxl:literalForm “bass (fish)”; ext:homograph :bass; ext:hasQualifier :fish. # add Labels :music and :bass--music using the same pattern 5/6 October 200928Ecoterm 2009: Lexical Complexity SKOS(XL)
29
Multilingualism (symmetric) # symmetric (in SKOS, can be expressed in SKOSXL likewise) :4711 rdf:type skos:Concept; skos:prefLabel “organisation”@en; skos:prefLabel “organization”@en-US; #add your language here... (GEMET has more than 20) skos:prefLabel “Organisation”@de. SKOS integrity condition S14: “A resource has no more than one value of skos:prefLabel per language tag.” NOTE: this does not mean it must have prefLabel values in multiple languages 5/6 October 200929Ecoterm 2009: Lexical Complexity SKOS(XL)
30
Multilingualism (language-centric) # UMTHES is German-centric with altLabel values also in English :4711 rdf:type skos:Concept; skos:prefLabel “Organisation”@de; skos:altLabel “organisation”@en; skos:altLabel “organization”@en-US. # or use skosxl: in the above to refer to: :Organisation rdf:type skosxl:Label; skosxl:literalForm “Organisation”@de; ext:inflectional “Organisationen”; ext:inflectional “Organisations-”. :organisation rdf:type skosxl:Label; skosxl:literalForm “organisation”@en; ext:inflectional “organisations”. :organization rdf:type skosxl:Label; skosxl:literalForm “organization”@en-US; ext:inflectional “organizations”. 5/6 October 200930Ecoterm 2009: Lexical Complexity SKOS(XL)
31
Multilingualism (asymmetric) # full asymmetric pattern (currently not used by UMTHES) :4711 rdf:type skos:Concept; skosxl:prefLabel :Organisation; ext:hasTranslation :4712. :4712 rdf:type skos:Concept; skosxl:prefLabel :organisation. ext:hasTranslation :4711. # :Organisation & :organisation already known from previous slide 5/6 October 200931Ecoterm 2009: Lexical Complexity SKOS(XL)
32
About Federation UMTHES has been one of the 8 sources of GEMET UMTHES extends GEMET with more detailed German Concepts and their lexical complexity. @prefix gemet:. # GEMET URIs do resolve in SKOS since 2009-09 !!! :14452 rdf:type skos:Concept; skosxl:prefLabel :klimaAenderung; skosxl:altLabel :klimaWandel; skosxl:altLabel :climateChange; # referencing GEMET “climatic change” from here skos:closeMatch gemet:1471. :klimaAenderung rdf:type skosxl:Label; ext:compoundFrom (:klima :aenderung); #... etc, as exemplified before 5/6 October 200932Ecoterm 2009: Lexical Complexity SKOS(XL)
33
preferred, non-preferred term again # you may define such classes in SKOS (OWL) at any time # but they will never be exactly equivalent to ISO 2788 (why?) :isPrefLabelOf owl:inverseOf skosxl:prefLabel. :isAltLabelOf owl:inverseOf skosxl:altLabel. :PreferredTerm owl:equivalentClass [ rdf:type owl:Restriction ; owl:onProperty :isPrefLabelOf ; owl:someValuesFrom skos:Concept ]. :NonPreferredTerm owl:equivalentClass [ owl:intersectionOf ( [owl:complementOf :PreferredTerm ] [owl:equivalentClass [ rdf:type owl:Restriction ; owl:onProperty :isAltLabelOf ; owl:someValuesFrom skos:Concept ] ])]. 5/6 October 200933Ecoterm 2009: Lexical Complexity SKOS(XL)
34
Finally … # you may express anything in RDF / Turtle … @prefix foaf:. :ecoTerm2009 rdf:type :meeting; :hasOnAgenda :theseSlides. :theseSlides rdf:type :presentation; skos:preflabel “Expressing Lexical Complexity in SKOS(XL)”; :hasPresenter :tb. :tb rdf:type foaf:person; foaf:mbox ; foaf:isPrimaryTopicOf ; foaf:workplaceHomepage ; foaf:currentProject ; # add your assertions here... :says “Good Buy!”. 5/6 October 200934Ecoterm 2009: Lexical Complexity SKOS(XL)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.