Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linguistic enrichment of ontologies: a glance to the role of previously existing linguistic resources Maria Teresa Pazienza, Armando Stellato

Similar presentations


Presentation on theme: "Linguistic enrichment of ontologies: a glance to the role of previously existing linguistic resources Maria Teresa Pazienza, Armando Stellato"— Presentation transcript:

1 Linguistic enrichment of ontologies: a glance to the role of previously existing linguistic resources Maria Teresa Pazienza, Armando Stellato ART group, Dept. of Computer Science, Systems and Production

2 06/04/ Motivation Ontologies provide vocabularies through which agents in the Semantic Web will be able to communicate –Every specific ontology bears its semantics, which is specified by: the interpretation given by people using the ontology inside a given framework the consistent use that applications make of ontological knowledge How can we recognize if and when these constraints are considered? Or, at least…

3 06/04/ Role of Natural Language What information can both humans and machines rely on? …natural language Natural Language is the last exploitable resource –…to convey data semantics It helps humans in understanding how formal objects relate to their world knowledge It may help machines in harmonizing different conceptualizations –Pros and cons: Pros: it offers a rich and universally accepted mean for express meaning Cons: it is ambiguous; phenomena like synonymy and homonymy must be taken in consideration Possible exploitations for a linguistically motivated approach to ontology development: –Provides useful linguistic anchors for improving knowledge sharing efforts –Strengthens relationships between ontology and raw textual information (for tasks like information extraction, ontology population etc…) –Enhances knowledge understanding and reuse even for humans

4 06/04/ Enriching ontologies with lexical information Possible scenarios for linguistic enrichment: –Explicit Linguistic Enrichment Ontology Linguistic Resource

5 06/04/ Enriching ontologies with lexical information Possible scenarios for linguistic enrichment: –Producing Multilingual Ontologies Ontology Bilingual Linguistic Resource

6 06/04/ Enriching ontologies with lexical information Possible scenarios for linguistic enrichment: –LexicoSemantic Enrichment of Ontologies Craftsman Employee event Academic... Technician Administrative... ProfessorResearcher... … Ontology Linguistic Resource with a Semantic structure (e.g WordNet) Worker

7 06/04/ Exploiting Linguistic Resources Different Linguistic Resources (LRs) are available on the Web These resources differentiate upon: –Trustworthiness: from free initiatives to coordinated research projects –Complexity: quantity and quality of detailed information, adopted model, morphology… –Representation: no standard for representation of linguistic resources –Implementation: available as databases, huge xml repositories, proprietary text formats etc..

8 06/04/ Tools for Linguistic Enrichment: Requirements (possibly) embedded in ontology editing applications Browsing different linguistic resources Providing functionalities for: –Querying LRs with terms from ontology –Enriching ontology concepts with linguistic information –Synonyms –Rich textual descriptions –Translations in different languages –Semantic Indexes from LR –Supporting ontology development by reusing semantic information from linguistic resources (when available)

9 06/04/ Infrastucture The Linguistic Watermark –Offers a classification of different LRs –Provides API for accessing their content

10 06/04/ Infrastucture The Linguistic Watermark –Offers a classification of different LRs –Provides API for accessing their content WordNet

11 06/04/ Infrastucture The Linguistic Watermark –Offers a classification of different LRs –Provides API for accessing their content Freelang

12 06/04/ Infrastucture The Linguistic Watermark –Offers a classification of different LRs –Provides API for accessing their content Dict

13 06/04/ OntoLing: a tool for semi-automatic linguistic enrichment of ontologies Deployed as a plug-in for the popular ontology editing tool Protégé ( then go plugins -> OntoLing ) Exploits the Linguistic Watermark API for accessing LRs Support linguistic enrichment of ontologies and ontology development Linguistic Browser Ontology Browser GUI Facade Linguistic Interface > Protégé API Ontoling Core Ontoling Architecture Different resources may be plugged and recognized at run time, by inspection of their Linguistic Watermark Wordnet 1.7 Wordnet Interface > FreeDict Interface > … Interface > Wordnet 2.1 Wordnet 2.0 Italian Hungarian English Italian English Danish...……

14 06/04/ …synonyms……semantic pointers to the LR… Linguistic Metadata for…Concepts documentation Search linguistic expressions inside the LR Explore semantic relationships which characterize the LR …and linguistic relationships Integration between ontology and linguistic resource: search ontology terms inside the linguistic resource Assist ontology creation by extracting portions of knowledge from the LR Linguistic Enrichment of the Ontology Ontology concepts bear a greater linguistic expressivity: this helps in identifying similarities with other conceptualizations.

15 06/04/ Adaptive behaviour and Graphic User Interface Linguistic Resources may be loaded into OntoLing at run time Upon initialization they declare themselves and their specific Linguistic Watermark OntoLing understands their capabilities and rearranges its Linguistic Browser according to properties and characteristics exhibited by the LR Different functionalities for enriching ontologies with content from the loaded LR are also activated depending on its watermark Support to semiautomatic enrichment also takes into consideration which ki

16 06/04/ Dynamic Functionalities The Linguistic Watermark provides a generic interface which embraces typical LR configurations and structures Three methods act as service providers, in that they allow the definition of functionalities dedicated to the exploration of particular aspects of a given LR –exploitSearchMethod –exploreSemanticRelation –exploreLexicalRelation

17 06/04/ Representing Linguistic Information inside Ontologies Standard Protégé Model –Use of meta-classes Linguistic-class Linguistic-slot –A terminology slot (one for each language) for indicating synonyms –Frame Documentation Slot Protégé-OWL –Use of standard rdfs properties: rdfs:label to indicate synonyms (also specifying the language) rdfs:comment to provide documentation about ontology objects

18 06/04/ Summarizing attention paid to formal conceptual representation in the Semantic Web is not being matched by an equivalent interest on how this information will be made easily accessible by humans, and by machines not sharing any form of semantic commitment. A wider and deeply aware adoption of Natural Language in representing knowledge could fill this gap We developed infrastructures and a tool for: –General framework for describing different kind of LRs –provide functionalities for accessing their content –enriching ontologies with information from LR –Support a linguistically aware ontology development Future Work: –Integrate as many lexical resources as possible! –Include interfaces for accessing and exploiting other kind of linguistic resources (e.g. Framenet) –Establish more complex connections between lexical resources and ontologies

19 06/04/ Automatic Lexico-Semantic Enrichment (LSE) of Ontologies Objective: –identify pointers (lexico-semantic anchors) from ontological objects to semantic entities (e.g. synsets, for WordNet) of a linguistic resource Through: –Observed linguistic/semantic similarities between the ontology and the Linguistic Resource (LR) exploited for enrichment Exploitable Linguistic Watermarks: –ConceptualizedLR –At least one from: TaxonomicalLR LRWithGlosses

20 06/04/ Automatic Lexico-Semantic Enrichment (LSE) of Ontologies Intuition behind the strategy: If a semantic pointer links a frame-synset pair Then other frame-synset pairs (where the frame is more specific/more generic than F and the synset is narrower/broader than S) have a good probability of being linked through a semantic pointer

21 06/04/ Automatic LSE of Ontologies: the Framework O: space of ontological objects, called Frames (classes, properties, individuals) L: space of semantic indexes (semex) in the LR Plausibility Matrix M P (defined over a O×L space) –M P (i,j) represents the plausibility that the ontological object i be matched with the semantic index j Evidence Matrix M E (defined over a O×L space) –contains in each element M E (i,j) the set of evidences which contribute to the computation of element M P (i,j) in the Plausibility Matrix.

22 06/04/ Automatic LSE of Ontologies: the Framework Discovery Phase –Objective: reduce the dimension of the L space –Process: find candidate (lexical) anchors between elements in O and elements in L, through: Search filtered by String similarity measures Exploitation of Translation and/or Synonyms vocabularies (possibly the LR itself) –Output: L A L (all synsets bound by candidate anchors) –Notes: Maximize recall

23 06/04/ Automatic LSE of Ontologies: the Framework Semantic Enrichment function: Implemented through: –Extraction of semantic/linguistic similarity evidences M E –Computation of M P Due to mutual dependencies between evidences for different candidate anchors: and:

24 06/04/ Automatic LSE of Ontologies: the Framework Legenda: –candidate pair : ( ) with: f O ; s L A where: p(f,s,0) 0. –Smarter notation for plausibility:

25 06/04/ Implementing f se Guidelines 1.prizing candidate pairs characterized by positive evidences. 2.punishing candidate pairs characterized by negative evidences 3.evaluate quantitative factors associated to different kind of evidences (representing the strength, or presence, of the evidence) 4.take into account inherent ambiguity (polysemy) of every label associated to ontology concepts

26 06/04/ Implementing f se Plausibility at time = 0 Plausibility threshold for an anchor to be confirmed Plausibility threshold for an anchor to be discarded Ambiguity (polysemy) of term bounding synset to frame Plausibility at time t

27 06/04/ Implementing f se Plausibility at time = 0 Plausibility at time t Weight related to single evidence at time t Positive Evidences Contribution Negative Evidences Contribution Plausibility at time = 0 Plausibility at time t Normalization factor

28 06/04/ Extracting evidences(1) Establishing proper context for each type of frame and for each type of evidence computeConceptualSphere(Frame frm, int DepthRange) SET OF Frame input frm: the class, property or individual which has been selected for linguistic enrichment DepthRange: the number of allowed hops along the IS-A relation for retrieving super concepts of frm output ConceptualSphere: the conceptual sphere surrounding frm begin FrameType type getOntoType(frm) SET OF Frame ConceptualSphere {} if (type = class or type = property) ConceptualSphere ConceptualSphere getSuperConcepts(frm, DepthRange) else//frm is an instance Classes getClasses(frm) for each class Classes do ConceptualSphere ConceptualSphere {class} getSuperConcepts(class, DepthRange) end for end if if (type = class) for each property p, class c | frm.hasRestriction(p,c) or c.harRestriction(p,frm) do ConceptualSphere ConceptualSphere { c } { p } if (type = instance) for each property p ( frm.getOwnRelationalProperties() ) do ConceptualSphere ConceptualSphere { p } frm.getOwnPropertyValues(p) end if if (type = property) for each class c ( domain(frm) range(frm) ) do ConceptualSphere ConceptualSphere {class} end if return ConceptualSphere end

29 06/04/ Extracting evidences(2) Examined evidences –Analysis of Taxonomical alignment ConceptualSphere (context) := the transitive closure of the IS-A relationship in the ontology (and hyponymy relation for LRs) Requirements: TaxonomicalLR compliant Linguistic Resource –Analysis of glosses from the LR ConceptualSphere := depends on frame type (see example in previous slide) Requirements: LRWithGlosses compliant Linguistic Resource

30 06/04/ Extracting evidences(3) Evidences based on Taxonomical Alignment Reflect alignment between the respective structures of the ontology and the linguistic resource exploited for enrichment Captured taxonomy patterns may have positive as well as negative influence over the plausibility of a given pair Positive EvidenceNegative Evidence FHFH SHSH FLFL SLSL IS-A semantic pointer pair candidate for a semantic pointer ONTLR FLFL SLSL IS-A ONTLR candidate pair SHSH FHFH

31 06/04/ Extracting evidences(3) Weighting coefficient for Taxonomy Alignment sign Plausibility at step t-1 of frame/semex pair closing the alignment square Evidences based on Taxonomical Alignment Reflect alignment between the respective structures of the ontology and the linguistic resource exploited for enrichment Captured taxonomy patterns may have positive as well as negative influence over the plausibility of a given pair

32 06/04/ Extracting evidences(4) Evidences extracted through Analysis of Glosses Glosses bear a lot of semantic information; it is not formally explicited, but, once unveiled, can provide useful hints on how to properly match ontology concepts and linguistic expressions Gloss Analysis generates three kind of evidences, provided by: glosses which contain linguistic reference to concepts expressed in the ontology and which are semantically related to the concept being enriched glosses which contain linguistic reference to concepts which at least exist in the ontology linguistic overlap between glosses of synsets which are candidate to enrich related concepts Next slides: examples for enrichment of baseball ontology from:

33 06/04/ OntologyLinguistic Resource Division League Noun Gloss :A league ranked by quality; he played baseball in class D… rdf triple: League division Division GlossRelateds,League,prop(class,domain),1 Glosses containing linguistic reference to semantically related concepts for each Frame rc ConceptualSphere do MtchLvl match(rc, gloss), if MtchLvl 0 Evidences Evidences evd(GR, rc, MtchLvl) end if end for

34 06/04/ Noun Gloss : A score in baseball made by a runner touching all four bases safely; "the Yankees scored 3 runs in the bottom of the 9th"; "their first tally came in the 3rd inning" Glosses containing linguistic reference to concepts which exist in the ontology for each term t gloss do Frame rc find(Ontology, t, MtchLvl), if rc null Evidences Evidences evd(GG, rc, MtchLvl) end if end for OntologyLinguistic Resource Run Inning Inning O GlossGeneral,Inning,1

35 06/04/ Noun series that constitutes the playoff for the baseball championship Overlap between glosses of synsets which are candidate to enrich related concepts for each Frame rfi ConceptualSphere do for each synset sij candidateSynsets(rfi) do let rfgloss[i,j] sj.getGloss() end for for each term t, t gloss and t rfgloss[i,j] let freq = LR.getGlossFrequency(t) if !filter(freq) Evidences Evidences evd(GO, rf i, s i, freq) end if end for end for OntologyLinguistic Resource WorldSeries home rdf triple: WorldSeries home Team Noun (baseball) base consisting of a rubber slab where the batter stands GlossOverlap,baseball, home-noun ,1

36 06/04/ Testing our framework Experimental setup: Fine tuning of evidence-typed σ -parameters has been performed over a collection of several small ontologies and/or portions of them Two ontologies used for testing, WordNet used for enrichment in both cases: 1.BASEBALL ontology ( )http://www.daml.org/2001/08/baseball/baseball-ont –Original version in DAML+OIL and converted to OWL –78 classes, 26 properties and 13 individuals –75,3% of ambiguous concepts, average ambiguity ~9,16 –Inter-annotator agreement = 98.76% (one contrasting decision out of the whole oracle) 2.MOSES Ontology about university ( )http://www.mondeca.com/owl/moses/ita.owl –developed in the context of the EU funded project MOSES (IST ) –built, in OWL language, over a pre-existing DAML ontology, and finalized for representing the Italian university domain –192 classes, 122 properties –73,1% of ambiguous concepts, average ambiguity ~5,23

37 06/04/ Experimental results Detailed analysis of the test data on the first experiment revealed that, though only 40% of the original corpus (ontology) has been correctly enriched, another 50% contains the right choice as first (but still under acceptance threshold), second or third in order of plausibility OntologyPrecisionRecall Baseball Ont80%39,5% Moses Italian81,48%42,72%

38 06/04/ Conclusions attention paid to formal conceptual representation in the Semantic Web is not being matched by an equivalent interest on how this information will be made easily accessible by humans, and by machines not sharing any form of semantic commitment. A wider and deeply aware adoption of Natural Language in representing knowledge – or, at least, support knowledge representation – could fill this gap We defined a first framework for: –describing LRs (under an operational point of view) and for enriching ontologies with their content –(Semi)Automatically enrich the content of ontologies with information from linguistic resources Future work: –Large scale (ontologies) testing! –Improving glosses processing (pos tagging, shallow parsing…) –Development of new techniques for multilingual ontology enrichment (possibly exploiting more than one LR at a time) –Embedding all these techniques inside existing frameworks for ontology editing

39 References Maria Teresa Pazienza, Armando Stellato An Environment for Semi- automatic Annotation of Ontological Knowledge with Linguistic Content 3rd European Semantic Web Conference (ESWC 2006) Budva, Montenegro, June 11-14, 2006An Environment for Semi- automatic Annotation of Ontological Knowledge with Linguistic Content Maria Teresa Pazienza, Armando Stellato Exploiting Linguistic Resources for building linguistically motivated ontologies in the Semantic Web Second Workshop on Interfacing Ontologies and Lexical Resources for Semantic Web Technologies (OntoLex2006), held jointly with LREC2006,Magazzini del Cotone Conference Center, Genoa, Italy, May 2006Exploiting Linguistic Resources for building linguistically motivated ontologies in the Semantic Web Maria Teresa Pazienza, Armando Stellato Linguistic Enrichment of Ontologies: a methodological framework Second Workshop on Interfacing Ontologies and Lexical Resources for Semantic Web Technologies (OntoLex2006), held jointly with LREC2006,Magazzini del Cotone Conference Center, Genoa, Italy, May 2006Linguistic Enrichment of Ontologies: a methodological framework

40 06/04/ References Maria Teresa Pazienza, Armando Stellato Linguistically motivated Ontology Mapping for the Semantic Web SWAP 2005, the 2nd Italian Semantic Web Workshop Trento, Italy, December 14-16, 2005Linguistically motivated Ontology Mapping for the Semantic Web Maria Teresa Pazienza, Armando Stellato The Protégé Ontoling Plugin - Linguistic Enrichment of Ontologies in the Semantic Web 4th International Semantic Web Conference (ISWC-2005) Galway, Ireland, November, 2005The Protégé Ontoling Plugin - Linguistic Enrichment of Ontologies in the Semantic Web Armando Stellato, Michele Vindigni, Fabio Massimo Zanzotto XeOML: An XML-based extensible Ontology Mapping Language Workshop on Meaning Coordination and Negotiation, held in conjunction with 3rd International Semantic Web Conference (ISWC-2004) Hiroshima, Japan, November 8, 2004 XeOML: An XML-based extensible Ontology Mapping Language

41 06/04/ Thanks for your attention …. see you in Roma for Aiia07 congress


Download ppt "Linguistic enrichment of ontologies: a glance to the role of previously existing linguistic resources Maria Teresa Pazienza, Armando Stellato"

Similar presentations


Ads by Google