Presentation is loading. Please wait.

Presentation is loading. Please wait.

Complete and Consistent Annotation of WordNet with the Top Concept Ontology Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra,

Similar presentations


Presentation on theme: "Complete and Consistent Annotation of WordNet with the Top Concept Ontology Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra,"— Presentation transcript:

1 Complete and Consistent Annotation of WordNet with the Top Concept Ontology Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra, Antoni Oliver and German Rigau Basque Country Univ., Pompeu Fabra Univ, (Barcelona), Open Univ. Of Catalonia (Barcelona)

2 Introduction 4 years work Full annotation of WordNet’s Nouns with Semantic Features (EWN TCO) Aimed to be an important semantic resource for NLP (selectional preferences, synset clustering, reasoning…).

3 Result 65.989 noun concepts (synsets) = 116.364 noun lexemes (variants) consistently annotated Average of 6.47 features per synset –Features organized in a multilevel hierarchy

4 Structure of the talk Methodology Examples and Discussion Conclusions

5 Methodology Annotation of the Inter Lingual Index (=EnWn1.6, SpaWN, mapping to other WNs...) with the nodes/features of the TCO (a shallow ontology defined in the EWN Project [Vossen et. Al 1998] ) Methodology based on: –INCOMPATIBILITY OF ONTOLOGICAL INFORMATION –SUBSUMPTION BLOCKAGE POINTS

6 The Top Concept Ontology Organized in three orders of entities: –1st Order (physical entities) –2nd Order (situations) –3rd Order (abstract entities)

7 The Top Concept Ontology 1st Order entities organized in four Qualia-like features: –Origin (Artifact, Natural..) –Form (Object, Substance…) –Composition (Group, Part) –Function (Building, Container, Vehicle…)

8 The Top Concept Ontology 2nd Order Entities organized in two dimensions –Situation Type: Dynamic (Bounded Events, Unbounded Events) & Static (Properties, Relations) –Situation Component: (Cause, Manner, Modal…) 3rd Order Entities, no further subdivided

9 Methodology We don’t modify the structure of neither the TCO nor WN (=> future work). We just annotate. We declared pairs of TCO properties as incompatible (e.g.:natural vs. artifact, substance vs. object) Initial annotation situation: In EWN, TCO features were manually assigned to a basic set of 1024 EWN synsets (= Base Concepts)

10 Methodology 1.We annotated automatically the rest of the Top Synsets (from the BCs up to the Top) using a Wordnet’s SemanticFile-TCO table of equivalence (e.g. NounAct Agentive, NounAttribute Property ) 2.We performed a full automatic top-down expansion of such information via the WN1.6 hierarchy (feature inheritance)

11 Methodology This caused feature incompatibility to arise: about 225.000 conflicts in 25.000 synsets Causes: Wrong manual annotation in EWN Wrong TCO-SF equivalence... but basically: –Subsumption in WN not always work »ISA Overloading etc. –Multiple inheritance in WN

12 Methodology We checked manually all feature incompatibilities in order to: – (i) adding and/or deleting ontological features – (ii) setting inheritance blockage points. A blockage point is an annotation in WN1.6 which breaks the ISA relation between two synsets, thus no inheritance is allowed.

13 A simple example Bandung Java island city

14 A simple example Bandung Java island =NATURAL city =ARTIFACT

15 A simple example Bandung +NATURAL +ARTIFACT Java +NATURAL island =NATURAL city =ARTIFACT

16 A simple example Bandung +ARTIFACT Java +NATURAL island =NATURAL city =ARTIFACT

17 Methodology Information used for decision making Relational information regarding every synset and neighbours; i.e. the WN structure Synsets' glosses as provided by EWN Glosses, descriptions and examples of the TCO features as provided in [Alonge et al. 1998] Usual word-substitution tests to acknowledge hyponymy, as in [Cruse 1986]

18 Methodology When all incompatibilities were fixed, a new automatic re-expansion was launched which resulted in a new (smaller) number of conflicts. Following this iterative and incremental approach, inheritance was re-calculated and data are re- examined several times. Task finished when a new cycle of re-expansion of properties did not result in new conflicts.

19 Methodology Then, two final steps were applied: 1. Since the TCO is itself a hierarchy, for every synset, its annotation was expanded up-feature; e.g. Animal expands ot Living, Natural, Origin and 1stOrderEntity 2.The whole hierarchy was checked for consistency using formal Theorem Provers like Vampire and E-prover –This step resulted in a number of new conflicts which were finally fixed.

20 Typology of miscategorizations (IS-A Overload) (in black:[Guarino 1998] original typology)

21 Typology of miscategorizations Overgeneralitzation = Hypernym has more features than Hyponym should have Reduction of Sense = Hypernym fails to capture part of the Hyponym’s meaning Confusion of senses = Multiple inheritance where hypernyms are incompatible

22 Typology of miscategorizations Extensional ambiguity = e.g. “layer”: is it an object or a substance? 3rd Order Entities vs Mental 2nd Order Entities (TCO labels) = e.g “discipline” (process thus 2ndOrder) IS-A “knowledge domain” (3rdOrder) Technical inconsistencies = e.g. Hyponymy- Meronymy confusion

23 Conclusions WN1.6 (= ILI) fully and consistently annotated for Nouns with 60 semantic features organized in a shallow ontology –65.000 synsets,116.000 variants –Average of 6.48 TCO features per synset 350 inheritance-blocking points detected in WN – 28.000 synsets have at least one in their hypernymy chain [= they are affected by WN hierarchy mistakes or inadequacies] The resource is free. It can be downloaded from our web site (vid. proceedings)

24 The Statue of Liberty +OBJECT +IMAGE_REPRESENTATION +CONCEPT monument +OBJECT artifact +OBJECT art +OBJECT sculpture =IMAGE_REPRESENTATION +CONCEPT +OBJECT impressionism +OBJECT figure +CONCEPT shape +CONCEPT abstraction =CONCEPT object =OBJECT

25 The Statue of Liberty +OBJECT +IMAGE_REPRESENTATION monument +OBJECT artifact +OBJECT art =CONCEPT sculpture =IMAGE_REPRESENTATION =OBJECT impressionism +CONCEPT figure +CONCEPT shape +CONCEPT abstraction =CONCEPT object =OBJECT


Download ppt "Complete and Consistent Annotation of WordNet with the Top Concept Ontology Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra,"

Similar presentations


Ads by Google