Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects Kiril Simov and Petya Osenova BulTreeBank Project

Similar presentations

Presentation on theme: "Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects Kiril Simov and Petya Osenova BulTreeBank Project"— Presentation transcript:

1 Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects Kiril Simov and Petya Osenova BulTreeBank Project Linguistic Modelling Laboratory Bulgarian Academy of Sciences RANLP, Borovets 2007, Bulgaria

2 Outline of the Talk Introductory notes LT4eL Domain Ontology Ontology-based Lexicon Model Annotation Grammar Semantic Annotation of Learning Objects Bottlenecks within the Semantic Annotation Discussion and Conclusions

3 Acknowledgments We would like to thank our partners from the project for their useful feedback. Especially to: Paola Monachesi, Cristina Vertan, Lothar Lemnitzer, Corina Forascu, Claudia Borg, Rosa Del Gaudio, Jantine Trapman, Beata Wójtowicz, Eelco Mossel

4 Introductory notes (1) LT4eL European project aims at demonstrating the relevance of the language technology and ontology document annotation for improving the usability of learning management systems (LMS) This paper discusses the role of the ontology in the definition of domain lexicons in several languages and its usage for the semantic annotation of Learning Objects (LOs)

5 Introductory notes (2) The relation between the domain ontology and the domain texts (the learning objects) is mediated by two kinds of information: domain lexicons and concept annotation grammars We assume that the ontology is defined first in a formal way, and then the lexicons are built on the basis of the concepts and relations defined in the ontology

6 LT4eL Domain Ontology: general issues The domain: Computer Science for Non- Computer Scientists Coverage: operating systems; programs; document preparation – creation, formatting, saving, printing; Web, Internet, computer networks; HTML, websites, HTML documents; The role of the ontology: for indexing of the LOs both: on metadata level, and inline

7 Synopsis of the Ontology Construction Methodologies (3) Conclusions (continued): Ontology development is necessarily an iterative process An evaluation of the sources of information is essential to the development of an ontology

8 LT4eL Domain Ontology: creation Keywords annotation BG EN PT NL MT CZ PO RO Translation into EN Definitio n Collectio n Concept creation

9 LT4eL Domain Ontology: verification The created domain concepts are the backbone Connection to an upper ontology –DOLCE –Via OntoWordnet Extension of the concepts –Restrictons on an existing concept –Superconcept –Subconcepts

10 Examples for the extension Available restriction: if a program has a creator, the concept for program creator is also added to the ontology Superconcept: if the concept for text editor is in the ontology, then we added also the concept of editor (as a kind of program) to the ontology Subconcept: if left margin and right margin are represented as concepts in the ontology, then we add also concepts for top margin and bottom margin

11 Current state of the ontology about 750 domain concepts, about 50 concepts from DOLCE about 250 intermediate concepts from OntoWordNet We also have added about 200 new concepts extracted from LOs. They were: –More specific concepts of a present concept, and hence expressed by more complex NPs –With new domain meanings, which were missing

12 Why are keywords not enough…? not all keywords are shared by the annotations in all languages some concepts from the extension of the first set of concepts (created on the basis of the keywords) appear in the texts, but were not annotated as keywords Therefore: NEED FOR A LEXICON

13 Ontology-Based Lexicon Model (1) The lexicons represent the main interface between the user's query, the ontology and the ontological search engine Close to LingInfo model (Buitelaar 2006) Using ideas of Pustejovsky applied in SIMPLE and mapping to WordNet The terminological lexicons were constructed on the basis of the formal definitions of the concepts within the ontology

14 Ontology-Based Lexicon Model 2 The problems in EuroWordNet: for some concepts there is no lexicalized term in a given language some important term in a given language has no appropriate concept in the ontology which to represent its meaning Our solution: we allow the lexicons to contain also non-lexicalized phrases which have the meaning of the concepts without lexicalization in a given language (mapping variety) including all the important concepts within a domain (see how)

15 Mapping varieties Ontology Lexicalized Terms Free Phrases

16 Solutions The specific solutions for the lexical terms without appropriate concept in the ontology are the following: –More detailed classes in the ontology added –More complex mapping between the ontology and some lexicons is performed

17 Example from the Dutch lexicon A horizontal or vertical bar as a part of a window, that contains buttons, icons. werkbalk balk balk met knoppen menubalk

18 Example part from the DTD

19 Concept Annotation Grammar It encodes the connection of the lexicon to the concepts in the ontology It is a cascaded regular grammar It is written and executed in CLaRK

20 Part of the BG annotation grammar

21 The relations between the lexical items, grammar rules and the text Lexical ItemsGrammar RulesDomain Texts

22 Semantic Annotation of Learning Objects Within the project we performed both types of annotation,: –inline –through metadata The metadata annotation is used during the retrieval of learning objects from the repository The inline annotation will be used: –as a step to metadata annotation of the learning objects –as a mechanism to validate the coverage of the ontology; –as an extension of the retrieval of learning objects

23 Stages of Annotation preparation for the semantic annotation (grammar, layouts, DTD, constraints) actual annotation – semi-automatic, involves annotator choice points

24 The role of the grammar and constraints Grammar – assigns all possible concepts per text segment Constraints – two variants: –Constraint 1 (Select Concept): stops at each annotated node introduces artificial ambiguities – EXTEND, and ERASE Example: Internet vs. Wireless Internet –Constraint 2 (Select LT4eL Concept) – at later stage, when the grammar shows a better performance

25 Bottlenecks (1) (1) some concepts in the ontology might be quite specific or rather broad with respect to the term which they were assigned to (Size is defined as number of unique entries in a database ) (2) some general terms in a connected text refer to specific entities (in anaphoric relations) (Systems for Personalization = Systems) (3) the boundaries of the terms might need extension (Agent technology )

26 Bottlenecks (2) (4) the ambiguity within concepts might be fake (Metadata and DescriptiveMetadata) (5) the detected term has also a common- use meaning and hence – in the context this common meaning is triggered (Help as command and as request ) (6) do verbs receive semantic annotation (Button Save and the command (to) Save )

27 Discussion of the Lexicon Model (1) WordNet –Similar: grouping lexical items around a common meaning in synsets –Different: the meaning is defined independently in the ontology SIMPLE –Similar: defining the meaning of lexical items by means of the ontology –Different: the selection of the ontology which in our case represents the domain of interest, and in the case of SIMPLE reflects the lexicon model

28 Discussion of the Lexicon Model (2) LingInfo model –Similar: the idea that the grammatical and context information also needs to be presented in a connection to the ontology –Different: implementation of the model and the degree of realization of the concrete language resources and tools

29 Formal Evaluation of Semantic Search Search for paragraphs with query formed on the basis of Concepts from ontology Search for paragraphs with query formed on the basis of Terms in the lexicons Cases: –Ambiguous term – depends on WSD - help –Unambiguous specific terms – slide and presentation –General terms – program (ontology inference)

30 Results for Bulgarian Concept search – #Program* + #Slide Text search – Program, Software, Editor, Slide Concept search: Precision 0.73, Recall 1.00, F 0.84 Text search: Precision 1.00, Recall 0.375, F 0.55

31 Conclusion In future, more work has to be done on: –the extension of the annotation grammars, –on the implementation of disambiguation rules, –on the connection of the lexicon and the grammars to non-domain dependent concepts, –also, we need to add domain relations

Download ppt "Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects Kiril Simov and Petya Osenova BulTreeBank Project"

Similar presentations

Ads by Google