Presentation is loading. Please wait.

Presentation is loading. Please wait.

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

Similar presentations


Presentation on theme: "Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao."— Presentation transcript:

1 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao 2 1 Computing Department, Lancaster University, Lancaster, UK 2 School of Computer Science, Manchester University, U A Framework to Experiment with Different NLP Techniques Workshop - Issues in Ontology Development and Use Nottingham, UK. 2007

2 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Index Context Problems Research Question Objectives Framework Brief Demo – Ontolancs –Workbench Further Work Context Problems Research Question Objectives Framework Brief Demo – Ontolancs –Workbench Further Work

3 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Context Most initiatives for Ontology Learning combine techniques to find concepts and relationships between them. Focus: Learning taxonomic relations between concepts Deriving a concept hierarchy organizing these concepts Extracting the relevant domain terminology and synonyms from a text collection Extending an existing concept hierarchy with new concepts Discovering concepts which can be regarded as abstractions of human thought Populating the ontology with instances of relations and concepts Learning non-taxonomic relations between concepts Discovering other axiomatic relationships or rules involving concepts and relations. Methods for term extraction can be as simple as : counting raw frequency, applying information retrieval methods such as TFIDF (Baeza-Yates & Ribeiro-neto, 1999) or applying sophisticated methods such as the C-value / NC-value method [Frantzi & Ananiadou 1999] Methods for term extraction can be as simple as : counting raw frequency, applying information retrieval methods such as TFIDF (Baeza-Yates & Ribeiro-neto, 1999) or applying sophisticated methods such as the C-value / NC-value method [Frantzi & Ananiadou 1999] Unsupervised clustering techniques known from Machine Learning. [Cimmiano et al. 2005, faure & Nedellec 1999, Caraballo, 1999]

4 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Context However, researchers have realised that the output for the ontology learning process is far from being perfect [Cimmiano, 2005] Philipp Cimiano, Johanna Völker, Rudi Studer Ontologies on Demand? - A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text Information, Wissenschaft und Praxis 57 (6-7): 315-320. October 2006. see the special issue for more contributions related to the Semantic Web Most initiatives for Ontology Learning combine techniques to find concepts and relationships between them. Focus: Context

5 Problem Research Question Background Framework Results Demo Conclusions Further Work Problem A challenging issue is to quantitatively evaluate the usefulness, accuracy of the techniques and combinations of techniques when applied to ontology learning [1]. A key issue not addressed yet: Reinberg and Spyns (2005) point out the importance of the evaluation of the effectiveness of the techniques for ontology learning To our knowledge no comparative study has been published yet on t he efficiency and effectiveness of the various techniques applied to ontology learning. (page 2) (1) Reinberger, M. L. and P. Spyns (2005). Unsupervised text Mining for the learning of DOGMA-inspired Ontologies. Ontologies Learning from Text: methods, Evaluation and Applications, Advances in Artificial Intelligence. P. Buitelaar, Cimiano P., Magnini B. (eds.). Amsterdam, IOS Press. vol. 24,: pages 305-339. In most cases, it is not obvious to how to use, configure and combine techniques from different fields for a specific domain.

6 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Research Question Can shallow semantic analysis of the kind enabled by semantic tagging, together with a range of other statistical NLP techniques; identify key domain concepts? Can it do it with sufficient confidence in the correctness and completeness of the result? Research Question

7 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Background.. Background They implement several techniques from different fields such a knowledge acquisition, machine learning, information retrieval, natural language processing, artificial intelligence reasoning and database management. A number of frameworks that support ontology learning process have been reported: ASIUMOntoLTDODDLE Tex2Onto OntoLearn Most frameworks use a pre- defined combination of techniques. Thus, they do not include any mechanism for carrying out experiments with combinations or the ability to include new ones. Text2Onto is based on the GATE framework. GATE framework it is flexible with respect to the set of algorithms.

8 Context Problem Research Question Background Framework Results Demo Conclusions Further Work A Flexible Framework Framework Phase 1: Part-of-Speech (POS) and Semantic annotation of corpus: Domain texts are tagged morpho- syntactically and semantically. Phase 2: Extraction of concepts: The domain terminology is extracted from the tagged domain corpus by identifying a list of domain candidate terms. The system provides a set of statistical and linguistic techniques which an ontology engineer can combine A existing DAML ontology can be used as a reference and to calculate precision and recall. Phase 3: Domain Ontology Construction: Concepts extracted during the previous phase are then added to a concept hierarchy. Phase 4: Domain Ontology Edition: the bootstrap ontology is turned into OWL. Then it is processed using an ontology editor (Protégé) to manage the versioning of the domain ontology and modify or improve it.

9 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Preliminary Results Our results are consistent with other studies. For instance, Alkula[3] suggests that the lemmatization may be a better approach than stemming. [3]Alkula, R. 2001. From Plain Character Strings to Meaningful Words: Producing Better Full Text Databases for Inflectional and Compounding Languages with Morphological Analysis Software. Inf. Retr. 4, 3-4 (Sep. 2001), 195-208. Some researchers use different text processing techniques such as stopword filtering, lemmatization or stemming. StopWord Filtering: [ Bloehdorn et al., 2006 ] Lemmatization: [ Buitelaar and Ramaka, 2005 ] Stemming: [ Kietz et al, 2000 ] S. Bloehdorn and P. Cimiano and A. Hotho: Learning Ontologies to Improve Text Clustering and Classification. Proc of GFKL, 2005. Paul Buitelaar, Srikanth Ramaka Unsupervised Ontology-based Semantic Tagging for Knowledge Markup In: Proc. of the Workshop on Learning in Web Search at the International Conference on Machine Learning, Bonn, Germany, August 2005. J.Kietz, et al., A Method for semi-automatic ontology acquisition from a corporate intranet, in: Proc EKAW-2000, France. 2000. From the preliminary experiments, we can conclude that the lemmatization technique (Group 3) produces better results than the stemming technique (Group 2) for the domain concept acquisition process. Results

10 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Brief Demo Demo Ontology Framework

11 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Conclusions Main challenge: Our research project addresses an important challenge of ontology research, i.e. how quantitatively to evaluate the usefulness and accuracy of both techniques and combinations of techniques, when are applied to ontology learning. This framework is designed as a cyclical process to experiment with different techniques. Techniques are included as a plug-in. 1 It provides support to determine what techniques or their combination provide optimal performances for ontology learning 2 Our ontology learning environment in unique in not only providing a framework for integrating linguistic techniques, but also possibility an experimental platform for identifying the most effective technique or combinations.

12 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Our Project: OntoLancs – A Flexible Framework For Ontology Learning Including new techniques (plugin) from different tools. Future Work A graphical workflow engine will provide support for the composition of complex ensemble techniques Experimenting with techniques in a Supervised and Unsupervised Mode Integration with Protégé (Editor)

13 Context Problem Research Question Background Framework Results Demo Conclusions Further Work The End OntoLancs Computing Department Lancaster University 2006, UK

14 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Text2Onto vs. OntoLancs Text2Onto defines the user interaction as a core aspect whereas our framework provides support to process algorithms in a unsupervised mode. Our framework provides a graphical workflow engine to provide support for the composition of complex ensemble techniques. Our framework uses a plug-in-based structure as Text2Onto. However, in contrast, it can include techniques from existing linguistic and ontology tools by using java APIs.

15 Context Problem Research Question Background Framework Results Demo Conclusions Further Work Techniques included into OntoLancs 1.Grouping by POS 2.Raw Frequency Filtering 3.POS Filtering 4.Lemmatization 5.Stemming 6.StopWord Filtering 7.Frequency Profiling 8.Syntactic Pattern Co- ocurrences 9.Window-based Collocations 10.Semantic Filter (soon) 1.Grouping by POS 2.Raw Frequency Filtering 3.POS Filtering 4.Lemmatization 5.Stemming 6.StopWord Filtering 7.Frequency Profiling 8.Syntactic Pattern Co- ocurrences 9.Window-based Collocations 10.Semantic Filter (soon)


Download ppt "Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao."

Similar presentations


Ads by Google