Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst Information Semantics Command & Control Center July 17, 2007 Ontologies Can't Help Records Management Or Can They?
Advertisements

Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
An Ontology Creation Methodology: A Phased Approach
Mitsunori Ogihara Center for Computational Science
AVATAR: Advanced Telematic Search of Audivisual Contents by Semantic Reasoning Yolanda Blanco Fernández Department of Telematic Engineering University.
CLiNG - May Overview of Research - Computational Terminology - Knowledge extraction from Text - Study of causal relation - Corpus building - Uncertainty.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Chapter 5: Introduction to Information Retrieval
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Survey of Semantic Annotation Platforms
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
FIIT STU Bratislava Classification and automatic concept map creation in eLearning environment Karol Furdík 1, Ján Paralič 1, Pavel Smrž.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
10/31/20151 EASTERN MEDITERRANEAN UNIVERSITY COMPUTER ENGINEERING DEPARTMENT Presented By Duygu CELIK Supervised By Atilla ELCI Intelligent Semantic Web.
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
An Ontology-based Framework for Radiation Oncology Patient Management DL McShan 1, ML Kessler 1 and BA Fraass 2 1 University of Michigan Medical Center,
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Terminology problems in literature mining and NLP
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
ece 627 intelligent web: ontology and beyond
CSE 635 Multimedia Information Retrieval
Presentation transcript:

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao 2 1 Computing Department, Lancaster University, Lancaster, UK 2 School of Computer Science, Manchester University, U A Framework to Experiment with Different NLP Techniques Workshop - Issues in Ontology Development and Use Nottingham, UK. 2007

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Index Context Problems Research Question Objectives Framework Brief Demo – Ontolancs –Workbench Further Work Context Problems Research Question Objectives Framework Brief Demo – Ontolancs –Workbench Further Work

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Context Most initiatives for Ontology Learning combine techniques to find concepts and relationships between them. Focus: Learning taxonomic relations between concepts Deriving a concept hierarchy organizing these concepts Extracting the relevant domain terminology and synonyms from a text collection Extending an existing concept hierarchy with new concepts Discovering concepts which can be regarded as abstractions of human thought Populating the ontology with instances of relations and concepts Learning non-taxonomic relations between concepts Discovering other axiomatic relationships or rules involving concepts and relations. Methods for term extraction can be as simple as : counting raw frequency, applying information retrieval methods such as TFIDF (Baeza-Yates & Ribeiro-neto, 1999) or applying sophisticated methods such as the C-value / NC-value method [Frantzi & Ananiadou 1999] Methods for term extraction can be as simple as : counting raw frequency, applying information retrieval methods such as TFIDF (Baeza-Yates & Ribeiro-neto, 1999) or applying sophisticated methods such as the C-value / NC-value method [Frantzi & Ananiadou 1999] Unsupervised clustering techniques known from Machine Learning. [Cimmiano et al. 2005, faure & Nedellec 1999, Caraballo, 1999]

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Context However, researchers have realised that the output for the ontology learning process is far from being perfect [Cimmiano, 2005] Philipp Cimiano, Johanna Völker, Rudi Studer Ontologies on Demand? - A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text Information, Wissenschaft und Praxis 57 (6-7): October see the special issue for more contributions related to the Semantic Web Most initiatives for Ontology Learning combine techniques to find concepts and relationships between them. Focus: Context

Problem Research Question Background Framework Results Demo Conclusions Further Work Problem A challenging issue is to quantitatively evaluate the usefulness, accuracy of the techniques and combinations of techniques when applied to ontology learning [1]. A key issue not addressed yet: Reinberg and Spyns (2005) point out the importance of the evaluation of the effectiveness of the techniques for ontology learning To our knowledge no comparative study has been published yet on t he efficiency and effectiveness of the various techniques applied to ontology learning. (page 2) (1) Reinberger, M. L. and P. Spyns (2005). Unsupervised text Mining for the learning of DOGMA-inspired Ontologies. Ontologies Learning from Text: methods, Evaluation and Applications, Advances in Artificial Intelligence. P. Buitelaar, Cimiano P., Magnini B. (eds.). Amsterdam, IOS Press. vol. 24,: pages In most cases, it is not obvious to how to use, configure and combine techniques from different fields for a specific domain.

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Research Question Can shallow semantic analysis of the kind enabled by semantic tagging, together with a range of other statistical NLP techniques; identify key domain concepts? Can it do it with sufficient confidence in the correctness and completeness of the result? Research Question

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Background.. Background They implement several techniques from different fields such a knowledge acquisition, machine learning, information retrieval, natural language processing, artificial intelligence reasoning and database management. A number of frameworks that support ontology learning process have been reported: ASIUMOntoLTDODDLE Tex2Onto OntoLearn Most frameworks use a pre- defined combination of techniques. Thus, they do not include any mechanism for carrying out experiments with combinations or the ability to include new ones. Text2Onto is based on the GATE framework. GATE framework it is flexible with respect to the set of algorithms.

Context Problem Research Question Background Framework Results Demo Conclusions Further Work A Flexible Framework Framework Phase 1: Part-of-Speech (POS) and Semantic annotation of corpus: Domain texts are tagged morpho- syntactically and semantically. Phase 2: Extraction of concepts: The domain terminology is extracted from the tagged domain corpus by identifying a list of domain candidate terms. The system provides a set of statistical and linguistic techniques which an ontology engineer can combine A existing DAML ontology can be used as a reference and to calculate precision and recall. Phase 3: Domain Ontology Construction: Concepts extracted during the previous phase are then added to a concept hierarchy. Phase 4: Domain Ontology Edition: the bootstrap ontology is turned into OWL. Then it is processed using an ontology editor (Protégé) to manage the versioning of the domain ontology and modify or improve it.

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Preliminary Results Our results are consistent with other studies. For instance, Alkula[3] suggests that the lemmatization may be a better approach than stemming. [3]Alkula, R From Plain Character Strings to Meaningful Words: Producing Better Full Text Databases for Inflectional and Compounding Languages with Morphological Analysis Software. Inf. Retr. 4, 3-4 (Sep. 2001), Some researchers use different text processing techniques such as stopword filtering, lemmatization or stemming. StopWord Filtering: [ Bloehdorn et al., 2006 ] Lemmatization: [ Buitelaar and Ramaka, 2005 ] Stemming: [ Kietz et al, 2000 ] S. Bloehdorn and P. Cimiano and A. Hotho: Learning Ontologies to Improve Text Clustering and Classification. Proc of GFKL, Paul Buitelaar, Srikanth Ramaka Unsupervised Ontology-based Semantic Tagging for Knowledge Markup In: Proc. of the Workshop on Learning in Web Search at the International Conference on Machine Learning, Bonn, Germany, August J.Kietz, et al., A Method for semi-automatic ontology acquisition from a corporate intranet, in: Proc EKAW-2000, France From the preliminary experiments, we can conclude that the lemmatization technique (Group 3) produces better results than the stemming technique (Group 2) for the domain concept acquisition process. Results

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Brief Demo Demo Ontology Framework

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Conclusions Main challenge: Our research project addresses an important challenge of ontology research, i.e. how quantitatively to evaluate the usefulness and accuracy of both techniques and combinations of techniques, when are applied to ontology learning. This framework is designed as a cyclical process to experiment with different techniques. Techniques are included as a plug-in. 1 It provides support to determine what techniques or their combination provide optimal performances for ontology learning 2 Our ontology learning environment in unique in not only providing a framework for integrating linguistic techniques, but also possibility an experimental platform for identifying the most effective technique or combinations.

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Our Project: OntoLancs – A Flexible Framework For Ontology Learning Including new techniques (plugin) from different tools. Future Work A graphical workflow engine will provide support for the composition of complex ensemble techniques Experimenting with techniques in a Supervised and Unsupervised Mode Integration with Protégé (Editor)

Context Problem Research Question Background Framework Results Demo Conclusions Further Work The End OntoLancs Computing Department Lancaster University 2006, UK

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Text2Onto vs. OntoLancs Text2Onto defines the user interaction as a core aspect whereas our framework provides support to process algorithms in a unsupervised mode. Our framework provides a graphical workflow engine to provide support for the composition of complex ensemble techniques. Our framework uses a plug-in-based structure as Text2Onto. However, in contrast, it can include techniques from existing linguistic and ontology tools by using java APIs.

Context Problem Research Question Background Framework Results Demo Conclusions Further Work Techniques included into OntoLancs 1.Grouping by POS 2.Raw Frequency Filtering 3.POS Filtering 4.Lemmatization 5.Stemming 6.StopWord Filtering 7.Frequency Profiling 8.Syntactic Pattern Co- ocurrences 9.Window-based Collocations 10.Semantic Filter (soon) 1.Grouping by POS 2.Raw Frequency Filtering 3.POS Filtering 4.Lemmatization 5.Stemming 6.StopWord Filtering 7.Frequency Profiling 8.Syntactic Pattern Co- ocurrences 9.Window-based Collocations 10.Semantic Filter (soon)