Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

From Words to Meaning to Insight Julia Cretchley & Mike Neal.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Improved TF-IDF Ranker
Exploiting a Thesaurus-Based Semantic Net for Knowledge-Based Search Peter Clark John Thompson Lisbeth Duncan Heather Holmback Knowledge Systems Boeing,
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
A Framework for Ontology-Based Knowledge Management System
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
ICT TEACHERS` COMPETENCIES FOR THE KNOWLEDGE SOCIETY
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
Query Expansion.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
A Semi-automatic Ontology Acquisition Method for the Semantic Web Man Li, Xiaoyong Du, Shan Wang Renmin University of China, Beijing WAIM May 2012.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : JEROEN DE KNIJFF, FLAVIUS FRASINCAR, FREDERIK HOGENBOOM DKE Data & Knowledge.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Querying Structured Text in an XML Database By Xuemei Luo.
1 Query Operations Relevance Feedback & Query Expansion.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Mining fuzzy domain ontology based on concept Vector from wikipedia category network.
Understanding the Content Index. Review: The Search Engine.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Search Engine Architecture
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Team Members Dilip Narayanan Gaurav Jalan Nithya Janarthanan.
Algorithmic Detection of Semantic Similarity WWW 2005.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
ISWC2007, Nov. 14. Discovering simple mappings between Relational database schemas and ontologies Wei Hu, Yuzhong Qu {whu,
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
AUTOMATIC GENERATION OF MODEL TRAVERSALS FROM METAMODEL DEFINITIONS Authors: Tomaž Lukman, Marjan Mernik, Zekai Demirezen, Barrett Bryant, Jeff Gray ACM.
Improving compound–protein interaction prediction by building up highly credible negative samples Toward more realistic drug-target interaction predictions.
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Rafael Almeida, Inês Percheiro, César Pardo, Miguel Mira da Silva
CSE 635 Multimedia Information Retrieval
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Presentation transcript:

Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei

OutLine  Introduction  Related Work  Our Proposed Method  Experiment and Evaluation  Conclusion

Introduction  ontology was carried out, it was drawn an attention by many domestic and foreign researchers, and applied to the various fields of computer.  Once the ontology constructed, it is a time-consuming and laborious to manually add a new concept into an existing ontology, and is still a great challenge to extend an existing ontology automatically;  To solve this problem, we propose a hybrid approach for semi- auto extending ontology from text using semantic relatedness between words in this paper.

Related Work  The automatic and semi-automatic ontology extension has been studied for two decade years. There are three kinds of approaches of ontology extension, which include natural language processing (NLP) based approach network based approach and user interaction approach.  we proposed a semi-automatic method for extending ontology from text, which used semantic relatedness between terms to discover the new concepts, and positioned them into seed ontology through various kinds of rules.

Our Proposed Method  The co-occurrence analysis and word filter are exploited to acquire the candidate concepts for each concept of the seed ontology from documents in this method;  To improve the speed of ontology extension, we use semantic relatedness between words to compress the extended concept space;  The extension rules and subsumption analysis are exploited to add the extended concepts into the seed ontology with generating the extended ontology.

Our Proposed Method  Identifying the Candidate Concepts Using Co-occurrence Analysis and Word Filter  exploiting search engine to get a domain document set related to C, named as D, and looking for the CoWord(C) from D to generate the co- occurrence word set,denoted as CoWordSet(C), CoWordSet(C)={wi|wi ∈ CoWord(C)}; then counting the CoFreq(wi) and AFreq(wi) for each wi in CoWordSet(C) in document set D, and discarding the words which hold the AFreq(wi) >> CoFreq(wi) and CoFreq(Wi)<5; finally ranking the remaining the co-occurrence words in CoWordSet(C) according to theirs CoFreq(W) on descending order;  calculate the relative importance of each wi in CoWordSet(C), RI(wi);  compute the entropy of each wi in CoWordSet(C), Entropy(wi);  Selecting the overlap words, which hold a higher RI and Entropy scores fromCoWordSet(C) and generating the candidate concept set of concept C, denoted byCandCpt(C).

Our Proposed Method  Obtaining the Extended Concepts Using Semantic Relatedness between Words  we use semantic relatedness between words to compress the extended concept space. We only select a portion of the concepts in CandCpt(C) as the extended concepts.  The processof selection is followed: for each concept Ci in CandCpt(C), we measure semantic relatedness between Ci and C, and select the concepts which have a highly score of semantic relatedness as the extended concepts. In this paper, we only use the top 3 concepts.

Our Proposed Method  Extending Ontology Using Extension Rules and Subsumption Analysis  Rule 1: if the score of semantic relatedness between the concept Ci and C is equals to 1 or approximately 1, it means that they are consistent in semantic and hold the synonym relationship. We add the concept Ci into the synonym attribute of concept C.

Our Proposed Method  Extending Ontology Using Extension Rules and Subsumption Analysis  Rule 2:if the score of semantic relatedness between the concept Ci and C is the maximum, but it does not satisfy the extension rule 1, we use subsumption analysis to identify the semantic relationship between Ci and C.  Subsumption analysis:Given two concepts Ci and C, the concept C is said to more general than concept Ci if the following condition holds:

Our Proposed Method  Extending Ontology Using Extension Rules and Subsumption Analysis  Rule 3: if the score of semantic relatedness between the concept Ci and C is the maximum and does not satisfy the extension rule 1,2, we think there hold a related relationship between Ci and C. We add the concept Ci into the related attributeof concept C.

Our Proposed Method  Experiment and Evaluation  Experiment  We select some terms related to education field that is a sub-field of E-government and constructed seed ontology in our experiment. The seed ontology is consist of 10 concepts and includes three kinds of relationship between this concepts, such as synonym, hyponym/hypernym (is-a) and related relationship.  download about 4,000 pages related to education from the website of Education Ministry of China, and then exploit htmlparser to acquire the content of these pages and generate the domain document set D.

Our Proposed Method  Experiment and Evaluation  Experiment 5

Our Proposed Method  Experiment and Evaluation  Evaluation  We choose a part of a gold standard E-government domain ontology constructed by E-government thesaurus[11] as our reference ontology, which is concerned to education.It has about 4,500 terms and three kinds of relationship between terms, such as synonym, hyponym/hypernym (is- a) and related relationship.  The improved recall,precision and F1-Measure have been used to evaluate our proposed method

Our Proposed Method  Experiment and Evaluation  Evaluation  Because ontology is consisting of the concepts and relationship between concepts, we define the improved recall, precision and F1-Measure as following formula.

Our Proposed Method F1-Measure has been raised with the increment of the number of iteration of ontology extension. It reaches after the fifth iteration achievement, which is a promising value. It indicates that the proposed method is valuable. And the precision has been maintained at a higher level. It ranges from to

Conclusion  With the massive new web information, the existing ontology serious lags in the emergence of the new concepts and has not suitable to organize and manage the new information.  To solve this problem, we propose a hybrid approach for extending ontology from text using semantic relatedness between words in this paper, and add the new concepts discovered in documents into the existing ontology.  Evaluation results on the improved recall, precision and F1- Measure demonstrate that our proposed method in this paper is promising and logically.  there is a little drawback because of relationship definition during the course of ontology extension.