Hierarchical, Perceptron-like Learning for OBIE

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

University of Sheffield, NLP Case study: GATE in the NeOn project Diana Maynard University of Sheffield.
University of Sheffield NLP Exercise I Objective: Implement a ML component based on SVM to identify the following concepts in company profiles: company.
University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell.
University of Sheffield NLP Module 11: Advanced Machine Learning.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Unsupervised Information Extraction from Unstructured, Ungrammatical Data Sources on the World Wide Web Mathew Michelson and Craig A. Knoblock.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Presented by Zeehasham Rasheed
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Survey of Semantic Annotation Platforms
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
A Language Independent Method for Question Classification COLING 2004.
Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and.
Some questions -What is metadata? -Data about data.
1 Language Technologies (2) Valentin Tablan University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.
Ontology based Information Extraction
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Linked Data Profiling Andrejs Abele UNLP PhD Day Supervisor: Paul Buitelaar.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Experience Report: System Log Analysis for Anomaly Detection
Automatically Labeled Data Generation for Large Scale Event Extraction
Sentiment analysis algorithms and applications: A survey
System for Semi-automatic ontology construction
Presented by: Hassan Sayyadi
Measuring Sustainability Reporting using Web Scraping and Natural Language Processing Alessandra Sozzi
Basic machine learning background with Python scikit-learn
Natural Language Processing of Knee MRI Reports
What is Pattern Recognition?
Social Knowledge Mining
Presented by: Prof. Ali Jaoua
Topic Oriented Semi-supervised Document Clustering
Benchmarking Textual Annotation Tools for the Semantic Web
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Cost Sensitive Evaluation Measures for F-term Classification
Automatic Extraction of Hierarchical Relations from Text
SVM Based Learning System for F-term Patent Classification
Perceptron Learning for Chinese Word Segmentation
Using Uneven Margins SVM and Perceptron for IE
Basics of ML Rohan Suri.
Presentation transcript:

Hierarchical, Perceptron-like Learning for OBIE Yaoyong Li, Kalina Bontcheva Department of Computer Science University of Sheffield {yaoyong,kalina}@dcs.shef.ac.uk http://gate.ac.uk/ http://nlp.shef.ac.uk/

Outline Ontology based information extraction (OBIE) and Semantic Web Perceptron-like algorithm for OBIE Experimental results 2(17)

Semantic Annotation Most material in Web involve textual language. Annotating text according to an ontology is an important aspect of Semantic Web. Automatic annotating is desirable. 3(17)

Ontology Based Information Extraction Information extraction (IE): extract the pre-defined information in text, automatically. Ontology based IE (OBIE): given some text, identify the mentions of the concepts of ontology. 4(17)

Machine Learning (ML) for OBIE OBIE systems based on ML Magpie, SemTag: match text with instances in ontology. C-PANKOW: unsupervised learning. KIM: rule based, human-engineered, using ontology structure during pattern matching. Few OBIE systems explored the ontology structure. 5(17)

Hierarchical Classification (HC) Class labels are organised in a hierarchical fashion. Learning algorithm takes into account the relations among labels. Thus it can achieve better performance than the flat learning algorithm for HC problem. 6(17)

Adapting Hieron for OBIE Hieron is an effective and efficient learning algorithm for HC, proposed in Dekel etc. 2004. OBIE is somehow different from HC. IE vs. classification. Ontology vs. taxonomy. Adapting Hieron for OBIE. 7(17)

Hieron Learning Learn a Perceptron classifier for each concept. The difference between two classifiers is proportional to the cost of misclassifying one concept as another. Given one training example, update the Perceptrons along the path from the true concept to the predicted one. 8(17)

Our Modification on Hieron Added a regularisation parameter for learning. So it will stop after a finite learning loops on any training data. 9(17)

Adaptation to OBIE Learn two Hierons, one for start tokens of information entities, another for end tokens. Add one concept into ontology, representing the non-class. More than one path between two concepts: select the shortest path during training. 10(17)

Ontology Sensitive F-measure Cost of misclassifying an example of concept A as another concept B: ecost(A, B) Overall accuracy An for n entities: Sum of n accuracies (1- ecost(Ai, Bi)) precision = An/(An+Nspurious) recall = An/(An+Nmissing) F1 = 2*precision*recall/(precision+recall) 11(17)

Experimental Dataset Sekt ontology-annotated news corpus Consist of 290 news articles, divided into three themes: business, international and UK politics. Manually annotated according to the Proton ontology: 146 concepts were used for annotation. Created within the EU project Sekt. Pre-processed the corpus using ANNIE Obtained the domain-independent linguistic features, such as token’s form, lemma, simple types, POS, named entity types. 12(17)

Experimental Results (1) PAUM SVM Hieron Business 0.741 0.753 0.827 Int.-politics 0.771 0.801 0.833 UK-politics 0.820 0.829 0.825 Conventional F1 13(17)

Experimental Results (2) PAUM SVM Hieron Business 0.788 0.793 0.912 Int.-politics 0.830 0.859 0.913 UK-politics 0.836 0.844 0.901 Ontology Based F1 14(17)

Experimental Results (3) Computational time PAUM SVM Hieron Training 552s 11450s 3815s Application 33s 111s 109s 15(17)

Regularization Parameter for Hieron Single loop 300 loops Regularisation F1 0.798 0.813 0.825 Ontology F1 0.890 0.893 0.901 Training time 510s 54173s 3815s 16(17)

Conclusions Explore the structure of ontology in semantic annotation. The Hieron, after adaptation, performed well for OBIE. Future research: Use other cost measures instead of distance. SVM based learning algorithms. 17(17)