SVM Based Learning System for F-term Patent Classification

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Taxonomic classification for web- based videos Author: Yang Song et al. (Google) Presenters: Phuc Bui & Rahul Dhamecha.
Taxonomies of Knowledge: Building a Corporate Taxonomy Wendi Pohs, Iris Associates
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Text Mining with Machine Learning.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
1/1/ An Integrated Knowledge-based and Machine Learning Approach for Chinese Question Classification Min-Yuh Day 1,2, Cheng-Wei Lee 1, Shih-Hung Wu 3,
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
1 Web Query Classification Query Classification Task: map queries to concepts Application: Paid advertisement 问题:百度 /Google 怎么赚钱?
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
K.U.Leuven Department of Computer Science Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat,
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Qatar Content Classification Presenter Mohamed Handosa VT, CS6604 May 6, 2014 Client Tarek Kanan 1.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Logical Structure Recovery in Scholarly Articles with Rich Document Features Minh-Thang Luong, Thuy Dung Nguyen and Min-Yen Kan.
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
Hierarchical Classification
LEADING INSTRUCTIONAL IMPROVEMENT
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Medical Information Retrieval: eEvidence System By Zhao Jin Mar
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Boosting the Feature Space: Text Classification for Unstructured.
Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute.
Evaluating Hypotheses
Information Organization: Overview
Antoine Guitton, Geophysics Department, CSM
An Enhanced Support Vector Machine Model for Intrusion Detection
Machine Learning Week 1.
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
Text Categorization Rong Jin.
Classification of Matter Task Card Classification of Matter Task Card
Introduction Task: extracting relational facts from text
Supervised vs. unsupervised Learning
Cost Sensitive Evaluation Measures for F-term Classification
Automatic Extraction of Hierarchical Relations from Text
Perceptron Learning for Chinese Word Segmentation
OvidSP Introduction Flexible. Innovative. Precise.
Authors: Barry Smyth, Mark T. Keane, Padraig Cunningham
Panagiotis G. Ipeirotis Luis Gravano
Text Categorization Berlin Chen 2003 Reference:
Using Uneven Margins SVM and Perceptron for IE
Mark Chavira Ulises Robles
Hierarchical, Perceptron-like Learning for OBIE
Information Organization: Overview
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 Tools of Software Development l 2 types of tools used by software engineers:
Text Mining Application Programming Chapter 9 Text Categorization
Using Link Information to Enhance Web Page Classification
Incremental Context Mining for Adaptive Document Classification
Discriminative Training
Presentation transcript:

SVM Based Learning System for F-term Patent Classification Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield {yaoyong,kalina,hamish}@dcs.shef.ac.uk http://gate.ac.uk/ http://nlp.shef.ac.uk/

SVM for F-term Classification Experiments with different contents of patent. Adapt the SVM for F-term patent classification sub-task. Difference between normal document classification and F-term patent classification. Hierarchical SVM for the F-term taxonomy. 2(10)

Using Different Parts of Patent Results using different contents of patent. A-Precision R-Precision F-measure Abstract only 0.4279 0.3908 0.3647 Full text of patent 0.4688 0.4270 0.3998 Full text + F-term description 0.4779 0.4363 0.4125 3(10)

Evaluation of F-term classification Document classification, based on category Learn a classifier for one category, and evaluate it on new documents. F-term patent classification: based on patent Compare the scores of all classifiers for one patent. So need normalise the scores of different classifiers for fair comparison. 4(10)

Benefit of Normalisation Comparison between the results with and without normalisation. A-Precision R-Precision F-measure Without normalisation 0.4643 0.4330 0.3677 With normalisation 0.4779 0.4363 0.4125 5(10)

Hierarchical SVM Flat SVM Hierarchical SVM for taxonomy Learn an SVM classifier for one class using the one vs. all others approach. Ignore the relations between classes. Hierarchical SVM for taxonomy Learn an SVM classifier for one class by using only the training examples which are the positive examples of the parent class. 6(10)

Comparison of H-SVM and Flat SVM Using the conventional measure. A-Precision R-Precision F-measure Flat SVM 0.4779 0.4363 0.4125 H-SVM 0.2376 0.2164 0.2257 7(10)

Comparison of H-SVM and Flat SVM Using the measures taking into account the relations between F-terms. A-Precision R-Precision F-measure Flat SVM 0.6269 0.6194 0.4429 H-SVM 0.5193 0.5414 0.3605 8(10)

Possible Reasons of Failure of H-SVM F-term classification is a multi-label problem, One instance may have more than one true class. But the H-SVM we used was designed for the case that each instance has only one true class. F-terms under a given theme are not hierarchically related with each other in the strict sense. 9(10)

Conclusions Adapted successfully the SVM for the F-term patent classification subtask. Demonstrated that more information are helpful. Analysis of the failure of H-SVM How to adapt the H-SVM? 10(10)