Transductive Inference for Text Classification using Support Vector Machines - Thorsten Joachims (1999) 서울시립대 전자전기컴퓨터공학부 데이터마이닝 연구실 G 노준호
Table of Contents Introduction Text Classification Transductive Support Vector Machines Experiments
Introduction Text classification (using SVM) be used to organize document databases filter spam learn users’ newsreding preferences problem little training data, large test set solution transductive inference (semi-supervised learning)
Text Classification Text classification using machine learning 1. to learn classifier from examples 2. classifier assign categories automatically Documents strings of characters ( feature : word) Information Retrieval(IR) research suggests that oword stems work computes, computing, computer comput oordering can be ignored
Text Classification - Representing text as a feature vector
Text Classification representation of text TF – IDF TF(term frequency) IDF(Inverse document frequency n : total number of documents oa word is low if it occurs in many documents oa word is highest if the word occurs in only one
Transductive Support Vector Machines SVM Minimize : subjet to :
Transductive Support Vector Machines TSVM - training examples : +/-, test examples : dot SVM TSVM
Transductive Support Vector Machines * : test data C : trade off margin size parameta : measure the degree of misclassification of the data TSVM
Transductive Support Vector Machines How can TSVM be any better? - strong co-occurrence patterns Training data : D1(category A), D6(category B) SVM Test data : D3 ? TSVM Test data : D3 A
Experiments Test Colletions Reuters dataset WebKB collection Ohsumed corpus Performance Measure Precision/Recall-Breakeven Point (F1 measure)
Experiments Reuters(Average)
Experiments WebKB(category course)
Experiments WebKB(category project)
Experiments - Reuters - Ohsumed - WebKB