Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.
Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
ECG Signal processing (2)
Clustering Categorical Data The Case of Quran Verses
Mathematics1 Mathematics 1 Applied Informatics Štefan BEREŽNÝ.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Support Vector Machines
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Identifying Early Buyers from Purchase Data Paat Rusmevichientong, Shenghuo Zhu & David Selinger Presented by: Vinita Shinde Feb 18 th, 2010.
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Computing Trust in Social Networks
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Visual Recognition Tutorial
Distributed Representations of Sentences and Documents
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Bayesian Networks. Male brain wiring Female brain wiring.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Algorithmic Detection of Semantic Similarity WWW 2005.
Clustering.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering Hongyuan Zha Department of Computer Science.
Presenter : Kuang-Jui Hsu Date : 2011/3/24(Thur.).
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Semi-Supervised Clustering
Lecture 15. Pattern Classification (I): Statistical Formulation
k-Nearest neighbors and decision tree
Greedy & Heuristic algorithms in Influence Maximization
Data Mining, Neural Network and Genetic Programming
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION Fabricio Aparecido Breve, Daniel Carlos Guimarães Pedronette State University.
K Nearest Neighbor Classification
Learning with information of features
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Advanced Artificial Intelligence Classification
Generally Discriminant Analysis
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg , Saarbrücken, Germany Present by Chia-Hao Lee

2 outline Introduction Graph-based Classification Incorporating Metric Label Distances Experimental Conclusion

3 Introduction Automatic classification is a supervised learning technique for assigning thematic categories to data items such as customer records, gene-expression data records, Web pages, or text documents. The standard approach is to represent each data item by a feature vector and learn parameters of mathematical decision models. Context-free : the decision is based only on the feature vector of a given data item, disregarding the other data items in the test set.

4 Introduction In many settings, this “context-free” approach does not exploit the available information about relationships between data items. Using the relationship information, we can construct a graph G in which each data item is a node and each relationship instance forms an edge between the corresponding nodes. In the following we will mostly focus on text documents with links to and from other documents.

5 Introduction A straightforward approach to capturing a document’s neighbors would be to incorporate the features and feature weights of the neighbors into the feature vector of the given document itself. A more advanced approach is to model the mutual influence between neighboring documents, aiming to estimate the class labels of all test documents simultaneously.

6 Introduction A simple example for RL (Relaxation labeling) is shown in figure 1. Let our set of class be. We wish to assign to every document marked “?” its most probable label. Let the contingency matrix in figure 1b) be estimated from the training data.

7 Introduction The theory paper by Kleinberg and Tardos views the classification problem for nodes in an undirected graph as a metric labeling problem where we aim to optimize a combinatorial function consisting of assignment costs and separation costs.

8 Graph-Based Classification Our approach is based on the probabilistic formulation of the classification problem and uses a relaxation labeling technique to derive two major approaches for finding the maximally likely labeling λ of the given test graph: hard and soft labeling. D : a set of documents G : a graph whose vertices correspond to documents and edges represent the link structure of D. : the label of node u. : the feature vector that locally captures the content of document d.

9 Graph-Based Classification Taking into account the underlying link structure and document d’ s context-based feature vector, the probability of a label to be assigned to d is : In the spirit of the introduction’s discuss on emphasizing the influence of the immediate neighbors for each document,,we obtain and denote it by. The independent of the labels of other nodes in the graph given the labels of its immediate neighbors. We abbreviate into.

10 Graph-Based Classification We abbreviate,the graph-unaware probability based only on d ’s local content, by. The additional independence assumption that there is no direct of its coupling between the content of a document and the labels of its neighbors, the following central equation holds for the total probability, summing up the posterior probabilities for all possible labelings of the neighborhood:

11 Graph-Based Classification In the same vein, if we further assume independence among all neighbor labels of the same node, we reach the following formulation for our neighborhood-conscious classification problem: This can be computed in an iterative manner as follow:

12 Graph-Based Classification Hard labeling : In contrast to the presented soft labeling approach, we also consider a method that take into account only the most probable label assignments in the test document neighborhood to be significant for the computation. Let be the maximum probable label :

13 Graph-Based Classification Soft Labeling : The soft labeling approach aims to achieve better accuracy of the classification by avoiding the overly eager “rounding” that the hard labeling approach does.

14 Incorporating Metric Label Distance Intuitively, neighboring documents should receive similar class labels. For example, suppose we have a set of classes and we wish to find the most probable label for a test document d. A document discussing scientific problems ( S ) would be much farther away from both C and E. So, a similarity metric imposed on the set of labels C would have high values for the pair ( C, E ) and small values for class pairs ( C, S ) and ( E, S ).

15 Incorporating Metric Label Distance This is why introducing a metric should help improve the classification result. In this metric, similar classes are separated by a shorter distance and impose smaller separation cost on an edge labeling. Our approach, on the other hand, is general, and we construct the metric Γ automatically from the training data. We incorporate the label metric into the iterations for computing the probability of an edge labeling by treating as a scaling factor.

16 Incorporating Metric Label Distance This way, we magnify the impact of edges between nodes with similar labels and scale down the impact of edges between dissimilar ones:

17 Experiments We have tested our graph-based classifier on three different data sets. The first one includes approximately 16000scientific publications chosen from the DBLP database. The second dataset has been selected from the internet movie database IMDB. The third dataset used in the experiments was the online encyclopedia Wikipedia.

18 Experiments

19 Experiments

20 Experiments

21 Experiments

22 Experiments

23 Conclusion The presented GC method for graph-based classification is a way of exploiting context relationships of data items. Incorporating metric distances among different labels contributed to the very good performance of GC method. This is one new form of exploiting knowledge about the relationships among category labels and thus the structure of the classifier’s target space.