Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.

Slides:



Advertisements
Similar presentations
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.
Advertisements

Spectral graph reduction for image and streaming video segmentation Fabio Galasso 1 Margret Keuper 2 Thomas Brox 2 Bernt Schiele 1 1 Max Planck Institute.
Normalized Cuts and Image Segmentation
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:
1School of CS&Eng The Hebrew University
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Graph-Based Image Segmentation
Image Matting and Its Applications Chen-Yu Tseng Advisor: Sheng-Jyh Wang
Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Lecture 21: Spectral Clustering
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Image Segmentation Based on the work of Shi and Malik, Carnegie Mellon and Berkley and based on.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Image Segmentation Chapter 14, David A. Forsyth and Jean Ponce, “Computer Vision: A Modern Approach”.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Semi-supervised Learning
Segmentation Graph-Theoretic Clustering.
Incorporating User Provided Constraints into Document Clustering Yanhua Chen, Manjeet Rege, Ming Dong, Jing Hua, Farshad Fotouhi Department of Computer.
Efficient Spatiotemporal Grouping Using the Nyström Method Charless Fowlkes, U.C. Berkeley Serge Belongie, U.C. San Diego Jitendra Malik, U.C. Berkeley.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Image Segmentation A Graph Theoretic Approach. Factors for Visual Grouping Similarity (gray level difference) Similarity (gray level difference) Proximity.
Proceedings of the 2007 SIAM International Conference on Data Mining.
Computer Vision - A Modern Approach Set: Segmentation Slides by D.A. Forsyth Segmentation and Grouping Motivation: not information is evidence Obtain a.
POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.
Clustering Unsupervised learning Generating “classes”
Graph-based Segmentation
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
Segmentation using eigenvectors
CSSE463: Image Recognition Day 34 This week This week Today: Today: Graph-theoretic approach to segmentation Graph-theoretic approach to segmentation Tuesday:
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
Segmentation Course web page: vision.cis.udel.edu/~cv May 7, 2003  Lecture 31.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering Hongyuan Zha Department of Computer Science.
Image Segmentation Superpixel methods Speaker: Hsuan-Yi Ko.
CS654: Digital Image Analysis Lecture 28: Advanced topics in Image Segmentation Image courtesy: IEEE, IJCV.
Efficient Semi-supervised Spectral Co-clustering with Constraints
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Ultra-high dimensional feature selection Yun Li
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 15/16 – TP10 Advanced Segmentation Miguel Tavares.
Motion Segmentation at Any Speed Shrinivas J. Pundlik Department of Electrical and Computer Engineering, Clemson University, Clemson, SC.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka
Semi-Supervised Clustering
Segmentation by clustering: normalized cut
Document Clustering Based on Non-negative Matrix Factorization
CSSE463: Image Recognition Day 34
Constrained Clustering -Semi Supervised Clustering-
Finding Clusters within a Class to Improve Classification Accuracy
Image Retrieval Longin Jan Latecki.
Segmentation Graph-Theoretic Clustering.
Grouping.
Semi-supervised Learning
Approximating the Community Structure of the Long Tail
Lecture 31: Graph-Based Image Segmentation
Digital Image Processing
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
CSSE463: Image Recognition Day 34
“Traditional” image segmentation
Presentation transcript:

Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu

Traditional Clustering Methods Methods –K-Means –Ratio Cut –Average Association –Normalized Cut –Min-Max Cut Important question: –For each given data set, there are always many possible ways of partitioning the data set.

Traditional Clustering Methods Related Work on Semi-Supervised Learning: –Wagstaff et al. introduced two types of constraints: “must link”, “cannot link” –Basu, et al. developed a semi-supervised K-means that make use of labeled data to generate initial seed cluster, and to guide the clustering process.

Normalized Cut J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transaction on Pattern Analysis and Machine Intelligence, Model the given document set using a undirected graph G(V,E,W) –V: vertex set, represents a document vector –E: edge set, assigned a weight to reflect the similarity between the two documents. –W: graph affinity matrix

Normalized Cut Measures how tightly the cluster S is connected with the rest of the data set. Measures how compact the entire data set is.

Normalized Cut Let be the indicator vector of the cluster,and each element takes a binary value {1,0} Then we get: D=diagonal matrix

Normalized Cut Minimize the cost function

Normalized Cut

Incorporating Prior Knowledge The prior knowledge is provided in the form of indicating several pairs of documents which the user whishes to be grouped into the same cluster. –Constraint vector:

The flow path of CNC Create the graph affinity matrix in which each element represents the similarity between the two documents. Compute the diagonal matrix D Form the constraint matrix U by the user Form the matrix and compute its K smallest eigenvalues and the corresponding eigenvectors. Project each document into the eigen-space spanned by the K eigenvectors. Apply K-means algorithm to find the K document clusters within this eigen-space

Data description This paper evaluated the performance of their document clustering model using two data set: Reuters and 20 Newsgroups document corpora. –Newsgroups data set contains documents that were collected from 20 newsgroups in the public domain.

Evaluation Given the two set of document clusters C, C’, their mutual information metric is defined as: 0: two sets are independent 1: two sets are identical

Result

Conclusion This paper proposed a constrained spectral clustering method (CNC) to incorporate user’s prior knowledge during the document cluster analysis. CNC model is a very effective semi-supervised document clustering tool, especially with very low amount of training samples. CNC model did not form constraints for prior knowledge related to cannot-link constraint.