Adaptive Multi-view Clustering via Cross Trace Lasso

Slides:

Advertisements

Similar presentations

Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.

Advertisements

Applications of one-class classification

Active Appearance Models

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.

Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker: Wei-Lun Chao Date: Nov. 23, 2011 DISP Lab, Graduate Institute of Communication.

Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.

AAM based Face Tracking with Temporal Matching and Face Segmentation Dalong Du.

One-Shot Multi-Set Non-rigid Feature-Spatial Matching

Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,

DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December

Principal Component Analysis

A General Model for Relational Clustering Bo Long and Zhongfei (Mark) Zhang Computer Science Dept./Watson School SUNY Binghamton Xiaoyun Wu Yahoo! Inc.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.

Presented by Zeehasham Rasheed

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

Distributed Representations of Sentences and Documents

Autoencoders Mostafa Heidarpour

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Summarized by Soo-Jin Kim

CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.

Presented By Wanchen Lu 2/25/2013

Person-Specific Domain Adaptation with Applications to Heterogeneous Face Recognition (HFR) Presenter: Yao-Hung Tsai Dept. of Electrical Engineering, NTU.

Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.

Presented by Tienwei Tsai July, 2005

Non Negative Matrix Factorization

Cs: compressed sensing

CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.

Mining Cross-network Association for YouTube Video Promotion Ming Yan, Jitao Sang, Changsheng Xu*. 1 Institute of Automation, Chinese Academy of Sciences,

Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.

Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.

A Regression Approach to Music Emotion Recognition Yi-Hsuan Yang, Yu-Ching Lin, Ya-Fan Su, and Homer H. Chen, Fellow, IEEE IEEE TRANSACTIONS ON AUDIO,

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Data Mining and Machine Learning Lab Unsupervised Feature Selection for Linked Social Media Data Jiliang Tang and Huan Liu Computer Science and Engineering.

Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Transductive Regression Piloted by Inter-Manifold Relations.

Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.

Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.

SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.

K ERNEL - BASED W EIGHTED M ULTI - VIEW C LUSTERING Grigorios Tzortzis and Aristidis Likas Department of Computer Science, University of Ioannina, Greece.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

A Convergent Solution to Tensor Subspace Learning.

CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Unsupervised Streaming Feature Selection in Social Media

2D-LDA: A statistical linear discriminant analysis for image matrix

Ultra-high dimensional feature selection Yun Li

MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.

Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.

哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.

Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.

Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.

Multi-View Discriminant Analysis 多视判别分析

CSE 554 Lecture 8: Alignment

CSE 4705 Artificial Intelligence

Jinbo Bi Joint work with Jiangwen Sun, Jin Lu, and Tingyang Xu

Jianping Fan Dept of CS UNC-Charlotte

Community Distribution Outliers in Heterogeneous Information Networks

Parallelization of Sparse Coding & Dictionary Learning

Deep Cross-media Knowledge Transfer

Presentation transcript:

Adaptive Multi-view Clustering via Cross Trace Lasso Dong Wang, Ran He, Liang Wang and Tieniu Tan Center for Research on Intelligent Perception and Computing National Lab of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

Outline Introduction to multi-view clustering Previous existing methods Multi-view clustering via trace lasso Empirical results and analysis Conclusion and future work

Multi-view data is pervasive Textual content Hyperlinks Images User click data Example: Webpage clustering The same class of entities can be observed or modeled from various perspectives, thus leading to multi-view representations. Here each view is a (different) feature set or modality, rather than the physical view angle Goal: discover the true underlying cluster membership of the data points given their feature representations under multiple views

Another example: Community detection in social media Social circles in ego network: groups of different relations or interests User demographic features (view 1): age, gender, job … User online behavior features (view 2): likes, comments, clicks …

Main challenges for multi-view clustering Data affinity is key to multi-view clustering: How to effectively use it? Correlated data under one view may not stay correlated under other views: How to obtain a consistent clustering result? Computational efficiency: How to deal with high-dimensional data?

Existing methods for multi-view clustering PairwiseSC [16]: co-regularize the eigenvectors of all views’ Laplacian matrices Co-Training [15]: Alternately modify one view’s graph structure using the other view’s information. Multi NMF [18]: A multi-view non-negative matrix factorization method to group the multi-view data. Multi SS [21]: A structure sparsity based multi-view clustering and feature learning framework.

Adaptive Multi-view Clustering via Cross Trace Lasso Then, we propose… Adaptive Multi-view Clustering via Cross Trace Lasso

NIPS 2011

Trace lasso A regularized least squares: trace lasso regularizer Properties: If the data matrix is column-wise normalized to have unit norm, then LASSO (Least Absolute Shrinkage and Selection Operator)

Trace lasso Consider two extreme cases: If all the predictors are orthogonal, then, it is equal to the L1-norm. Indeed, we have the decomposition: where ei are the vectors of the canonical basis. Since the predictors are orthogonal and the ei are orthogonal too, this gives the singular value decomposition of XDiag(w) and we get On the other hand, if all the predictors are equal to X(1) (first column of X), then Under other circumstances, trace Lasso behaves in between L1 norm and L2 norm.

Solution to Trace lasso based least squares: Iterative Reweighted Least Squares (IRLS)

Solution to Trace lasso based least squares: Iterative Reweighted Least Squares (IRLS)

The proposed method Given data matrices under different views: XA , XB (For ease of introduction, we give the two-view case.) An auto-regression framework Learn new feature representations Z that can not only reconstruct the data itself, but also reflect the underlying data correlation across views Use trace Lasso to enforce that the distributed feature representation derived from view A can adapt to the data correlation under view B, and vice versa. ZA and ZB are concatenated to form a larger matrix and are further subject to a joint low-rank constraint, which means corresponding columns of ZA and ZB are asked to be similar. The intuition is that no matter under which view, the same group of candidates should be selected to play equally important roles in reconstructing the target.

Solution to the proposed method

Experiments: databases Movies617 dataset : 617 movies with 17 labels extracted from IMDb. The two views are the 1878 keywords and the 1398 actors with a keyword used for at least 2 movies and an actor appeared in at least 3 movies. Animal dataset : 30475 images of 50 animals with six pre-extracted features for each image. PyramidHOG (PHOG), color-SIFT and SURF, are chosen as three views. UCI Handwritten Digit dataset : 76 Fourier coefficients of the character shapes and the 216 profile correlations as two views of the original dataset. Pascal VOC 2007 dataset: We use the Color feature and Bow feature as two-view visual representation. NUS WIDE dataset : select 500 images from each of the five classes with the most number of images for evaluation. we use color correlogram and wavelet texture as two-view representations

Compared baselines 1) S_Spectral: Use spectral clustering to cluster each view’s data and select the best clustering result. 2) S_LowRank: Use only single-view low-rank representation to construct the affinity matrix and then apply spectral clustering to cluster the dataset. We also report the best clustering results. 3) Combined: Concatenate features from two views and apply low-rank representation on the combined feature to perform clustering. 4) PairwiseSC: co-regularize the eigenvectors of all views’ Laplacian matrices. 5) Co-Training: Alternately modify one view’s graph structure using the other view’s information. 6) Multi_NMF: A multi-view non-negative matrix factorization method to group the multi-view data. Note that this method is not applicable on NUS dataset since it requires all non-negative input features. 7) Multi_SS: A structure sparsity based multi-view clustering and feature learning framework. The parameters in these methods are carefully selected in order to achieve their best results. Once K-means is used, it is run 20 times with random initialization. To measure the clustering results, we use accuracy (ACC) and normalized mutual information (NMI). Both mean and standard deviation are reported.

Experimental results: NMI S_Spectral and S_LowRank fail to give good results as they only rely on one source of information for the data partition. Low-rank representation on a simple concatenation of features from multiple views, tends to improve clustering performance to some extent, but is still less competitive. Feature vectors from different views are heterogeneous, making correlated data under different views less correlated after feature concatenation. Our auto-regression framework uses homogeneous features under each view and cross-regularizes the regression coefficients to adapt to the intrinsic data correlation across views.

Experimental results: Accuracy Multi SS uses structured sparsity encourages sparsity between views, but it makes this method less adaptive to data correlation within each view. PairwiseSC, Co Training and Multi NMF: capture the data affinity by either constructing similarity graphs or learning latent representations via matrix decomposition. Our method treats all data points as a global basis and learns distributed feature representations under the auto-regression framework. Our method beats them possibly because our regression coefficients reflect the intra & inter cluster structures more precisely via the adaptive ability of trace Lasso. Further in-depth investigation needed to see which way has a clear edge on capturing the data affinity.

Conclusion Data can have various feature modalities at the same time, which may not guarantee correlated data under one view still stay correlated under a different view. To better capture the data correlation across views, we have proposed an auto-regression framework in which the regression coefficients are asked to adapt to the data correlation from all views. Use trace Lasso regularizer, which flexibly adjusts the sparsity of the regression coefficients and makes sure that correlated data should fall into the same cluster. The resulting regression coefficient serves as a new distributed feature representation over the basis of all data points. An additional joint low-rank constraint is imposed to let the same samples have similar distributed feature representations across views.

Thank you! Q&A