An Efficient Projection for L1-∞ Regularization

Slides:

Advertisements

Similar presentations

Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.

Advertisements

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Joint work with Irad Yavneh

A CTION R ECOGNITION FROM V IDEO U SING F EATURE C OVARIANCE M ATRICES Kai Guo, Prakash Ishwar, Senior Member, IEEE, and Janusz Konrad, Fellow, IEEE.

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

On Constrained Optimization Approach To Object Segmentation Chia Han, Xun Wang, Feng Gao, Zhigang Peng, Xiaokun Li, Lei He, William Wee Artificial Intelligence.

Learning Convolutional Feature Hierarchies for Visual Recognition

Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Transfer Learning for WiFi-based Indoor Localization

Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.

“Random Projections on Smooth Manifolds” -A short summary

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.

Image Denoising via Learned Dictionaries and Sparse Representations

Autoencoders Mostafa Heidarpour

Group Norm for Learning Latent Structural SVMs Overview Daozheng Chen (UMD, College Park), Dhruv Batra (TTI Chicago), Bill Freeman (MIT), Micah K. Johnson.

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization.

1 Blockwise Coordinate Descent Procedures for the Multi-task Lasso with Applications to Neural Semantic Basis Discovery ICML 2009 Han Liu, Mark Palatucci,

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Non Negative Matrix Factorization

DTU Medical Visionday May 27, 2009 Generative models for automated brain MRI segmentation Koen Van Leemput Athinoula A. Martinos Center for Biomedical.

Cs: compressed sensing

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Learning With Structured Sparsity

Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Transfer Learning for Image Classification Group No.: 15 Group member : Feng Cai Sauptik Dhar Sauptik.

Learning to Perceive Transparency from the Statistics of Natural Scenes Anat Levin School of Computer Science and Engineering The Hebrew University of.

Yair Weiss, CS HUJI Daphna Weinshall, CS HUJI Amnon Shashua, CS HUJI Yonina Eldar, EE Technion Ron Meir, EE Technion.

Joint Tracking of Features and Edges STAN BIRCHFIELD AND SHRINIVAS PUNDLIK CLEMSON UNIVERSITY ABSTRACT LUCAS-KANADE AND HORN-SCHUNCK JOINT TRACKING OF.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.

Markov Random Fields & Conditional Random Fields

Transfer Learning for Image Classification. Transfer Learning Approaches Leverage data from related tasks to improve performance:  Improve generalization.

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.

An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona.

Machine learning & object recognition Cordelia Schmid Jakob Verbeek.

Scalable Person Re-identification on Supervised Smoothed Manifold

Deep Feedforward Networks

Deeply learned face representations are sparse, selective, and robust

Compact Bilinear Pooling

Jinbo Bi Joint work with Jiangwen Sun, Jin Lu, and Tingyang Xu

Multiplicative updates for L1-regularized regression

Intrinsic images and shape refinement

Learning Deep L0 Encoders

Boosting and Additive Trees (2)

CSE 4705 Artificial Intelligence

Neural networks (3) Regularization Autoencoder

Machine Learning Basics

Image2Vec: Learning word and image representations for reasoning

Group Norm for Learning Latent Structural SVMs

K-means and Hierarchical Clustering

CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes

Example: Academic Search

Optimal sparse representations in general overcomplete bases

Objects as Attributes for Scene Classification

Sparse Learning Based on L2,1-norm

Sparse Principal Component Analysis

Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

Goodfellow: Chapter 14 Autoencoders

Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.

Presentation transcript:

An Efficient Projection for L1-∞ Regularization Ariadna Quattoni, Xavier Carreras, Michael Collins, Trevor Darrell MIT Computer Science and Artificial Intelligence Laboratory Problem: Efficient training of jointly sparse models in high dimensional spaces. Approach: The l1,∞ norm has been proposed for jointly sparse regularization. Contributions: We derive an efficient projected gradient method for l1,∞ regularization. Our projection works on O(n log n) time, same cost as l1 projection. We test our algorithm in a multi-task image annotation problem and show that our algorithm can discover jointly sparse solutions and leads to better performance than l2 and l1 regularization. Dataset: Image Annotation president actress team We use a Projected SubGradient method. Advantages: simple, scalable, guaranteed convergence rates. A convex function 40 top content words Image representation: Vocabulary Tree (Nister 2006) 11000 dimensions These methods have been proposed for: l2 regularization [Shalev-Shwartz et al. 2007] l1 regularization [Duchi et al. 2008] Convex constraints Differences are statistically significant Characterization of the solution: Joint Regularization Penalty Euclidean Projection into the l1-∞ ball How do we penalize solutions that use too many features? Coefficients for feature 2 Coefficients for task 2 Parameter III Parameter VI The l∞ norm on each row promotes non-sparsity on each row. Share features An l1 norm on the maximum absolute values of the coefficients across tasks promotes sparsity. Dataset: Indoor Scene Recognition Use few features Feature III Feature IV Feature I Feature II bakery bar Train station Application: Multitask Learning Mapping to a simpler problem 67 indoor scenes. Collection of Tasks Image representation: Similarities to a set of unlabeled images. 2000 dimensions. Joint Sparse Approximation The total cost of the algorithm is dominated by a sort of the entries of A The total cost is in the order of: