1 Blockwise Coordinate Descent Procedures for the Multi-task Lasso with Applications to Neural Semantic Basis Discovery ICML 2009 Han Liu, Mark Palatucci,

Slides:



Advertisements
Similar presentations
FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.
Advertisements

Chapter 09 AI techniques in different game genres (Puzzle/Card/Shooting)
Neural Networks and Kernel Methods
Slides from: Doug Gray, David Poole
Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )
Regularization David Kauchak CS 451 – Fall 2013.
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Generalizing Backpropagation to Include Sparse Coding David M. Bradley and Drew Bagnell Robotics Institute Carnegie.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
1 Part I Artificial Neural Networks Sofia Nikitaki.
Predictive Modeling of Spatial Properties of fMRI Response Predictive Modeling of Spatial Properties of fMRI Response Melissa K. Carroll Princeton University.
1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.
Hidden Process Models with applications to fMRI data Rebecca Hutchinson Oregon State University Joint work with Tom M. Mitchell Carnegie Mellon University.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Hidden Process Models Rebecca Hutchinson Tom M. Mitchell Indrayana Rustandi October 4, 2006 Women in Machine Learning Workshop Carnegie Mellon University.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Semi-Stochastic Gradient Descent Methods Jakub Konečný (joint work with Peter Richtárik) University of Edinburgh SIAM Annual Meeting, Chicago July 7, 2014.
Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.
Online Learning Algorithms
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Non Negative Matrix Factorization
DTU Medical Visionday May 27, 2009 Generative models for automated brain MRI segmentation Koen Van Leemput Athinoula A. Martinos Center for Biomedical.
Fast and incoherent dictionary learning algorithms with application to fMRI Authors: Vahid Abolghasemi Saideh Ferdowsi Saeid Sanei. Journal of Signal Processing.
Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
An Efficient Greedy Method for Unsupervised Feature Selection
Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.
1 Modeling the fMRI signal via Hierarchical Clustered Hidden Process Models Stefan Niculescu, Tom Mitchell, R. Bharat Rao Siemens Medical Solutions Carnegie.
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural representation and decoding of the meanings of words
StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent
Jinbo Bi Joint work with Tingyang Xu, Chi-Ming Chen, Jason Johannesen
Artificial Neural Networks
Deep Learning Amin Sobhani.
Learning with Perceptrons and Neural Networks
Boosting and Additive Trees (2)
CSE 4705 Artificial Intelligence
A Simple Artificial Neuron
Machine Learning Today: Reading: Maria Florina Balcan
Logistic Regression & Parallel SGD
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
School of Computer Science, Carnegie Mellon University
Learning Theory Reza Shadmehr
Large scale multilingual and multimodal integration
Support Vector Machine I
Sparse Principal Component Analysis
Mathematical Foundations of BME
Humanoid Motion Planning for Dual-Arm Manipulation and Re-Grasping Tasks Nikolaus Vahrenkamp, Dmitry Berenson, Tamim Asfour, James Kuffner, Rudiger Dillmann.
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Primal Sparse Max-Margin Markov Networks
An Efficient Projection for L1-∞ Regularization
Reinforcement Learning (2)
Reinforcement Learning (2)
Logistic Regression Geoff Hulten.
Presentation transcript:

1 Blockwise Coordinate Descent Procedures for the Multi-task Lasso with Applications to Neural Semantic Basis Discovery ICML 2009 Han Liu, Mark Palatucci, Jian Zhang Carnegie Mellon University Purdue University June 16, 2009 TexPoint fonts used in EMF: AAAAA A A A A AA A

2 Overview Sparsity and Multi-task Lasso Review Lasso and Multi-task Lasso (MTL) Discuss a new technique for solving MTL that is very fast and highly scalable Cognitive Neuroscience Goal: Want to predict the brain’s neural activity in response to some stimulus Show how we can use MTL to learn good features for prediction

3 Sparse Models Model Output Prediction Features: Relevant Features

4 Sparsity through Regularization Model: Loss Function + Penalty Term Squared Loss Sum of Absolute Weights Lasso

5 Joint Sparsity of Related Tasks Model Output Prediction Relevant Features Output Prediction-2 Output Prediction-3 Model

6 Joint Sparsity: Shared Features Model Output Prediction Shared Features: Relevant Features Output Prediction-2 Output Prediction-3

7 Lasso Model Multi-Task Lasso Model a.k.a Sum of sup-norm Sum over K tasks

8 Lasso Penalty Multi-Task Lasso Penalty Learns sparse solution: most elements zero, only relevant features non-zero Learns row-sparse solution: most rows zero, some rows have non-zero elements in each column Features Tasks

9 Solving the Lasso and Multi-task Lasso Lasso: LARS, Interior Point, Primal/Dual, Gradient Projection, Coordinate Descent Multi-task Lasso Interior Point (Turlach et al. 2004) Gradient Projection (Quattoni et al. 2009) Coordinate Descent (this work 2009)

10 What’s the best method? Single-task Lasso: Coordinate Descent Friedman, Hastie, Tibshirani (2008) Wu, Lange (2008) Duchi, Shalev-Shwartz, Singer, and Chandra (2008)

11 Coordinate Descent 1.Each iteration can be computed using a closed-form soft- thresholding operator 2.If solution vector is truly sparse, it can avoid updating irrelevant parameters through a simple check 3.Many computational tricks – warm start, covariance pre- computing, adaptive/greedy updates Why is coordinate descent so good for the Lasso? Can we develop a coordinate descent procedure for the Multi-task Lasso that has a similar closed-form update for each iteration?

12 Coordinate Descent for Multi-task Lasso Yes we can! Lasso: Multi-task Lasso: Main result: Can generalize soft-thresholding to multiple tasks using a Winsorization operator

13 Take Out Points from Part I Can efficiently solve the multi-task Lasso using a closed-form Winsorization operator that’s easy-to-implement This leads to a Coordinate Descent method that can take advantage of all the usual computational tricks

14 Overview Sparsity and Multi-task Lasso Review Lasso and Multi-task Lasso (MTL) Discuss a new technique for solving MTL that is very fast and highly scalable Cognitive Neuroscience Goal: Want to predict the brain’s neural activity in response to some stimulus Show how we can use MTL to learn good features for prediction

15 Neural Activity Prediction Model “apple” Mitchell et al. (2008) 0.836, 0.346, 0.000, … “airplane” eat | taste | ride, … 0.01, 0.001, , … Predicted activity

16 Neural Activity Prediction Can we train a model to automatically discover a good set of semantic features?

17 Neural Activity Prediction Model “apple” Large number of possible features This is a multi-task Lasso problem! Each “task” is the neural activity of a voxel

18 Evaluation Data Response: Collected fMRI images of neural activity for 60 words (5 examples from 12 categories)

19 BODY PARTSlegarmeyefoothand FURNITURE chairtablebeddeskdresser VEHICLES carairplanetraintruckbicycle ANIMALS horsedogbearcowcat KITCHEN UTENSILSglassknifebottlecupspoon TOOLS chiselhammerscrewdriverplierssaw BUILDINGS apartmentbarnhousechurchigloo PART OF A BUILDINGwindowdoorchimneyclosetarch CLOTHING coatdressshirtskirtpants INSECTS flyantbeebutterflybeetle VEGETABLESlettucetomatocarrotcorncelery MAN MADE OBJECTSrefrigeratorkeytelephonewatchbell Categories 60 Exemplars Data – fMRI Examples

20 Evaluation Data Response: Collected fMRI images of neural activity for 60 words (5 examples from 12 categories) Design: Each word represented as co-occurrence vector with 5,000 most frequent words in English

21 Evaluation Experiment Train model using 58 of 60 word stimuli (leave-two-out-cross- validation) Run Multi-task Lasso to select features from co-occurrences with 5,000 most frequent words Use those features in same linear model in Mitchell et al. (2008) Apply to predict activation for 2 held-out-words Label 2 held-out-words using cosine similarity with prediction Repeat for all (60 choose 2) = 1,770 iterations

22 “celery” “airplane” Predicted:Observed: fMRI activation high below average average Predicted and observed fMRI images for “celery” and “airplane” after training on 58 other words. [Mitchell et al. 08]

23 Results Random 25 features (words) Handcrafted 25 features from domain experts Learned features using Multi-task Lasso (set size of 25, 140, 350 features)

24 Results

25 Interpreting the Model 25 Top Features (Words) Selected by Multi-task Lasso ToolsCarDogWinePottery ModelStationBedroomBreakfastCup MadRentalsFishingCakeTip ArmsWalkCleaningCheeseGay RightWhiteFrontContentsResult

26 Interpreting the Model How does a feature contribute to observed neural activation?

27 Interpreting the Model Analyzing the weights learned for the “Tools” feature: Postcentral gyrus believed associated with pre-motor planning Superior temporal sulcus believed associated with perception of biological motion

28 Take Out Points: Multi-task Lasso Can learn common features of related tasks Scalable to thousands of features and tasks found with our coordinate descent procedure. Can build interpretable models useful for natural science applications Can learn features for neural prediction that perform better than handcrafted features on majority of fMRI subjects See Liu, Palatucci, and Zhang 2009: Blockwise Coordinate Descent Procedures for the Multi-task Lasso

29 Thanks to: ICML 2009 TexPoint fonts used in EMF: AAAAA A A A A AA Tom Mitchell W.M. Keck Foundation National Science Foundation Intel Research Google