MLSLP-2012 Learning Deep Architectures Using Kernel Modules (thanks collaborations/discussions with many people) Li Deng Microsoft Research, Redmond.

Slides:

Advertisements

Similar presentations

Introduction to Support Vector Machines (SVM)

Advertisements

A brief review of non-neural-network approaches to deep learning

Neural networks Introduction Fitting neural networks

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Support vector machine

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

ImageNet Classification with Deep Convolutional Neural Networks

What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.

Machine Learning Neural Networks

Lecture 14 – Neural Networks

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

“Use of contour signatures and classification methods to optimize the tool life in metal machining” Enrique Alegrea, Rocío Alaiz-Rodrígueza, Joaquín Barreirob.

Artificial Neural Networks

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

Deep Boltzman machines Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh.

What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Machine Learning Chapter 4. Artificial Neural Networks

A shallow introduction to Deep Learning

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Yang, Luyu.  Postal service for sorting mails by the postal code written on the envelop  Bank system for processing checks by reading the amount of.

Robustness to Adversarial Examples

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

An Artificial Neural Network Approach to Surface Waviness Prediction in Surface Finishing Process by Chi Ngo ECE/ME 539 Class Project.

Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.

Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Introduction to Deep Learning

Hand-written character recognition

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Machine Learning Supervised Learning Classification and Regression

Neural networks and support vector machines

Learning Deep Generative Models by Ruslan Salakhutdinov

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

ECE 5424: Introduction to Machine Learning

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Article Review Todd Hricik.

Project 4: Facial Image Analysis with Support Vector Machines

Inception and Residual Architecture in Deep Convolutional Networks

Neural Networks CS 446 Machine Learning.

Schizophrenia Classification Using

Deep Belief Networks Psychology 209 February 22, 2013.

Deep Learning: Methodologies and Applications in Medical Imaging

CSSE463: Image Recognition Day 17

Handwritten Digits Recognition

Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.

Li Deng Microsoft Research, Redmond, USA

Neural Networks Advantages Criticism

Hyperparameters, bias-variance tradeoff, validation

Construct a Convolutional Neural Network with Python

Deep Architectures for Artificial Intelligence

Neuro-Computing Lecture 4 Radial Basis Function Network

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

network of simple neuron-like computing elements

CSSE463: Image Recognition Day 17

Deep Learning for Non-Linear Control

SVM-based Deep Stacking Networks

Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 9, 2001

CSSE463: Image Recognition Day 17

Model generalization Brief summary of methods

CSSE463: Image Recognition Day 17

Machine Learning Support Vector Machine Supervised Learning

CSSE463: Image Recognition Day 17

Introduction to Neural Networks

Presentation transcript:

MLSLP-2012 Learning Deep Architectures Using Kernel Modules (thanks collaborations/discussions with many people) Li Deng Microsoft Research, Redmond

Outline Deep neural net (“modern” multilayer perceptron) Hard to parallelize in learning  Deep Convex Net (Deep Stacking Net) Limited hidden-layer size and part of parameters not convex in learning  (Tensor DSN/DCN) and Kernel DCN K-DCN: combines elegance of kernel methods and high performance of deep learning Linearity of pattern functions (kernel) and nonlinearity in deep nets

Deep Neural Networks 3

4

5

“Stacked generalization” in machine learning: –Use a high-level model to combine low-level models –Aim to achieve greater predictive accuracy This principle has been reduced to practice: –Learning parameters in DSN/DCN (Deng & Yu, Interspeech- 2011; Deng, Yu, Platt, ICASSP-2012) –Parallelizable, scalable learning (Deng, Hutchinson, Yu, Interspeech-2012) Deep Stacking Network (DSN)

Many modules Still easily trainable Alternative linear & nonlinear sub-layers Actual architecture for digit image recognition (10 classes) MNIST: 0.83% error rate (LeCun’s MNIST site) DCN Architecture Example: L=

Anatomy of a Module in DCN 784 linear units linear units W rand W RBM h targets U=pinv(h)t x

From DCN to Kernel-DCN

Kernel-DCN

Nystrom Woodbury Approximation C

Eigenvalues of W, select 5000 samples enlarge

Nystrom Woodbury Approximation

K-DSN Using Reduced Rank Kernel Regression

Computation Analysis (run time) Evaluate FormulamultiplicationnonlinearityMemory MLP/DNN/DSN per layer KDSN/SVM/SVR (RBF kernel) per layer

K-DCN: Layer-Wise Regularization Two hyper-parameters in each module Tuning them using cross validation data Relaxation at lower modules Different regularization procedures Lower-modules vs. higher modules

SLT-2012 paper: