MLSLP-2012 Learning Deep Architectures Using Kernel Modules (thanks collaborations/discussions with many people) Li Deng Microsoft Research, Redmond.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

A brief review of non-neural-network approaches to deep learning
Neural networks Introduction Fitting neural networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
ImageNet Classification with Deep Convolutional Neural Networks
What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
“Use of contour signatures and classification methods to optimize the tool life in metal machining” Enrique Alegrea, Rocío Alaiz-Rodrígueza, Joaquín Barreirob.
Artificial Neural Networks
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Deep Boltzman machines Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh.
What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Machine Learning Chapter 4. Artificial Neural Networks
A shallow introduction to Deep Learning
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Yang, Luyu.  Postal service for sorting mails by the postal code written on the envelop  Bank system for processing checks by reading the amount of.
Robustness to Adversarial Examples
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
An Artificial Neural Network Approach to Surface Waviness Prediction in Surface Finishing Process by Chi Ngo ECE/ME 539 Class Project.
Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Introduction to Deep Learning
Hand-written character recognition
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Learning Deep Generative Models by Ruslan Salakhutdinov
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
ECE 5424: Introduction to Machine Learning
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Article Review Todd Hricik.
Project 4: Facial Image Analysis with Support Vector Machines
Inception and Residual Architecture in Deep Convolutional Networks
Neural Networks CS 446 Machine Learning.
Schizophrenia Classification Using
Deep Belief Networks Psychology 209 February 22, 2013.
Deep Learning: Methodologies and Applications in Medical Imaging
CSSE463: Image Recognition Day 17
Handwritten Digits Recognition
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
Li Deng Microsoft Research, Redmond, USA
Neural Networks Advantages Criticism
Hyperparameters, bias-variance tradeoff, validation
Construct a Convolutional Neural Network with Python
Deep Architectures for Artificial Intelligence
Neuro-Computing Lecture 4 Radial Basis Function Network
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
network of simple neuron-like computing elements
CSSE463: Image Recognition Day 17
Deep Learning for Non-Linear Control
SVM-based Deep Stacking Networks
Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 9, 2001
CSSE463: Image Recognition Day 17
Model generalization Brief summary of methods
CSSE463: Image Recognition Day 17
Machine Learning Support Vector Machine Supervised Learning
CSSE463: Image Recognition Day 17
Introduction to Neural Networks
Presentation transcript:

MLSLP-2012 Learning Deep Architectures Using Kernel Modules (thanks collaborations/discussions with many people) Li Deng Microsoft Research, Redmond

Outline Deep neural net (“modern” multilayer perceptron) Hard to parallelize in learning  Deep Convex Net (Deep Stacking Net) Limited hidden-layer size and part of parameters not convex in learning  (Tensor DSN/DCN) and Kernel DCN K-DCN: combines elegance of kernel methods and high performance of deep learning Linearity of pattern functions (kernel) and nonlinearity in deep nets

Deep Neural Networks 3

4

5

“Stacked generalization” in machine learning: –Use a high-level model to combine low-level models –Aim to achieve greater predictive accuracy This principle has been reduced to practice: –Learning parameters in DSN/DCN (Deng & Yu, Interspeech- 2011; Deng, Yu, Platt, ICASSP-2012) –Parallelizable, scalable learning (Deng, Hutchinson, Yu, Interspeech-2012) Deep Stacking Network (DSN)

Many modules Still easily trainable Alternative linear & nonlinear sub-layers Actual architecture for digit image recognition (10 classes) MNIST: 0.83% error rate (LeCun’s MNIST site) DCN Architecture Example: L=

Anatomy of a Module in DCN 784 linear units linear units W rand W RBM h targets U=pinv(h)t x

From DCN to Kernel-DCN

Kernel-DCN

Nystrom Woodbury Approximation C

Eigenvalues of W, select 5000 samples enlarge

Nystrom Woodbury Approximation

K-DSN Using Reduced Rank Kernel Regression

Computation Analysis (run time) Evaluate FormulamultiplicationnonlinearityMemory MLP/DNN/DSN per layer KDSN/SVM/SVR (RBF kernel) per layer

K-DCN: Layer-Wise Regularization Two hyper-parameters in each module Tuning them using cross validation data Relaxation at lower modules Different regularization procedures Lower-modules vs. higher modules

SLT-2012 paper: