Presentation is loading. Please wait.

Presentation is loading. Please wait.

MLSLP-2012 Learning Deep Architectures Using Kernel Modules (thanks collaborations/discussions with many people) Li Deng Microsoft Research, Redmond.

Similar presentations


Presentation on theme: "MLSLP-2012 Learning Deep Architectures Using Kernel Modules (thanks collaborations/discussions with many people) Li Deng Microsoft Research, Redmond."— Presentation transcript:

1 MLSLP-2012 Learning Deep Architectures Using Kernel Modules (thanks collaborations/discussions with many people) Li Deng Microsoft Research, Redmond

2 Outline Deep neural net (“modern” multilayer perceptron) Hard to parallelize in learning  Deep Convex Net (Deep Stacking Net) Limited hidden-layer size and part of parameters not convex in learning  (Tensor DSN/DCN) and Kernel DCN K-DCN: combines elegance of kernel methods and high performance of deep learning Linearity of pattern functions (kernel) and nonlinearity in deep nets

3 Deep Neural Networks 3

4 4

5 5

6 “Stacked generalization” in machine learning: –Use a high-level model to combine low-level models –Aim to achieve greater predictive accuracy This principle has been reduced to practice: –Learning parameters in DSN/DCN (Deng & Yu, Interspeech- 2011; Deng, Yu, Platt, ICASSP-2012) –Parallelizable, scalable learning (Deng, Hutchinson, Yu, Interspeech-2012) Deep Stacking Network (DSN)

7 Many modules Still easily trainable Alternative linear & nonlinear sub-layers Actual architecture for digit image recognition (10 classes) MNIST: 0.83% error rate (LeCun’s MNIST site) DCN Architecture Example: L=3 784 3000 10 3000 784 10...

8 Anatomy of a Module in DCN 784 linear units 10 10 linear units 784 3000 10 3000 784 10 W rand W RBM h targets U=pinv(h)t x

9 From DCN to Kernel-DCN

10 Kernel-DCN

11 Nystrom Woodbury Approximation C

12 Eigenvalues of W, select 5000 samples enlarge

13 Nystrom Woodbury Approximation

14 K-DSN Using Reduced Rank Kernel Regression

15 Computation Analysis (run time) Evaluate FormulamultiplicationnonlinearityMemory MLP/DNN/DSN per layer KDSN/SVM/SVR (RBF kernel) per layer

16 K-DCN: Layer-Wise Regularization Two hyper-parameters in each module Tuning them using cross validation data Relaxation at lower modules Different regularization procedures Lower-modules vs. higher modules

17 SLT-2012 paper:

18


Download ppt "MLSLP-2012 Learning Deep Architectures Using Kernel Modules (thanks collaborations/discussions with many people) Li Deng Microsoft Research, Redmond."

Similar presentations


Ads by Google