Presentation is loading. Please wait.

Presentation is loading. Please wait.

How Microsoft Had Made Deep Learning Red-Hot in IT Industry Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014.

Similar presentations


Presentation on theme: "How Microsoft Had Made Deep Learning Red-Hot in IT Industry Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014."— Presentation transcript:

1 How Microsoft Had Made Deep Learning Red-Hot in IT Industry Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014

2 Self Introduction 鄢志杰  996 – studied in USTC from 1999 to 2008  Graduate student – studied in iFlytek speech lab from 2003 to 2008, supervised by Prof. Renhua Wang  Intern – worked in MSR Asia from 2005 to 2006  Visiting scholar – visited Georgia Tech in 2007  FTE – worked in MSR Asia since 2008  Research interests  Speech, deep learning, large-scale machine learning

3 In Today’s Talk  Deep learning becomes very hot in the past few years  How Microsoft had made deep learning hot in IT industry  Deep learning basics  Why Microsoft can turn all these ideas into reality  Further reading materials

4 How Hot is Deep Learning  “This announcement comes on the heels of a $600,000 gift Google awarded Professor Hinton’s research group to support further work in the area of neural nets.” – U. of T. website

5 How Hot is Deep Learning

6

7

8

9 Microsoft Had Made Deep Learning Hot in IT Industry  Initial attempts made by University of Toronto had shown promising results using DL in speech recognition on TIMIT phone recognition task  Prof. Hinton’s student visited MSR as an intern, good results were obtained on Microsoft Bing voice search task  MSR Asia and Redmond collaborated and got amazing results on Switchboard task, which shocked the whole industry

10 Microsoft Had Made Deep Learning Hot in IT Industry *figure borrowed from MSR principal researcher Li DENG

11 Microsoft Had Made Deep Learning Hot in IT Industry  Followed by others and results were confirmed in various different speech recognition tasks  Google / IBM / Apple / Nuance / 百度 / 讯飞  Continuously advanced by MSR and others  Expand to solve more and more problems  Image processing  Natural language processing  Search ……

12 Deep Learning From Speech to Image  ILSVRC-2012 competition on ImageNet  Classification task: classify an image into 1 of the 1,000 classes in your 5 bets airliner lifeboat school bus InstitutionError rate (%) University of Amsterdam29.6 XRCE/INRIA27.1 Oxford27.0 ISI26.2

13 Deep Learning From Speech to Image  ILSVRC-2012 competition on ImageNet  Classification task: classify an image into 1 of the 1,000 classes in your 5 bets airliner lifeboat school bus InstitutionError rate (%) University of Amsterdam29.6 XRCE/INRIA27.1 Oxford27.0 ISI26.2 SuperVision16.4

14 Deep Learning Basics  Deep learning  deep neural networks  multi-layer perceptron (MLP) with a deep structure (many hidden layers) Input layer Hidden layer Output layer W0W0 W1W1 Input layer Hidden layer Output layer W0W0 W1W1 Hidden layer W2W2 W3W3

15 Deep Learning Basics  Sounds not new at all? Sounds familiar like you’ve learned in class?  Things not change over the years  Network topology / activation functions / …  Backpropagation (BP)  Things changed recently  Data  Big data  General-purpose computing on graphics processing units (GPGPU)  “A bag of tricks” accumulated over the years

16 E.g. Deep Neural Network for Speech Recognition  Three key components that make DNN-HMM work Tied tri- phones as the basis units for HMM states Many layers of nonlinear feature transformation Long window of frames *figure borrowed from MSR senior researcher Dong YU

17 E.g. Deep Neural Network for Image Classification  The ILSVRC-2012 winning solution *figure copied from Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”

18 Scale Out Deep Leaning  Training speed was a major problem of DL  Speech recognition model trained with 1,800-hour data (~650,000,000 vector frames) costs 2 weeks using 1 GPU  Image classification model trained with ~1,000,000 figures costs 1 weeks using 2 GPUs*  How to scale out if 10x, 100x training data becomes available? *Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”

19 DNN-GMM-HMM  Joint work with USTC-MSRA Ph.D. program student, Jian XU ( 许 健, 0510)  The “DNN-GMM-HMM” approach for speech recognition*  DNN as hierarchical nonlinear feature extractor, trained using a sub-set of training data  GMM-HMM as acoustic model, trained using full data *Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNN- derived features in GMM-HMM based acoustic modeling for LVCSR”

20 DNN-GMM-HMM DNN- derived features PCAHLDA Tied-state WE-RDLT MMI sequence training CMLLR unsupervised adaptation  GMM-HMM modeling of DNN-derived features: combine the best of both worlds

21 Experimental Results  300hr DNN (18k states, 7 hidden layers) + 2,000hr GMM-HMM (18k states)*  Training time reduced from 2 weeks to 3-5 days *Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNN- derived features in GMM-HMM based acoustic modeling for LVCSR”

22 A New Optimization Method  Joint work with USTC-MSRA Ph.D. program student, Kai Chen ( 陈凯, 0700)  Using 20 GPUs, time needed to train a 1,800-hour acoustic model is cut from 2 weeks to 12 hours, without accuracy loss  The magic is to be published  We believe the scalability issue in DNN training for speech recognition is now solved!

23 Why Microsoft Can Do All These Good Things  Research  Bridge the gap between academia and industry via our intern and visiting scholar programs  Scale out from toy problems to real-world industry-scale applications  Product team  Solve practical issues and deploy technologies to serve users worldwide via our services  All together  We continuously improve our work towards larger scale, higher accuracy, and to tackle more challenging tasks  Finally  We have big-data + world-leading computational infrastructure

24 If You Want to Know More About Deep Learning  Neural networks for machine learning: https://class.coursera.org/neuralnets https://class.coursera.org/neuralnets  Prof. Hinton’s homepage:  DeepLearning.net:  Open-source  Kaldi (speech):  cuda-convent (image):

25 Thanks!


Download ppt "How Microsoft Had Made Deep Learning Red-Hot in IT Industry Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014."

Similar presentations


Ads by Google