Presentation is loading. Please wait.

Presentation is loading. Please wait.

BigDL Deep Learning Library on HDInsight

Similar presentations


Presentation on theme: "BigDL Deep Learning Library on HDInsight"— Presentation transcript:

1 BigDL Deep Learning Library on HDInsight
5/23/2018 1:25 PM THR3040 BigDL Deep Learning Library on HDInsight Microsoft Ignite September , 2017 Xiaoyong Zhu, Microsoft Sergey Ermolin, Intel © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 BigDL Deep Learning Library on HDInsight
Microsoft Ignite September , 2017 Xiaoyong Zhu, Microsoft Sergey Ermolin, Intel

3 BIGDL WITHIN SPARK FRAMEWORK
End-to-end Big Data Analytics with Deep Learning Functionalities Directly on Spark Natively integrated with Big Data (Hadoop/Spark) ecosystem Massively distributed, scale out Sends compute to data Fault tolerance Elasticity Incremental scaling Dynamic resource sharing BigDL

4 BigDL features BigDL Python API Scala API Examples Documents Seq2Seq
Vgg ResNet Lenet Inception SGD Adagrad Cross Entropy Distributed Training Batch Normalization Other 100+ Layers Tensor MKL Integration Spatial Convolution RELU LRN RNN Pooling BigDL

5 https://software.intel.com/bigdl
BigDL Features Distributed Deep learning applications (training, fine-tuning & prediction) on Apache Spark* No changes to the existing Hadoop/Spark clusters needed

6 https://github.com/intel-analytics/BigDL
BIGDL benefits Allows to write deep learning applications as standard Spark programs Runs on top of existing Spark or Hadoop/Hive clusters Adds rich Deep Learning functionalities to Apache Spark Feature parity with Caffe and TensorFlow. High performance - Intel MKL and multi-threaded programming Efficient scale-out with an all-reduce communications on Spark BigDL has been open-sourced since 2016:

7 BigDL can re-use/fine-tune models from other frameworks
BigDL Model File Load existing Caffe/Torch/TF Model Allows for transition from single-node to distributed application deployment Useful for inference Allows for minor model tuning Allows for model sharing between Data Scientists and Production Engr. Scoring can be done *outside of Spark*, as a Java app Caffe Model File Load BigDL TensorFlow Model File Save Torch Model File Storage

8 BigDL integration with spark streaming
Integration with Spark Streaming for runtime training and prediction HDFS/S3 Kafka Flume Kinesis Twitter BigDL Model RDDs Train Spark Streaming Evaluator StreamWriter Predict

9 https://software.intel.com/bigdl
Python API Support Based on PySpark, Python API in BigDL allows use of existing Python libs: Numpy Scipy Pandas Scikit-learn Matplotlib $pip install bigdl

10 Jupyter Notebook support
Running BigDL applications directly in Jupyter notebooks Share and Reproduce Notebooks can be shared with others Easy to reproduce and track Rich Content Texts, images, videos, LaTeX and JavaScript Code can also produce rich contents Rich toolbox Apache Spark, from Python, R and Scala Pandas, scikit-learn, ggplot2, dplyr, etc

11 Visualization of optimization process - tensorboard
BigDL integration with TensorBoard TensorBoard is a suite of web applications from Google for visualizing and understanding deep learning applications

12 HDInsight on Linux Overview

13 HDInsight (Linux) supports…
Hive & Hive LLAP & Standard Hadoop:  ETL, reporting, ad hoc queries, data mining and analysis, log analysis, data warehousing… Spark: real-time analysis, streaming analysis, machine learning, ETL, graph analysis, real-time SQL query R Server: advanced analytics over big data, machine learning, statistical analysis Hbase & Phoenix: No SQL storage with SQL friendly interfaces (Phoenix), suitable for key-value store or schema-changing logs Storm: real-time streaming analysis Kafka: high throughput data ingestion engine

14 Scale compute & storage independently
Gateway nodes Head Worker Edge Zookeeper nodes Azure Blob Storage or Azure Data Lake Store

15 Demo

16 Train a CNN model on MNIST dataset
Install BigDL on HDInsight – easy as 1-2-3 Configure Spark settings Set up BigDL parameters Set up network topologies Run, train, and see results

17 Set up HDInsight Cluster in a few steps

18 Monitor HDInsight Cluster via Ambari GUI

19 BigDL is easily installed and built (“Deploy to Azure”)

20 Spark Session configuration

21

22 Network Layout

23

24

25

26 To learn more about BigDL + HDInsight
github.com/intel-analytics/BigDL software.intel.com/bigdl w-to-use-bigdl-on-apache-spark-for-azure-hdinsight/

27

28 Please evaluate this session
Tech Ready 15 5/23/2018 Please evaluate this session From your Please expand notes window at bottom of slide and read. Then Delete this text box. PC or tablet: visit MyIgnite Phone: download and use the Microsoft Ignite mobile app Your input is important! © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

29 5/23/2018 1:25 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "BigDL Deep Learning Library on HDInsight"

Similar presentations


Ads by Google