Presentation is loading. Please wait.

Presentation is loading. Please wait.

Build interactive data analysis environments using Apache Spark

Similar presentations


Presentation on theme: "Build interactive data analysis environments using Apache Spark"— Presentation transcript:

1 Build interactive data analysis environments using Apache Spark
Microsoft 2016 5/29/2018 4:13 PM BRK3226 Build interactive data analysis environments using Apache Spark Maxim Lukiyanov Senior Program Manager, Big Data © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 Agenda How it all fits together Components Resource management
5/29/2018 4:13 PM Agenda How it all fits together Components Apache Spark, Notebooks, Job submission server, BI Tools, Developer Tools, Azure Cloud Resource management © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

3 What is your top concern for big data projects?

4 Length of Development Cycle
#1 © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 Length of development cycle
Universal metric to track and improve Affects productivity Predicts project risk

6 Development phases Data exploration and experimentation Data sharing
Development of production code Debugging

7 Interactive Spark on Azure
YARN Jupyter notebooks Default Queue Local HDFS Spark Application IntelliJ/Eclipse Spark Application Livy server REST Spark Application Blob Storage Command line SSH Thrift Queue BI Tools Spark Application Thrift server ODBC Data Lake Store

8 Components

9 Apache Spark Interactive compute engine New in Spark 2.0
Interactive on small datasets Interactive on large datasets on large clusters with in-memory or SSD caching Built-in sampling New in Spark 2.0 Tungsten Phase 2 (3-10x speedup) Structured Streams Great momentum Active and large community Supported by all major big data vendors Fast release cadence

10 Evolution of big data Data Sources

11 Spark on Azure Cloud (HDInsight)
Fully Managed Service 100% open source Apache Spark and Hadoop bits Latest releases of Spark (2.0 is coming later this week) Fully supported by Microsoft and Hortonworks 99.9% Azure Cloud SLA Certifications: PCI, ISO 27018, SOC, HIPAA, EU-MC Tools for data exploration, experimentation and development Jupyter Notebooks (scala, python, automatic data visualizations) IntelliJ/Eclipse plugin (job submission, remote debugging) ODBC connector for Power BI, Tableau, Qlik, SAP, Excel, etc

12 Demo: Components in action
Maxim Lukiyanov

13 Resource Management

14 Interactive Spark on Azure
YARN Jupyter notebooks Default Queue Local HDFS Spark Application IntelliJ/Eclipse Spark Application Livy server REST Spark Application Blob Storage Command line SSH Thrift Queue BI Tools Spark Application Thrift server ODBC Data Lake Store

15 Yarn resource management
Dynamic resource allocation (Thrift) Thrift server adds executors when processing SQL queries After timeout it shrinks back Resource preemption (between queues) Thrift will take resources from other apps during activity and vice versa When multiple apps are active the resources are shared fairly

16 Yarn resource management: Limitations
Bugs Capacity resource scheduler + Default resource calculator configuration works Dominant resource calculator breaks preemption logic Limitations No resource preemption between applications No application sharing between notebooks in Livy

17 Summary Components Techniques Apache Spark
Jupyter + sparkmagic kernel (or Zeppelin) Livy job server Apache Yarn resource management using queues and preemption Columnar file formats (parquet, orc) IntelliJ IDEA + plugin for HDInsight [Non-OSS] BI Tools: Power BI, Tableau, Qlik, SAP, Excel, etc Azure Cloud Techniques Sample, sample, sample CACHE TABLE (or auto-caching using Alluxio) Scale out on demand using elasticity of the cloud

18 Resources SparkMagic kernel for Jupyter notebook Livy job server
Livy job server IntelliJ IDEA plug-in documentation NYTaxi data science notebooks

19 Q & A Maxim Lukiyanov

20 Free IT Pro resources To advance your career in cloud technology
Microsoft Ignite 2016 5/29/2018 4:13 PM Free IT Pro resources To advance your career in cloud technology Plan your career path Microsoft IT Pro Career Center Cloud role mapping Expert advice on skills needed Self-paced curriculum by cloud role $300 Azure credits and extended trials Pluralsight 3 month subscription (10 courses) Phone support incident Weekly short videos and insights from Microsoft’s leaders and engineers Connect with community of peers and Microsoft experts Get started with Azure Microsoft IT Pro Cloud Essentials Demos and how-to videos Microsoft Mechanics Connect with peers and experts Microsoft Tech Community © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

21 Please evaluate this session
5/29/2018 4:13 PM Please evaluate this session Your feedback is important to us! From your PC or Tablet visit MyIgnite at From your phone download and use the Ignite Mobile App by scanning the QR code above or visiting © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

22 5/29/2018 4:13 PM © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Build interactive data analysis environments using Apache Spark"

Similar presentations


Ads by Google