Download presentation
Presentation is loading. Please wait.
1
Machine Learning with Databricks
SQL Saturday Madison 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
2
Thanks to our GOLD Sponsor
3
Thanks to our SILVER Sponsors
4
© 2018 TALAVANT. All Rights Reserved.
Introductions Daniel Woods Senior BI Talavant Working with BI since 2015; Started as a BI Developer Transitioned to Consulting in 2018 Interested in the marriage between BI and Mathematics 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
5
© 2018 TALAVANT. All Rights Reserved.
5/21/2019 © 2018 TALAVANT. All Rights Reserved.
6
© 2018 TALAVANT. All Rights Reserved.
Agenda Discuss Machine Learning and it’s uses within a business Overview of Databricks and it’s role with Machine Learning Demo Overview Demo Debrief Wrap-up 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
7
© 2018 TALAVANT. All Rights Reserved.
Machine Learning Machine Learning (ML) is an application of artificial intelligence (AI) that provides systems the ability to learn and improve from experience There are 4 main types of Machine Learning: Supervised unsupervised, semi-supervised and Reinforcement In business, ML can be used to identify and act upon trends in data that lead to better outcomes For example, ML can be used to predict customer churn 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
8
Where Databricks comes in
Databricks is an online, collaborative notebook space that runs with Apache Spark. Although it is an independent product, Microsoft Azure incorporates it into it’s solution space, making it easily accessible to any of it’s other cloud services Databricks can run Python, SQL, R or Scala, so it has great flexibility when it comes to Data Science and Engineering 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
9
© 2018 TALAVANT. All Rights Reserved.
Why Databricks? Databricks distributes many tasks across worker nodes that can help speed-up long processes. This environment is ideal for Machine Learning programs, since we’re often looking at larger data sets and running programs that require a good amount of compute power and memory 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
10
© 2018 TALAVANT. All Rights Reserved.
Demo Overview 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
11
© 2018 TALAVANT. All Rights Reserved.
The Dataset Telecom Customer Churn data Gathered from Kaggle Keep an eye on the dataset properties 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
12
Developing a Churn Prediction Model
Mount data from Azure Blob Storage Exploratory Data Analysis Reduce the Feature Set Boolean Encoding One-Hot Encoding Vector Assembly Model Development Save the Model 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
13
© 2018 TALAVANT. All Rights Reserved.
Demo 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
14
Machine Learning and Databricks
Despite a small data set, we see how we can still develop a model that can be used to predict customer churn However, the F1-score of the GBT model ( as well as the others) is not ideal, and circles back to the quality of the data set When looking to implement machine learning, the most important part is ensuring that your company is prepared for the journey 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
15
© 2018 TALAVANT. All Rights Reserved.
Thank you! Daniel Woods LinkedIn: If you are able to, Talavant has a survey we’d like for you to complete at There will be a drawing for an Amazon Gift Card for those that complete the survey 5/21/2019 © 2018 TALAVANT. All Rights Reserved.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.