Breeding Data Scientists

Slides:



Advertisements
Similar presentations
MIX 09 4/15/ :14 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Advertisements

Introduction to Big Data and Hadoop Name Title Microsoft Corporation.
demo Cloud Storage WA Blobs Schema Management APIs & Portal Web Roles Integration Pipeline 3 rd Party Web Services 3 rd Party Store 3 rd Party.
SQL SERVER 2012 FOR THE NEW WORLD OF DATA Doug Leland General Manager SQL Server Marketing.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

Breaking points of traditional approach What if you could handle big data?
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

Machine Learning & Data Science Conference
BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHT
1/27/2018 5:13 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
IT Operations Management
All about Ashley GmbH COMMUNICATION PARTNERS Partner overview.
Data Platform and Analytics Foundational Training
Data Platform and Analytics Foundational Training
Data Platform and Analytics Foundational Training
Examine information management in Cortana Intelligence
Cortana Intelligence Overview
S4 Solution Specialist Sales Summit
Creating Enterprise Grade BI Models with Azure Analysis Services
Orchestrating Data and Services with Azure Data Factory
Microsoft Azure: The only consistent Hybrid Cloud
Developing Hybrid Apps on Microsoft Azure Stack
AI development using Data Science Virtual Machines (DSVM) in Azure
Machine Learning in practice
Enable the Hybrid Data Platform
Data Platform and Analytics Foundational Training
7/4/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
IT Operations Management
Data Platform and Analytics Foundational Training
Microsoft Azure P wer Lunch
Microsoft Virtual Academy
Create and publish reports with Power BI for desktop
Add intelligence to Dynamics AX with Cortana Intelligence suite
Office 365 Summit – Power BI
Accelerate your advanced analytics practice using solution templates
9/19/2018 5:55 AM How Microsoft does IT: Modern Cloud management with Operations Management Suite Seth Malcolm IT Showcase © Microsoft Corporation. All.
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Melbourne Azure Meetup
Turning back time … … to 1998.
Azure Data Catalog Adoption Patterns and Best Practices
Business Intelligence for Project Server/Online
DevOps Fundamentals Configuration Management
Azure Active Directory
Dive into Predictive Maintenance using Cortana Intelligence Suite
Ed oms team OMS: Log Analytics Ed oms team.
11/22/2018 1:43 PM THR3005 How to provide business insight from your data using Azure Analysis Services Peter Myers Bitwise Solutions © Microsoft Corporation.
Mobile Center and VSTS:​ Better together for your Mobile DevOps
The Internet of Things (IoT) from the back-end perspective
Build /2/ The future of Azure devops: Building and managing cloud applications lifecycle across your teams Bradley Millington Program.
DevOps Fundamentals Automated Testing
Microsoft Virtual Academy
12/26/2018 1:44 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
DevOps Fundamentals Continuous Integration
2/25/2019.
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
HDInsight Tools for Visual Studio
Виктор Хаджийски Катедра “Металургия на желязото и металолеене”
Microsoft Corporation
What’s new in Visual Studio 2012
Шитманов Дархан Қаражанұлы Тарих пәнінің
Office 365 Development July 2014.
Build /27/2019 © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION.
Customer 360.
Presentation transcript:

Breeding Data Scientists Danielle Dean, PhD Senior Data Scientist Lead, Microsoft Amy O’Connor Business Value Enablement, Cloudera

Five changes in the world of the Data Scientist More Data, Insights, Results Organization & Culture Data Engineering Productivity Tools Cloud Enabled

More Data, More Insights Data is abundant, diverse & shared freely As is how we store, process and analyze it Streaming Machine Learning BI ETL Modeling

More Results Working to Cure Cancer Rocket Science Destroying Human Trafficking Networks Working to Cure Cancer Rocket Science Top Cancer Research Institutions Thorn

Organization & Culture: Sobering Statistics “Only 27% of the big data projects are regarded as successful” Only 13% of organizations have achieved full-scale production for their Big Data implementations “Only 8% of the big data projects are regarded as VERY successful” “Only 17% of survey respondents said they had a well-developed Predictive/Prescriptive Analytics program in place, while 80% said they planned on implementing such a program within five years” Dataversity 2015 Survey Source: CapGemini 2014

Math and Statistical Knowledge Substantive Expertise The Data Scientist is not one person Source: Drew Conway Curiosity Traditional Research Data Science Danger Zone Machine Learning Hacking Skills Math and Statistical Knowledge Substantive Expertise

The Data Scientist does not stand alone Executive Sponsor Data Engineer/ETL Engineer Data Scientist + Product Owner, app developer, program manager, devOps etc Subject Matter Expert Data Steward/SME

The Data Scientist does not sit in a centralized org Source: Gartner 2016

“How do I become a Data Scientist?”

“How do I become a Data Scientist?”

Machine Learning & Data Science Conference Importance of Process 4/14/2018 9:37 AM Data Science != Software Engineering But, we can learn a lot, especially on processes after all…Failing to plan is planning to fail Data Science 1. Data Problem Formulation 6. Model evaluation and tuning 7 . Model Deployment 2. Acquire Data Sources 3. Data exploration 4. Create analytics dataset 5. Modeling & Descriptive Analysis Data Acquisition 1. Data Flow Architecture 2. Data Schema Architecture 2. Feature Extraction 3. Data Flow Implementation 4. Data Flow Validation © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Machine Learning & Data Science Conference Four Pillars of the Team Data Science Process 4/14/2018 9:37 AM Standard Project Lifecycle Standardized Document Templates, Project Structure Shared, Distributed Resources Productivity Tools, Shared Utilities 1 2 3 4 © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Team Data Science Process at Microsoft Data science virtual machines (DSVMs) as the fundamental development platform on cloud Use Visual Studio Team Services (VSTS) Work item tracking and scrum planning Git repositories Shared data science utilities in Git repository Use cloud-based Azure resources as needed

Data Engineering – ready for ML? 4/14/2018 9:37 AM Data Engineering – ready for ML? The better the raw materials, the better the product. Question is sharp. Data measures what they care about. Data is accurate. Data is connected. A lot of data. E.g. Predict whether component X will fail in the next Y days; clear path of action with answer E.g. Identifiers at the level they are predicting E.g. Failures are really failures, human labels on root causes; domain knowledge translated into process E.g. Machine information linkable to usage information E.g. Will be difficult to predict failure accurately with few examples © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

A Bit more on Data Engineering How do Data Scientists spend their time? Source: CrowdFlower Gartner estimates that poor quality of data costs an average organization $13.5 million per year, and yet data governance problems — which all organizations suffer from — are worsening.

A Bit more on Data Engineering Data Ingestion (Kafka, Navigator, Search) Cloudera enables users to build real-time, end-to-end data pipelines in order to power their business.  Leadership in Apache Spark and Kafka have made Cloudera a trusted resource for users who want to capture real-time, streaming, and time series data without being presented with gaps in security.   Data Processing (Spark, Hive) Cloudera is helping users accelerate their data pipelines with leadership in technologies like Apache Spark.  Data processing in Cloudera Enterprise can help take processing windows from hours to minutes and enables faster access to data for a variety of users and skillsets.

Data Engineering/Science/Analyst Tools Data Science/Analytics Data Analyst / BI Cloudera Certified Partners

Flexible deployments: Cloud enabled Easy Administration Dynamic cluster lifecycle management Single pane of glass: multi-cluster view Consumption based billing and metering Enterprise-grade Integration across Cloudera Enterprise Management of CDH deployments at scale Flexible Deployments No cloud vendor lock-in: open plugin framework for IaaS platforms Scaling of provisioned clusters Spot instance provisioning Cloudera Director

Cortana Intelligence Suite on Azure cloud platform Information Management Big Data Stores Machine Learning and Analytics Intelligence People Data Sources Machine Learning Cognitive Services Data Factory Data Lake Store SQL Data Warehouse Data Lake Analytics Bot Framework Apps Web Mobile Bots Data Catalog Apps HDInsight (Hadoop and Spark) Event Hubs Cortana Sensors and devices Dashboards & Visualizations Stream Analytics Automated Systems Power BI Data Data Intelligence Action

More Data = More results! Create a data driven culture & DS processes Careful checking and cleaning of data Use the right tool for the job Leverage the power of the cloud

Resources Microsoft’s “Team Data Science Process” Github: http://aka.ms/tdsp Productive utilities repository: https://github.com/Azure/Azure-TDSP-Utilities Sign up for a free VSTS account: http://www.visualstudio.com Complete Cloudera resource library: https://www.cloudera.com/resources.html Coursera Data Science: http://www.coursera.org