Presentation is loading. Please wait.

Presentation is loading. Please wait.

Team Data Science Process (TDSP) for Data Scientists

Similar presentations


Presentation on theme: "Team Data Science Process (TDSP) for Data Scientists"— Presentation transcript:

1 Team Data Science Process (TDSP) for Data Scientists
6/25/2018 7:55 PM Team Data Science Process (TDSP) for Data Scientists Debraj GuhaThakurta (Microsoft) Carlos Medina (Microsoft) Brad Johnson (NewSignature) Microsoft Ignite, 2017 Sept 26, Tue, 9:00 AM – 10:15 AM EDT Hyatt Plaza International G Companion Session TDSP for DevOps, BRK3277 Buck Woody Sept 28, Thu, 10:45 AM – 12 PM EDT Hyatt Plaza International G © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 Session goals: Value for attendees
Machine Learning, Analytics, & Data Science Conference 6/25/2018 7:55 PM 2 Session goals: Value for attendees Understand the process challenges in data science How can Team Data Science Process (TDSP) help How are organizations are adopting and using TDSP to deliver data science solutions to customers © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

3 Agenda TDSP: objectives, components & adoption Demo: TDSP in action
6/25/2018 7:55 PM 3 Agenda TDSP: objectives, components & adoption Demo: TDSP in action TDSP documentation How to use TDSP in Azure Machine Learning Adoption and use: Microsoft Consulting Services NewSignature © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 6/25/2018 7:55 PM 4 Team Data Science Process objectives, components & adoption Debraj GuhaThakurta, Senior Data Scientist © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 The opportunity and challenge of data science in enterprises
6/25/2018 7:55 PM 5 The opportunity and challenge of data science in enterprises Opportunity: 17% had a well-developed Predictive/Prescriptive Analytics program in place, while 80% planned on implementing such a program within five years – Dataversity 2015 Survey Challenge: Only 27% of the big data projects are regarded as successful – CapGenimi 2014 Tools & data platforms have matured - Still a major gap in executing on the potential © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6 One reason: Process challenge in Data Science
6/25/2018 7:55 PM 6 One reason: Process challenge in Data Science Organization Collaboration Quality Knowledge Accumulation Agility Global Teams Geographic Locations Team Growth Onboard New Members Rapidly Varied Use Cases Industries and Use Cases Diverse DS Backgrounds DS have diverse backgrounds, experiences with tools, languages “Intelligent” application (ML/AI) development has unique complexity not always encountered in other Software Development scenarios © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7 6/25/2018 7:55 PM 7 Why is a process useful? A process is a detailed sequence of activities necessary to perform specific business tasks It is used to standardize procedures and establish best practices Technology and tools are changing rapidly. A standardized process can provide continuity and stability of work-flow. - Based on discussions with Luis Morinigo, Dir. IoT, NewSignature © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 Data Science can borrow processes from DevOps
6/25/2018 7:55 PM 8 Data Science can borrow processes from DevOps Integrated Software Development & Operations (DevOps) has had much more time to mature, standardize, build in efficiency and develop best practices Data Science has unique complexity – but can learn standardized processes from DevOps © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

9 DevOps: the three stage conversation
S4 Solution Specialist Sales Summit 6/25/2018 7:55 PM 9 DevOps: the three stage conversation 2 Process 3 Products 1 People DEV OPS DevOps is union of People (culture), Process, and Products to continuously deliver value to users © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 DevOps process components
One Marketing Template 6/25/2018 7:55 PM 10 DevOps process components Configure Code Build Test Package Deploy Monitor AI / ML / DL Infrastructure as Code (IaC) Automated Testing Continuous Integration Continuous Deployment Release Management App Performance Monitoring Adoption of DevOps in Data Science should help efficiency TDSP for DevOps, BRK3277, Buck Woody © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11 6/25/2018 7:55 PM 11 TDSP objective Integrate DevOps with data science workflow to improve collaboration, quality, and productivity of data science teams DevOps Data Science TDSP = + Data ingestion Data exploration Modeling (experimentation) Model deployment & consumption Infrastructure as Code (IaC) Automated Testing Agile © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 TDSP components for data science teams
6/25/2018 7:55 PM 12 TDSP components for data science teams Standardized Data Science Lifecycle Project Structure, Templates & Roles Infrastructure Re-usable Data Science Utilities © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13 Data Science lifecycle
6/25/2018 7:55 PM 13 Data Science lifecycle Lifecycle Primary stages: Business Understanding Data Acquisition and Understanding Modeling Deployment © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

14 6/25/2018 7:55 PM 14 TDSP lifecycle stages can be integrated with specific deliverables & checkpoints Lifecycle Business Understanding Project Objective Data, Target & Feature Definition Data Dictionary Data acquisition and understanding Data Quality Report Architecture Diagram (initial draft) Modeling Featurization Modeling Report Source code repo Deployment Predictive Model Final model reporting Final deployment platform & architecture © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

15 Project structure and templates
6/25/2018 7:55 PM 15 Project structure and templates Working Directory Template Documentation Template (example) Structure, templates, roles Clone Template Repository to initiate new project Business understanding & problem scope definition © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

16 Project roles & tasks Governance and Project Management
16 6/25/2018 7:55 PM Project roles & tasks Governance and Project Management Team lead Git template repo & sever management, access control Project lead Business understanding, create project, work- items Data Science and Engineering Data scientist Modeling, exploratory analysis Data engineer or architect Data ingestion, deployment, solution architecture Structure, templates, roles © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 Agile work planning and execution template
17 6/25/2018 7:55 PM Agile work planning and execution template Use Agile work planning & execution template (data science specific) DS Projects: e.g. “Fraud Detection for Customer ABC” DS Stages: correspond to the stages in TDSP lifecycle. DS Stories: correspond to the life-cycle sub-stages. DS Tasks: Tasks are assignable code or document work items to complete a specific data science story. Structure, templates, roles Scrum framework © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 18 6/25/2018 7:55 PM Tracking progress with PowerBI dashboards Power BI content pack for VSTS: tool for PM Structure, templates, roles © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19 Shared and distributed infrastructure
19 6/25/2018 7:55 PM Shared and distributed infrastructure Infrastructure Virtual machines (VMs), or clusters are disposable compute, added to projects as needed Many-to-many relationship between data scientists, VMs and projects possible Data typically stored in cloud stores, such as blob or database Project artifacts & code permanently stored in central git (version control) repositories. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 DSVM (Data Science Virtual Machine) Example cloud compute resource
20 6/25/2018 7:55 PM DSVM (Data Science Virtual Machine) Example cloud compute resource Infrastructure Azure virtual machine (VM) image pre-installed and configured with data science tools Spark Microsoft R Server Developer Edition Anaconda Python distribution Jupyter notebook (with R, Python kernels) Visual Studio Community Edition Power BI desktop SQL Server 2016 Developer Edition Machine learning and Data Analytics tools Deep Learning Toolkits © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

21 Collaborative development guidelines
21 6/25/2018 7:55 PM Collaborative development guidelines Infrastructure TDSP git Template Version control and review Git is a Version Control System Each repo contains the full change history Used in a distributed way with a single remote repo and several local repos (on local machine or a VM) Remote Local Integrated Agile planning & code development © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

22 Distribution assessment
22 6/25/2018 7:55 PM Re-usable data science utilities: Analytics Interactive data exploration and reporting – IDEAR (Python, R, MRS) DS utilities Data quality assessment Getting business insights from the data Association between variables Generating standardized data quality reports automatically Clustering Distribution assessment © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

23 23 6/25/2018 7:55 PM Re-usable data science utilities: Analytics - modeling Automated modeling and reporting AMAR (R) DS utilities Predicted vs. Actual (multiple algorithms) Feature Importance (multiple algorithms) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

24 TDSP in action: E2E worked-out samples
24 6/25/2018 7:55 PM TDSP in action: E2E worked-out samples Azure Machine Learning Azure HDInsight Spark Azure HDInsight Hadoop SQL-server with R and Python Azure SQL data warehouse Azure Data Lake © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

25 Adoption: How to stage (as needed)
25 6/25/2018 7:55 PM Adoption: How to stage (as needed) Data science teams may stage adoption as follows Level 1 - One git repository per project - Standard directory structure - Standardized templates like charter, exit reports - Planning and tracking of work items Level 2 - Customize templates to fit team needs - Create shared team utility repo (like IDEAR, AMAR) Leve 3 - Develop process to graduate code from projects to the shared team utility repo - Develop E2E worked-out templates - Use mature work planning and tracking system (e.g. Agile) Level 4 - Link git branch with work items - Code review - Manage and version model and data assets - Develop automated testing framework © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

26 Adoption: Organizations using TDSP
26 6/25/2018 7:55 PM Adoption: Organizations using TDSP Microsoft Microsoft consulting services (MCS) AI & R Cloud Platform: Algorithms and data sciences (ADS) team Windows Devices Data Science team Partners New Signature BlueGranite © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

27 27 6/25/2018 7:55 PM Summary TDSP components, guidelines, E2E samples eases process challenges in data science solutions Organization Collaboration Quality Knowledge Accumulation Agility Standardized Data Science Lifecycle Project Structure, Templates & Roles Infrastructure Re-usable Data Science Utilities TDSP Components Data Science Challenges Guidelines & E2E worked- out samples © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

28 28 6/25/2018 7:55 PM Demo - TDSP documentation and E2E samples - Using TDSP in Azure Machine Learning Debraj GuhaThakurta © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

29 TDSP documentation: https://aka.ms/tdsp
29 6/25/2018 7:55 PM TDSP documentation: © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

30 Azure Machine Learning: New features (Preview)
30 6/25/2018 7:55 PM Azure Machine Learning: New features (Preview) Experimentation service Lifecycle management, tracking jobs running locally, or in the cloud. Full dependency management, enabling you to reproduce the same environment easily locally, then scaling up or out to the cloud.  Jobs can be dispatched locally, remotely on Data Science VM's or on HDInsight clusters to leverage Spark. Run history is checkpointed in git, along with storing all of the metadata and output of the job, letting you go back in time to see how your experiments have evolved. Model management service The Model Management service enables model deployment, management, and monitoring, for models built anywhere. It uses Docker, which makes it easy to deploy to single machines for dev/test environments, to scale out on top of Kubernetes clusters running in Azure Container Service, or to other places (on prem, edge) that can run Docker containers as well. We give you the control to customize your deployment environment, whether you want to deploy 1000 services hosting your models on a cluster, or one service scaled out across 1000 nodes. Deployments and upgrades can be managed seamlessly across a large repository of models. Workbench application The Workbench application is a desktop based, companion application that provides a powerful data preparation experience, hosts Jupyter notebooks, and serves as the control panel for dispatching and monitoring your training jobs.  The data preparation experience is tuned for data science, enabling easy exploration and transformation of your data.  This has been built with some great collaboration with teams in MSR to deliver a great experience for getting your data ready. The run history feature enables rich visualization of your experiment history, as well as run comparison to see how things have changed over time. First look at What’s New in Azure Machine Learning (BRK2270) Monday, September 25; 4:00 PM - 5:15 PM Hyatt Regency Windermere X Operationalize your models with Azure Machine Learning (BRK2290) Tuesday, September 26; 4:00 PM - 5:15 PM Hyatt Plaza International G Increase the rate of experimentation with Azure Machine Learning (BRK3319) Tuesday, September 26; 10:45 AM - 12:00 PM Hyatt Plaza International I-K © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

31 Using TDSP within Azure Machine Learning
31 6/25/2018 7:55 PM Using TDSP within Azure Machine Learning © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

32 Using TDSP within Azure Machine Learning
32 6/25/2018 7:55 PM Using TDSP within Azure Machine Learning © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

33 TDSP template & samples in Azure ML gallery
33 6/25/2018 7:55 PM TDSP template & samples in Azure ML gallery TDSP template for Azure Machine Learning (for creating your new projects) TDSP sample: US Income classification Biomedical entity recognition (NLP + DL), Spark and DSVM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

34 34 6/25/2018 7:55 PM TDSP Adoption in Microsoft Consulting Services (MCS) Carlos Medina, Architect © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

35 Microsoft Consulting Services
6/25/2018 7:55 PM Microsoft Consulting Services MCS puts customer outcomes at the center of everything with the mission of “Empower every person and every organization on the planet to achieve more” 191 countries 46 languages 35,000 global partners 1/5 of Microsoft 21,000+ global employees 5,000+ Consultants + Enterprise Architects 9,000+ professional services and customer support 3,000+ Premier field engineers 75% Fortune 1,000 © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

36 Microsoft Consulting Services Solution Areas
6/25/2018 7:55 PM Microsoft Consulting Services Solution Areas Modern Workplace Modern Workplace enables the cultural and technology changes needed to empower your employees, connect the organization and innovate securely across location and workstyle boundaries. Business Apps ​Learn how Dynamics can help organizations digitize business-critical functions including relationship sales, talent and people processes, operations, customer service, field service, and more Apps and Infrastructure ​Gain a deeper understanding of Hybrid and Public Cloud infrastructure and Application development sales plays and solutions​ Data and AI ​The core currency of any business will be the ability to convert data into AI that drives competitive advantage. Check our solutions that include business insights, advanced analytics, and IoT​.​​Data Insights Customer Use Cases and Scenarios​ © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

37 6/25/2018 7:55 PM Data & AI: What Do We Do? Enable our customers to grow faster and perform better by freeing up the power of data and making it accessible and usable. Advanced Analytics, Internet of Things, and AI Helping customers open new business value through capturing, storing, and analyzing enormous data volumes and bringing new solutions to market, connected to the intelligent cloud focused on the analytics and insights of those solutions. Business Intelligence, Data Insights and Visualization Helping customers drive decisions & actions from their data. Driving information worker productivity by making data, decision-making and engaging customer and employee experiences available seamlessly across all devices and platforms. Data Platform Modernization Helping customers maximize their value in our SQL and Azure platforms by modernizing data, solutions, and platforms including migrating data and applications from competing platforms. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

38 Data Analytics Platform Modern Definition
6/25/2018 7:55 PM Data Analytics Platform Modern Definition FOUNDATIONAL ANALYTICS Ingest, transform, store and aggregate structured data Well understood and widely practiced Often inflexible, rigid, scale problems, limited context scope ADVANCED ANALYTICS Includes semi/un-structured data at volume Enables advanced analytics with a robust ingestion, modeling and serving layer DATA CULTURE Historic > Predictive > Prescriptive Information Action Decisions and Actions are guided by analytics Predictions Discovery Advice New Approaches DATA ARCHITECTURE Regulations Security Governance Digital transformation via managed data exploration A collection of managed technologies, practices and patterns that allow organizations to ingest, model and serve analytics to consumers that drive action © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

39 Data Analytics Platform Organizational Challenges
6/25/2018 7:55 PM Data Analytics Platform Organizational Challenges TALENT WITH NO STRATEGY Operationalization knowledge and practices Marginalized and underused resources Underutilized acquired technology SERVERS instead of SERVICES Splitting servers/solution focus Effort to "keeping lights blinking" Rigidity in answering timely business questions NO DATA CULTURE Hippo challenge Closest to the data are the farthest away from those with questions Importance and impact of data analytics is trivialized REACTION VS STRATEGY Changes driven from necessity instead of desired outcomes Driven from technical groups instead of by business questions Ready, Fire, Aim Governance, management, integrity, and protection against misuse or theft Discovery, competitive research, used for digital transformation Leveraging Digital Transformation for competitive gains means a balance between data security and discovery starting with a single version of the truth © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

40 Using TDSP in MCS SDMPlus
6/25/2018 7:55 PM Using TDSP in MCS SDMPlus The Microsoft Services Delivery Methodology Library (SDM) is the foundational methodology for delivering services projects and engagements. It builds and expands upon the already successful concepts and principles introduced by Microsoft Solutions Framework (MSF). It contributes to professionalize our MCS services by: Providing consistent repeatable processes Improving integration for more predictable experiences Increasing customer and partner satisfaction Improving engagement performance Reducing time to deliver a solution Accelerating team readiness © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

41 TDSP into SDMPlus Business Understanding
6/25/2018 7:55 PM TDSP into SDMPlus Business Understanding Main goal objectives in this phase: Work with the customer and other stakeholders to understand and identify the business problem. Clearly and explicitly specifying the model objective(s) as a sharp question which is used to drive the customer engagement. Find relevant data sources that helps to answer the questions that define the objective(s) of the project. Establish a business strategy context within which this project exists, this strategic context provides a value setting for judging the project success, and for suggesting extended or not directly related projects in the future. Main tasks should be addressed: Define the business objective(s) as a sharp question. Identify known and available data sources with exact or estimated data sizes to help to answer the question Deliverables Project Objective: A (usually) one-page document clearly stating the question(s) of interest and how the expected answer will look. Data and Feature Definition: create the initial version of the Data and Feature Definition document quantifying the sources for the raw data. Data Dictionary: this document provides the description and the schema (data types, information on validation rules, if any) for the data that is used to answer the question. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

42 TDSP into SDMPlus Data Acquisition & Understanding
6/25/2018 7:55 PM TDSP into SDMPlus Data Acquisition & Understanding Main goal objectives in this phase: A clean, high-quality data whose relations to the target variables are understood that are located in the analytics environment, ready to model. A solution architecture of the data pipeline to refresh and score data regularly has been developed. Main tasks should be addressed: Ingest the data into the target analytic environment. Explore the data to determine if the data quality is adequate to answer the question. Set up a data pipeline to score new or regularly refreshed data. Deliverables Data Quality Report: A document detailing data requirements, quality (accuracy, connectedness), variable ranking and relevance to the target and the ability to answer the question of interest. Analytics Solution Architecture Diagram (initial draft): This can be a diagram or description of the data pipeline used to run scoring or predictions on new data once we have built a model. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

43 TDSP into SDMPlus Data Modeling
6/25/2018 7:55 PM TDSP into SDMPlus Data Modeling Main goal objectives in this phase: Optimal data features for the machine learning model. A machine learning model that predicts the business objective most accurately. Start the development activities to implement the data pipeline (optional, if scoped). Main tasks should be addressed: Feature engineering and selection to create data features from the raw data to facilitate model training. Model training to find the model that answers the question most accurately by comparing their success metrics and determine if the model is the appropriate. Start to develop the data pipeline to ingest the data into the storage environment for analytics. Deliverables Data and Feature Definitions: update the document with the feature sets developed for the modeling. Modeling Report: for each model that is tried, a standard report following a specified template is produced. The source code (for initial draft artifacts) with the query language or other programming source code to produce the model features and the model targets. Analytics Solution Architecture Diagram: update the document based on new information got from the analysis. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

44 TDSP into SDMPlus Deployment & Operationalization
6/25/2018 7:55 PM TDSP into SDMPlus Deployment & Operationalization Main goal objectives in this phase: Models and pipeline are deployed to a production or production-like environment. Main tasks should be addressed: Operationalize the model. Deploy the data pipeline components and test the end-to-end scenario. Deliverables Predictive Model adapted for retraining. Final model reporting with deployment details. Final solution architecture document. The source code of the data pipeline and visualization artifacts. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

45 TDSP in a modern approach Governance model in VSTS
6/25/2018 7:55 PM TDSP in a modern approach Governance model in VSTS Use Iterative/Agile to plan activities: Use VSTS to create all the steps and activities. Epic: correspond to the project engagement. For example, “Create a Fraud Detection system” Feature: correspond to the project engagement and work-stream. If you are executing the data science work-stream write “Predictive Model for Fraud Detection” PBI/Story: correspond to the step. For example, Business Understanding. Task: Tasks are assignable code or document work items or other activities that need to be done to complete a specific story. Bug: Bugs usually refer to fixes that are needed for an existing code or document that are done when completing a task. It can escalate to being a story or a task if the bug is caused by missing stages or tasks respectively Use KANBAN to track the PBI/Stories: New: the PBI/Story needs to be executed. In Progress: the PBI/Story is in progress Waiting/Feedback: the PBI/Story has been executed and it is waiting for feedback from the next step. If next step requires to re-execute the PBI should be moved to “In Progress” state. Done: the PBI/Story only can move into Done if the final predictive model response to the objective. Track the bugs using VSTS. They can be associated to the steps and keep the history The total effort can be sized as the sum of all the PBI/Stories or activities. Since the customer has access to the KANBAN dashboard, it will be transparent to understand the actual work. Use the end of each step as a milestone with the customer. remember that only if the final Data Model step is successfully executed all the steps can be moved to the Done state. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

46 TDSP in a modern approach Delivering results
6/25/2018 7:55 PM TDSP in a modern approach Delivering results PRE-SPRINT SPRINT 0 SPRINT 1 - N OPERATIONALIZE EVALUATION and eventually DEPLOYMENT Solution Design and system architecture planning Data Engineering and DATA PREPARATION Workshops to develop a BUSINESS UNDERSTANDING CONTINUTED ITERATIONS Workshops to develop a DATA UNDERSTANDING Advanced DATA PREPARATION and MODELING planning MODELING R CONTINUTED ITERATIONS Extending from CRISP-DM, Microsoft’s Team Data Science Process generally follows these steps © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

47 Demo - Using SDMPlus Carlos Medina 6/25/2018 7:55 PM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

48 Data Analytics Platform How it’s done
6/25/2018 7:55 PM Data Analytics Platform How it’s done INGESTION (BATCH/SPEED) SCHEMATIZATION MODELING SERVING Azure Analysis Services Multi-dimensional models and pre-aggregation for reporting Traditional report authoring, views, and pre-aggregated entities Excel Power BI (and other) Reporting tools for human usage Azure Data Factory – hybrid cloud data ingestion and transformation Structured data with changing schema or unstructured data in “big” quantities Data Warehouse - Read-optimize schema for relational engines All data stored in a Azure Data Lake or other platforms with no schema other than the original Azure Data Lake Compute or other platforms allow creation of analysis data sets and application of schema Excel, Power BI (and other) Reporting tools for human usage, or inclusion of API calls from apps or other analytical services Micro-batch in-memory processed and persisted to storage Azure Machine Learning or other ML platforms R Streaming data from hot path persisted to cold storage or sent to serving layer © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

49 TDSP Adoption in NewSignature Brad Johnson, Senior Data Scientist
49 6/25/2018 7:55 PM TDSP Adoption in NewSignature Brad Johnson, Senior Data Scientist © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

50 Creating accessible content
50 6/25/2018 7:55 PM Creating accessible content Take the following steps to create accessible content that everyone can consume effectively. Contrast Use high contrast colors for maximum readability The recommended contrast ratio is at least 4.5:1 Color Contrast Analyzer Download this tool to determine the legibility of text and the contrast of visual elements Shape and color Use different shapes with a legend to indicate statuses to accommodate for color blindness Example: Alt text Alt text helps people with screen readers understand the content of slides You can create alternative text for shapes, pictures, charts, tables, SmartArt graphics, or other objects. Here’s how: Right click the image or shape Select Format Picture… or Format Shape… Select the Size & Properties icon Expand the Alt Text field Enter a Title and Description of your image or object Slide layouts Using a built-in slide layout that matches your content ensures a hierarchical reading order of text blocks Example: If a new slide will have a title, rather than starting with a blank layout and adding a text block for the title, choose one of the built-in layouts with a title placeholder Reading order Screen readers describe content on the screen in the order it was created To ensure your content is read back in the order you prefer, arrange your objects in the Selection Pane appropriately. Objects on the bottom of the selection pane are read first. Here’s how: Click the Home tab In the Drawing group, select the Arrange drop-down menu Click Selection Pane… Text Subject 1 Subject 2 Subject 3 C3 C2 C1 Download Additional Tips! Be sure to run the Accessibility Checker! Go to File click the Check for Issues drop down menu click Check Accessibility Videos need to be accessible: If your presentation includes a video, ensure it is captioned and audio described (if appropriate) Visit the Office Accessibility Center to learn more about accessibility in PowerPoint © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

51 Applying TDSP: New Signature
51 6/25/2018 7:55 PM Applying TDSP: New Signature © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

52 The Connected Enterprise
52 6/25/2018 7:55 PM The Connected Enterprise © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

53 Pillars of IoT & Advanced Analytics
53 6/25/2018 7:55 PM Pillars of IoT & Advanced Analytics Things Devices Connectivity Gateway Network Data Storage Streaming Analytics Machine Learning Insights  Action Visualizations Automation © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

54 Structure: The Data Science lifecycle
54 6/25/2018 7:55 PM Structure: The Data Science lifecycle © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

55 Tasks & artifacts 55 6/25/2018 7:55 PM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

56 Case Study: Internet of Twizzlers
56 6/25/2018 7:55 PM Case Study: Internet of Twizzlers Can IoT & Machine Learning combine to help reduce variability in Twizzler production? © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

57 $ Problem definition Objectives:
57 6/25/2018 7:55 PM Problem definition Objectives: Predict directional change in weight using Extruder data (temperature, pressure, torque etc.) Use predictions to modulate Extruder, reducing variability $ © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

58 Data acquisition & modeling
58 6/25/2018 7:55 PM Data acquisition & modeling Extruder Data IoT Hub Azure ML Studio Weight Data © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

59 Deployment API Endpoint Azure ML Studio IoT Hub Event Hubs Extruder
59 6/25/2018 7:55 PM Deployment API Endpoint Azure ML Studio IoT Hub Event Hubs Extruder Stream Analytics Web Jobs © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

60 Results & takeaways Key Results: Proved predictive capability
60 6/25/2018 7:55 PM Results & takeaways Key Results: Proved predictive capability Laid groundwork for other IoT & AA efforts Where TDSP Helped: Project structure Common language for collaboration Alignment to Azure ML © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

61 Thank you! https://aka.ms/tdsp For issues and questions:
61 6/25/2018 7:55 PM Business Understanding Data acquisition & understanding Modeling (ML/AI) Deployment Thank you! For issues and questions: TDSP for DevOps, BRK3277 Buck Woody Sept 28, Thu, 10:45 AM – 12 PM EDT Hyatt Plaza International G © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Team Data Science Process (TDSP) for Data Scientists"

Similar presentations


Ads by Google