Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building 1 million predictions per second using SQL-R

Similar presentations


Presentation on theme: "Building 1 million predictions per second using SQL-R"— Presentation transcript:

1 Building 1 million predictions per second using SQL-R
11/30/2017 1:54 AM Building 1 million predictions per second using SQL-R Amit Banerjee Sr. PM Manager, Data Group (TIGER) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 Agenda Data Science Process Bringing Analytics to Data
Demo Using Lending Club Data Optimization Tips

3 Data Science Process – CRISP-DM
Data scientist is able to perform all these steps in SQL Server 2016 CRISP-DM Model– General data science framework

4 In-Database Analytics at Scale
Jack Henry A leading provider for banking solutions for credit unions across Americas Business User 20M Vehicle Loans Age, Original Balance, Interest Rate, Loan Remaining Months, Credit Score R Store Predictions Visualize ColumnStore In-Database Analytics at Scale In-memory OLTP PowerBI Dashboard Prepare for analytics Use Case: Predict vehicle loan charge off (default) based on attributes like interest rate, credit scores etc Input: A subset of 20 million row of vehicle loan data in SQL Server - columns including branch location, customer profiles, interest rate, loan age etc.. Expected Result: Probability score of loans get charged off (Higher the score, higher the probability of loan get charged off) Build PowerBI report using probability score to show healthiness of vehicle loans across different branches Build what if scenario in business application

5 Using SQL Server R Services
11/30/2017 1:54 AM Using SQL Server R Services Bringing Analytics to the Data Data already in SQL Use T-SQL know-hows to do ETL Use the power of in-memory OLTP and column store indexing to enhance speed of ETL RevoScaleR package to provide parallelism and scale Making the data travel Data sources not in SQL Data sinks not in SQL Complex ETL needed Long running R script © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6 sp_execute_external EXEC sp_execute_external_script
@language = N'language' , @script = N'script', @input_data_1 = ] 'input_data_1' [ = ] N'input_data_1_name' ] [ = 'output_data_1_name' ] [ = 0 | 1 ] [ = ] data_type [ OUT | OUTPUT ] [ ,...n ]' [ = ] 'value1' [ OUT | OUTPUT ] [ ,...n ] [ WITH <execute_option> ] [;] <execute_option>::= { { RESULT SETS UNDEFINED } | { RESULT SETS NONE } | { RESULT SETS ( <result_sets_definition> ) } } <result_sets_definition> ::= { ( { column_name data_type [ COLLATE collation_name ] [ NULL | NOT NULL ] } [,...n ] ) | AS OBJECT [ db_name . [ schema_name ] . | schema_name . ] {table_name | view_name | table_valued_function_name } | AS TYPE [ schema_name.]table_type_name } EXEC sp_execute_external_script @language = N'R' = N'iris_data <- iris;' = N'' = N'iris_data' WITH RESULT SETS (("Sepal.Length" float not null, "Sepal.Width" float not null, "Petal.Length" float not null, "Petal.Width" float not null, "Species" varchar(100))); END; go

7 Building a 1 million predictions per second
Fast Models Correct attribute selection Fast Ingestion Fast Reads Uniform resource usage Use R Studio to design your models or you could even use T-SQL to design your models and test

8 Building a 1 million predictions per second
Fast Models Correct attribute selection Fast Ingestion Fast Reads Uniform resource usage Important to select the right attributes to ensure that the correct attributes are selected

9 Building a 1 million predictions per second
Fast Models Correct attribute selection Fast Ingestion Fast Reads Uniform resource usage In-Memory OLTP can help here. Staging tables can be schema-only.

10 Building a 1 million predictions per second
Fast Models Correct attribute selection Fast Ingestion Fast Reads Uniform resource usage Most of these models will read large datasets. The I/O can be reduced which tends to be slowest part using Columnstore indexes

11 Building a 1 million predictions per second
Fast Models Correct attribute selection Fast Ingestion Fast Reads Uniform resource usage You want all the CPUs to be used in parallel. This can be achieved with Resource Governor group assignment for the available CPUs/NUMA nodes. Pick the right CPU optimized machines

12 SQL Server as Scoring Engine
Deployment Using: Triggers Powershell scripts SQL agent jobs

13 DEMO Using public dataset of Lending Club
Using G5 instance of Azure Data Science VM (DSVM) Following Data Science Process using SQL Server 2016 R Services

14 References Loan Classification using SQL Server 2016 R Services
A walkthrough of Loan Classification using SQL Server 2016 R Services Using MicrosoftML in SQL-Server GitHub SQL Server Samples

15 Microsoft Data Amp WHERE DATA GETS TO WORK
Put data, analytics and artificial intelligence into the heart of your solutions. Get the latest on big data and machine learning innovations. Join us online April 19, 2017 at 8AM PT microsoft.com/data-amp

16 Questions?


Download ppt "Building 1 million predictions per second using SQL-R"

Similar presentations


Ads by Google