Girish Nathan Misha Bilenko Microsoft Azure Machine Learning How to Work with Large Datasets to Build Predictive Models.

Slides:



Advertisements
Similar presentations
Running Hadoop-as-a-Service in the Cloud
Advertisements

Delivering on one of the old dreams of Microsoft co-founder Bill Gates: Computers that can see, hear and understand. John Platt Distinguished scientist.
BIG DATA – WHAT’S THE BIG DEAL The call would start soon, please be on mute. Thanks for your time and patience.
Piilo Makes HR Easy for Businesses of Any Size, Thanks to the Convenience of Its Mobile App and the Power of the Microsoft Azure Cloud Platform MICROSOFT.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Tyson Condie.
CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.
With the Help of the Microsoft Azure Platform, Awingu’s Web-Based Workspace Aggregator Enables Concrete and Easy Mobility Scenarios MICROSOFT AZURE ISV.
Communicate with All Workers Involved in the Process of Delivering High-Quality Health Care by Choosing Dossier365 on the Azure Platform MICROSOFT AZURE.
An Introduction to HDInsight June 27 th,
Securely Synchronize and Share Enterprise Files across Desktops, Web, and Mobile with EasiShare on the Powerful Microsoft Azure Cloud Platform MICROSOFT.
How* to Win the #BestMicrosoftHack Shahed Chowdhuri Sr. Technical WakeUpAndCode.com *Hint: Use the Cloud.
Azure Machine Learning: From design to integration Peter Myers M355.
Virtual Classes Provides an Innovative App for Education that Stimulates Engagement and Sharing Content and Experiences in Office 365 MICROSOFT OFFICE.
Machine Learning as a Service
Small Businesses Can Reach New Customers while Retaining Existing Ones with Ferret Card COMPANY PROFILE: FERRET CARD Founded in 2011, Ferret Card is a.
Datalayer Notebook Allows Data Scientists to Play with Big Data, Build Innovative Models, and Share Results Easily on Microsoft Azure MICROSOFT AZURE ISV.
Matthew Winter and Ned Shawa
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
Microsoft Azure Integrated with C21 Live Cloud Mosaic Helps Control Your Live Streaming from Anywhere by Deploying in Global Azure Regions MICROSOFT AZURE.
Breaking points of traditional approach What if you could handle big data?
+ Logentries Is a Real-Time Log Analytics Service for Aggregating, Analyzing, and Alerting on Log Data from Microsoft Azure Apps and Systems MICROSOFT.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Easy-to-Use RedFlag System Delivers Notifications via Phone, , Text, Social Media, and More to Improve Effectiveness of Your Communications COMPANY.
Please note that the session topic has changed
Gain High Availability Performance and Scale of Applications Running on Windows Azure with KEMP Technologies’ Virtual LoadMaster COMPANY PROFILE: KEMP.
Azure Machine Learning Introduction to Azure ML. Setting Expectations This presentation is for you if…  you hear the buzzword “Machine Learning” and.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
The United States Postal Service processed over 150 billion pieces of mail in 2013—far too much for efficient human sorting. But as recently as 1997,
Built on the Powerful Microsoft Azure Platform, HarmonyPSA Is a Cloud-Based Customer Service and Billing System for IT Solution Providers MICROSOFT AZURE.
Improve the Performance, Scalability, and Reliability of Applications in the Cloud with jetNEXUS Load Balancer for Microsoft Azure MICROSOFT AZURE ISV.
AZURE MACHINE LEARNING Bringing New Value To Old Data SQL Saturday #
Microsoft Partner since 2011
Discover How You Can Increase Collaboration with External Partners While Reducing Your Cost in Managing an Extranet from the Azure Cloud MICROSOFT AZURE.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Microsoft Ignite /28/2017 6:07 PM
Snip2Code: Search, Share and Collect Code Snippets Faster, Easier, Efficiently with Power of Microsoft Azure Platform MICROSOFT AZURE ISV PROFILE: SNIP2CODE.
ShepHertz App42 Platform on Microsoft Azure Offers an Omnichannel Platform for Complete Digitization and Marketing Solution for Enterprises MICROSOFT AZURE.
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
Connected Infrastructure
Big Data is a Big Deal!.
Data Platform and Analytics Foundational Training
Data Platform and Analytics Foundational Training
Barracuda Networks Creates Next-Generation Security Solutions That Enable Customers to Accelerate Their Adoption of Microsoft Azure MICROSOFT AZURE APP.
Make Predictions Using Azure Machine Learning Studio
AI development using Data Science Virtual Machines (DSVM) in Azure
Wonderware Online Cost-Effective SaaS Solution Powered by the Microsoft Azure Cloud Platform Delivers Industrial Insights to Users and OEMs MICROSOFT AZURE.
Introduction to R Programming with AzureML
Connected Infrastructure
Azure ML and Cognitive Services
Data Platform and Analytics Foundational Training
Azure Machine Learning & ML Studio
Enterprise security for big data solutions on Azure HDInsight
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Introduction to Azure Machine Learning Studio
Advanced Analytics. Advanced Analytics What is Machine Learning?
This meme comes from South Park (S2E )
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
Dive into Predictive Maintenance using Cortana Intelligence Suite
Data science and machine learning at scale, powered by Jupyter
Databricks: the new kid on the block
Technical Capabilities
Charles Tappert Seidenberg School of CSIS, Pace University
HDInsight Tools for Visual Studio
Agenda Need of Cloud Computing What is Cloud Computing
Big-Data Analytics with Azure HDInsight
Server & Tools Business
Databricks and End-to-End Processes Demo Links & Help
Customer 360.
Presentation transcript:

Girish Nathan Misha Bilenko Microsoft Azure Machine Learning How to Work with Large Datasets to Build Predictive Models

Agenda 1. How to Work with Large Datasets Sample Dataset: NYC Taxi HDInsight (Hadoop on Azure) iPython notebook and HDInsight 2. Building Predictive Models Azure ML Studio Learning with Counts 3. Putting it all together: Learning with Counts and HDInsight

Sample Data: NYC Taxi One year log of NYC taxi rides 60GB, publicly available at Trip (driver id, times, locations) and fare (fare, tip, tolls) Rest of tutorial: data wrangling and tip prediction Tools: AzCopy, HDInsight, iPython, Azure ML Studio

100% Apache Hadoop as an Azure service Can deploy on Windows or Linux Provides Map-Reduce capability over big data in Azure blobs Head node: job and cluster monitoring Hive: SQL-like queries as an alternative to writing code SELECT Col1, COUNT(*) AS Count_Col1 FROM Your_Table GROUP BY Col1 ORDER BY Count_Col1 DESC LIMIT 10; HD Insight : Hadoop on Azure

Web-based Python REPL environment Combines authoring, execution, visualization Can author and execute HDInsight Hive queries Sample query (python code snippet) def submit_hive_query(self): response=urllib2.urlopen(self.url, self.hiveParams) data = json.load(response) self.hiveJobID = data[‘id’] def query(self, queryString): self.submit_hive_query() Example query string: SELECT * FROM sample_table LIMIT 10; Ipython Notebook

Fully managed cloud service Browser based authoring of dataflow Best in class machine learning algorithms Support for R/Python/SQL Collaborative data science Quickly deploy models as web services/REST API’s Publish to a gallery for collaboration with community What is Azure ML Studio

( Distributed Robust Algorithm for CoUnt-based LeArning) Misha Bilenko Microsoft Azure Machine Learning Microsoft Research Learning with Counts a.k.a Dracula

adid = adText = K2 ski sale! adURL= Userid = 0xb dd9b IP = Query = powder skis QCategories = {skiing, outdoor gear} 8 Information retrieval Advertising, recommending, search: item, page/query, user Transaction classification Payment fraud: transaction, product, user spam: message, sender, recipient Intrusion detection: session, system, user IoT: device, location Large Scale learning in multi entity domains

adid: adText: Fall ski sale! adURL: userid 0xb dd9b IP query powder skis qCategories {skiing, outdoor gear} 9 Large Scale learning in multi entity domains

IP ……… REST Learning with Counts

IP ……… REST Learning with Counts

IP ……… REST query facebook dozen roses ……… REST Query × AdId facebook, ad facebook, ad dozen roses, ad ……… REST time T now Counting IP[2] *.* *.* *.* ……… 12 Learning with Counts : aggregation

IP ……… REST query facebook dozen roses ……… REST time T now Train predictor …. IsBackoff Aggregated features Original numeric features Counting Train non-linear model on count-based features Counts, transforms, lookup properties Additional features can be injected Query × AdId facebook, ad facebook, ad dozen roses, ad ……… REST Learning with Counts : combiner training

IP ……… REST query facebook dozen roses ……… REST URL × Country url 1, US url 2, CA url 3, FR ……… REST time T now …. IsBackoff Aggregated features Counts are updated continuously Combiner re-training infrequent T train Original numeric features Prediction with counts

State-of-the-art accuracy Good fit for map-reduce Modular (vs. monolithic) Learner can be tuned/monitored/replaced in isolation Monitorable, debuggable (this is HUGE in practice!) Temporal changes easy to monitor Easy emergency recovery (remove bot attacks, etc.) Decomposable predictions Error debugging (which feature can we blame…) 15 What is great about learning with Counts ?

Learning with Counts : in Azure ML

HDInsight: large data storage and map-reduce processing Azure ML: cloud ML and analytics accessible anywhere Learning with Counts: intuitive, flexible large-scale ML solution Putting it all together

Thanks for your time Useful Links: Sign up for your free Azure ML Trial - Free tutorial on how to use Azure ML Need Azure ML for teaching in classroom ? - Contact the speakers Other Questions ? - Contact the speakers Speakers :- Misha Bilenko : Girish Nathan –