7How to build a predictive model? 1Define Business ProblemMonitor Model’s Performance5Prepare Data2BusinessInsights43Deploy ModelDevelop Model Through Iterations
8DDSG Solving Real Problems: Sample Client Engagements OEM – Unlicensed DevicesROI, INSIGHTWINDOWS 8 DEVICESAnalysis of ROI and development of actionable insight for marketing spend in OEM channels, including manufacturers retailers and distributersPIRACY DETECTIONREVENUE GROWTH OPPORTUNITYAnalyzing current trends in piracy of MS products and building models to identify instances of pirated softwareLCA – Cybercrime UnitIndustry StatsWindows TelemetrySEGMENTATIONCYCLE TIME REDUCTIONBuild a utilization based customer segmentation by analyzing the Click stream from Windows Telemetry panelMS.COM - TargetingTARGETINGSURFACE TABLET, WINDOWS PHONE 8Target visitors that showed an in interest in Surface, Windows Phone, Xbox on the basis of their MS.com/MS Store behaviorCRM OnlineCHURN PREDICTIONPROACTIVE SUBSCRIBER RETENTIONBuilding a predictive churn model – for the CRM online customers to help with retentionISRM - SecurityEnhance ISRM security monitoring and incident response capabilities. Detect potential threats on the Microsoft corporate network.SECURITYINTRUSION DETECTION
11Preventing Network Intrusion with Machine Learning Problem: Early detection of suspicious activity on the network servers & eliminate the threat.Methodology:File system to store massive security data.Fully automated workflow to drive end-to-end data receiving and transformation process.Analysis and visualizations of Windows Events to identify pre-defined threat scenarios.Move from descriptive analytics to a mature predictive archetype.
12Churn AnalysisProblem: A business line is experiencing 36% Churn annuallyFindings:Under-utilization is a key leading indicator (Low usage)Each 1% reduction of churn results in ~$342K impactMethodology:40% of data is missing or incompleteEnumerated key leading indicators drivers of churn and scored every subscription with probability of churnDeveloped Random Forest model with ~65% accuracy
14New Targeting Models Developed for Surface and Windows Phone Targeting Models DeliveredWindows PhoneProvided list of cookies that are more likely to land on a Windows Phone pageMonthly scoring during 3 monthsSurfaceProvided list of cookies that are more likely to buy a surfaceMonthly scoring since April 20, 2013
15By Microsoft’s PowerMap Big Data Analysis5 months of logs from Microsoft.comAnalysis conducted using Power BI, SQL Server, & HadoopPath analysisUnderstand the Big Picture of your website’s logsText Mining on external and internal queriesRecognize your users quickly before their behavior changesBig Data Clustering models for user segmentationBig Data Predictive models for user behavior / targetingDo this for any sub-site, campaign, user segment, etc.Leverage big data platform for ongoing model refinementGeography analysisBy Microsoft’s PowerMap
17Text Analysis Internal (i.e. on direct Microsoft pages) Queries in Microsoft.com were logged during a specific time range. The engineering team was interested to know the popular “topics” from this collection of queries (documents)A text miner tool pre-processed 3 million queries, and constructed 25 thematic topics formed by “key words”. The 5 most popular “topics” are listed belowCategoryTopic IdDoc cutoffTerms cutoffTopicNum of termsNum of queriesMultiple5.05.0320.397+window, +live, windowsmedia, xp, aspx26.015.03.0740.304xp, +window, sp3, xp service pack, +download44.013.03.3530.316+window, +vista, +installer, +mobile, +phone77.02.05.8040.432+medium, +player, +window, +download, +window19.04.04.9990.402+office, +microsoft office, microsoft, +mac, +download24.0Internal (i.e. on direct Microsoft pages)CategoryTopic IdDoc cutoffTerms cutoffTopicNum of termsNum of queriesMultiple5.08.7930.367+window, +phone, +bit, +theme, +install177.09.08.1330.343microsoft, +microsoft office, +microsoft word, +microsoft essential, +microsoft outlook140.010.07.3050.337+window, +phone, +installer, +vista, +server174.025.03.1520.228+error, +server, +file, +code, sharepoint545.08.07.818+download, +free, +window, +explorer, microsoft128.0External (i.e. referrals from Google, Yahoo, etc.)
18Text Analysis Windows OS users Internal queries This chart shows groups of similar queries. There are total 15 end nodes in this chart showing 15 groups. Almost all of these groups are product related.
21Data Science Team Composition …a key resource for delivering value to the enterprise and your businessTeam Experience:Our Academic BackgroundsApplied MathematicsComputer ScienceEconometricsStatisticsEngineeringOur Professional ExpertiseFinancial ServicesTelecommunicationsInformation TechnologyIndustrials/ManufacturingUtilitiesHealthcareMarketingDomain Experience:Forecasting/ModelingDemand ForecastingPredictive ModelingDemand-Driven PlanningCredit ModelingFraud DetectionConsumer RelationsSentiment Analysis/Social MediaInventory OptimizationCustomer Acquisition/SegmentationMembership Portfolio OptimizationClick stream Data AnalysisData ScienceDesign of experimentsPredictive MaintenanceMachine LearningBig Data Analytics/Innovation
23Best Practices Data Science is a team sport Hire complementary skills to build a rounded team!We need a hybrid Data Science team structure for best resultsNeed a centralized team of Data Scientists to share and promote best practicesAnd Data Scientists in Line of Business groups for domain knowledgeData Science team needs to be peers, but not inside a BI teamAnalytics team should span descriptive, diagnostic, predictive and prescriptive analyticsBI only covers descriptive and diagnosticData Scientist in a BI team may be under-utilized
24Summary Introduced Data & Decision Sciences Group Data Science at MicrosoftCybercrime and antipiracyNetwork intrusionCustomer churn predictionCustomer targeting modelsBuilding a Data Science team
30Advanced Telemetry Analytics for Windows Problem:We needed a behavior customer segmentation for Windows and OfficeVery large volumes of telemetry data are collected – over 1.7 Billion mouse clicks and 2.4 Billion keystrokesFindings:Successfully developed 7 user behavioral segmentsPrioritize investments around activities people do mostMethodology:How can we effectively mine and extract meaning from the data?Used clustering techniques to segment data that included hardware, app usage, user data, URLs visited