Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing More Intelligent Applications Using Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd

Similar presentations


Presentation on theme: "Developing More Intelligent Applications Using Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd"— Presentation transcript:

1 Developing More Intelligent Applications Using Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk

2 2 Objectives & Agenda Introduce the idea of using Data Mining for application development Explain the DM terminology and concepts Overview available DM algorithms Show you a working example of an intelligent application The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. © 2007 Project Botticelli Ltd & Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE. This session is partly based on “Data Mining” book by ZhaoHui Tang and Jamie MacLennan, and also on Jamie’s presentations. Thank you to Jamie and to Donald Farmer for helping me in preparing this session. Thank you to Roni Karassik for a slide. Thank you to Marin Bezic for all the support.

3 3 Why Do We Need This Session? Our applications increasingly must handle unpredictable conditions without failing We are tired of writing thousands of lines of code just to handle errors We want apps to be more intelligent We need to harness Data Mining from within our apps

4 4 The Essence of DM for Intelligent Applications

5 5 Data Mining Technologies for analysis of data and discovery of (very) hidden patterns Uses a combination of statistics, probability analysis and database technologies Fairly young (<20 years old) but clever algorithms developed through database research

6 6 Predictive Analysis PresentationExplorationDiscovery Passive Interactive Proactive Role of Software Business Insight Canned reporting Ad-hoc reporting OLAP Data mining DM Enables Predictive Analysis

7 7 DM and Business Intelligence BI is a larger field of technology which utilises DM BI is geared at an end user, such as a business owner, knowledge worker etc. DM is an IT technology geared towards experts By the way: who is qualified to use DM today?

8 8 Value of Predictive Analysis Typical Applications Predictive Analysis Seek Profitable Customers Understand Customer Needs Anticipate Customer Churn Predict Sales & Inventory Build Effective Marketing Campaigns Detect and Prevent Fraud

9 9 New Opportunity: Intelligent Applications Examples of Intelligent Applications: Adaptive User Interface based on past behaviour Input Validation, based on accepted data, not on fixed rules Business Process Validation – early detection of failure Writing software this way is also known as Predictive Programming In fact, this is all related to Artificial Intelligence In a way, backwards

10 10 What is So Special? Application behaviour evolves and follows the data mining model Which is influenced by actual real-world events caused by your application! We are creating a feedback loop from the application through its effects back to the application The “trick” that connects the two is the discovery of new emerging patterns and old patterns disappearing – the very job of DM

11 11 DM Past and Present Traditional approaches from Microsoft’s competitors are geared at DM experts: “White-coat PhD statisticians” limiting its usefulness to us DM tools are also fairly expensive Microsoft’s approach is designed for developers with some database skills Multiple APIs and a working process from a Developer’s perspective DM built into Microsoft SQL Server 2005 and 2008 at no extra cost

12 12 DM Technologies in SQL Server 2005 Of course, strong developer support: DMX and OLE DB for Data Mining XML for Analysis.NET Framework 2.0/3.0/3.5 Interoperability PMML (Predictive Model Markup Language) for SAS, SPSS, IBM and Oracle Strong, patented algorithms from Microsoft Research labs

13 13 What is New in SQL Server 2008? Data Mining Enhancements In addition to many other new aspects of SQL Server: Enhanced Mining Structures Easier to prepare and test your models Models allow for cross-validation Filtering Use of incompatible models within a structure Algorithm Changes Improved Time Series algorithm combining best of ARIMA and ARTXP “What-If” analysis Redistributable Visualizers

14 14 Using DM to Build an Application

15 15 Mining Model Data Mining Process DM Engine Data To Predict DM Engine Predicted Data Training Data Mining Model DB data Client data Application data DB data Client data Application data “Just one row”

16 16 Analysis Services Server Mining Model Data Mining Algorithm DataSource Server Mining Architecture Your Application OLE DB/ ADOMD/ XMLA Deploy BI Dev Studio (Visual Studio) AppData

17 17 Data Mining Extensions (DMX) CREATE MINING MODEL CreditRisk (CustID LONG KEY, Gender TEXT DISCRETE, Income LONG CONTINUOUS, Profession TEXT DISCRETE, Risk TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees INSERT INTO CreditRisk (CustId, Gender, Income, Profession, Risk) Select CustomerID, Gender, Income, Profession,Risk From Customers Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk.Risk) FROM CreditRisk PREDICTION JOIN NewCustomers ON CreditRisk.Gender=NewCustomer.Gender AND CreditRisk.Income=NewCustomer.Income AND CreditRisk.Profession=NewCustomer.Profession

18 18 Intelligent Application – Steps A Simplified View 1.Prepare the database for mining 2.Create and train the DM model on your data, consisting of both the inputs and actual outcomes 3.Test the model. If OK... 4.The model predicts outcomes 5.Make application logic depend on predicted outcomes (if, case etc.) 6.Update (and validate) the model periodically as data evolves

19 19 Concepts Case – set of attributes you want to analyse Age, Gender, Annual Spending Nested Case – case containing a table column Age, Gender, Annual Spending, Products, Purchases Case Key – unique ID of a case Data Mining Model – container of patterns discovered by a DM algorithm in your data Typically a relational table containing: Key Columns Input Columns Predictable Columns

20 20 Steps for Building a DM Model 1.Model Creation We define the columns for cases: visually (BI Studio), using DMX, or from PMML 2.Model Training We feed lots of data from a real database, or from an application behaviour log Congratulations! We now have a model 3.Model Testing Does our model make any sense? We test it with sample data to check its predictions. Testing data must be different from training data (this is easier to do in SQL 2008). If we get nonsense, we adjust the algorithm, its parameters, model design, or even data 4.Model Prediction We use the model on new data to predict outcomes This is the new logic of our application!

21 21 CREATE MINING MODEL CREATE MINING MODEL CREATE MINING MODEL ( ) USING [( )] [WITH DRILLTHROUGH]

22 22 CREATE MINING MODEL CREATE MINING MODEL MyModel ( ) USING Microsoft_Decision_Trees

23 23 CREATE MINING MODEL CREATE MINING MODEL MyModel ( [CustID] TEXT KEY, ) USING Microsoft_Decision_Trees Name Data Type TextLongDoubleBooleanDate

24 24 CREATE MINING MODEL CREATE MINING MODEL MyModel ( [CustID] TEXT KEY, ) USING Microsoft_Decision_Trees Name Data Type Content Type Key Key Time DiscreteContinuousDiscretized

25 25 CREATE MINING MODEL CREATE MINING MODEL MyModel ( [Viewer] TEXT KEY, ) USING Microsoft_Decision_Trees

26 26 CREATE MINING MODEL CREATE MINING MODEL MyModel ( [Viewer] TEXT KEY, [Gender] TEXT DISCRETE, ) USING Microsoft_Decision_Trees

27 27 CREATE MINING MODEL CREATE MINING MODEL MyModel ( [Viewer] TEXT KEY, [Gender] TEXT DISCRETE, [Marital Status] TEXT DISCRETE, ) USING Microsoft_Decision_Trees

28 28 CREATE MINING MODEL CREATE MINING MODEL MyModel ( [Viewer] TEXT KEY, [Gender] TEXT DISCRETE, [Marital Status] TEXT DISCRETE, [Education] TEXT DISCRETE, ) USING Microsoft_Decision_Trees

29 29 CREATE MINING MODEL CREATE MINING MODEL MyModel ( [Viewer] TEXT KEY, [Gender] TEXT DISCRETE, [Marital Status] TEXT DISCRETE, [Education] TEXT DISCRETE, [Home Ownership] TEXT DISCRETE PREDICT, ) USING Microsoft_Decision_Trees Usage Predict Predict Only

30 30 CREATE MINING MODEL CREATE MINING MODEL MyModel ( [CustID] LONG KEY, [Gender] TEXT DISCRETE, [Marital Status] TEXT DISCRETE, [Education] TEXT DISCRETE, [Home Ownership] TEXT DISCRETE PREDICT, [Age] LONG CONTINUOUS, [Income] DOUBLE CONTINUOUS ) USING Microsoft_Decision_Trees

31 31 Nested Tables CustIDGenderMarital Status EducationHome Ownership 980001MaleMarriedBachelorsRent 980002MaleMarriedBachelorsOwn 980003FemaleSingleMastersOwn 980004MaleSingleSome CollegeOwn 980005FemaleMarriedBachelorsRent 980006FemaleMarriedMastersRentMovies Sofa TV Ladder Boiler Sofa Lazygirl Recliner Boiler TV DVD Player Bedstead TV Bookstand Yoga Mat Vase

32 32 CREATE MINING MODEL Nested CREATE MINING MODEL MyModel ( [CustID] LONG KEY, [Gender] TEXT DISCRETE, [Marital Status] TEXT DISCRETE, [Education] TEXT DISCRETE, [Home Ownership] TEXT DISCRETE PREDICT, [Age] LONG CONTINUOUS, [Income] DOUBLE CONTINUOUS, [Products] TABLE ( [Product Name] TEXT KEY ) ) USING Microsoft_Decision_Trees

33 33 Training Use standard SQL INSERT INTO statement Model contains patterns, not data Use SHAPE syntax to create nested input rowsets

34 34 INSERT INTO Source Data can be: Data Query DMX Query MDX Query Stored Procedure Call Rowset Parameter INSERT INTO [MINING MODEL | MINING STRUCTURE] [MINING MODEL | MINING STRUCTURE] [( )] <source-data>

35 35 Prediction Use SQL “SELECT” Semantics New JOIN type “PREDICTION JOIN” Returned values can contain tables Subselect from nested tables

36 36 PREDICTION JOIN SELECT [TOP ] FROM FROM [ [NATURAL] PREDICTION JOIN AS AS [ ON ] [ WHERE ] [ ORDER BY ] ]

37 37 Data Mining APIs Now that we have a model, let’s talk to it, on the SQL Server running Analysis Services There are several choices of APIs for us ADO.NET OLE DB/DM – OLE DB for Data Mining ADOMD.NET – ADO Multidimensional for.NET AMO – Analysis Management Objects DSO – Decision Support Objects have been superseded by AMO And of course the ones you already know: DMX – Data Mining Extensions XMLA – XML for Analysis

38 38 DM Interfaces Analysis Server (msmdsrv.exe) OLAPData Mining Server ADOMD.NET.Net Stored Procedures Microsoft Algorithms Third Party Algorithms WAN XMLA Over TCP/IP OLEDB for OLAP/DM ADO/DSO XMLA Over HTTP Any Platform, Any Device C++ AppVB App.Net App AMO Any App ADOMD.NET Data Mining APIs

39 39 The Intelligent Bit in Our Apps Your “if” statement will test the value returned from a prediction – typically, predicted probability or outcome Steps: 1.Build a case (set of attributes) representing the transaction you are processing at the moment E.g. Shopping basket of a customer plus their shipping info 2.Execute a “ SELECT... PREDICTION JOIN ” on the pre- loaded mining model 3.Read returned attributes, especially case probability for a some outcome E.g. Probability > 50% that “TransactionOutcome=ShippingDeliveryFailure” 4.Your application has just made an intelligent decision! 5.Remember to refresh and retest the model regularly – daily?

40 Demo A More Intelligent Application – Example of Predictive Programming

41 41 The Algorithms

42 42 Microsoft DM Algorithms Algorithms designed for open, wide use: Classification, regression, segmentation, associations, predicting, text analysis and advanced data exploration Consistent and simple (nice!) API hides the complexity of the algorithms

43 43 Data Mining Algorithms AlgorithmDescription Decision Trees Calculates the odds of an outcome based on values in a training set Association Rules Helps identify relationships between various elements. Naïve Bayes Clearly shows the differences in a particular variable for various data elements Sequence Clustering Groups or clusters data based on a sequence of previous events Time Series Analyzes and forecasts time-based data combining the power of ARIMA for long-term prediction and the power of ARTXP (developed by Microsoft Research) for short-term prediction. Together optimizing prediction accuracy. Greatly enhanced in SQL Server 2008. Neural Nets Seeks to uncover non-intuitive relationships in data Text Mining Analyzes unstructured text data Linear Regression Determines the relationship between columns in order to predict an outcome Logistic Regression Determines the relationship between columns in order to evaluate the probability that a column will contain a specific state

44 44 Algorithm Matrix Time Series Sequence Clustering Neural Nets Naïve Bayes Logistic Regression Linear Regression Decision Trees Clustering Association Rules Classification Estimation Segmentation Association Forecasting Text Analysis Advanced Data Exploration

45 45 Resources Sample code from this session on www.sqlserverdatamining.com www.sqlserverdatamining.com Book by Jamie MacLennan and ZhaoHui Tang “Data Mining with SQL Server 2005”, Wiley 2005, ISBN 0-471- 46261-6 On-demand webcast at microsoft.com – event ID 1032273683 “Intelligent Applications: Embedding Data Mining in Your Application” Also: www.beyeblogs.com/donaldfarmer blogs.msdn.com/jamiemac www.microsoft.com/sql/technologies/dm forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=81&SiteID=1 SQL Server Books Online

46 46 Summary Data Mining is a powerful technology not yet fully discovered by developers Turns data into knowledge and decision-making logic SQL Server 2005 and 2008 Analysis Services have been created with a developer in mind Build intelligent applications today and let’s discover what is still ahead of us in this brave new world

47 47 Resources Technical Communities, Webcasts, Blogs, Chats & User Groups http://www.microsoft.com/communities/default.mspx http://www.microsoft.com/communities/default.mspx Microsoft Learning and Certification http://www.microsoft.com/learning/default.mspx http://www.microsoft.com/learning/default.mspx Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet http://microsoft.com/msdn http://microsoft.com/technet Trial Software and Virtual Labs http://www.microsoft.com/technet/downloads/trials/defa ult.mspx http://www.microsoft.com/technet/downloads/trials/defa ult.mspx New, as a pilot for 2007, the Breakout sessions will be available post event, in the TechEd Video Library, via the My Event page of the website Required slide: Please customize this slide with the resources relevant to your session MSDN Library Knowledge Base Forums MSDN Magazine User Groups Newsgroups E-learning Product Evaluations Videos Webcasts V-labs Blogs MVPs Certification Chats learn support connect subscribe Visit MSDN in the ATE Pavilion and get a FREE 180-day trial of MS Visual Studio Team System!

48 48 Complete your evaluation on the My Event pages of the website at the CommNet or the Feedback Terminals to win!

49 49 © 2007 Microsoft Corporation & Project Botticelli Ltd. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. © 2007 Project Botticelli Ltd & Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.


Download ppt "Developing More Intelligent Applications Using Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd"

Similar presentations


Ads by Google