Presentation is loading. Please wait.

Presentation is loading. Please wait.

More value from data using Data Mining Allan Mitchell SQL Server MVP.

Similar presentations


Presentation on theme: "More value from data using Data Mining Allan Mitchell SQL Server MVP."— Presentation transcript:

1 More value from data using Data Mining Allan Mitchell SQL Server MVP

2 Who am I SQL Server MVP SQL Server Consultant Joint author on Wrox Professional SSIS book Worked with SQL Server since version 6.5 www.SQLDTS.com and www.SQLIS.com www.SQLDTS.comwww.SQLIS.com Partner of SQL Know How

3 SQL Know How Dedicated to Microsoft SQL Server We are familiar trusted faces Provide – Consultancy large and small – Training public and private – Mentoring – Business Brain Storming

4 Today’s Schedule what is data mining (Overview) data mining terminology myths around data mining excel AddIn to Office2007 – Demo Setup – Demo Key Influencers – Demo Categories – Demo Make a Prediction – Demo “Other stuff” – if time Questions and answers

5 What is Data Mining The process of using statistical techniques to discover subtle relationships between data items, and the construction of predictive models based on them. The process is not the same as just using an OLAP tool to find exceptional items. Generally, data mining is a very different and more specialist application than OLAP, and uses different tools from different vendors. Normally the users are different, too. OLAP vendors have had little success with their data mining efforts. OLAP REPORT

6 What does Data Mining Do? Explores Your Data Finds Patterns Performs Predictions Query, Reporting, AnalysisData Mining WhatWhy How

7 Comparative Benefits Predictive Projects versus Nonpredictive Projects Source: IDC, 2003

8 Data Mining terminology mining structure mining model mining algorithm training dataset testing dataset

9 SQL Server 2005 Algorithms Decision Trees Clustering Time Series Sequence Clustering Association Naïve Bayes Neural Net Plus: Linear and Logistic Regression

10 Sequence Clustering Applied to – Click stream analysis – Customer segmentation with sequence data – Sequence prediction Mix of clustering and sequence technologies Group individuals based on their profiles including sequence data

11 Time Series Applied to – Forecast sales – Web hits prediction – Stock value estimation Patented technique from Microsoft Research Uses regression tree technology to describe and predict series values

12 Clustering Applied to – Segmentation: Customer grouping, Mailing campaign – Also support classification and regression Expectation Maximization – Probabilistic Clustering K-Means – Distance based Clusters both discrete and continuous values – Discrete values are “binarized” Anomaly detection Check variable independence – “Predict Only” attributes not used for clustering

13 Clustering Discrete Male Female Son Daughter Parent Age

14 Clustering Anomaly Detection Male Female Son Daughter Parent Age

15 dm data flow Cube Historical Dataset New Dataset Data Transform (SSIS) Reporting Mining Models Model Browsing Prediction LOB Application Cube

16 the steps to a successful model MS BOL

17 DMX CREATE MINING MODEL CreditRisk (CustID LONG KEY, Gender TEXT DISCRETE, Income LONG CONTINUOUS, Profession TEXT DISCRETE, Risk TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees CREATE MINING MODEL CreditRisk (CustID LONG KEY, Gender TEXT DISCRETE, Income LONG CONTINUOUS, Profession TEXT DISCRETE, Risk TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees INSERT INTO CreditRisk (CustId, Gender, Income, Profession, Risk) Select CustomerID, Gender, Income, Profession,Risk From Customers INSERT INTO CreditRisk (CustId, Gender, Income, Profession, Risk) Select CustomerID, Gender, Income, Profession,Risk From Customers Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk) FROM CreditRisk PREDICTION JOIN NewCustomers ON CreditRisk.Gender=NewCustomer.Gender AND CreditRisk.Income=NewCustomer.Income AND CreditRisk.Profession=NewCustomer.Profession Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk) FROM CreditRisk PREDICTION JOIN NewCustomers ON CreditRisk.Gender=NewCustomer.Gender AND CreditRisk.Income=NewCustomer.Income AND CreditRisk.Profession=NewCustomer.Profession

18 Myths around data mining You have to be a propeller head It’s a new concept. Only works with SSAS cubes

19 Excel 2007 DMAddin DM visualisation table analysis Create session models/permanent models Connect to ssas for full blown models intuitive interface

20 Demos setup key Influencers categories Make a prediction other sexy stuff

21 Resources Loads to be honest (DMX, API to name two things) Big Subject but very sexy

22 Contact Details allan.mitchell@konesans.com


Download ppt "More value from data using Data Mining Allan Mitchell SQL Server MVP."

Similar presentations


Ads by Google