Presentation is loading. Please wait.

Presentation is loading. Please wait.

Business Intelligence Fundamentals: Data Mining

Similar presentations


Presentation on theme: "Business Intelligence Fundamentals: Data Mining"— Presentation transcript:

1 Business Intelligence Fundamentals: Data Mining
9/22/2018 Business Intelligence Fundamentals: Data Mining Ola Ekdahl IT Mentors

2 Agenda Introducing Data Mining
Business Intelligence Fundamentals: Data Mining Introducing Data Mining Integration with SQL Server Components Data Mining Programmability Agenda

3 Where Are We? Data Warehouse Data Sources Data Marts Staging Area
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Where Are We? Data Warehouse Data Sources Data Marts Staging Area Manual Cleansing

4 Module Overview Introducing Data Mining
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Module Overview Introducing Data Mining Integration with SQL Server 2008 Components Data Mining Programmability

5 Introducing Data Mining
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Introducing Data Mining Purpose of Data Mining Business Scenarios SQL Server 2008 Data Mining Data Preparation Data Mining Process Data Mining Visualization

6 Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Purpose of Data Mining Addresses the problem of too much data and not enough information Enables data exploration, pattern discovery, and pattern prediction—which lead to knowledge discovery Forms a key part of a BI solution Data mining is a topic new to most students. In this slide, focus on the evolution of data management. Organizations today have become masters of data collection and storage. There is more data to store than ever before, and data is being stored for longer periods of time. Organizations are now focusing on exploring this wealth of data, with an objective of understanding the patterns it contains. These patterns can be used for prediction. Have students think about where data mining may have affected them. Most probably do not realize its widespread use. Have them think about how Web ads might be specifically targeted to the logged-on user. Also, most students identify with the Amazon “those that bought this bought that” feature.

7 Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Business Scenarios Identifying responsive customers/unresponsive customers (also known as churn analysis) Detecting fraud Targeting promotions Managing risk Forecasting sales Cross-selling Segmenting customers Provide general examples to explain business applications of data mining: Responsive/unresponsive customers: Using segmentation, churn analysis identifies customers who have a high probability of departing. Cell phone carriers are particularly concerned with customers leaving to get a better deal. It usually costs more to gain a customer than to keep one, so these companies commonly offer attractive deals to maintain loyalty. Fraud detection: Have you ever had your bank or credit card company contact you to confirm an irregular transaction? Targeted promotions: Marketing departments can spend their budgets more wisely by targeting customers who are likely to purchase. A demonstration later in this module produces a solution for targeted promotions. Sales forecasting: Extrapolating historical sales can forecast future sales. A demonstration later in this module produces a solution for sales (quota) forecasting. Cross-selling: Understanding “what sells with what” is key to business success; Amazon.com commonly uses this technique. Customer segmentation: Data mining can group customers by similar attributes, such as region, age, or some other demographic data.

8 Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining SQL Server 2008 Data Mining Hides the complexity of an advanced technology Includes full suite of algorithms to automatically extract information from data Handles large volumes of data and complex data Data can be sourced from relational and OLAP databases Uses standard programming interfaces XMLA DMX Delivers a complete framework for building and deploying intelligent applications Data mining is complex. This slide’s message is that there is no reason to avoid or discount the technology. SSAS 2008 data mining exposes standard programming interfaces that are wrapped in convenient and consistent object models (discussed later in the module). The SSAS 2008 data mining platform integrates well with other development frameworks. For example, developers can seamlessly integrate data mining results with Windows or Web applications. Finally, the platform is extensible, letting you plug in visualization viewers and algorithms (discussed later in the module). Also, note that SSAS is bundled as part of SQL Server. Other data mining platforms have very expensive price tags attached to them. The important message here is that data mining is now affordable and approachable.

9 SQL Server 2008 Algorithms Decision Trees Clustering
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining SQL Server 2008 Algorithms Decision Trees The most popular data mining technique Used for classification Clustering Finds natural groupings inside data Sequence Clustering Groups a sequence of discrete events into natural groups based on similarity Use this algorithm to understand how visitors use your Web site Provide an example of each algorithm: Decision Trees: Identifying customers who are likely to purchase a bicycle Clustering: Grouping customers together by some potentially “hidden” likeness Sequence Clustering: Understanding how a visitor uses a Web site

10 SQL Server 2008 Algorithms Naïve Bayes Linear Regression
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining SQL Server 2008 Algorithms Naïve Bayes Used for classification in similar scenarios to Decision Trees Linear Regression Finds the best possible straight line through a series of points Used for prediction analysis Logistic Regression Fits to an exponential factor Provide an example of each algorithm: Naïve Bayes: Identifying customers who are likely to purchase a bicycle (again) Linear and logistic regression: Estimating relationships between variables—for example, body mass and muscle strength Point out that an in-depth knowledge of each algorithm is not required for them to be useful as the results are trustworthy if you have properly prepared data. Use the results with caution and validate with a subject matter expert.

11 SQL Server 2008 Algorithms Association Rules Time Series
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining SQL Server 2008 Algorithms Association Rules Supports market basket analysis to learn what products are purchased together Time Series Forecasting algorithm used for short-term or long-term predictions future values from a time series Use multiple series to predict “what if” scenarios Neural Network Used for classification and regression tasks More sophisticated than Decision Trees and Naïve Bayes, this algorithm can explore extremely complex scenarios Provide an example of each algorithm: Association Rules: Understanding products that are purchased together in a single transaction Time Series: Deriving next year’s sales forecasts based on this year’s sales Neural Network: Advanced classification, such as that used in drug testing

12 Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Data Preparation Often significant amounts of effort are required to prepare data for mining Transforming for cleaning and reformatting Isolating and flagging abnormal data Appropriately substituting missing values Discretizing continuous values into ranges Normalizing values between 0 and 1 Considerable time should be dedicated to understanding the data and preparing it for mining. Having the right data to begin with is important also. It is important to stress this to ISVs who will be designing databases, and may need to consider including attributes that will be used to classify data. Emphasize that it is important to have a clear idea about what business problem is being solved, and have defined business objectives.

13 Data Mining Process Design time Process time Query time Mining Model
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Data Mining Process Design time Process time Query time Mining Model At design time, the developer creates the model. The model is like a database table. Its definition includes a specific algorithm and a collection of attributes and their data types. The model stores patterns discovered during processing.

14 Data Mining Process Design time Process time Query time Mining Model
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Data Mining Process Design time Process time Query time Mining Model Training Data Data Mining Engine At process time (or model training time), historical data is passed into the data mining engine. Patterns are extracted by the algorithm and stored in the model. Mar-2008 Microsoft Developer & Platform Evangelism

15 Data Mining Process Design time Process time Query time Mining Model
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Data Mining Process Design time Process time Query time Mining Model Data Mining Engine Predicted Data Data to Predict At query time, a dataset to be predicted is passed into the mining model. The data mining engine applies rules it found in the training step to a new dataset and assigns the prediction result for each input case. Generally, prediction is very fast and can be executed in real time. Mar-2008 Microsoft Developer & Platform Evangelism

16 Data Mining Visualization
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Data Mining Visualization In contrast to OLTP and OLAP queries, data mining queries typically extract previously unknown information Visualizations can effectively present data discoveries SQL Server 2008 provides algorithm-specific visualizations that you can se to Test and explore models in Business Intelligence Development Studio Embed into Windows Forms applications Developers can construct and plug-in custom data mining viewers This slide discusses the presentation of data mining results. Let the students know that they will see some visualizations in the demos.

17 Integration with SQL Server 2008 Components
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Integration with SQL Server 2008 Components Integration with SSIS Integration with SSAS Integration with SSRS

18 Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Integration with SSIS Perform data mining directly in the control flow or the data flow pipeline Configure “intelligent” packages based on data mining query results Enterprise Edition only This slide serves as a reminder of control flow tasks and data flow components, introduced in Module 03. You can achieve “intelligent” packages by having the package execution adapt to data mining results.

19 Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Integration with SSAS Create data mining models directly from OLAP stores Create dimensions from data mining models to slice cubes using discovered patterns Decision Trees Clustering Association Rules SSAS presents data that can be consumed by a data mining model. There is the requirement that the data mining model and OLAP database belong to the same Business Intelligence Development Studio project. As Module 06 noted, dimensions can be based on data mining models. This allows the analysis of data in a cube via data mining models. A good example to share with students is the clustering of customers, then the assessing of sales by those clusters.

20 Integration with SSRS Present data mining results in SSRS reports
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Integration with SSRS Present data mining results in SSRS reports Prediction queries Content queries Parameterized queries Use a data mining query builder to easily select results Apply grouping and aggregation to summarize results Distribute data mining results by using subscriptions SSRS can consume data mining query results. Report Designer includes a drag-and-drop query builder that writes the DMX statements. Standard SSRS features—such as grouping, aggregation, and subscriptions (distribution)—then apply.

21 Data Mining Programmability
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Data Mining Programmability SSAS Data Mining Programmability Overview Programming Interfaces Embedding SSAS Data Mining Extending SSAS Data Mining

22 SSAS Data Mining Programmability Overview
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining SSAS Data Mining Programmability Overview C++ App VB App .NET App Any App OLE DB ADO ADOMD.NET AMO Any Platform, Any Device WAN XMLA Over TCP/IP XMLA Over HTTP Analysis Server OLAP Data Mining Server ADOMD.NET Data Mining Interfaces .NET Stored Procedures Microsoft Algorithms Third-Party Algorithms Use this slide to discuss the programming opportunities with SSAS data mining. The next two slides discuss AMO, ADOMD.NET, Server ADOMD, .NET Stored Procedures, and SSAS data mining extension opportunities.

23 Programming Interfaces
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Programming Interfaces AMO (Analysis Management Objects) Administer database objects Apply security Manage processing ADOMD.NET Connect to SSAS databases Retrieve and manipulate data Server ADOMD.NET Extend DMX by using .NET stored procedures This slide is similar to the slide for the UDM in Module 06. AMO: This API fulfills the same purpose discussed in Module 06. ADOMD.NET: In addition to retrieving DMX query results, this API is used to manipulate data and metadata. Server ADOMD.NET: This API has been designed to extend DMX itself. SSAS 2008 includes true server-side stored procedure support. Mar-2008

24 Embedding SSAS Data Mining
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Embedding SSAS Data Mining Validate or repair user entry Integrate predictions Targeted advertising “Those that bought this book also purchased these books” Embed custom visualizations into Windows Forms applications to allow users to explore and understand model patterns SSAS Data Mining ships with custom visualizations Emphasize that end users do not usually query data mining models directly. Rarely do they have the skills or tools to write DMX or the ability to comprehend the query results. Mention that data mining is more pervasive than many think – it happens in the background. This slide will be of particular interest to ISVs. Having covered the theory and capabilities of data mining, this slide focuses on how to embed data mining. This covers the out-of-the-box functionality while the next slide introduces another interesting topic for ISVs: extensibility. You can make mention that the lab for this module integrates predictions in the form of suggestions based on shopping cart content.

25 Extending SSAS Data Mining
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Extending SSAS Data Mining Stored procedures Enhanced Visual Studio data mining tools Plug-in algorithms Plug-in data mining viewers There are four main ways to extend the SSAS 2008 data mining system. Do not dwell on the specifics here; simply let the audience know the possibilities: Stored procedures Using the Visual Studio extensibility mechanisms to extend and enhance the data mining tools Developing plug-in algorithms Writing new viewers to visualize data mining models 25

26 Classifying Customers Likely to Purchase a Bicycle
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Classifying Customers Likely to Purchase a Bicycle DEMO Refer to the demonstration notes.

27 Resources www.microsoft.com/sql/technologies/dm
Microsoft BI Voyage Business Intelligence Fundamentals: Data Mining Resources Links to technical resources, case studies, news, and reviews Site designed and maintained by the SQL Server Data Mining team Live samples Tutorials Webcasts Tips and tricks FAQ Data Mining for SQL Server 2005, by ZhaoHui Tang and Jamie MacLennan The Web site contains a wealth of data mining information. Although written for SQL Server 2005, Data Mining for SQL Server remains relevant for use with SQL Server 2008 Data Mining. The time series algorithm was updated with greater functionality, which will be discussed in Part 2 of this course.


Download ppt "Business Intelligence Fundamentals: Data Mining"

Similar presentations


Ads by Google