Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data.

Similar presentations


Presentation on theme: "Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data."— Presentation transcript:

1 Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data

2 2 Outline MS SQL Server 2008 and Data Mining MS SQL Server 2008 and Data Mining Extensions (DMX) Using MS SQL Server Data Mining MS SQL Server Available Algorithms:  Naïve Bayes  Decision Tree  Time Series  Clustering  Association Rules  Neural Networks and Logisitc Regression

3 MS SQL Server 2008 and Data Mining 3

4 Hard drive capacity increased (CRM, ERP, web server log records, etc.) faster than increase in processing power; data outpaced the capability to process it leading to data-rich and knowledge-poor. Main purpose of data mining is to extract knowledge from the huge data at hand. With traditional RDBMS, you can issue a query, including OLAP, to find answers to interesting questions? In contrast with data mining, you ask the question in terms of the data ( and possible hypothesis) and let the data mining tools to either verify your hypothesis or to discover hypothesis you did not think of! MS SQL Server 2008 and Data Mining: An Overview 4

5 Classification: risk management, targeted advertisement, etc. Find a model that describes the class attribute as function of input attributes. Algorithms include: decision tree, neural network, and Naïve Bayes. Clustering: typically unsupervised learning where all attributes are treated equally. Most clustering algorithms are iterative in nature and stop when the model converges when the clusters dynamics become stable. MS SQL Server 2008 and Data Mining: Data Mining Tasks 5 Decision tree Clustering

6 Association (market Basket Analysis): In a sales situation, we would like to identify products that are often in the same shopping basket for cross selling purposes. Regression: Similar to classification except instead of looking for a pattern to describe a class, the goal is find a pattern to determine a numerical value. Example: predict a coupon redemption rate based on the face value, etc. MS SQL Server 2008 and Data Mining: Data Mining Tasks 6 Product Association

7 Forecasting (predicting future values): what will be MSFT stock value tomorrow? What will be the sales amount of wine next month? Sequence Analysis: tries to find patterns in a sequence of events called a sequence. Next Figure is a web click sequence: each node is a URL category, and the line represent transition between them with weight that is probability of transitions between these 2 URLS! MS SQL Server 2008 and Data Mining: Data Mining Tasks 7 Time Series Wen Navigation Sequence

8 Deviation Analysis: is used to find rare cases that behave very differently from the norm! Example is credit card fraud detection, network intrusion detection, manufacture error analysis, etc. There is no standard technique. Usually applying decision trees, clustering or neural network algorithms. MS SQL Server 2008 and Data Mining: Data Mining Tasks 8

9 Business problem formulation Data Collection Data cleaning and transformation Model Building Model Assessment Reporting and prediction MS SQL Server 2008 and Data Mining: Data Mining Project Cycle 9

10 10 MS SQL Server 2008 and Data Mining extensions (DMX)

11 DMX was created by Microsoft OLAP team leveraging OLE DB as the application programming interface (API) and created a query language as close to SQL as possible while meeting the needs for data mining. Evolving with time, target developers expanded to include.NET developers using C# or VB.NET and OLE DB became less relevant. MS SQL Server 2008 and Data Mining Extensions (DMX): An Overview 11

12 First, you need to define the problem! Create a mining model (an object) Provide training data to the model Now, you can provide new data and perform predictions (deductions) of information using the patterns discovered by the algorithm during the training MS SQL Server 2008 and Data Mining Extensions (DMX): The D.M. Process 12 The Data Mining Process

13 13 Using MS SQL Server Data Mining

14 The BI Dev. Studio: it is a tool that is integrated into MS Visual Studio shell to provide a complete development experience for BI. Using MS SQL Server 2008 Data Mining: The BI Dev Studio 14

15 Solution explorer: this is where you manage your project and objects are created Window tabs: allow you to switch between designer windows Designer window: edit/analyze your objects Designer tabs: object aspects that you can edit or interact with the object Properties window: context-sensitive windows; allow you to display properties of selected item BI menu: it is context-sensitive menus specific to Analysis Services objects, e.g., open the data source view (DSV) Output window: displays messages when you build and deploy projects Using MS SQL Server 2008 Data Mining: The BI Dev Studio 15

16 Immediate Mode: more natural for data mining users; you are connected to an Analysis Services server:  When you open an object, you are getting the object from the server  When you modify the object and save it; the object is immediately updated on the server Offline Mode: your project contains files that are stored on your client machine:  Modifications to objects are stored in XML format on your hard drive  The model and objects are not reflected in the server until you decide to deploy them to the destination server Using MS SQL Server 2008 Data Mining: Understanding Immediate & Offline Modes 16

17 After you open your project, you must describe your source data  create mining structures and models Two objects in Analysis Services act as interfaces to your data: the data source and the data source view (DSV) Data source is a simple object that consists of connection string, plus additional information indicating how to connect DSV is an abstraction layer that enables you to modify the way you look at data sources Using MS SQL Server 2008 Data Mining: Creating & Modifying Data Sources 17

18 To learn/understand your data, leverage controls from Office Web Components (OWC), the DSV Designer provides functionality to explore your data in your different views. After organizing, modifying, selecting, and understanding the data you want to analyze, you can start to create data mining objects. Two important objects that deal with data mining: mining structures and mining models:  Mining structure: defines the domain of a mining problem. In addition, mining structure contains list of mining models that use columns from the structure  Mining model: apply a mining algorithm to the data in a mining structure Using MS SQL Server 2008 Data Mining: Exploring Data and Evaluating Models 18

19 19 MS SQL Server Available Algorithms

20 MS SQL Server Available Algorithms:  Naïve Bayes: enables you to create models with predictive abilities; learning based on evidence using correlation between the variables you are interested in and all other variables, e.g., figure out if congressman is Democrat or Republican based on their voting records!  Decision Tree: one of the mot popular data mining techniques because of the fast training performance with high degree of accuracy, e.g., classify if loan applicant is high or low risk!  Time Series: consists of a series of data collected over successive increments of time or other sequence indicator. Main purpose is to forecast future series points based on past history MS SQL Server Available Algorithms 20

21 MS SQL Server Available Algorithms:  Clustering: finds natural grouping inside your data when such groupings are not obvious. In other words, find hidden variables that accurately classifies your data. It is good technology to discover hidden patterns but as usual you get best answers when you ask your question the right way.  Association Rules (market basket analysis): perform the market basket analysis on your customer’s transactions. You can learn which products are commonly purchased together and how likely a particular product is to purchased along with another. Possible outcome is: 5% of your customers have bought X, Y and Z together, and that 75% of these customers who bought X and also bought Z. You could use this insight to manage stock levels, etc. MS SQL Server Available Algorithms 21

22 MS SQL Server Available Algorithms:  Neural Networks and Logisitc Regression: Human minds analyze the problem’s facts and are weighted then these weighted facts are grouped to lea to a conclusion. Neural Networks are mathematical models for the above process. It works by creating neural paths (relationships between In/Out) that are used as patterns for further predictions. Training Neural Network is time consuming more than other models. The complexity comes from the fact that (1) any/all inputs may be related somehow to ay/all outputs! (2) Different combinations of inputs may be related differently to outputs! MS SQL Server Available Algorithms 22

23 MS SQL Server Available Algorithms: The MS Logistic Regression algorithm is a special case of a Neural Network – one with single level of relationships. Typically used by statisticians to model and predict the probability of events based on inputs. MS SQL Server Available Algorithms 23

24 24 END


Download ppt "Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data."

Similar presentations


Ads by Google