Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data.

Slides:



Advertisements
Similar presentations
Supporting End-User Access
Advertisements

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall COS 236 Day 25.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Mining (and Machine Learning) With Microsoft Tools Michael Lisin, Plaster Group May 8, 2014.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall COS 346 Day 26.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Business Intelligence
Gavin Russell-Rockliff BI Technical Specialist Microsoft BIN305.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Dr. Awad Khalil Computer Science Department AUC
Data Mining Techniques
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Understanding Data Analytics and Data Mining Introduction.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
More value from data using Data Mining Allan Mitchell SQL Server MVP.
1 1 Slide Introduction to Data Mining and Business Intelligence.
The DM Process – MS’s view (DMX). The Basics  You select an algorithm, show the algorithm some examples called training example and, from these examples,
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Using SAS® Information Map Studio
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Consul- ting Services Outsour- cing Services Techno- logy Services Local Profes- sional Services Competence Centers Business Intelligence WebTech SAP.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
Finding Hidden Intelligence with Predictive Analysis of Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
David M. Kroenke and David J. Auer Database Processing Fundamentals, Design, and Implementation Appendix J: Business Intelligence Systems.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining With SQL Server Data Tools Mining Data Using Tools You Already Have.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Data Mining – Intro.
Business Intelligence for a Tough Economy: Data Mining
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction C.Eng 714 Spring 2010.
Week 11 Knowledge Discovery Systems & Data Mining :
TechEd /28/ :48 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
Supporting End-User Access
Module 14: Performing Predictive Analysis with Data Mining
Presentation transcript:

Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data

2 Outline MS SQL Server 2008 and Data Mining MS SQL Server 2008 and Data Mining Extensions (DMX) Using MS SQL Server Data Mining MS SQL Server Available Algorithms:  Naïve Bayes  Decision Tree  Time Series  Clustering  Association Rules  Neural Networks and Logisitc Regression

MS SQL Server 2008 and Data Mining 3

Hard drive capacity increased (CRM, ERP, web server log records, etc.) faster than increase in processing power; data outpaced the capability to process it leading to data-rich and knowledge-poor. Main purpose of data mining is to extract knowledge from the huge data at hand. With traditional RDBMS, you can issue a query, including OLAP, to find answers to interesting questions? In contrast with data mining, you ask the question in terms of the data ( and possible hypothesis) and let the data mining tools to either verify your hypothesis or to discover hypothesis you did not think of! MS SQL Server 2008 and Data Mining: An Overview 4

Classification: risk management, targeted advertisement, etc. Find a model that describes the class attribute as function of input attributes. Algorithms include: decision tree, neural network, and Naïve Bayes. Clustering: typically unsupervised learning where all attributes are treated equally. Most clustering algorithms are iterative in nature and stop when the model converges when the clusters dynamics become stable. MS SQL Server 2008 and Data Mining: Data Mining Tasks 5 Decision tree Clustering

Association (market Basket Analysis): In a sales situation, we would like to identify products that are often in the same shopping basket for cross selling purposes. Regression: Similar to classification except instead of looking for a pattern to describe a class, the goal is find a pattern to determine a numerical value. Example: predict a coupon redemption rate based on the face value, etc. MS SQL Server 2008 and Data Mining: Data Mining Tasks 6 Product Association

Forecasting (predicting future values): what will be MSFT stock value tomorrow? What will be the sales amount of wine next month? Sequence Analysis: tries to find patterns in a sequence of events called a sequence. Next Figure is a web click sequence: each node is a URL category, and the line represent transition between them with weight that is probability of transitions between these 2 URLS! MS SQL Server 2008 and Data Mining: Data Mining Tasks 7 Time Series Wen Navigation Sequence

Deviation Analysis: is used to find rare cases that behave very differently from the norm! Example is credit card fraud detection, network intrusion detection, manufacture error analysis, etc. There is no standard technique. Usually applying decision trees, clustering or neural network algorithms. MS SQL Server 2008 and Data Mining: Data Mining Tasks 8

Business problem formulation Data Collection Data cleaning and transformation Model Building Model Assessment Reporting and prediction MS SQL Server 2008 and Data Mining: Data Mining Project Cycle 9

10 MS SQL Server 2008 and Data Mining extensions (DMX)

DMX was created by Microsoft OLAP team leveraging OLE DB as the application programming interface (API) and created a query language as close to SQL as possible while meeting the needs for data mining. Evolving with time, target developers expanded to include.NET developers using C# or VB.NET and OLE DB became less relevant. MS SQL Server 2008 and Data Mining Extensions (DMX): An Overview 11

First, you need to define the problem! Create a mining model (an object) Provide training data to the model Now, you can provide new data and perform predictions (deductions) of information using the patterns discovered by the algorithm during the training MS SQL Server 2008 and Data Mining Extensions (DMX): The D.M. Process 12 The Data Mining Process

13 Using MS SQL Server Data Mining

The BI Dev. Studio: it is a tool that is integrated into MS Visual Studio shell to provide a complete development experience for BI. Using MS SQL Server 2008 Data Mining: The BI Dev Studio 14

Solution explorer: this is where you manage your project and objects are created Window tabs: allow you to switch between designer windows Designer window: edit/analyze your objects Designer tabs: object aspects that you can edit or interact with the object Properties window: context-sensitive windows; allow you to display properties of selected item BI menu: it is context-sensitive menus specific to Analysis Services objects, e.g., open the data source view (DSV) Output window: displays messages when you build and deploy projects Using MS SQL Server 2008 Data Mining: The BI Dev Studio 15

Immediate Mode: more natural for data mining users; you are connected to an Analysis Services server:  When you open an object, you are getting the object from the server  When you modify the object and save it; the object is immediately updated on the server Offline Mode: your project contains files that are stored on your client machine:  Modifications to objects are stored in XML format on your hard drive  The model and objects are not reflected in the server until you decide to deploy them to the destination server Using MS SQL Server 2008 Data Mining: Understanding Immediate & Offline Modes 16

After you open your project, you must describe your source data  create mining structures and models Two objects in Analysis Services act as interfaces to your data: the data source and the data source view (DSV) Data source is a simple object that consists of connection string, plus additional information indicating how to connect DSV is an abstraction layer that enables you to modify the way you look at data sources Using MS SQL Server 2008 Data Mining: Creating & Modifying Data Sources 17

To learn/understand your data, leverage controls from Office Web Components (OWC), the DSV Designer provides functionality to explore your data in your different views. After organizing, modifying, selecting, and understanding the data you want to analyze, you can start to create data mining objects. Two important objects that deal with data mining: mining structures and mining models:  Mining structure: defines the domain of a mining problem. In addition, mining structure contains list of mining models that use columns from the structure  Mining model: apply a mining algorithm to the data in a mining structure Using MS SQL Server 2008 Data Mining: Exploring Data and Evaluating Models 18

19 MS SQL Server Available Algorithms

MS SQL Server Available Algorithms:  Naïve Bayes: enables you to create models with predictive abilities; learning based on evidence using correlation between the variables you are interested in and all other variables, e.g., figure out if congressman is Democrat or Republican based on their voting records!  Decision Tree: one of the mot popular data mining techniques because of the fast training performance with high degree of accuracy, e.g., classify if loan applicant is high or low risk!  Time Series: consists of a series of data collected over successive increments of time or other sequence indicator. Main purpose is to forecast future series points based on past history MS SQL Server Available Algorithms 20

MS SQL Server Available Algorithms:  Clustering: finds natural grouping inside your data when such groupings are not obvious. In other words, find hidden variables that accurately classifies your data. It is good technology to discover hidden patterns but as usual you get best answers when you ask your question the right way.  Association Rules (market basket analysis): perform the market basket analysis on your customer’s transactions. You can learn which products are commonly purchased together and how likely a particular product is to purchased along with another. Possible outcome is: 5% of your customers have bought X, Y and Z together, and that 75% of these customers who bought X and also bought Z. You could use this insight to manage stock levels, etc. MS SQL Server Available Algorithms 21

MS SQL Server Available Algorithms:  Neural Networks and Logisitc Regression: Human minds analyze the problem’s facts and are weighted then these weighted facts are grouped to lea to a conclusion. Neural Networks are mathematical models for the above process. It works by creating neural paths (relationships between In/Out) that are used as patterns for further predictions. Training Neural Network is time consuming more than other models. The complexity comes from the fact that (1) any/all inputs may be related somehow to ay/all outputs! (2) Different combinations of inputs may be related differently to outputs! MS SQL Server Available Algorithms 22

MS SQL Server Available Algorithms: The MS Logistic Regression algorithm is a special case of a Neural Network – one with single level of relationships. Typically used by statisticians to model and predict the probability of events based on inputs. MS SQL Server Available Algorithms 23

24 END