Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.

Slides:



Advertisements
Similar presentations
Supporting End-User Access
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
1.Data categorization 2.Information 3.Knowledge 4.Wisdom 5.Social understanding Which of the following requires a firm to expend resources to organize.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Data Mining By Archana Ketkar.
IBM SPSS Modeler 14.2 Data Mining Concepts Introduction to Undirected Data Mining: Association Analysis Prepared by David Douglas, University of ArkansasHosted.
Building Knowledge-Driven DSS and Mining Data
Data Mining – Intro.
Data mining By Aung Oo.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Introduction to Directed Data Mining: Decision Trees
Introduction to undirected Data Mining: Clustering
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining CMPT 455/826 - Week 10, Day 2 Jan-Apr 2009 – w10d21.
Dr. Awad Khalil Computer Science Department AUC
Data Mining Techniques
More on Data Mining KDnuggets Datanami ACM SIGKDD
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
Data Mining Chun-Hung Chou
Introduction: The essential background
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
AN INTELLIGENT AGENT is a software entity that senses its environment and then carries out some operations on behalf of a user, with a certain degree of.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Data Mining and ERP Presented by: Abhineet Malviya Ankesh Jindal Mayur Shinde.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining and Decision Support
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Chapter 2 Data, Text, and Web Mining. Data Mining Concepts and Applications  Data mining (DM) A process that uses statistical, mathematical, artificial.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining – Intro.
SNS COLLEGE OF TECHNOLOGY
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction C.Eng 714 Spring 2010.
Object oriented system development life cycle
Data Warehousing and Data Mining
C.U.SHAH COLLEGE OF ENG. & TECH.
Supporting End-User Access
Course Introduction CSC 576: Data Mining.
Presentation transcript:

Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 1 Microsoft Enterprise Consortium

Modules in this Series The modules in this series are targeted to support using the Microsoft SQL Server 2008 Business Intelligence Development Studio hosted at the University of Arkansas This module is the introduction to data mining The series of modules includes both directed and undirected data mining modules. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 2

Microsoft Enterprise Consortium Data Mining What is data mining?  “…the process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data…” (Gartner Group)  “…the analysis of observational data sets to find unsuspected relationships and to summarize data in novel ways…” (Hand et al.)  “…is an interdisciplinary field bringing together techniques from machine learning, pattern recognition, statistics, databases, and visualization…” (Cabana et al.)  … is the exploration and analysis of large quantities of data in order to discover previously unknown meaningful and actionable patterns and rules ( adapted form Berry and Linoff) Berry & Linoff (Data Miners) -- Microsoft Enterprise ConsortiumPrepared by David Douglas, University of Arkansas 3

Microsoft Enterprise Consortium Why Data Mining in a customer centric organization?  Data mining can assist in the firm’s ability to form learning relationships with its customers  Factors other than data mining required to turn a product-oriented organization into a customer-centric one  To form a learning relationship with customers, a firm must Notice what its customers are doing – accomplished via transaction processing system Remember what it and its customers have done over time – accomplished via data warehouses Learn from what was remembered – data mining Act on what is has learned – implementation Berry & Linoff (Data Miners) -- Microsoft Enterprise ConsortiumPrepared by David Douglas, University of Arkansas 4

Microsoft Enterprise Consortium Why Data Mining Now?  Data are being produced  Data are being stored in data warehouses  Computing power if more affordable  Competitive pressures are enormous  Availability of easy to use data mining software Microsoft Enterprise ConsortiumPrepared by David Douglas, University of Arkansas 5

Microsoft Enterprise Consortium A CRISP Data Mining Methodology? Microsoft Enterprise ConsortiumPrepared by David Douglas, University of Arkansas 6 Cross Industry Standard Process - DM

Microsoft Enterprise Consortium Cross Industry Standard Process - DM Microsoft Enterprise ConsortiumPrepared by David Douglas, University of Arkansas 7  Iterative CRISP-DM process shown in outer circle  Most significant dependencies between phases shown  Next phase depends on results from preceding phase  Returning to earlier phase possible before moving forward

Microsoft Enterprise Consortium CRISP-DM (cont) (1) Business Understanding Phase  Define business requirements and objectives  Translate objectives into data mining problem definition  Prepare initial strategy to meet objectives (2) Data Understanding Phase  Collect data  Assess data quality  Perform exploratory data analysis (EDA) (3) Data Preparation Phase  Cleanse, prepare, and transform data set  Prepares for modeling in subsequent phases  Select cases and variables appropriate for analysis Microsoft Enterprise ConsortiumPrepared by David Douglas, University of Arkansas 8

Microsoft Enterprise Consortium CRISP-DM (cont) (4) Modeling Phase  Select and apply one or more modeling techniques  Calibrate model settings to optimize results  If necessary, additional data preparation may be required (5) Evaluation Phase  Evaluate one or more models for effectiveness  Determine whether defined objectives achieved  Make decision regarding data mining results before deploying to field (6) Deployment Phase  Make use of models created  Simple deployment: generate report  Complex deployment: implement additional data mining effort in another department  In business, customer often carries out deployment based on model Microsoft Enterprise ConsortiumPrepared by David Douglas, University of Arkansas 9

Microsoft Enterprise Consortium Important Note  The Need for Human Direction Don’t be misled into believing that software can just automatically wonder around in the data and produce significant results. Automation is no substitute for human input. Humans need to be involved in every phase of the DM process. George Grinstein, U. of Mass. at Lowell puts it into perspective Imagine a black box capable of answering any question it is asked. Any question. Will this eliminate our need for human participation as may suggest? Quite the opposite. The fundamental problem still comes down to a human interface issue. How do I phrase the question correctly? How do I set the parameters to get the solution that is applicable in the particular case I am interested in? How do I get the results in reasonable time and in a form that I can understand? Note that all the questions connect the discovery process to me, for my human consumption. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 10

Microsoft Enterprise Consortium Four Fallacies of Data Mining (Louie Nautilus Systems, Inc.)  Fallacy 1 Set of tools can be turned loose on data repositories Finds answers to all business problems  Reality 1 No automatic data mining tools solve problems Rather, data mining is process (CRISP-DM) Integrates into overall business objectives  Fallacy 2 Data mining process is autonomous Requires little oversight  Reality 2 Requires significant intervention during every phase After model deployment, new models require updates Continuous evaluative measures monitored by analysts Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 11

Microsoft Enterprise Consortium Four Fallacies of Data Mining (Louie Nautilus Systems, Inc.)  Fallacy 3 Data mining quickly pays for itself  Reality 3 Return rates vary Depending on startup, personnel, data preparation costs, etc.  Fallacy 4 Data mining software easy to use  Reality 4 Ease of use varies across projects Analysts must combine subject matter knowledge with specific problem domain Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 12

Microsoft Enterprise Consortium Data Mining Tasks Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 13 Description Estimation Classification Prediction Clustering Affinity Analysis SupervisedDirected U nsupervised U ndirected Difference; target variable—numeric or categorical Difference between prediction and (classification and estimation) is future

Microsoft Enterprise Consortium Matching Data Mining Tasks to Data Mining Algorithms Estimation Multiple Linear Regression, Neural Networks Classification Decision Trees, Logistic Regression, Neural Networks, k-Nearest Neighbor Prediction Estimation & Classification for future values Clustering k-means, Kohonen Self Organizing Maps Affinity Analysis Association Analysis, sometimes referred to as Market Basket Analysis Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 14