Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.

Slides:



Advertisements
Similar presentations
Data Mining Tools Overview Business Intelligence for Managers.
Advertisements

Supporting End-User Access
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Chapter 2. Introduction to Data Mining
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
MP3 / MD740 Strategy & Information Systems Oct. 13, 2004 Databases & the Data Asset, Types of Information Systems, Artificial Intelligence.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Building Knowledge-Driven DSS and Mining Data
1 An Excel-based Data Mining Tool Chapter The iData Analyzer.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
6/22/2006 DATA MINING I. Definition & Business-Related Examples Mohammad Monakes Fouad Alibrahim.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
CS-470: Data Mining Fall Organizational Details Class Meeting: 4:00-6:45pm, Tuesday, Room SCIT215 Instructor: Dr. Igor Aizenberg Office: Science.
Basic Data Mining Techniques
Data Mining Techniques
 BA_EM Electronic Marketing – Pavel
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Introduction To Data Mining. What Is Data Mining? A toolA tool Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Information systems and management in business Chapter 8 Business Intelligence (BI)
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
MIS2502: Data Analytics Advanced Analytics - Introduction.
Academic Year 2014 Spring Academic Year 2014 Spring.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
MIS 451 Building Business Intelligence Systems
CH. 1: Introduction 1.1 What is Machine Learning Example:
Adrian Tuhtan CS157A Section1
An Excel-based Data Mining Tool
Data Analysis.
Supporting End-User Access
Presentation transcript:

Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent Systems 1

Data Mining: A First View Chapter 1 2

1.1 Data Mining: A Definition The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data. 3

Induction-based Learning The process of forming general concept definitions by observing specific examples of concepts to be learned. –Many televised golf tournaments are sponsored by online brokerage firms –Advertise rap music in magazines for senior citizens –Suspect a stolen credit card 4

Knowledge Discovery in Databases (KDD) The application of the scientific method to data mining. Data mining is one step of the KDD process. 5

1.2 What Can Computers Learn? 6

Four Levels of Learning Facts –Sea is blue Concepts –Trees, rules, networks, and mathematical equations Procedures –A step-by-step course of action to achieve a goal Principles –General truths or laws 7

Concepts Computers are good at learning concepts. Concepts are the output of a data mining session. 8 Three concept views Classical view Probabilistic view Exemplar view

Classical View All concepts have definite defining properties IF Annual Income >= 30,000 & Years at Current Position >= 5 & Owns Home = True THEN Good Credit Risk = True 9

Probabilistic View Represented by properties that are probable of concept members The majority of good credit risks own their own home 10

Exemplar View A given instance is determined to be an example of a particular concept Good credit risks example Annual Income = 32,000 Number of Years at Current Position = 6 Homeowner 11

Supervised Learning Build a learner model using data instances of known origin. Use the model to determine the outcome new instances of unknown origin. 12

Decision Tree A tree structure where nonterminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes. 13

14

15

16

Production Rules IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No THEN Diagnosis = Allergy 17

Unsupervised Clustering A data mining method that builds models from data without predefined classes. 18

19

Question Can I develop a general profile of an online investor? Can I determine if a new customer who does not initially open a margin account is likely to do so in the future Can I build a model able to accurately predict the average number of trades per month for a new investor? What characteristics differentiate female and male investors? 20

Candidate questions for unsupervised clustering What attribute similarities group customers? What differences in attribute values segment the customer database? 21

Three Clusters IF (Conditions) Margin Account=Yes & Age=20-29& Annual Income=40-59K THEN Cluster=1 Accuracy=0.8, Coverage= Accuracy => rule confidence for all instances EX. This rule will be erroneous in 20% Coverage => rule significance for the cluster 50% in the cluster satisfy the conditions

Other two rules IF Account Type=Custodial & Favorite Recreation=Skiing & Annual Income = 80-90K THEN Cluster=2 Accuracy=0.95, coverage=0.35 IF Account Type=Joint & Trades/Month>5 & Transaction Method=Online THEN Cluster=3 Accuracy=0.82, coverage=

1.3 Is Data Mining Appropriate for My Problem? 24

Data Mining or Data Query? Shallow Knowledge –Is factual Multidimensional Knowledge –Is factual and stored in a multidimensional format Hidden Knowledge –Patterns or regularities Deep Knowledge –Need some direction to find it 25

Data Mining vs. Data Query Use data query if you already almost know what you are looking for. Use data mining to find regularities in data that are not obvious. 26

1.4 Expert Systems or Data Mining? 27

Expert System A computer program that emulates the problem-solving skills of one or more human experts. 28

Knowledge Engineer A person trained to interact with an expert in order to capture their knowledge. 29

30

1.5 A Simple Data Mining Process Model 31 Assembling the Data Mining the Data Interpreting the Results Result Application

32

Assembling the Data The Data Warehouse –Only data useful for decision support is extracted from the operational environment Relational Databases and Flat Files 33

Mining the Data 34 Supervised learning or unsupervised? Which instances will be used? Which attributes will be selected? Setting learning parameter

Interpreting the Results 35 If the results are less than optimal we can repeat the data mining step using new attributes and/or instances

Result Application 36 apply what has been discovered to new situations –Baby diapers and beer

1.6 Why Not Simple Search? Nearest Neighbor Classifier K-nearest Neighbor Classifier Problem: Computation times Differentiating between relevant and irrelevant attributes Which attributes are able to differentiate the classes 37

1.7 Data Mining Applications 38 Fraud Detection Health Care Business and Finance Scientific Applications Sports and Gaming

Customer Intrinsic Value 39 Customer ’ s expected value based on the historical value of similar customers. Once it is determined, an appropriate marketing strategy can be applied

40