1 Data Mining Functionalities / Data Mining Tasks Concepts/Class Description Concepts/Class Description Association Association Classification Classification.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Decision Tree Approach in Data Mining
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Classification Techniques: Decision Tree Learning
Chapter 16 Parallel Data Mining 16.1From DB to DW to DM 16.2Data Mining: A Brief Overview 16.3Parallel Association Rules 16.4Parallel Sequential Patterns.
Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Concept Description and Data Generalization (baseado nos slides do livro: Data Mining: C & T)
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Three kinds of learning
6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 1 Data Mining I Jagdish Gangolly State University of New York at Albany.
Data Mining By Archana Ketkar.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Chapter 5 Data mining : A Closer Look.
Business Intelligence
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Understanding Data Analytics and Data Mining Introduction.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
1 COMP3503 Inductive Decision Trees with Daniel L. Silver Daniel L. Silver.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
CS690L Data Mining: Classification
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Chapter 16 Social Statistics. Chapter Outline The Origins of the Elaboration Model The Elaboration Paradigm Elaboration and Ex Post Facto Hypothesizing.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Data Mining and Decision Support
Decision Tree Algorithms Rule Based Suitable for automatic generation.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
CIS 335 CIS 335 Data Mining Classification Part I.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CSE 4705 Artificial Intelligence
Data Mining Functionalities
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction to Data Mining, 2nd Edition by
Data Mining II: Association Rule mining & Classification
The Nature of Probability and Statistics
Data Mining Concept Description
Introduction to Data Mining, 2nd Edition by
Supporting End-User Access
Data Mining: Characterization
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

1 Data Mining Functionalities / Data Mining Tasks Concepts/Class Description Concepts/Class Description Association Association Classification Classification Clustering Clustering

2 Mining Concept/Class Description

3 Objective It describes a given set of data in a concise and summarative manner, presenting interesting general properties of the data It describes a given set of data in a concise and summarative manner, presenting interesting general properties of the data  data generalisation  data generalisation  Characterization & Comparison  Characterization & Comparison

4 Data Generalisation-Based Characterisation Example: Example: Summer season sales Strategy -> item_ID, name, brand, category, supplier, price Summarising a large set of items relating to Summer season Abstract a large set of data in database from relatively low-conceptual level to higher-conceptual level Abstract a large set of data in database from relatively low-conceptual level to higher-conceptual level

5 Method/Approach: Attribute-Oriented Induction General Process: General Process:  collect the task relevant data  perform generalization based on the examination of the distinct values

6 Attribute removal: Attribute removal:  there is no generalization operator, OR  there is no generalization operator, OR  its higher-level concepts are expressed in terms of other attributes Attribute generalization Attribute generalization  there exists a set of generalisation operators on attribute

7 Problems/Issue how large ‘ a large set of distinct values for an attribute’ is considered how large ‘ a large set of distinct values for an attribute’ is considered  attribute generalisation threshold if the number of distinc value in attribute is greater than the threshold, then further att.removal or generalisation should be performed

8  generalisation relation threshold sets threshold for the generalisation relation. if the number of distinct valuegreater than the threshold, further generalisation should be performed. Otherwise, no generalisation should be performed  drilling down, rolling up

9 Specifying attributes, too many or too small Specifying attributes, too many or too small  measure of attribute relevance analysis  measure of attribute relevance analysis  to identify irrelevant or weakly relevant attributes that can be excluded from concept description process.

10 Comparisaon: Discriminating Between Different Classes It mines descriptions that distinguish a target class from its contrasting classes It mines descriptions that distinguish a target class from its contrasting classes General process: General process:  generalisation is performed synchronously among all the class compared

11 Topics: Topics: J.Han, Y.Fu. “Exploration of the power of attribute-oriented induction in data mining, Advances in Knowledge Discovery and Data Mining, 1996J.Han, Y.Fu. “Exploration of the power of attribute-oriented induction in data mining, Advances in Knowledge Discovery and Data Mining, 1996 S.Chaudhuri and U.Dayal. “ An overview of datawarehousing and OLAP technology, ACM SIGMOD Record 26, 1997S.Chaudhuri and U.Dayal. “ An overview of datawarehousing and OLAP technology, ACM SIGMOD Record 26, 1997

12 Basic Technique Decision Tree Induction Decision Tree Induction  internal node  branch  leaf node Algorithm: ID3, C45 Algorithm: ID3, C45

13 Problems/Issues: Problems/Issues: Selecting attribute to be tested  attribute selection measure Overfitting data  tree pruning

14 Bayessian Classification Bayessian Classification it is a statistical classifierit is a statistical classifier it can predicts class membership probabilitiesit can predicts class membership probabilities based on Bayes theorembased on Bayes theorem

15 Bayessian Belief Network Provide a graphical model of causal relationship Provide a graphical model of causal relationship Joint conditional probability distributionJoint conditional probability distribution Called: bayessian network, belief network, probabilistic networkCalled: bayessian network, belief network, probabilistic network Component: Component: Directed Acyclic Graph (DAG)Directed Acyclic Graph (DAG) Conditional Probablity Table (CPT)Conditional Probablity Table (CPT)

16

17

18 Prediction It is used to predict continuous values as prediction It is used to predict continuous values as prediction Approach: Regression Techniques Approach: Regression Techniques Linear & Multiple RegressionLinear & Multiple Regression Non-linear RegressionNon-linear Regression

19 Problems/Issues Estimating Classifier Accuracy Estimating Classifier Accuracy  effectiveness methods for estimating classifier accuracy  effectiveness methods for estimating classifier accuracy  k-fold cross-validation, sensitivity, specificity  k-fold cross-validation, sensitivity, specificity