Associative Classification (AC) Mining for A Personnel Scheduling Problem Fadi Thabtah.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Data Mining Classification: Alternative Techniques
Mining Association Rules from Microarray Gene Expression Data.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Data Mining Techniques Association Rule
From Decision Trees To Rules
Imbalanced data David Kauchak CS 451 – Fall 2013.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Learning Fuzzy Association Rules and Associative Classification Rules Jianchao Han Computer Science Department California State University Dominguez Hills.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.
Decision Tree Algorithm
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Data Mining Adrian Tuhtan CS157A Section1.
Research Project Mining Negative Rules in Large Databases using GRD.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult.
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Basic Data Mining Techniques
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Inductive learning Simplest form: learn a function from examples
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
ISQS 7342 Dr. zhangxi Lin By: Tej Pulapa. DT in Forecasting Targeted Marketing - Know before hand what an online customer loves to see or hear about.
Data Mining and Decision Support
Elsayed Hemayed Data Mining Course
Academic Year 2014 Spring Academic Year 2014 Spring.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining Functionalities
By Arijit Chatterjee Dr
Data Mining Association Analysis: Basic Concepts and Algorithms
Rule Induction for Classification Using
Association Rules.
Waikato Environment for Knowledge Analysis
Adrian Tuhtan CS157A Section1
Market Basket Analysis and Association Rules
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Transactional data Algorithm Applications
Discriminative Frequent Pattern Analysis for Effective Classification
Market Basket Analysis and Association Rules
Presentation transcript:

Associative Classification (AC) Mining for A Personnel Scheduling Problem Fadi Thabtah

Trainer scheduling problem Schedule Courses (events) Resources Locations Staff (trainers) Timeslots

Trainer scheduling problem Assigning a number of training courses (events) to a limited number of training staff, locations, and timeslots Each course has a numerical priority value Each trainer is penalised depending on the travel distance

Objective Function MAX Total priority for scheduled events Total penalty for training staff MAX

Hyperheuristic approach Operates at a higher level of abstraction than metaheuristics You may think of it as a supervisor that manages the choice of simple local search neighbourhoods (low-level heuristics) at any time

Low-level heuristics Problem-oriented Represent simple methods used by human experts Easy to implement Examples: Add new event to the schedule Swap two events in the schedule Replace one event in the schedule by another

Hyperheuristic Current solution Low Level Heuristic 1 Low Level Heuristic 2 Low Level Heuristic 3 Perturbed solution

Building a Schedule using A hyperheuristic Initial solution Hyperheuristic algorithm Set of low-level heuristics Selected low-level heuristic Perturbed solution CPU time Objective value Current solution (according to acceptance criterion)

Advantages of hyperheuristics Cheap and fast to implement Produce solutions of good quality (comparable to those obtained by hard-to-implement metaheuristic methods) Require limited domain-specific knowledge Robustness: can be effectively applied to a wide range of problems and problem instances

Current Hyperheuristics Approaches Simple hyperheuristics (Cowling et al., 2001-2002) Choice-function-based (Cowling et al., 2001 – 2002) Based on genetic algorithms (Cowling et al., 2002; Han et al., 2002) Hybrid Hyperheuristics. (Cowling, Chakhlevitch 2003-2004)

Why Data Mining Scenario: While constructing the solution of the scheduling problem, the hyperheuristic manages the choice of appropriate LLH in each choice point, therefore an expert decision maker is needed (Classification). Two approaches: Learn the performance of LLH from past schedules to predict appropriate LLH in current one While constructing schedule learn and predict LLH Or what so called, Learn “On-the-fly”

Classification Algorithm Classification : A Two-Step Process 1. Classifier building: Describing a set of predetermined classes 2. Classifier usage: Calculate error rate If Error rate is acceptable, then apply the classifier to test data Classification Algorithm Training Data Class/ LLH Test Data RowIds A1 A2 1 x1 y1 c1 2 y2 c2 3 4 5 x2 6 7 y3 8 9 y4 10 x3 RowId A1 A2 Class 1 x1 y1 2 x2 y4 3 Classification Rules

Learning the Performance of LLH (Hyperheuristic Solution) Applied K times llh oldpriority newpriority oldpenalty newpenalty applied 1 71954 72054 793 790 2 20 71054 761 27 37 43 47 58 68 74 Data Mining Techniques Produce Derived Hyperheuristic Algorithm Guide Rules Set (If/Then)

Association Rules Mining Advantages: Items shelving Sales promotions Future planning Strong tool that aims to find relationships between variables in a database. Its applied widely especially in market basket analysis in order to infer items from the presence of other items in the customer’s shopping cart Example : if a customer buys milk, what is the probability that he/she buys cereal as well? Unlike classification, the target class is not pre-specified in association rule mining. Transactional Database Transaction Id Items Time 12 bread, milk, juice 10:12 13 bread, juice, milk 12:13 14 milk, beer, bread, juice 13:22 15 bread, eggs, milk 13:26 16 beer, basket, bread, juice 15:11

Associative Classification (AC) Special case of association rule that considers only the class label as a consequent of a rule. Derive a set of class association rules from the training data set which satisfy certain user-constraints, i.e support and confidence thresholds. To discover the correlations between objects and class labels. Ex: CBA CPAR CMAR

Attribute values that pass support threshold Class Association Rules AC Steps Training Data Associative classification Algorithm Frequent Ruleitems: Attribute values that pass support threshold user Class Association Rules

Rule support and confidence Given a training data set T, for a rule The support of R, denoted as sup(R) , is the number of objects in T matching R condition and having a class label c The confidence of R , denoted as conf(R), is the the number of objects matching R condition and having class label c over the number of objects matching R condition Any Item has a support larger than the user minimum support is called frequent itemset

Current Developed Techniques MCAR (Thabtah et al., Pceeding of the 3rd IEEE International Conference on Computer Systems and Applications (pp. 1-7) MMAC (Thabtah, et al., Journal of Knowledge and Information System (2006)00:1-21. MCAR Characteristics: Combinations of two general data mining approaches, i.e. (association rule, classification) Suitable for traditional classification problems Employs a new method of finding the rules MMACC characteristics: Produces classifiers of the form: that are suitable to not only traditional binary classification problems but also useful to multi-class labels problems such as Medical Diagnoses and Text Classification. Presents three Evaluation Accuracy measures

Data and Experiments Supp=5%, confidence=40% Learning Approach : Learn the performance of LLH from past schedules to predict appropriate LLH in current one Supp=5%, confidence=40% Number of datasets : 12-16 UCI data and 9 solutions Of the training scheduling problem Algorithms used: CBA (AC algorithm) MMAC (AC algorithm) Decision Tree algorithms (C4.5) Covering algorithms (RIPPER) Hybrid Classification algorithm (PART)

Relative prediction accuracy in term of PART for the Accuracy Measures of MMAC algorithm

Relative prediction accuracy in term of CBA for the Accuracy Measures of MMAC algorithm

Number of Rules of CBA, PART and Top-label

Accuracy (%) for PART, RIPPER, CBA and MMAC on UCI data sets

Comparison between AC algorithms on 12 UCI data sets

MCAR vs. CBA and C4.5 On UCI data sets Classification Accuracy % Classifier Number of Rules Dataset MCAR CBA C4.5 Tic-tac 100 83.61 26 25 95 Balloon 3 Contact 93.33 83.33 9 6 4 Led7 72.32 71.1 73.34 192 50 37 Breast-cancer 71.61 69.66 75.52 71 45 Weather 5 Heart-c 80.4 79.87 78.12 72 44 12 Heart-s 81.35 79.2 81.29 31 22 2 Lymph 78.5 75.09 83.78 48 38 Mushroom 97.65 94.18 33 primary-tumour 40.5 25.47 42.47 28 1 23 Vote 90.1 86.91 88.27 84 40 CRX 83.05 85.31 80.72 97 43 54 Sick 93.88 93.9 93.87 17 10 Credit-Card 70.26 70.4 71.8 162 116

Conclusions Associative classification is a promising approach in data mining Since more than LLHs could improve the objective function in the hyperheuristic, we need a multi-label rules in the classifier Associative classifiers produce more accurate classification models than traditional classification algorithms such as decision trees and rule induction approaches One challenge in associative classification is the exponential growth of rules, therefore pruning becomes essential

Future Work Constructing a hyperheuristic approach for the personnel scheduling problem Investigating the use of multi-class labels classification algorithms with a hyperheuristic Implementing of a new data mining techniques based on dynamic learning suitable for scheduling and optimization problem. Investigate rule pruning in AC mining

Questions ?