Presentation is loading. Please wait.

Presentation is loading. Please wait.

Associative Classification (AC) Mining for A Personnel Scheduling Problem Fadi Thabtah.

Similar presentations


Presentation on theme: "Associative Classification (AC) Mining for A Personnel Scheduling Problem Fadi Thabtah."— Presentation transcript:

1 Associative Classification (AC) Mining for A Personnel Scheduling Problem
Fadi Thabtah

2 Trainer scheduling problem
Schedule Courses (events) Resources Locations Staff (trainers) Timeslots

3 Trainer scheduling problem
Assigning a number of training courses (events) to a limited number of training staff, locations, and timeslots Each course has a numerical priority value Each trainer is penalised depending on the travel distance

4 Objective Function MAX Total priority for scheduled events
Total penalty for training staff MAX

5 Hyperheuristic approach
Operates at a higher level of abstraction than metaheuristics You may think of it as a supervisor that manages the choice of simple local search neighbourhoods (low-level heuristics) at any time

6 Low-level heuristics Problem-oriented
Represent simple methods used by human experts Easy to implement Examples: Add new event to the schedule Swap two events in the schedule Replace one event in the schedule by another

7 Hyperheuristic Current solution Low Level Heuristic 1 Low Level Heuristic 2 Low Level Heuristic 3 Perturbed solution

8 Building a Schedule using A hyperheuristic
Initial solution Hyperheuristic algorithm Set of low-level heuristics Selected low-level heuristic Perturbed solution CPU time Objective value Current solution (according to acceptance criterion)

9 Advantages of hyperheuristics
Cheap and fast to implement Produce solutions of good quality (comparable to those obtained by hard-to-implement metaheuristic methods) Require limited domain-specific knowledge Robustness: can be effectively applied to a wide range of problems and problem instances

10 Current Hyperheuristics Approaches
Simple hyperheuristics (Cowling et al., ) Choice-function-based (Cowling et al., 2001 – 2002) Based on genetic algorithms (Cowling et al., 2002; Han et al., 2002) Hybrid Hyperheuristics. (Cowling, Chakhlevitch )

11 Why Data Mining Scenario:
While constructing the solution of the scheduling problem, the hyperheuristic manages the choice of appropriate LLH in each choice point, therefore an expert decision maker is needed (Classification). Two approaches: Learn the performance of LLH from past schedules to predict appropriate LLH in current one While constructing schedule learn and predict LLH Or what so called, Learn “On-the-fly”

12 Classification Algorithm
Classification : A Two-Step Process 1. Classifier building: Describing a set of predetermined classes 2. Classifier usage: Calculate error rate If Error rate is acceptable, then apply the classifier to test data Classification Algorithm Training Data Class/ LLH Test Data RowIds A1 A2 1 x1 y1 c1 2 y2 c2 3 4 5 x2 6 7 y3 8 9 y4 10 x3 RowId A1 A2 Class 1 x1 y1 2 x2 y4 3 Classification Rules

13 Learning the Performance of LLH
(Hyperheuristic Solution) Applied K times llh oldpriority newpriority oldpenalty newpenalty applied 1 71954 72054 793 790 2 20 71054 761 27 37 43 47 58 68 74 Data Mining Techniques Produce Derived Hyperheuristic Algorithm Guide Rules Set (If/Then)

14 Association Rules Mining
Advantages: Items shelving Sales promotions Future planning Strong tool that aims to find relationships between variables in a database. Its applied widely especially in market basket analysis in order to infer items from the presence of other items in the customer’s shopping cart Example : if a customer buys milk, what is the probability that he/she buys cereal as well? Unlike classification, the target class is not pre-specified in association rule mining. Transactional Database Transaction Id Items Time 12 bread, milk, juice 10:12 13 bread, juice, milk 12:13 14 milk, beer, bread, juice 13:22 15 bread, eggs, milk 13:26 16 beer, basket, bread, juice 15:11

15 Associative Classification (AC)
Special case of association rule that considers only the class label as a consequent of a rule. Derive a set of class association rules from the training data set which satisfy certain user-constraints, i.e support and confidence thresholds. To discover the correlations between objects and class labels. Ex: CBA CPAR CMAR

16 Attribute values that pass support threshold Class Association Rules
AC Steps Training Data Associative classification Algorithm Frequent Ruleitems: Attribute values that pass support threshold user Class Association Rules

17 Rule support and confidence
Given a training data set T, for a rule The support of R, denoted as sup(R) , is the number of objects in T matching R condition and having a class label c The confidence of R , denoted as conf(R), is the the number of objects matching R condition and having class label c over the number of objects matching R condition Any Item has a support larger than the user minimum support is called frequent itemset

18 Current Developed Techniques
MCAR (Thabtah et al., Pceeding of the 3rd IEEE International Conference on Computer Systems and Applications (pp. 1-7) MMAC (Thabtah, et al., Journal of Knowledge and Information System (2006)00:1-21. MCAR Characteristics: Combinations of two general data mining approaches, i.e. (association rule, classification) Suitable for traditional classification problems Employs a new method of finding the rules MMACC characteristics: Produces classifiers of the form: that are suitable to not only traditional binary classification problems but also useful to multi-class labels problems such as Medical Diagnoses and Text Classification. Presents three Evaluation Accuracy measures

19 Data and Experiments Supp=5%, confidence=40%
Learning Approach : Learn the performance of LLH from past schedules to predict appropriate LLH in current one Supp=5%, confidence=40% Number of datasets : UCI data and 9 solutions Of the training scheduling problem Algorithms used: CBA (AC algorithm) MMAC (AC algorithm) Decision Tree algorithms (C4.5) Covering algorithms (RIPPER) Hybrid Classification algorithm (PART)

20 Relative prediction accuracy in term of PART for the Accuracy Measures of MMAC algorithm

21 Relative prediction accuracy in term of CBA for the Accuracy Measures of MMAC algorithm

22 Number of Rules of CBA, PART and Top-label

23 Accuracy (%) for PART, RIPPER, CBA and MMAC on UCI data sets

24 Comparison between AC algorithms on 12 UCI data sets

25 MCAR vs. CBA and C4.5 On UCI data sets
Classification Accuracy % Classifier Number of Rules Dataset MCAR CBA C4.5 Tic-tac 100 83.61 26 25 95 Balloon 3 Contact 93.33 83.33 9 6 4 Led7 72.32 71.1 73.34 192 50 37 Breast-cancer 71.61 69.66 75.52 71 45 Weather 5 Heart-c 80.4 79.87 78.12 72 44 12 Heart-s 81.35 79.2 81.29 31 22 2 Lymph 78.5 75.09 83.78 48 38 Mushroom 97.65 94.18 33 primary-tumour 40.5 25.47 42.47 28 1 23 Vote 90.1 86.91 88.27 84 40 CRX 83.05 85.31 80.72 97 43 54 Sick 93.88 93.9 93.87 17 10 Credit-Card 70.26 70.4 71.8 162 116

26 Conclusions Associative classification is a promising approach in data mining Since more than LLHs could improve the objective function in the hyperheuristic, we need a multi-label rules in the classifier Associative classifiers produce more accurate classification models than traditional classification algorithms such as decision trees and rule induction approaches One challenge in associative classification is the exponential growth of rules, therefore pruning becomes essential

27 Future Work Constructing a hyperheuristic approach for the personnel scheduling problem Investigating the use of multi-class labels classification algorithms with a hyperheuristic Implementing of a new data mining techniques based on dynamic learning suitable for scheduling and optimization problem. Investigate rule pruning in AC mining

28 Questions ?


Download ppt "Associative Classification (AC) Mining for A Personnel Scheduling Problem Fadi Thabtah."

Similar presentations


Ads by Google