Behrouz Minaei, William Punch

Slides:



Advertisements
Similar presentations
© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.
Advertisements

Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Data Mining Classification: Alternative Techniques
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.
Estimation of Distribution Algorithms Ata Kaban School of Computer Science The University of Birmingham.
Data Mining Techniques Outline
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Design of Curves and Surfaces by Multi Objective Optimization Rony Goldenthal Michel Bercovier School of Computer Science and Engineering The Hebrew University.
Genetic Algorithms Nehaya Tayseer 1.Introduction What is a Genetic algorithm? A search technique used in computer science to find approximate solutions.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
On the Application of Artificial Intelligence Techniques to the Quality Improvement of Industrial Processes P. Georgilakis N. Hatziargyriou Schneider ElectricNational.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Efficient Model Selection for Support Vector Machines
Mining Feature Importance: Applying Evolutionary Algorithms within a Web- based Educational System CITSA 2004, Orlando, July 2004 Behrouz Minaei-Bidgoli,
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Evolution Strategies Evolutionary Programming Genetic Programming Michael J. Watts
Genetic Algorithms Michael J. Watts
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Introduction to Evolutionary Algorithms Session 4 Jim Smith University of the West of England, UK May/June 2012.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions.
FINAL EXAM SCHEDULER (FES) Department of Computer Engineering Faculty of Engineering & Architecture Yeditepe University By Ersan ERSOY (Engineering Project)
DYNAMIC FACILITY LAYOUT : GENETIC ALGORITHM BASED MODEL
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Genetic Algorithms What is a GA Terms and definitions Basic algorithm.
Routing and Scheduling in Multistage Networks using Genetic Algorithms Advisor: Dr. Yi Pan Chunyan Ji 3/26/01.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Genetic algorithm. Definition The genetic algorithm is a probabilistic search algorithm that iteratively transforms a set (called a population) of mathematical.
Genetic Algorithms. Solution Search in Problem Space.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Evolutionary Computation Evolving Neural Network Topologies.
Purposeful Model Parameters Genesis in Simple Genetic Algorithms
Genetic Algorithm in TDR System
Genetic Algorithms.
Chapter 7. Classification and Prediction
Evolution Strategies Evolutionary Programming
USING MICROBIAL GENETIC ALGORITHM TO SOLVE CARD SPLITTING PROBLEM.
Boosted Augmented Naive Bayes. Efficient discriminative learning of
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Bulgarian Academy of Sciences
Rule Induction for Classification Using
A Comparison of Simulated Annealing and Genetic Algorithm Approaches for Cultivation Model Identification Olympia Roeva.
Perceptrons Lirong Xia.
An Introduction to Support Vector Machines
Machine Learning Today: Reading: Maria Florina Balcan
Gerd Kortemeyer, William F. Punch
Predicting Student Performance: An Application of Data Mining Methods with an Educational Web-based System FIE 2003, Boulder, Nov 2003 Behrouz Minaei-Bidgoli,
COSC 4335: Other Classification Techniques
Department of Electrical Engineering
EE368 Soft Computing Genetic Algorithms.
Boltzmann Machine (BM) (§6.4)
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Lecture 10 – Introduction to Weka
Review NNs Processing Principles in Neuron / Unit
A Tutorial (Complete) Yiming
Traveling Salesman Problem by Genetic Algorithm
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Perceptrons Lirong Xia.
Presentation transcript:

Behrouz Minaei, William Punch Using Genetic Algorithms for Data Mining Optimization in an Educational Web-based System GECCO 2003 Behrouz Minaei, William Punch Genetic Algorithm Research and Application Group Department of Computer Science and Engineering Michigan State University

Topics Problem Overview Classification Methods Weighting the features Classifiers Combination of Classifiers Weighting the features Using GA to choose best set of weights Experimental Results Conclusion & Next Steps 12/5/2018 GECCO 2003

Problem Overview This research is a part of the latest online educational system developed at Michigan State University (MSU), the Learning Online Network with Computer-Assisted Personalized Approach (LON-CAPA). In LON-CAPA, we are involved with two kinds of large data sets: Educational resources: web pages, demonstrations, simulations, individualized problems, quizzes, and examinations. Information about users who create, modify, assess, or use these resources. Find classes of students. Groups of students use these online resources in a similar way. Predict for any individual student to which class he/she belongs. 12/5/2018 GECCO 2003

Data Set: PHY183 SS02 227 students 12 Homework sets 184 Problems 12/5/2018 GECCO 2003

Class Labels (3-ways) 2-Classes 3-Classes 9-Classes 12/5/2018   3-Classes   9-Classes 12/5/2018 GECCO 2003

Extracted Features Total number of correct answers. (Success rate) Success at the first try Number of attempts to get answer Time spent until correct Total time spent on the problem Participating in the communication mechanisms 12/5/2018 GECCO 2003

Classifiers Genetic Algorithm (GA) Non-Tree Classifiers (Using MATLAB) Bayesian Classifier 1NN kNN Multi-Layer Perceptron Parzen Window Combination of Multiple Classifiers (CMC) Genetic Algorithm (GA) Decision Tree-Based Software C5.0 (RuleQuest <<C4.5<<ID3) CART (Salford-systems) QUEST (Univ. of Wisconsin) CRUISE [use an unbiased variable selection technique] 12/5/2018 GECCO 2003

GA Optimizer vs. Classifier Apply GA directly as a classifier Use GA as an optimization tool for resetting the parameters in other classifiers. Most application of GA in pattern recognition applies GA as an optimizer for some parameters in the classification process. Many researchers used GA in feature selection and feature extraction. GA is applied to find an optimal set of feature weights that improve classification accuracy. Download GA Toolbox for MATLAB from: http://www.shef.ac.uk/~gaipp/ga-toolbox/ 12/5/2018 GECCO 2003

Fitness/Evaluation Function 5 classifiers: Multi-Layer Perceptron 2 Minutes Bayesian Classifier 1NN kNN Parzen Window CMC 3 seconds Divide data into training and test sets (10-fold Cross-Val) Fitness function: performance achieved by classifier 12/5/2018 GECCO 2003

Individual Representation The GA Toolbox supports binary, integer and floating-point chromosome representations. Chrom = crtrp(NIND, FieldDR) creates a random real-valued matrix of N x d, NIND specifies the number of individuals in the population FieldDR is a matrix of size 2 x d and contains the boundaries of each variable of an individual. FieldDR = [ 0 0 0 0 0 0; % lower bound 1 1 1 1 1 1]; % upper bound Chrom = 0.23 0.17 0.95 0.38 0.06 0.26 0.35 0.09 0.43 0.64 0.20 0.54 0.50 0.10 0.09 0.65 0.68 0.46 0.21 0.29 0.89 0.48 0.63 0.89 12/5/2018 GECCO 2003

GA Parameters GGAP = 0.3 XOVR = 0.7 MUTR = 1/600 MAXGEN = 500 NIND = 1000 SEL_F = ‘sus’ XOV_F = ‘recint’ MUT_F = ‘mutbga’ OBJ_F = ‘ObjFn2’ / ‘ObjFn3’ / ‘ObjFn9’ 12/5/2018 GECCO 2003

Simple GA % Create population Chrom = crtrp(NIND, FieldD); % Evaluate initial population, ObjV = ObjFn2(data, Chrom); gen = 0; while gen < MAXGEN, % Assign fitness-value to entire population FitnV = ranking(ObjV); % Select individuals for breeding SelCh = select(SEL_F, Chrom, FitnV, GGAP); % Recombine selected individuals (crossover) SelCh = recombin(XOV_F, SelCh,XOVR); % Perform mutation on offspring SelCh = mutate(MUT_F, SelCh, FieldD, MUTR); % Evaluate offspring, call objective function ObjVSel = ObjFn2(data, SelCh); % Reinsert offspring into current population [Chrom ObjV]=reins(Chrom,SelCh,1,1,ObjV,ObjVSel); gen = gen+1; MUTR = MUTR+0.001; end 12/5/2018 GECCO 2003

Selection - Stochastic Universal Sampling A form of stochastic universal sampling is implemented by obtaining a cumulative sum of the fitness vector, FitnV, and generating N equally spaced numbers between 0 and sum (FitnV). Only one random number is generated, all the others used being equally spaced from that point. The index of the individuals selected is determined by comparing the generated numbers with the cumulative sum vector. The probability of an individual being selected is then given by 12/5/2018 GECCO 2003

Crossover OldChrom = [0.23 0.17 0.95 0.38 0.82 0.19; % parent1 Intermediate Recombination: NewChrom = recint(OldChrom) New values are produced by adding the scaled difference between the parent values to the first parent. An internal table of scaling factors, Alpha, is produced, e.g. Alpha = [0.13 0.50 0.32 0.16 0.23 0.06; % for offspring1 0.12 0.54 0.44 0.26 0.27 0.10] % for offspring2 NewChrom = [ 0.40 0.92 0.86 0.33 0; % Alpha(1,:) parent1&2 0.11 0.59 0.98 0.04]% Alpha(2,:) parent1&2 12/5/2018 GECCO 2003

Mutation OldChrom = [0.23 0.81 0.17 0.66 0.28 0.30; 0.43 0.96 025 0.64 0.10 0.23] NewChrom = mutbga(OldChrom, FieldDR, [0.02 1.0]); mutbga produces an internal mask table, MutMx = [0 0 0 1 0 1; 0 0 1 0 0 0] An second internal table, delta, specifies the normalized mutation step size, delta = [0.02 0.02 0.02 0.02 0.02 0.02; 0.01 0.01 0.01 0.01 0.01 0.01] NewChrom =0.23 0.81 0.17 0.66 0.28 0.30 0.52 0.96 0.25 0.64 0.68 0.39 NewChrom - OldChrom shows the mutation steps = 0 0 0 4.61 0 0 -7.51 -5.01 12/5/2018 GECCO 2003

Results without GA Performance % Classifier 2-Classes 3-Classes Tree Classifier C5.0 80.3 56.8 25.6 CART 81.5 59.9 33.1 QUEST 80.5 57.1 20.0 CRUISE 81.0 54.9 22.9 Non-tree Classifier Bayes 76.4 48.6 23.0 1NN 76.8 50.5 29.0 kNN 82.3 50.4 28.5 Parzen 75.0 48.1 21.5 MLP 79.5 50.9 - CMC 86.8 70.9 51.0 12/5/2018 GECCO 2003

Results of using GA 12/5/2018 GECCO 2003

GA Optimization Results 12/5/2018 GECCO 2003

GA Optimization Results 12/5/2018 GECCO 2003

GA Optimization Results 12/5/2018 GECCO 2003

Summary Four classifiers used to segregate the students A combination of multiple classifiers leads to a significant accuracy improvement in all 3 cases. Weighted the features and used genetic algorithm to minimize the error rate. Genetic Algorithm could improve the prediction accuracy more than 10% in the case of 2 and 3-Classes and more than 12% in the case of 9-Classes. 12/5/2018 GECCO 2003

Next Steps Using Evolutionary Computation for extracting the features to find the optimal solution in LON-CAPA Data Mining Using Genetic programming to extract new features and improve prediction accuracy Apply EA to find the Frequency Dependency and Association Rules among the groups of problems (Mathematical, Optional Response, Numerical, Java Applet, etc) 12/5/2018 GECCO 2003

Questions 12/5/2018 GECCO 2003