Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.

Slides:



Advertisements
Similar presentations
Genetic Algorithms.
Advertisements

The story beyond Artificial Immune Systems Zhou Ji, Ph.D. Center for Computational Biology and Bioinformatics Columbia University Wuhan, China 2009.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Genetic Algorithms Representation of Candidate Solutions GAs on primarily two types of representations: –Binary-Coded –Real-Coded Binary-Coded GAs must.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Genetic Algorithms Genetic Algorithms (Gas) are inspired by ideas from biological evolution. Like SAs the starting point is a random poor quality solution,
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Data Mining CS 341, Spring 2007 Genetic Algorithm.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Genetic Algorithm for Variable Selection
Genetic Algorithms Learning Machines for knowledge discovery.
Chapter 14 Genetic Algorithms.
Data Mining By Archana Ketkar.
Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult.
K-means clustering CS281B Winter02 Yan Wang and Lihua Lin.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Genetic Algorithm.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
An Approach of Artificial Intelligence Application for Laboratory Tests Evaluation Ş.l.univ.dr.ing. Corina SĂVULESCU University of Piteşti.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
An Introduction to Artificial Intelligence and Knowledge Engineering N. Kasabov, Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering,
Chapter 1 Introduction to Data Mining
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
Lecture 8: 24/5/1435 Genetic Algorithms Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
GENETIC ALGORITHMS FOR THE UNSUPERVISED CLASSIFICATION OF SATELLITE IMAGES Ankush Khandelwal( ) Vaibhav Kedia( )
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Fuzzy Genetic Algorithm
1 Machine Learning: Lecture 12 Genetic Algorithms (Based on Chapter 9 of Mitchell, T., Machine Learning, 1997)
1 Chapter 14 Genetic Algorithms. 2 Chapter 14 Contents (1) l Representation l The Algorithm l Fitness l Crossover l Mutation l Termination Criteria l.
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
EE459 I ntroduction to Artificial I ntelligence Genetic Algorithms Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Genetic Algorithms. Evolutionary Methods Methods inspired by the process of biological evolution. Main ideas: Population of solutions Assign a score or.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1 Genetic Algorithms K.Ganesh Introduction GAs and Simulated Annealing The Biology of Genetics The Logic of Genetic Programmes Demo Summary.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
 Based on observed functioning of human brain.  (Artificial Neural Networks (ANN)  Our view of neural networks is very simplistic.  We view a neural.
Genetic Algorithms Genetic algorithms provide an approach to learning that is based loosely on simulated evolution. Hypotheses are often described by bit.
Chapter 12 FUSION OF FUZZY SYSTEM AND GENETIC ALGORITHMS Chi-Yuan Yeh.
EE749 I ntroduction to Artificial I ntelligence Genetic Algorithms The Simple GA.
Biologically inspired algorithms BY: Andy Garrett YE Ziyu.
Waqas Haider Bangyal 1. Evolutionary computing algorithms are very common and used by many researchers in their research to solve the optimization problems.
Chapter 9 Genetic Algorithms Evolutionary computation Prototypical GA
Neural Networks And Its Applications By Dr. Surya Chitra.
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Genetic Algorithms. Underlying Concept  Charles Darwin outlined the principle of natural selection.  Natural Selection is the process by which evolution.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Genetic Algorithms. Solution Search in Problem Space.
EVOLUTIONARY SYSTEMS AND GENETIC ALGORITHMS NAME: AKSHITKUMAR PATEL STUDENT ID: GRAD POSITION PAPER.
Genetic Algorithm(GA)
George Yauneridge.  Machine learning basics  Types of learning algorithms  Genetic algorithm basics  Applications and the future of genetic algorithms.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
CLUSTERING EE Class Presentation. TOPICS  Clustering basic and types  K-means, a type of Unsupervised clustering  Supervised clustering type.
Genetic Algorithms.
Evolutionary Algorithms Jim Whitehead
Prepared by: Mahmoud Rafeek Al-Farra
Basic concepts of Data Mining, Clustering and Genetic Algorithms
Presentation transcript:

Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo

Data Mining Motivation Mechanical production of data need for mechanical consumption of data Large databases = vast amounts of information Difficulty lies in accessing it

KDD and Data Mining KDD: Extraction of knowledge from data –“non-trivial extraction of implicit, previously unknown & potentially useful knowledge from data” Data Mining: Discovery stage of the KDD process

Data Mining Techniques Query tools Statistical techniques Visualization On-line analytical processing (OLAP) Clustering Classification Decision trees Association rules Neural networks Genetic algorithms Any technique that helps to extract more out of data is useful

What’s Clustering Clustering is a kind of unsupervised learning. Clustering is a method of grouping data that share similar trend and patterns. Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. –Example: Thus, we see clustering means grouping of data or dividing a large data set into smaller data sets of some similarity. After clustering:

The usage of clustering Some engineering sciences such as pattern recognition, artificial intelligence have been using the concepts of cluster analysis. Typical examples to which clustering has been applied include handwritten characters, samples of speech, fingerprints, and pictures. In the life sciences (biology, botany, zoology, entomology, cytology, microbiology), the objects of analysis are life forms such as plants, animals, and insects. The clustering analysis may range from developing complete taxonomies to classification of the species into subspecies. The subspecies can be further classified into subspecies. Clustering analysis is also widely used in information, policy and decision sciences. The various applications of clustering analysis to documents include votes on political issues, survey of markets, survey of products, survey of sales programs, and R & D.

A Clustering Example Income: High Children:1 Car:Luxury Income: Low Children:0 Car:Compact Car: Sedan and Children:3 Income: Medium Children:2 Car:Truck Cluster 1 Cluster 2 Cluster 3 Cluster 4

Different ways of representing clusters (b) a d k j h g i f e c b a d k j h g i f e c b (a) (c) 123 a b c (d) g acie dkbjfh

K Means Clustering (Iterative distance-based clustering) K means clustering is an effective algorithm to extract a given number of clusters of patterns from a training set. Once done, the cluster locations can be used to classify patterns into distinct classes.

K means clustering (Cont.) Select the k cluster centers randomly. Store the k cluster centers. Loop until the change in cluster means is less the amount specified by the user.

The drawbacks of K-means clustering The final clusters do not represent a global optimization result but only the local one, and complete different final clusters can arise from difference in the initial randomly chosen cluster centers. (fig. 1) We have to know how many clusters we will have at the first.

Drawback of K-means clustering (Cont.) Figure 1

Clustering with Genetic Algorithm Introduction of Genetic Algorithm Elements consisting GAs Genetic Representation Genetic operators

Introduction of GAs Inspired by biological evolution. Many operators mimic the process of the biological evolution including –Natural selection –Crossover –Mutation

Elements consisting GAs Individual (chromosome): –feasible solution in an optimization problem Population –Set of individuals –Should be maintained in each generation

Elements consisting GAs Genetic operators. (crossover, mutation…) Define the fitness function. –The fitness function takes a single chromosome as input and returns a measure of the goodness of the solution represented by the chromosome.

Genetic Representation The most important starting point to develop a genetic algorithm Each gene has its special meaning Based on this representation, we can define –fitness evaluation function, –crossover operator, –mutation operator.

Genetic Representation (Cont.) Examples 1 Outlook 0 Wind 1 PlayTennis 1 Overcast Rain Sunny 11 Strong Normal Yes No 00 If Outlook is Overcast or Rain and Wind is Strong, then PlayTennis = Yes If Outlook is Overcast or Rain and Wind is Strong, then PlayTennis = Yes A chromosome Gene Allele value

Genetic Representation (Cont.) Examples 2 ( In clustering problem) –Each chromosome represents a set of clusters; each gene represents an object; each allele value represents a cluster. Genes with the same allele value are in the same cluster ABCDEFG

Crossover Exchange features of two individuals to produce two offspring (children) Selected mates may have good properties to survive in next generations So, we can expect that exchanging features may produce other good individuals

Crossover (cont.) Single-point Crossover Two-point Crossover Uniform Crossover Crossover template

Mutation Usually change a single bit in a bit string This operator should happen with very low probability Mutation point (random)

Typical Procedures Crossover mates are probabilistically selected based on their fitness value Crossover point randomly selected old generation new generation Mutation point (random) Probabilistically select individuals

Preparing the chromosomes Defining genetic operators –Fusion: takes two unique allele values and combines them into a single allele value, combining two clusters into one. –Fission: takes a single allele value and gives it a different random allele value, breaking a cluster apart. Defining fitness functions How to apply GA on a clustering problem

Example: (Cont.) Crossover Mutation Fusion Fission Old generation New generation Select the chromosomes according to the fitness function

Finally…