Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Lecture 34 of 42 Wednesday, 19 November.

Slides:

Advertisements

Similar presentations

VC Dimension – definition and impossibility result

Advertisements

Genetic Algorithms.

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Genetic Algorithms Representation of Candidate Solutions GAs on primarily two types of representations: –Binary-Coded –Real-Coded Binary-Coded GAs must.

1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.

CS 8751 ML & KDDGenetic Algorithms1 Evolutionary computation Prototypical GA An example: GABIL Genetic Programming Individual learning and population evolution.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.

Artificial Neural Networks

Radial Basis Function Networks

PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Computing & Information Sciences Kansas State University Lecture 37 of 42 CIS 530 / 730 Artificial Intelligence Lecture 37 of 42 Genetic Programming Discussion:

Genetic Algorithm.

Artificial Neural Networks

Computing & Information Sciences Kansas State University Friday, 21 Nov 2008CIS 530 / 730: Artificial Intelligence Lecture 35 of 42 Friday, 21 November.

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 16 March 2007 William.

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.

SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.

Machine Learning Chapter 4. Artificial Neural Networks

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, September 16, 1999.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

Computing & Information Sciences Kansas State University Monday, 27 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 38 of 42 Monday, 27 November.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

1 Machine Learning: Lecture 12 Genetic Algorithms (Based on Chapter 9 of Mitchell, T., Machine Learning, 1997)

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 16 February 2007 William.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.

1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 15 February 2008 William.

Machine Learning Chapter 9. Genetic Algorithm

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 9, 2000.

Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.

Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.

Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.

CS Inductive Bias1 Inductive Bias: How to generalize on novel data.

Genetic Algorithms Genetic algorithms provide an approach to learning that is based loosely on simulated evolution. Hypotheses are often described by bit.

CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 5: Power of Heuristic; non- conventional search.

CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 21 February 2007.

Computing & Information Sciences Kansas State University Monday, 27 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 38 of 42 Monday, 27 November.

Computing & Information Sciences Kansas State University Monday, 24 Nov 2008CIS 530 / 730: Artificial Intelligence Lecture 36 of 42 Monday, 24 November.

Chapter 9 Genetic Algorithms Evolutionary computation Prototypical GA

Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.

Chapter 9 Genetic Algorithms

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Monday, 25 February 2008 William.

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Computing & Information Sciences Kansas State University Wednesday, 04 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 17 of 42 Wednesday, 04 October.

Computing & Information Sciences Kansas State University Friday, 13 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 21 of 42 Friday, 13 October.

Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.

Computing & Information Sciences Kansas State University Lecture 42 of 42 CIS 732 Machine Learning & Pattern Recognition Lecture 42 of 42 Genetic Programming.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

Lecture 13 Multi-Layer Perceptrons and Backpropagation of Error

Computational Learning Theory

Computational Learning Theory

CSCI B609: “Foundations of Data Science”

Machine Learning: UNIT-3 CHAPTER-2

Lecture 14 Learning Inductive inference

Chapter 9 Genetic Algorithms

INTRODUCTION TO Machine Learning 3rd Edition

Presentation transcript:

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Lecture 34 of 42 Wednesday, 19 November 2008 William H. Hsu Department of Computing and Information Sciences, KSU KSOL course page: Course web site: Instructor home page: Reading for Next Class: Sections 22.1, , Russell & Norvig 2 nd edition Genetic and Evolutionary Computation Discussion: GA, GP

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Hidden Units and Feature Extraction  Training procedure: hidden unit representations that minimize error E  Sometimes backprop will define new hidden features that are not explicit in the input representation x, but which capture properties of the input instances that are most relevant to learning the target function t(x)  Hidden units express newly constructed features  Change of representation to linearly separable D’ A Target Function (Sparse aka 1-of-C, Coding)  Can this be learned? (Why or why not?) Learning Hidden Layer Representations

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Training: Evolution of Error and Hidden Unit Encoding error D (o k ) h j ( ), 1  j  3

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Input-to-Hidden Unit Weights and Feature Extraction  Changes in first weight layer values correspond to changes in hidden layer encoding and consequent output squared errors  w 0 (bias weight, analogue of threshold in LTU) converges to a value near 0  Several changes in first 1000 epochs (different encodings) Training: Weight Evolution u i1, 1  i  8

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Convergence of Backpropagation No Guarantee of Convergence to Global Optimum Solution  Compare: perceptron convergence (to best h  H, provided h  H; i.e., LS)  Gradient descent to some local error minimum (perhaps not global minimum…)  Possible improvements on backprop (BP) Momentum term (BP variant with slightly different weight update rule) Stochastic gradient descent (BP algorithm variant) Train multiple nets with different initial weights; find a good mixture  Improvements on feedforward networks Bayesian learning for ANNs (e.g., simulated annealing) - later Other global optimization methods that integrate over multiple networks Nature of Convergence  Initialize weights near zero  Therefore, initial network near-linear  Increasingly non-linear functions possible as training progresses

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Overtraining in ANNs Error versus epochs (Example 2) Recall: Definition of Overfitting  h’ worse than h on D train, better on D test Overtraining: A Type of Overfitting  Due to excessive iterations  Avoidance: stopping criterion (cross-validation: holdout, k-fold)  Avoidance: weight decay Error versus epochs (Example 1)

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Overfitting in ANNs Other Causes of Overfitting Possible  Number of hidden units sometimes set in advance  Too few hidden units (“underfitting”) ANNs with no growth Analogy: underdetermined linear system of equations (more unknowns than equations)  Too many hidden units ANNs with no pruning Analogy: fitting a quadratic polynomial with an approximator of degree >> 2 Solution Approaches  Prevention: attribute subset selection (using pre-filter or wrapper)  Avoidance Hold out cross-validation (CV) set or split k ways (when to stop?) Weight decay: decrease each weight by some factor on each epoch  Detection/recovery: random restarts, addition and deletion of weights, units

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence 90% Accurate Learning Head Pose, Recognizing 1-of-20 Faces Example: Neural Nets for Face Recognition 30 x 32 Inputs Left Straight Right Up Hidden Layer Weights after 1 Epoch Hidden Layer Weights after 25 Epochs Output Layer Weights (including w 0 =  ) after 1 Epoch

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Example: NetTalk Sejnowski and Rosenberg, 1987 Early Large-Scale Application of Backprop  Learning to convert text to speech Acquired model: a mapping from letters to phonemes and stress marks Output passed to a speech synthesizer  Good performance after training on a vocabulary of ~1000 words Very Sophisticated Input-Output Encoding  Input: 7-letter window; determines the phoneme for the center letter and context on each side; distributed (i.e., sparse) representation: 200 bits  Output: units for articulatory modifiers (e.g., “voiced”), stress, closest phoneme; distributed representation  40 hidden units; weights total Experimental Results  Vocabulary: trained on 1024 of 1463 (informal) and 1000 of (dictionary)  78% on informal, ~60% on dictionary

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence NeuroSolutions Demo

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence PAC Learning: Definition and Rationale Intuition  Can’t expect a learner to learn exactly Multiple consistent concepts Unseen examples: could have any label (“OK” to mislabel if “rare”)  Can’t always approximate c closely (probability of D not being representative) Terms Considered  Class C of possible concepts, learner L, hypothesis space H  Instances X, each of length n attributes  Error parameter , confidence parameter , true error error D (h)  size(c) = the encoding length of c, assuming some representation Definition  C is PAC-learnable by L using H if for all c  C, distributions D over X,  such that 0 <  < 1/2, and  such that 0 <  < 1/2, learner L will, with probability at least (1 -  ), output a hypothesis h  H such that error D (h)    Efficiently PAC-learnable: L runs in time polynomial in 1/ , 1/ , n, size(c)

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence PAC Learning: Results for Two Hypothesis Languages Unbiased Learner  Recall: sample complexity bound m  1/  (ln | H | + ln (1/  ))  Sample complexity not always polynomial  Example: for unbiased learner, | H | = 2 | X |  Suppose X consists of n booleans (binary-valued attributes) | X | = 2 n, | H | = 2 2 n m  1/  (2 n ln 2 + ln (1/  )) Sample complexity for this H is exponential in n Monotone Conjunctions  Target function of the form  Active learning protocol (learner gives query instances): n examples needed  Passive learning with a helpful teacher: k examples (k literals in true concept)  Passive learning with randomly selected examples (proof to follow): m  1/  (ln | H | + ln (1/  )) = 1/  (ln n + ln (1/  ))

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence PAC Learning: Monotone Conjunctions [1] Monotone Conjunctive Concepts  Suppose c  C (and h  H) is of the form x 1  x 2  …  x m  n possible variables: either omitted or included (i.e., positive literals only) Errors of Omission (False Negatives)  Claim: the only possible errors are false negatives (h(x) = -, c(x) = +)  Mistake iff (z  h)  (z  c)  (  x  D test. x(z) = false): then h(x) = -, c(x) = + Probability of False Negatives  Let z be a literal; let Pr(Z) be the probability that z is false in a positive x  D  z in target concept (correct conjunction c = x 1  x 2  …  x m )  Pr(Z) = 0  Pr(Z) is the probability that a randomly chosen positive example has z = false (inducing a potential mistake, or deleting z from h if training is still in progress)  error(h)   z  h Pr(Z) c h Instance Space X

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence PAC Learning: Monotone Conjunctions [2] Bad Literals  Call a literal z bad if Pr(Z) >  =  ’/n  z does not belong in h, and is likely to be dropped (by appearing with value true in a positive x  D), but has not yet appeared in such an example Case of No Bad Literals  Lemma: if there are no bad literals, then error(h)   ’  Proof: error(h)   z  h Pr(Z)   z  h  ’/n   ’ (worst case: all n z’s are in c ~ h) Case of Some Bad Literals  Let z be a bad literal  Survival probability (probability that it will not be eliminated by a given example): 1 - Pr(Z) < 1 -  ’/n  Survival probability over m examples: (1 - Pr(Z)) m < (1 -  ’/n) m  Worst case survival probability over m examples (n bad literals) = n (1 -  ’/n) m  Intuition: more chance of a mistake = greater chance to learn

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence PAC Learning: Monotone Conjunctions [3] Goal: Achieve An Upper Bound for Worst-Case Survival Probability  Choose m large enough so that probability of a bad literal z surviving across m examples is less than   Pr(z survives m examples) = n (1 -  ’/n) m <   Solve for m using inequality 1 - x < e -x n e -m  ’/n <  m > n/  ’ (ln (n) + ln (1/  )) examples needed to guarantee the bounds  This completes the proof of the PAC result for monotone conjunctions  Nota Bene: a specialization of m  1/  (ln | H | + ln (1/  )); n/  ’ = 1/  Practical Ramifications  Suppose  = 0.1,  ’ = 0.1, n = 100: we need 6907 examples  Suppose  = 0.1,  ’ = 0.1, n = 10: we need only 460 examples  Suppose  = 0.01,  ’ = 0.1, n = 10: we need only 690 examples

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence PAC Learning: k-CNF, k-Clause-CNF, k-DNF, k-Term-DNF k-CNF (Conjunctive Normal Form) Concepts: Efficiently PAC-Learnable  Conjunctions of any number of disjunctive clauses, each with at most k literals  c = C 1  C 2  …  C m ; C i = l 1  l 1  …  l k ; ln (| k-CNF |) = ln (2 (2n) k ) =  (n k )  Algorithm: reduce to learning monotone conjunctions over n k pseudo-literals C i k-Clause-CNF  c = C 1  C 2  …  C k ; C i = l 1  l 1  …  l m ; ln (| k-Clause-CNF |) = ln (3 kn ) =  (kn)  Efficiently PAC learnable? See below (k-Clause-CNF, k-Term-DNF are duals) k-DNF (Disjunctive Normal Form)  Disjunctions of any number of conjunctive terms, each with at most k literals  c = T 1  T 2  …  T m ; T i = l 1  l 1  …  l k k-Term-DNF: “Not” Efficiently PAC-Learnable (Kind Of, Sort Of…)  c = T 1  T 2  …  T k ; T i = l 1  l 1  …  l m ; ln (| k-Term-DNF |) = ln (k3 n ) =  (n + ln k)  Polynomial sample complexity, not computational complexity (unless RP = NP)  Solution: Don’t use H = C! k-Term-DNF  k-CNF (so let H = k-CNF)

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Consistent Learners General Scheme for Learning  Follows immediately from definition of consistent hypothesis  Given: a sample D of m examples  Find: some h  H that is consistent with all m examples  PAC: show that if m is large enough, a consistent hypothesis must be close enough to c  Efficient PAC (and other COLT formalisms): show that you can compute the consistent hypothesis efficiently Monotone Conjunctions  Used an Elimination algorithm (compare: Find-S) to find a hypothesis h that is consistent with the training set (easy to compute)  Showed that with sufficiently many examples (polynomial in the parameters), then h is close to c  Sample complexity gives an assurance of “convergence to criterion” for specified m, and a necessary condition (polynomial in n) for tractability

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence VC Dimension: Framework Infinite Hypothesis Space?  Preceding analyses were restricted to finite hypothesis spaces  Some infinite hypothesis spaces are more expressive than others, e.g., rectangles vs. 17-sided convex polygons vs. general convex polygons linear threshold (LT) function vs. a conjunction of LT units  Need a measure of the expressiveness of an infinite H other than its size Vapnik-Chervonenkis Dimension: VC(H)  Provides such a measure  Analogous to | H |: there are bounds for sample complexity using VC(H)

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence VC Dimension: Shattering A Set of Instances Dichotomies  Recall: a partition of a set S is a collection of disjoint sets S i whose union is S  Definition: a dichotomy of a set S is a partition of S into two subsets S 1 and S 2 Shattering  A set of instances S is shattered by hypothesis space H if and only if for every dichotomy of S, there exists a hypothesis in H consistent with this dichotomy  Intuition: a rich set of functions shatters a larger instance space The “Shattering Game” (An Adversarial Interpretation)  Your client selects an S (an instance space X)  You select an H  Your adversary labels S (i.e., chooses a point c from concept space C = 2 X )  You must find then some h  H that “covers” (is consistent with) c  If you can do this for any c your adversary comes up with, H shatters S

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence VC Dimension: Examples of Shattered Sets Three Instances Shattered Intervals  Left-bounded intervals on the real axis: [0, a), for a  R  0 Sets of 2 points cannot be shattered Given 2 points, can label so that no hypothesis will be consistent  Intervals on the real axis ([a, b], b  R > a  R ): can shatter 1 or 2 points, not 3  Half-spaces in the plane (non-collinear): 1? 2? 3? 4? Instance Space X 0a ab +

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Lecture Outline Readings for Friday  Finish Chapter 20, Russell and Norvig 2e  Suggested: Chapter 1, , Goldberg; 9.1 – 9.4, Mitchell Evolutionary Computation  Biological motivation: process of natural selection  Framework for search, optimization, and learning Prototypical (Simple) Genetic Algorithm  Components: selection, crossover, mutation  Representing hypotheses as individuals in GAs An Example: GA-Based Inductive Learning (GABIL) GA Building Blocks (aka Schemas) Taking Stock (Course Review)

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Simple Genetic Algorithm (SGA) Algorithm Simple-Genetic-Algorithm (Fitness, Fitness-Threshold, p, r, m) // p: population size; r: replacement rate (aka generation gap width), m: string size  P  p random hypotheses// initialize population  FOR each h in P DO f[h]  Fitness(h)// evaluate Fitness: hypothesis  R  WHILE (Max(f) < Fitness-Threshold) DO  1. Select: Probabilistically select (1 - r)p members of P to add to P S  2. Crossover:  Probabilistically select (r · p)/2 pairs of hypotheses from P  FOR each pair DO P S += Crossover ( )// P S [t+1] = P S [t] +  3. Mutate: Invert a randomly selected bit in m · p random members of P S  4. Update: P  P S  5. Evaluate: FOR each h in P DO f[h]  Fitness(h)  RETURN the hypothesis h in P that has maximum fitness f[h]

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence GA-Based Inductive Learning (GABIL) GABIL System [Dejong et al, 1993]  Given: concept learning problem and examples  Learn: disjunctive set of propositional rules  Goal: results competitive with those for current decision tree learning algorithms (e.g., C4.5) Fitness Function: Fitness(h) = (Correct(h)) 2 Representation  Rules: IF a 1 = T  a 2 = F THEN c = T; IF a 2 = T THEN c = F  Bit string encoding: a 1 [10]. a 2 [01]. c [1]. a 1 [11]. a 2 [10]. c [0] = Genetic Operators  Want variable-length rule sets  Want only well-formed bit string hypotheses

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Crossover: Variable-Length Bit Strings Basic Representation  Start with a 1 a 2 c a 1 a 2 c h 1 1[ ]00 h 2 0[1 1]  Idea: allow crossover to produce variable-length offspring Procedure  1. Choose crossover points for h 1, e.g., after bits 1, 8  2. Now restrict crossover points in h 2 to those that produce bitstrings with well-defined semantics, e.g.,,, Example  Suppose we choose  Result h h

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence GABIL Extensions New Genetic Operators  Applied probabilistically  1. AddAlternative: generalize constraint on a i by changing a 0 to a 1  2. DropCondition: generalize constraint on a i by changing every 0 to a 1 New Field  Add fields to bit string to decide whether to allow above operators a 1 a 2 c a 1 a 2 cAADC  So now learning strategy also evolves!  aka genetic wrapper

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence GABIL Results Classification Accuracy  Compared to symbolic rule/tree learning methods  C4.5 [Quinlan, 1993]  ID5R  AQ14 [Michalski, 1986]  Performance of GABIL comparable  Average performance on a set of 12 synthetic problems: 92.1% test accuracy  Symbolic learning methods ranged from 91.2% to 96.6% Effect of Generalization Operators  Result above is for GABIL without AA and DC  Average test set accuracy on 12 synthetic problems with AA and DC: 95.2%

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Building Blocks (Schemas) Problem  How to characterize evolution of population in GA?  Goal  Identify basic building block of GAs  Describe family of individuals Definition: Schema  String containing 0, 1, * (“don’t care”)  Typical schema: 10**0*  Instances of above schema: , , … Solution Approach  Characterize population by number of instances representing each schema  m(s, t)  number of instances of schema s in population at time t

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Selection and Building Blocks Restricted Case: Selection Only   average fitness of population at time t  m(s, t)  number of instances of schema s in population at time t   average fitness of instances of schema s at time t Quantities of Interest  Probability of selecting h in one selection step  Probability of selecting an instance of s in one selection step  Expected number of instances of s after n selections

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Schema Theorem Theorem  m(s, t)  number of instances of schema s in population at time t   average fitness of population at time t   average fitness of instances of schema s at time t  p c  probability of single point crossover operator  p m  probability of mutation operator  l  length of individual bit strings  o(s)  number of defined (non “*”) bits in s  d(s)  distance between rightmost, leftmost defined bits in s Intuitive Meaning  “The expected number of instances of a schema in the population tends toward its relative fitness”  A fundamental theorem of GA analysis and design

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Genetic Programming Readings / Viewings  View GP videos 1-3  GP1 – Genetic Programming: The Video  GP2 – Genetic Programming: The Next Generation  GP3 – Genetic Programming: Invention  GP4 – Genetic Programming: Human-Competitive  Suggested: Chapters 1-5, Koza Previously  Genetic and evolutionary computation (GEC)  Generational vs. steady-state GAs; relation to simulated annealing, MCMC  Schema theory and GA engineering overview Today: GP Discussions  Code bloat and potential mitigants: types, OOP, parsimony, optimization, reuse  Genetic programming vs. human programming: similarities, differences

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence GP Flow Graph Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandez

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Structural Crossover Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandez

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Structural Mutation Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandez

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Terminology Evolutionary Computation (EC): Models Based on Natural Selection Genetic Algorithm (GA) Concepts  Individual: single entity of model (corresponds to hypothesis)  Population: collection of entities in competition for survival  Generation: single application of selection and crossover operations  Schema aka building block: descriptor of GA population (e.g., 10**0*)  Schema theorem: representation of schema proportional to its relative fitness Simple Genetic Algorithm (SGA) Steps  Selection  Proportionate (aka roulette wheel): P(individual)  f(individual)  Tournament: let individuals compete in pairs or tuples; eliminate unfit ones  Crossover  Single-point:   { , }  Two-point:   { , }  Uniform:   { , }  Mutation: single-point (“bit flip”), multi-point

Computing & Information Sciences Kansas State University Wednesday, 19 Nov 2008CIS 530 / 730: Artificial Intelligence Summary Points Evolutionary Computation  Motivation: process of natural selection  Limited population; individuals compete for membership  Method for parallelizing and stochastic search  Framework for problem solving: search, optimization, learning Prototypical (Simple) Genetic Algorithm (GA)  Steps  Selection: reproduce individuals probabilistically, in proportion to fitness  Crossover: generate new individuals probabilistically, from pairs of “parents”  Mutation: modify structure of individual randomly  How to represent hypotheses as individuals in GAs An Example: GA-Based Inductive Learning (GABIL) Schema Theorem: Propagation of Building Blocks Next Lecture: Genetic Programming, The Movie