Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genetic Programming. What is Genetic Programming? GP for Symbolic Regression Other Representations for GP Example of GP for Knowledge Discovery Outline.

Similar presentations


Presentation on theme: "Genetic Programming. What is Genetic Programming? GP for Symbolic Regression Other Representations for GP Example of GP for Knowledge Discovery Outline."— Presentation transcript:

1 Genetic Programming

2 What is Genetic Programming? GP for Symbolic Regression Other Representations for GP Example of GP for Knowledge Discovery Outline Genetic Programming2

3 Given a set of input-output data, Genetic programming (GP) searches for a relationship (i.e., expression, function, or program) between the input and the output GP Algorithm: (Koza, 1992) 1.Define a primitive set of possible functions (operators) and terminals (variables and constants) for the program. 2.Generate an initial population of random programs. 3.Repeat until termination: ○Calculate the fitness of each program by running it on a set of fitness cases (input-output data). ○Apply selection, crossover, andmutation. What is Genetic Programming? Genetic Programming3

4 Example: Curve fitting –Given the following observations of input and output of a system, find a curve that best fits the data The ideal function to be identified: f(x) = x 0.5 + sin x GP for Symbolic Regression Genetic Programming4

5 An algebraic expression (program) is represented by its parse tree –Internal nodes can be functions (operators) –Leaf nodes can be variables and constants, aka terminals Example: The expression x 0.5 + sin x in a prefix notation: (+ (^ x 0.5) sin x) Parse Tree Representation Genetic Programming5 Chromosome representation Parse tree representation

6 Syntactically correct trees should be generated randomly –For each node an element is selected from the primitive set –The root of a tree must be a function –A node containing a function should have k child nodes, where k is the number of arguments of that function –A node containing a variable or a constant has no child Special care must be taken to restrict the maximum tree size in the initial population –Usually the maximum depth of a tree is controlled not to exceed a certain predetermined limit Initialization Genetic Programming6

7 The fitness of an individual program e can be obtained by calculating the sum of absolute errors on the fitness cases: y(k)  value of the k th fitness case e(k)  value of individual program e on the k th fitness case Evaluation Genetic Programming7

8 Subtree crossover: –Randomly pick a node in each parent tree and exchange the subtrees rooted at those selected nodes to generate two offspring Reproductive Operators Genetic Programming8 One offspring

9 Point mutation: –Randomly pick a node in a tree and replace it by other compatible element in the primitive set Reproductive Operators Genetic Programming9

10 The representation based on the prefix notation has some shortcomings: –Chromosomes have variable length, which creates some inconveniences in designing reproductive operators ○Simple crossover and mutation, e.g., single-point crossover and bit-flip mutation, might generate illegal offspring ○Chromosomes need to be transformed into parse trees for crossover or mutation Other Representations for GP Genetic Programming10

11 Gene Expression Programming Genetic Programming11 head tail Gene expression programming (GEP) divides the fixed length chromosome into head and tail: –The head part has h genes of functions and terminals –The tail part has at most t = h(k – 1) + 1 terminals, where k is the maximum number of arguments required by the functions Example: h = 10, k = 2, t = 10(2 – 1) + 1 = 11

12 Decoding to a parse tree: –Nodes are expanded in a breadth-first manner starting from a root node –Each node, just before its expansion, is assigned a gene from the chromosome in a left-to-right order ○The number of nodes generated = the number of arguments ○Terminal nodes are not expanded GEP always produces a legal string as long as the tail part contains only the terminals –Ordinary crossover and mutation can be used Gene Expression Programming Genetic Programming12

13 Motivation: –To evolve complete programs in an arbitrary language using an integer array representation (O’Neill et al., 2001) ○Each integer indicates the production rule (from a context-free grammar) to be applied during decoding Context-Free Grammar (CFG): –A formal grammar in which every production rule is of the form V  wV  w where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ( w can be empty) –CFG is used to describe the structure of a programming language –A popular notation for CFGs is Backus-Naur Form (BNF) Grammatical Evolution Genetic Programming13

14 Example: BNF CFG Grammatical Evolution Genetic Programming14 set Id

15 CFG in BNF: –The symbols belong to one of the two sets ○Terminal set T : Symbols that appear in legal sentences of the CFG ○Nonterminal set N : Symbols that expand into terminals or non- terminals under the application of a set P of production rules –A BNF expresses the CFG as a tuple  N, T, P, S  in which S is a predefined start symbol in N –A BNF CFG provides the mechanism for decoding a genotype into a phenotype Grammatical Evolution Genetic Programming15

16 Example: Decoding of a chromosome of length l –Begin with the initial expression (i.e., S = ) as the current expression E, and i  1 –Repeat until there is no nonterminal symbol left in E ○Locate the leftmost nonterminal symbol e in E, and k  (i mod l) ○From the rule set R(e) for e, retrieve the production rule p whose id is V k mod |R(e)| where V k is the k th allele (Note that the total number of production rules applied can be either larger than or smaller than l ) ○Update E by expanding e according to p ○ i  i + 1 Grammatical Evolution Genetic Programming16

17 Example: Decoding of a chromosome of length l Grammatical Evolution Genetic Programming17

18 Chest-pain diagnosis (Bojarczuk et al., 2000) – 12 diseases, 165 Boolean attributes –Need to derive a Boolean expression for predicting each disease IF (starting factor is emotion) AND ((the pain lasts no more than seconds) OR ((the pain begins gradually) AND (the pain irradiates towards the upper left limb))) THEN (disease is stable angina) – 138 fitness cases collected ○For disease i, cases with the disease are positive examples and those without are negative examples ○ 90 for training and 48 for testing for each disease Example of GP for Knowledge Discovery Genetic Programming18

19 Chest-pain diagnosis –Primitive set: ○Functions: AND, OR, NOT ○Terminals: the symbols for 165 attributes Example of GP for Knowledge Discovery Genetic Programming19

20 Chest-pain diagnosis –Test of an expression by fitness cases results in four possible outcomes: ○The true positive rate or sensitivity: ○Specificity: Example of GP for Knowledge Discovery Genetic Programming20 Predicted class YesNo Actual class YesTrue positive ( TP )False negative ( FN ) NoFalse positive ( FP )True negative ( TN )

21 Chest-pain diagnosis –Simplicity ( Sy ): the measure for the size of a parse tree where maxnodes is the maximum number of nodes allowed and numnodes is the number of nodes of the current parse tree ( 0.5  Sy  1 ) –The fitness function: fitness = Se  Sp  Sy ( 0  fitness  1 ) –GP has shown better predicting performance than other rule- induction algorithms Example of GP for Knowledge Discovery Genetic Programming21


Download ppt "Genetic Programming. What is Genetic Programming? GP for Symbolic Regression Other Representations for GP Example of GP for Knowledge Discovery Outline."

Similar presentations


Ads by Google