Presentation is loading. Please wait.

Presentation is loading. Please wait.

GENETIC ALGORITHMS AND GENETIC PROGRAMMING

Similar presentations


Presentation on theme: "GENETIC ALGORITHMS AND GENETIC PROGRAMMING"— Presentation transcript:

1 GENETIC ALGORITHMS AND GENETIC PROGRAMMING

2 John R. Koza Consulting Professor (Medical Informatics)
Department of Medicine School of Medicine Consulting Professor Department of Electrical Engineering School of Engineering Stanford University Stanford, California 94305

3 DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.

4 GENETIC ALGORITHM (GA)
Generation 0 Generation 1 Individuals Fitness Offspring 011 $3 111 001 $1 010 110 $6 $2

5 HAMBURGER RESTAURANT PROBLEM
Price 1 = $ 0.50 price 0 = $10.00 price Drink 1 = Coca Cola 0 = Wine Ambiance 1 = Fast snappy service 0 = Leisurely service with tuxedoed waiter

6 CHROMOSOME (GENOME) OF THE GLOBAL OPTIMUM
McDONALD's 1

7 THE SEARCH SPACE Alphabet size K=2, Length L=3
1 000 2 001 3 010 4 011 5 100 6 101 7 110 8 111 Alphabet size K=2, Length L=3 Size of search space: KL=2L=23=8

8 IMPRACTICALITY OF RANDOM OR ENUMERATIVE SEARCH
81-bit problems are very small for GA However, even if L is as small as 81, 281 ~ 1027 = number of nanoseconds since the beginning of the universe 15 billion years ago

9 GA FLOWCHART

10 GENERATION 0 Generation 0 1 011 3 2 001 110 6 4 010 Total Worst
Average Best

11 DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.

12 PROBABILISTIC SELECTION BASED ON FITNESS
Better individuals are preferred Best is not always picked Worst is not necessarily excluded Nothing is guaranteed Mixture of greedy exploitation and adventurous exploration Similarities to simulated annealing (SA)

13 DARWINIAN FITNESS PROPORTIONATE SELECTION
Generation 0 Mating pool 1 011 3 .25 2 001 .08 110 6 .50 4 010 .17 Total 12 17 Worst Average 3.00 4.5 Best

14 DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.

15 MUTATION OPERATION Parent chosen probabilistically based on fitness
Mutation point chosen at random One offspring Parent 010 Parent --0 Offspring 011

16 AFTER MUTATION OPERATION
Generation 0 Mating pool Generation 1 1 011 3 .25 2 001 .08 110 6 .50 4 010 .17 --- Total 12 17 Worst Average 3.00 4.5 Best

17 CROSSOVER OPERATION 2 parents chosen probabilistically based on fitness Parent 1 Parent 2 011 110

18 CROSSOVER (CONTINUED)
Interstitial point picked at random 2 remainders 2 offspring produced by crossover Fragment 1 Fragment 2 01- 11- Remainder 1 Remainder 2 - - 1 - - 0 Offspring 1 Offspring 2 111 010

19 AFTER CROSSOVER OPERATION
Generation 0 Mating pool Generation 1 1 011 3 .25 2 111 7 001 .08 110 6 010 .50 4 .17 Total 12 17 Worst Average 3.00 4.5 Best

20 AFTER REPRODUCTION OPERATION
Generation 0 Mating pool Generation 1 1 011 3 .25 2 001 .08 110 6 .50 --- 4 010 .17 Total 12 17 Worst Average 3.00 4.5 Best

21 DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.

22 GENERATION 1 Generation 0 Mating pool Generation 1 1 011 3 .25 2 111 7
001 .08 110 6 010 .50 --- 4 .17 Total 12 17 18 Worst Average 3.00 4.5 Best

23 PROBABILISTIC STEPS The initial population is typically random
Probabilistic selection based on fitness - Best is not always picked - Worst is not necessarily excluded Random picking of mutation and crossover points Often, there is probabilistic scenario as part of the fitness measure

24 GENETIC PROGRAMMING BBB4003

25 THE CHALLENGE "How can computers learn to solve problems without being explicitly programmed? In other words, how can computers be made to do what is needed to be done, without being told exactly how to do it?"  Attributed to Arthur Samuel (1959)

26 CRITERION FOR SUCCESS "The aim [is] ... to get machines to exhibit behavior, which if done by humans, would be assumed to involve the use of intelligence.“  Arthur Samuel (1983)

27 REPRESENTATIONS Binary decision diagrams Decision trees
Formal grammars Coefficients for polynomials Reinforcement learning tables Conceptual clusters Classifier systems Decision trees If-then production rules Horn clauses Neural nets Bayesian networks Frames Propositional logic

28 A COMPUTER PROGRAM BBB121

29 GENETIC PROGRAMMING (GP)
GP applies the approach of the genetic algorithm to the space of possible computer programs Computer programs are the lingua franca for expressing the solutions to a wide variety of problems A wide variety of seemingly different problems from many different fields can be reformulated as a search for a computer program to solve the problem.

30 GP  MAIN POINTS Genetic programming now routinely delivers high-return human-competitive machine intelligence. Genetic programming is an automated invention machine. Genetic programming has delivered a progression of qualitatively more substantial results in synchrony with five approximately order-of-magnitude increases in the expenditure of computer time.

31 GP FLOWCHART BBB2028 (converted to BMP)

32 A COMPUTER PROGRAM IN C int foo (int time) { int temp1, temp2;
if (time > 10) temp1 = 3; else temp1 = 4; temp2 = temp ; return (temp2); }

33 OUTPUT OF C PROGRAM Time Output 6 1 2 3 4 5 7 8 9 10 11 12

34 PROGRAM TREE (+ 1 2 (IF (> TIME 10) 3 4))

35 CREATING RANDOM PROGRAMS
Creation.avi (creation.gif converted to AVI movie file)

36 CREATING RANDOM PROGRAMS
Available functions F = {+, -, *, %, IFLTE} Available terminals T = {X, Y, Random-Constants} The random programs are: Of different sizes and shapes Syntactically valid Executable

37 GP GENETIC OPERATIONS Reproduction Mutation
Crossover (sexual recombination) Architecture-altering operations

38 MUTATION OPERATION Mutation.avi

39 MUTATION OPERATION Select 1 parent probabilistically based on fitness Pick point from 1 to NUMBER-OF-POINTS Delete subtree at the picked point Grow new subtree at the mutation point in same way as generated trees for initial random population (generation 0) The result is a syntactically valid executable program Put the offspring into the next generation of the population

40 CROSSOVER OPERATION Crossover.avi

41 CROSSOVER OPERATION Select 2 parents probabilistically based on fitness Randomly pick a number from 1 to NUMBER-OF-POINTS for 1st parent Independently randomly pick a number for 2nd parent The result is a syntactically valid executable program Put the offspring into the next generation of the population Identify the subtrees rooted at the two picked points

42 REPRODUCTION OPERATION
Select parent probabilistically based on fitness Copy it (unchanged) into the next generation of the population

43 FIVE MAJOR PREPARATORY STEPS FOR GP
Determining the set of terminals Determining the set of functions Determining the fitness measure Determining the parameters for the run Determining the method for designating a result and the criterion for terminating a run BBB3666 (converted to BMP from eps) The following were cut as parameter subpoints so that the text fit on a slide population size number of generations minor parameters

44 PREPARATORY STEPS Objective:
Find a computer program with one input (independent variable X) whose output equals the given data 1 Terminal set: T = {X, Random-Constants} 2 Function set: F = {+, -, *, %} 3 Fitness: The sum of the absolute value of the differences between the candidate program’s output and the given data (computed over numerous values of the independent variable x from –1.0 to +1.0) 4 Parameters: Population size M = 4 5 Termination: An individual emerges whose sum of absolute errors is less than 0.1

45 POPULATION OF 4 RANDOMLY CREATED INDIVIDUALS FOR GENERATION 0
SYMBOLIC REGRESSION POPULATION OF 4 RANDOMLY CREATED INDIVIDUALS FOR GENERATION 0 BBB3663 was broken into 4 components and converted to BMP files for the incremental unveiling

46 SYMBOLIC REGRESSION x2 + x + 1
FITNESS OF THE 4 INDIVIDUALS IN GEN 0 BBB3662 was broken up into 4 indidual BMP files from the original eps file to satisfy display constraings x + 1 x2 + 1 2 x 0.67 1.00 1.70 2.67

47 SYMBOLIC REGRESSION x2 + x + 1
GENERATION 1 Second offspring of crossover of (a) and (b)  picking “+” of parent (a) and left-most “x” of parent (b) as crossover points Copy of (a) Mutant of (c) picking “2” as mutation point First offspring of crossover of (a) and (b)  picking “+” of parent (a) and left-most “x” of parent (b) as crossover points BBB3664 was broken up into 4 indidual BMP files from the original eps file to satisfy display constraings

48 WALL-FOLLOWER BBB??? No number, only existed as embedded word file

49 FITNESS BBB??? No number, only existed as embedded word file

50 BEST OF GENERATION 57 BBB??? No number, only existed as embedded word file

51 SUBROUTINE DUPLICATION
Branch-duplication.avi

52 SUBROUTINE CREATION Branch-creation.avi

53 SUBROUTINE DELETION Branch-deletion.avi

54 ARGUMENT DUPLICATION Arg-duplication.avi

55 ARGUMENT DELETION Arg-deletion.avi

56 16 ATTRIBUTES OF A SYSTEM FOR AUTOMATICALLY CREATING COMPUTER PROGRAMS
Starts with "What needs to be done" Tells us "How to do it" Produces a computer program Automatic determination of program size Code reuse Parameterized reuse Internal storage Iterations, loops, and recursions Self-organization of hierarchies Automatic determination of program architecture Wide range of programming constructs Well-defined Problem-independent Wide applicability Scalable Competitive with human-produced results

57 PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
Toy problems Human-competitive non-patent results 20th-century patented inventions 21st-century patented inventions Patentable new inventions

58 GP AS AN INVENTION MACHINE

59 To be on satellite to be launched in 2004
NASA EVOLVED ANTENNA lohn-st5-evolved-antenna.gif (bmp version for power point) To be on satellite to be launched in 2004

60 CHARACTERISTICS SUGGESTING USE OF GP
(1) discovering the size and shape of the solution, (2) reusing substructures, (3) discovering the number of substructures, (4) discovering the nature of the hierarchical references among substructures, (5) passing parameters to a substructure, (6) discovering the type of substructures (e.g., subroutines, iterations, loops, recursions, or storage), (7) discovering the number of arguments possessed by a substructure, (8) maintaining syntactic validity and locality by means of a developmental process, or (9) discovering a general solution in the form of a parameterized topology containing free variables

61 DESIGNING A GIRAFFE Long neck Long tongue
Vegetable-digesting enzymes in stomach 4 legs Long legs Brown coloration

62 THE DESIGN OF A GOOD GIRAFE
Neck length Tongue length Carnivorous? Number of legs Leg length Coloration 15.11 feet 14 inches No 4 9.96 feet Brown Floating point Boolean Integer Categorical

63 NON-LINEARITY — GIRAFE
Taken one-by-one, some gene values found in a giraffe, such as the long neck contribute (alone) negatively to fitness requires considerable material to construct requires considerable energy to maintain prone to injury (thereby hurting rate of survival and reproduction) Thus, maximizing any one variable will not lead to the global optimum solution

64 NON-LINEARITY (CONTINUED)
When the variables are taken in pairs (there are 15 possible pairs), many combinations of pairs (e.g., Long neck and long tongue) are doubly detrimental

65 NON-LINEARITY (CONTINUED)
But, certain combinations of traits, when taken together, are "co-adapted sets of alleles" that yield a very fit animal for eating high acacia leaves in the jungle environment, having good camouflage, having high escape velocity when faced with predators, and exploiting a niche (and avoiding competition) with other animals feeding on low-hanging vegetation

66 SEARCH METHODS IN GENERAL
Initial structure(s) Fitness measure Operations for creating new structures Parameters Termination criterion and method of designating the result

67 SPACE WITH MANY LOCAL OPTIMA

68 SEARCH METHODS Blind random search does not use acquired information in deciding on the future direction of the search Hill combing and gradient descent use acquired information; however, they are prone to becoming trapped on local optima The previous point is especially true for non-trivial search spaces

69 7 DIFFERENCES BETWEEN GP AND ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APPROACHES

70 REPRESENTATION Genetic programming overtly conducts it
search for a solution to the given problem in program space

71 ROLE OF POINT-TO-POINT TRANSFORMATIONS IN THE SEARCH
Genetic programming does not conduct its search by transforming a single point in the search space into another single point, but instead transforms a set of points into another set of points

72 ROLE OF HILL CLIMBING IN THE SEARCH
Genetic programming does not rely exclusively on greedy hill climbing to conduct its search, but instead allocates a certain number of trials, in a principled way, to choices that are known to be inferior

73 DETERMINISM IN THE SEARCH
Genetic programming conducts its search probabilistically

74 ROLE OF AN EXPLICIT KNOWLEDGE BASE
Genetic programming does NOT make use of a knowledge base

75 ROLE OF FORMAL LOGIC IN THE SEARCH
Genetic programming does not utilize formal logic in it’s search strategy. Contradictory alternatives are created and actively maintained.

76 UNDERPINNINGS OF THE TECHNIQUE
Biologically inspired

77 TURING (1948) Turing made the connection between
searches and the challenge of getting a computer to solve a problem without explicitly programming it in his 1948 essay “Intelligent Machines” "Further research into intelligence of machinery will probably be very greatly concerned with 'searches' ... “

78 TURING’S 3 APPROACHES TO MACHINE INTELLIGENCE (1948)
LOGIC-BASED SEARCH One approach that Turing identified is a search through the space of integers representing candidate computer programs.

79 TURING’S 3 APPROACHES (CONTINUED)
CULTURAL SEARCH A second approach is the "cultural search“ which relies on knowledge and expertise acquired over a period of years from others (akin to present-day knowledge- based systems).

80 TURING’S 3 APPROACHES (CONTINUED)
GENETICAL OR EVOLUTIONARY SEARCH "There is the genetical or evolutionary search by which a combination of genes is looked for, the criterion being the survival value.“

81 TURING (1950) From Turing’s 1950 paper "Computing
Machinery and Intelligence" … “We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications”

82 TURING (1950) (CONTINUED) “Structure of the child machine =
Hereditary material “Changes of the child machine = Mutations “Natural selection = Judgment of the experimenter”


Download ppt "GENETIC ALGORITHMS AND GENETIC PROGRAMMING"

Similar presentations


Ads by Google