Download presentation
Presentation is loading. Please wait.
1
GENETIC ALGORITHMS AND GENETIC PROGRAMMING
2
John R. Koza
3
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
4
GENETIC ALGORITHM (GA)
Generation 0 Generation 1 Individuals Fitness Offspring 011 $3 111 001 $1 010 110 $6 $2
5
HAMBURGER RESTAURANT PROBLEM
Price 1 = $ 0.50 price 0 = $10.00 price Drink 1 = Coca Cola 0 = Wine Ambiance 1 = Fast snappy service 0 = Leisurely service with tuxedoed waiter
6
CHROMOSOME (GENOME) OF THE GLOBAL OPTIMUM
McDONALD's 1
7
THE SEARCH SPACE Alphabet size K=2, Length L=3
1 000 2 001 3 010 4 011 5 100 6 101 7 110 8 111 Alphabet size K=2, Length L=3 Size of search space: KL=2L=23=8
8
IMPRACTICALITY OF RANDOM OR ENUMERATIVE SEARCH
81-bit problems are very small for GA However, even if L is as small as 81, 281 ~ 1027 = number of nanoseconds since the beginning of the universe 15 billion years ago
9
GA FLOWCHART
10
GENERATION 0 Generation 0 1 011 3 2 001 110 6 4 010 Total Worst
Average Best
11
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
12
PROBABILISTIC SELECTION BASED ON FITNESS
Better individuals are preferred Best is not always picked Worst is not necessarily excluded Nothing is guaranteed Mixture of greedy exploitation and adventurous exploration Similarities to simulated annealing (SA)
13
PROBABILISTIC SELECTION BASED ON FITNESS
14
DARWINIAN FITNESS PROPORTIONATE SELECTION
Generation 0 Mating pool 1 011 3 .25 2 001 .08 110 6 .50 4 010 .17 Total 12 17 Worst Average 3.00 4.5 Best
15
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
16
MUTATION OPERATION Parent chosen probabilistically based on fitness
Mutation point chosen at random One offspring Parent 010 Parent --0 Offspring 011
17
AFTER MUTATION OPERATION
Generation 0 Mating pool Generation 1 1 011 3 .25 2 001 .08 110 6 .50 4 010 .17 --- Total 12 17 Worst Average 3.00 4.5 Best
18
CROSSOVER OPERATION 2 parents chosen probabilistically based on fitness Parent 1 Parent 2 011 110
19
CROSSOVER (CONTINUED)
Interstitial point picked at random 2 remainders 2 offspring produced by crossover Fragment 1 Fragment 2 01- 11- Remainder 1 Remainder 2 - - 1 - - 0 Offspring 1 Offspring 2 111 010
20
AFTER CROSSOVER OPERATION
Generation 0 Mating pool Generation 1 1 011 3 .25 2 111 7 001 .08 110 6 010 .50 4 .17 Total 12 17 Worst Average 3.00 4.5 Best
21
AFTER REPRODUCTION OPERATION
Generation 0 Mating pool Generation 1 1 011 3 .25 2 001 .08 110 6 .50 --- 4 010 .17 Total 12 17 Worst Average 3.00 4.5 Best
22
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
23
GENERATION 1 Generation 0 Mating pool Generation 1 1 011 3 .25 2 111 7
001 .08 110 6 010 .50 --- 4 .17 Total 12 17 18 Worst Average 3.00 4.5 Best
24
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
25
DEFINITION OF THE GENETIC ALGORITHM (GA)
The genetic algorithm is a probabalistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixed-length binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation.
26
PROBABILISTIC STEPS The initial population is typically random
Probabilistic selection based on fitness - Best is not always picked - Worst is not necessarily excluded Random picking of mutation and crossover points Often, there is probabilistic scenario as part of the fitness measure
27
ANTENNA DESIGN
28
ANTENNA DESIGN The problem (Altshuler and Linden 1998) is to determine the x-y-z coordinates of the 3-dimensional position of the ends (X1, Y1, Z1, X2, Y2, Z2,… , X7, Y7, Z7) of 7 straight wires so that the resulting 7-wire antenna satisfies certain performance requirements The first wire starts at feed point (0, 0, 0) in the middle of the ground plane The antenna must fit inside the 0.5 cube
29
ANTENNA GENOME X1 Y1 Z1 X2 Y2 Z2 … +0010 -1110 +0001 +0011 -1011
105-bit chromosome (genome) Each x-y-z coordinate is represented by 5 bits (4-bit granularity for data plus a sign bit) Total chromosome is 3 7 5 = 105 bits
30
ANTENNA FITNESS Antenna is for ground-to-satellite communications for cars and handsets We desire near-uniform gain pattern 10 above the horizon Fitness is measured based on the antenna's radiation pattern. The radiation pattern is simulated by National Electromagnetics Code (NEC)
31
ANTENNA FITNESS Fitness is sum of the squares of the difference between the average gain and the antenna's gain Sum is taken for angles between -90 and +90 and all azimuth angles from 0 to 180 The smaller the value of fitness, the better
32
GRAPH OF ANTENNA FITNESS
33
U. S. PATENT 5,719,794
34
10-MEMBER TRUSS
35
10-MEMBER TRUSS Prespecified topological arrangement of the 10 members, the load, and the wall (Goldberg and Samtani 1986) Truss has 10 members (6 are length of 30 feet and 4 are length 30√2 = 41 feet) The problem is to determine the cross-sectional areas (A1, … , A10) of each of the 10 members so as to minimize weight of the material for a truss that supports the 2 loads The weight is based on volume (i.e., cross-sectional area length)
36
TRUSS GENOME A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 0010 1110 0001 0011 1011
1111 1010 40-bit chromosome (genome) 4-bit granularity for truss diameters 0000 = smallest diameter 1111 = largest diameter Total chromosome is 4 10 = 40 bits
37
TRUSS FITNESS Two-part (multiobjective) fitness measure
First, fitness is computed by taking the sum, over the 10 members, of the cross-sectional area of each member times the length of each member (30 feet or 30√2 = 41 feet). Second, a penalty (up to 10%) is imposed for violating the stress constraints. Stresses are computed using standard mechanical engineering techniques. The smaller the total fitness, the better
38
CELLULAR AUTOMATA BBB628
39
STATE TRANSITION TABLE
# WWW WW W X E EE EEE Rule 1 a0 a1 2 a2 3 a3 4 a4 … 127 a127
40
CELLULAR AUTOMATA 128-bit chromosome (genome) A0 A1 A2 … A127 a0 a1 a2
41
PROBLEM-SPECIFIC GENOMES
N M GENOME 1
42
GENETIC ALGORITHM USING VARIABLE-LENGTH STRINGS
5-WIRE ANTENNA (5 15 = 75 bits) 4-WIRE ANTENNA (4 15 = 60 bits) X1 Y1 Z1 … X5 Y5 Z5 +0010 -1110 +0001 X1 Y1 Z1 … X4 Y4 Z4 +1010 -0110 +1101 +1001
43
GENETIC PROGRAMMING BBB4003
44
THE CHALLENGE "How can computers learn to solve problems without being explicitly programmed? In other words, how can computers be made to do what is needed to be done, without being told exactly how to do it?" Attributed to Arthur Samuel (1959)
45
CRITERION FOR SUCCESS "The aim [is] ... to get machines to exhibit behavior, which if done by humans, would be assumed to involve the use of intelligence.“ Arthur Samuel (1983)
46
REPRESENTATIONS Binary decision diagrams Decision trees
Formal grammars Coefficients for polynomials Reinforcement learning tables Conceptual clusters Classifier systems Decision trees If-then production rules Horn clauses Neural nets Bayesian networks Frames Propositional logic
47
A COMPUTER PROGRAM BBB121
48
GENETIC PROGRAMMING (GP)
GP applies the approach of the genetic algorithm to the space of possible computer programs Computer programs are the lingua franca for expressing the solutions to a wide variety of problems A wide variety of seemingly different problems from many different fields can be reformulated as a search for a computer program to solve the problem.
49
GP MAIN POINTS Genetic programming now routinely delivers high-return human-competitive machine intelligence. Genetic programming is an automated invention machine. Genetic programming has delivered a progression of qualitatively more substantial results in synchrony with five approximately order-of-magnitude increases in the expenditure of computer time.
50
DEFINITION OF “HIGH-RETURN”
The AI ratio (the “artificial-to-intelligence” ratio) of a problem-solving method as the ratio of that which is delivered by the automated operation of the artificial method to the amount of intelligence that is supplied by the human applying the method to a particular problem
51
DEFINITION OF “ROUTINE”
A problem solving method is routine if it is general and relatively little human effort is required to get the method to successfully handle new problems within a particular domain and to successfully handle new problems from a different domain.
52
CRITERIA FOR “HUMAN-COMPETITIVENESS”
Previously patented, an improvement over a patented invention, or patentable today Publishable in its own right as a new scientific result ¾ independent of the fact that the result was mechanically created Holds it own in regulated competition against humans (or programs) 5 other similar criteria that are “arms-length” from the fields of AI, ML, GP
53
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
Toy problems Human-competitive non-patent results 20th-century patented inventions 21st-century patented inventions Patentable new inventions
54
GP FLOWCHART BBB2028 (converted to BMP)
55
A COMPUTER PROGRAM IN C int foo (int time) { int temp1, temp2;
if (time > 10) temp1 = 3; else temp1 = 4; temp2 = temp ; return (temp2); }
56
OUTPUT OF C PROGRAM Time Output 6 1 2 3 4 5 7 8 9 10 11 12
57
PROGRAM TREE (+ 1 2 (IF (> TIME 10) 3 4))
58
CREATING RANDOM PROGRAMS
Creation.avi (creation.gif converted to AVI movie file)
59
CREATING RANDOM PROGRAMS
Available functions F = {+, -, *, %, IFLTE} Available terminals T = {X, Y, Random-Constants} The random programs are: Of different sizes and shapes Syntactically valid Executable
60
GP GENETIC OPERATIONS Reproduction Mutation
Crossover (sexual recombination) Architecture-altering operations
61
MUTATION OPERATION Mutation.avi
62
MUTATION OPERATION Select 1 parent probabilistically based on fitness Pick point from 1 to NUMBER-OF-POINTS Delete subtree at the picked point Grow new subtree at the mutation point in same way as generated trees for initial random population (generation 0) The result is a syntactically valid executable program Put the offspring into the next generation of the population
63
CROSSOVER OPERATION Crossover.avi
64
CROSSOVER OPERATION Select 2 parents probabilistically based on fitness Randomly pick a number from 1 to NUMBER-OF-POINTS for 1st parent Independently randomly pick a number for 2nd parent The result is a syntactically valid executable program Put the offspring into the next generation of the population Identify the subtrees rooted at the two picked points
65
REPRODUCTION OPERATION
Select parent probabilistically based on fitness Copy it (unchanged) into the next generation of the population
66
FIVE MAJOR PREPARATORY STEPS FOR GP
Determining the set of terminals Determining the set of functions Determining the fitness measure Determining the parameters for the run Determining the method for designating a result and the criterion for terminating a run BBB3666 (converted to BMP from eps) The following were cut as parameter subpoints so that the text fit on a slide population size number of generations minor parameters
67
ILLUSTRATIVE GP RUN
68
SYMBOLIC REGRESSION Independent variable X Dependent variable Y -1.00
-0.80 0.84 -0.60 0.76 -0.40 -0.20 0.00 0.20 1.24 0.40 1.56 0.60 1.96 0.80 2.44 3.00
69
PREPARATORY STEPS Objective:
Find a computer program with one input (independent variable X) whose output equals the given data 1 Terminal set: T = {X, Random-Constants} 2 Function set: F = {+, -, *, %} 3 Fitness: The sum of the absolute value of the differences between the candidate program’s output and the given data (computed over numerous values of the independent variable x from –1.0 to +1.0) 4 Parameters: Population size M = 4 5 Termination: An individual emerges whose sum of absolute errors is less than 0.1
70
POPULATION OF 4 RANDOMLY CREATED INDIVIDUALS FOR GENERATION 0
SYMBOLIC REGRESSION POPULATION OF 4 RANDOMLY CREATED INDIVIDUALS FOR GENERATION 0 BBB3663 was broken into 4 components and converted to BMP files for the incremental unveiling
71
SYMBOLIC REGRESSION x2 + x + 1
FITNESS OF THE 4 INDIVIDUALS IN GEN 0 BBB3662 was broken up into 4 indidual BMP files from the original eps file to satisfy display constraings x + 1 x2 + 1 2 x 0.67 1.00 1.70 2.67
72
SYMBOLIC REGRESSION x2 + x + 1
GENERATION 1 Second offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points Copy of (a) Mutant of (c) picking “2” as mutation point First offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points BBB3664 was broken up into 4 indidual BMP files from the original eps file to satisfy display constraings
73
CLASSIFICATION BBB??? No number, only existed as embedded word file
74
GP TABLEAU – INTERTWINED SPIRALS
Objective: Create a program to classify a given point in the x-y plane to the red or blue spiral 1 Terminal set: T = {X,Y,Random-Constants} 2 Function set: F = {+,-,*,%,IFLTE,SIN,COS} 3 Fitness: The number of correctly classified points (0 – 194) 4 Parameters: M = 10,000. G = 51 5 Termination: An individual program scores 194
75
WALL-FOLLOWER BBB??? No number, only existed as embedded word file
76
FITNESS BBB??? No number, only existed as embedded word file
77
BEST OF GENERATION 57 BBB??? No number, only existed as embedded word file
78
BOX MOVER – BEST OF GEN 0
79
BOX MOVER GEN 45 – FITNESS CASE 1
80
TRUCK BACKER UPPER
81
TRUCK BACKER UPPER 4-Dimensional control problem
horizontal position, x vertical position, y angle between trailer and horizontal, Qt angle between trailer and cab, Qd One control variable (steering wheel turn angle) State transition equations map the 4 state variables into 1 output (the control variable) Simulation run over many initial conditions and over hundreds of time steps
82
GENETIC PROGRAMMING: ON THE PROGRAMMING OF COMPUTERS BY MEANS OF NATURAL SELECTION (Koza 1992)
BBB4031 (converted to BMP)
83
2 MAIN POINTS FROM 1992 BOOK Virtually all problems in artificial intelligence, machine learning, adaptive systems, and automated learning can be recast as a search for a computer program. Genetic programming provides a way to successfully conduct the search for a computer program in the space of computer programs.
84
SOME RESULTS FROM 1992 BOOK Intertwined Spirals Truck Backer Upper
Broom Balancer Wall Follower Box Mover Artificial Ant Differential Games Inverse Kinematics Central Place Foraging Block Stacking Randomizer Cellular Automata Task Prioritization Image Compression Econometric Equation Optimization Boolean Function Learning Co-Evolution of Game-Playing Strategies
85
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
Toy problems Human-competitive non-patent results 20th-century patented inventions 21st-century patented inventions Patentable new inventions
86
COMPUTER PROGRAMS Subroutines provide one way to REUSE code possibly with different instantiations of the dummy variables (formal parameters) Loops (and iterations) provide a 2nd way to REUSE code Recursion provide a 3rd way to REUSE code Memory provides a 4th way to REUSE the results of executing code BBB121 converted to BMP
87
SYMBOLIC REGRESSION Fitness case L0 W0 H0 L1 W1 H1
Dependent variable D 1 3 4 7 2 5 54 10 9 600 8 6 312 111 -18 -171 363 -36 -24 45
88
EVOLVED SOLUTION (- (* (* W0 L0) H0) (* (* W1 L1) H1))
BBB??? No number, only existed as embedded word file (- (* (* W0 L0) H0) (* (* W1 L1) H1))
89
DIFFERENCE IN VOLUMES D = L0W0H0 – L1W1H1
BBB??? No number, only existed as embedded word file D = L0W0H0 – L1W1H1
90
AUTOMATICALLY DEFINED FUNCTION volume
BBB??? No number, only existed as embedded word file
91
AUTOMATICALLY DEFINED FUNCTION volume
(progn (defun volume (arg0 arg1 arg2) (values (* arg0 (* arg1 arg2)))) (- (volume L0 W0 H0) (volume L1 W1 H1))))
92
AUTOMATICALLY DEFINED FUNCTIONS
ADFs provide a way to REUSE code Code is typically reused with different instantiations of the dummy variables (formal parameters)
93
ADDITION OF V0 AND V1 Fitness case L0 W0 H0 L1 W1 H1 V0 V1 D 1 3 4 7 2
5 84 30 54 10 9 630 600 8 6 360 48 312 135 24 111 42 -18 180 -171 405 363 18 -36 96 120 -24 80 35 45
94
DIVIDE AND CONQUER BBB012 (BMP from EPS)
95
DIVIDE AND CONQUER Decompose a problem into sub-problems
Solve the sub-problems Assemble the solutions of the sub-problems into a solution for the overall problem
96
CHANGE OF REPRESENTATION
BBB013 (BMP from EPS)
97
CHANGE OF REPRESENTATION
Identify regularities Change the representation Solve the overall problem
98
ADF IMPLEMENTATION Each overall program in population includes
a main result-producing branch (RPB) and function-defining branch (i.e., automatically defined function, ADF) In generation 0, create random programs with different ingredients for the RPB and the ADF Terminal set for ADF typically contains dummy arguments (formal parameters), such as ARG0, ARG1, … Function set of the RPB contains ADF0 ADFs are private and associated with a particular individual program in the population
99
ADF MUTATION Select parent probabilistically on the basis of fitness
Pick a mutation point from either RPB or an ADF Delete sub-tree rooted at the picked point Grow a new sub-tree at the picked point composed of the allowable ingredients appropriate for the picked point The offspring is a syntactically valid executable program
100
ADF CROSSOVER Select parent probabilistically on the basis of fitness
Pick a crossover point from either RPB or an ADF of the FIRST patent The choice of crossover point in the SECOND parent is RESTRICTED to the picked RPB or to the picked ADF The sub-trees are swapped The offspring are syntactically valid executable programs
101
GENETIC PROGRAMMING II: AUTOMATIC DISCOVERY OF REUSABLE PROGRAMS (Koza 1994)
BBB4032 converted to BMP
102
MAIN POINTS OF 1994 BOOK Scalability is essential for solving non-trivial problems in artificial intelligence, machine learning, adaptive systems, and automated learning Scalability can be achieved by reuse Genetic programming provides a way to automatically discover and reuse subprograms in the course of automatically creating computer programs to solve problems
103
COMPUTER PROGRAMS Subroutines provide one way to REUSE code possibly with different instantiations of the dummy variables (formal parameters) Loops (and iterations) provide a 2nd way to REUSE code Recursion provide a 3rd way to REUSE code Memory provides a 4th way to REUSE the results of executing code BBB121 converted to BMP
104
MEMORY Settable (named) variables Indexed vector memory Matrix memory
Relational memory BBB--- no file listing. Pictures was copied from word file version of presentation
105
LANGDON'S DATA STRUCTURES
Stacks Queues Lists Rings
106
COMPUTER PROGRAMS Subroutines provide one way to REUSE code possibly with different instantiations of the dummy variables (formal parameters) Loops (and iterations) provide a 2nd way to REUSE code Recursion provide a 3rd way to REUSE code Memory provides a 4th way to REUSE the results of executing code BBB121 converted to BMP
107
AUTOMATICALLY DEFINED ITERATION (ADI)
The overall program includes an iteration-performing branch (IPB) in addition to a result-producing branch (RPB) and function-defining branches (ADF) There are no infinite loops because the iteration is performed over a known, fixed set protein or DNA sequence (of varying length) time-series data two-dimensional array of pixels Memory is usually involved and is used to communicate between IPB, RPB, and ADF
108
TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
Goal is to classify a given protein segment as being a transmembrane domain or non-transmembrane area of the protein
109
TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
(progn (defun ADF0 () (ORN (ORN (ORN (I?) (H?)) (ORN (P?) (G?))) (ORN (ORN (ORN (Y?) (N?)) (ORN (T?) (Q?))) (ORN (A?) (H?)))))) (defun ADF1 () (values (ORN (ORN (ORN (A?) (I?)) (ORN (L?) (W?))) (ORN (ORN (T?) (L?)) (ORN (T?) (W?)))))) (defun ADF2 () (values (ORN (ORN (ORN (ORN (ORN (D?) (E?)) (ORN (ORN (ORN (D?) (E?)) (ORN (ORN (T?) (W?)) (ORN (Q?) (D?)))) (ORN (K?) (P?)))) (ORN (K?) (P?))) (ORN (T?) (W?))) (ORN (ORN (E?) (A?)) (ORN (N?) (R?)))))) (progn (loop-over-residues (SETM0 (+ (- (ADF1) (ADF2)) (SETM3 M0)))) (values (% (% M3 M0) (% (% (% (- L -0.53) (* M0 M0)) (+ (% (% M3 M0) (% (+ M0 M3) (% M1 M2))) M2)) (% M3 M0))))))
110
TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
in-sample correlation of 0.976 out-of-sample correlation of 0.968 out-of-sample error rate 1.6%
111
AUTOMATICALLY DEFINED LOOP (ADL)
loop initialization branch, LIB loop condition branch, LCB loop body branch, LBB loop update branch, LUB
112
ADL BBB??? No number, only existed as embedded word file
113
COMPUTER PROGRAMS Subroutines provide one way to REUSE code possibly with different instantiations of the dummy variables (formal parameters) Loops (and iterations) provide a 2nd way to REUSE code Recursion provide a 3rd way to REUSE code Memory provides a 4th way to REUSE the results of executing code BBB121 converted to BMP
114
AUTOMATICALLY DEFINED RECURSION (ADR)
recursion condition branch, RCB recursion body branch, RBB recursion update branch, RUB recursion ground branch, RGB
115
ADR BBB??? No number, only existed as embedded word file
116
HUMAN-COMPETITIVE RESULTS (NOT RELATED TO PATENTS)
Transmembrane segment identification problem for proteins Motifs for D–E–A–D box family and manganese superoxide dismutase family of proteins Cellular automata rule for Gacs-Kurdyumov-Levin (GKL) problem Quantum algorithm for the Deutsch-Jozsa “early promise” problem Quantum algorithm for Grover’s database search problem Quantum algorithm for the depth-two AND/OR query problem Quantum algorithm for the depth-one OR query problem Protocol for communicating information through a quantum gate Quantum dense coding Soccer-playing program that won its first two games in the 1997 Robo Cup competition Soccer-playing program that ranked in the middle of field in 1998 Robo Cup competition Antenna designed by NASA for use on spacecraft Sallen-Key filter
117
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
Toy problems Human-competitive non-patent results 20th-century patented inventions 21st-century patented inventions Patentable new inventions
118
GENETIC PROGRAMMING III: DARWINIAN INVENTION AND PROBLEM SOLVING (Koza, Bennett, Andre, Keane 1999)
BBB4031 converted to a BMP
119
SUBROUTINE DUPLICATION
Branch-duplication.avi
120
SUBROUTINE CREATION Branch-creation.avi
121
SUBROUTINE DELETION Branch-deletion.avi
122
ARGUMENT DUPLICATION Arg-duplication.avi
123
ARGUMENT DELETION Arg-deletion.avi
124
16 ATTRIBUTES OF A SYSTEM FOR AUTOMATICALLY CREATING COMPUTER PROGRAMS
Starts with "What needs to be done" Tells us "How to do it" Produces a computer program Automatic determination of program size Code reuse Parameterized reuse Internal storage Iterations, loops, and recursions Self-organization of hierarchies Automatic determination of program architecture Wide range of programming constructs Well-defined Problem-independent Wide applicability Scalable Competitive with human-produced results
125
GENETIC PROGRAMMING PROBLEM SOLVER (GPPS)
BBB657 (pulled from word file)
126
AUTOMATIC SYNTHESIS OF BOTH THE TOPOLOGY AND SIZING OF ANALOG ELECTRICAL CIRCUITS BY MEANS OF DEVELOPMENTAL GENETIC PROGRAMMING
127
AUTOMATED CIRCUIT SYNTHESIS
The topology of a circuit includes specifying the gross number of components in the circuit, the type of each component (e.g., a capacitor), and a netlist specifying where each lead of each component is to be connected. Sizing involves specifying the values (typically numerical) of each of the circuit's components.
128
COMPONENT-CREATING FUNCTIONS
Resistor R function Capacitor C function Inductor L function Diode D function Transistor Q function (3-leaded)
129
COMPONENT-CREATING FUNCTIONS
Transistor.avi
130
TOPOLOGY-MODIFYING FUNCTIONS
SERIES division PARALLEL division VIA FLIP
131
TOPOLOGY-MODIFYING FUNCTIONS
Parallel-division.avi
132
DEVELOPMENT-CONTROLLING FUNCTIONS
END function NOP (No Operation) function SAFE_CUT function
133
THE INITIAL CIRCUIT BBB089 (BMP version)
134
DEVELOPMENTAL GP (LIST (C (– (– (– ) 0.880)) (series (flip end) (series (flip end) (L end) end) (L (– ) (L end)))) (flip (nop (L end)))))
135
CAPACITOR-CREATING FUNCTION
(LIST (C (– (– (– -0.113) 0.880)) (series (flip end) (series (flip end) (L – 0.277 end) end) (L (– 0.749) (L end)))) (flip (nop (L end)))))
136
CAPACITOR-CREATING FUNCTION
BBB??? No number, only existed as embedded word file
137
SERIES DIVISION FUNCTION
(LIST (C (– (– (– -0.113) 0.880)) (series (flip end) (series (flip end) (L – 0.277 end) end) (L (– 0.749) (L end)))) (flip (nop (L end)))))
138
SERIES DIVISION BBB??? No number, only existed as embedded word file
139
DEVELOPMENTAL GP
140
EVALUATION OF FITNESS No BBB reference, just a word picture
141
DESIRED BEHAVIOR OF A LOWPASS FILTER
BBB152, but not found in archive – using the word picture
142
EVOLVED CAMPBELL FILTER
BBB151 BMP version U. S. patent 1,227,113 George Campbell American Telephone and Telegraph 1917
143
American Telephone and Telegraph Company
EVOLVED ZOBEL FILTER BBB2081 BMP version U. S. patent 1,538,964 Otto Zobel American Telephone and Telegraph Company 1925
144
EVOLVED SALLEN-KEY FILTER
BBB4021 (BMP version)
145
EVOLVED DARLINGTON EMITTER-FOLLOWER SECTION
BBB456 (BMP version) U. S. patent 2,663,806 Sidney Darlington Bell Telephone Laboratories 1953
146
NEGATIVE FEEDBACK BBB3901 and BBB3901_hilite
147
HAROLD BLACK’S RIDE ON THE LACKAWANNA FERRY
BBB3998 (gif -> bmp) Courtesy of Lucent Technologies
148
20th-CENTURY PATENTS Campbell ladder topology for filters
Zobel “M-derived half section” and “constant K” filter sections Crossover filter Negative feedback Cauer (elliptic) topology for filters PID and PID-D2 controllers Darlington emitter-follower section and voltage gain stage Sorting network for seven items using only 16 steps 60 and 96 decibel amplifiers Analog computational circuits Real-time analog circuit for time-optimal robot control Electronic thermometer Voltage reference circuit Philbrick circuit NAND circuit Simultaneous synthesis of topology, sizing, placement, and routing Had to combine lines with PID and PID-D2 to get everything to fit Text is too small for a presentation
149
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
Toy problems Human-competitive non-patent results 20th-century patented inventions 21st-century patented inventions Patentable new inventions
150
SIX POST-2000 PATENTED INVENTIONS
151
EVOLVED HIGH CURRENT LOAD CIRCUIT
BBB3321
152
REGISTER-CONTROLLED CAPACITOR CIRCUIT
BBB3418
153
LOW-VOLTAGE CUBIC CIRCUIT
BBB3218
154
VOLTAGE-CURRENT-CONVERSION CIRCUIT
BBB3507
155
LOW-VOLTAGE BALUN CIRCUIT
BBB3113
156
TUNABLE INTEGRATED ACTIVE FILTER
BBB3761
157
21st-CENTURY PATENTED INVENTIONS
Low-voltage balun circuit Mixed analog-digital variable capacitor circuit High-current load circuit Voltage-current conversion circuit Cubic function generator Tunable integrated active filter
158
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
Toy problems Human-competitive non-patent results 20th-century patented inventions 21st-century patented inventions Patentable new inventions
159
NOVELTY-DRIVEN EVOLUTION
Two factors in fitness measure Circuit’s behavior in the frequency domain Largest number of nodes and edges (circuit components) of a subgraph of the given circuit that is isomorphic to a subgraph of a template representing the prior art. Graph isomorphism algorithm with the cost function being based on the number of shared nodes and edges (instead of just the number of nodes).
160
NOVELTY-DRIVEN EVOLUTION
For circuits not scoring the maximum number of hits (101), the fitness of a circuit is the product of the two factors. For circuits scoring 101 hits (100%-compliant individuals), fitness is the number of shared nodes and edges divided by 10,000.
161
PRIOR ART TEMPLATE No BBB template, just using word picture
162
NON-INFRINGING SOLUTION NO. 1
No BBB number, using word picture
163
NON-INFRINGING SOLUTION NO. 5
No BBB number, using word picture
164
GP AS AN INVENTION MACHINE
165
CIRCUIT-CONSTRUCTING PROGRAM TREE WITH ADFs
166
LOWPASS FILTER WITH ADFs
Still the same…
167
ADF0
168
AUTOMATIC SYNTHESIS OF CIRCUIT LAYOUT INCLUDING THE PLACEMENT OF COMPONENTS AND ROUTING OF WIRES ALONG WITH THE TOPOLOGY AND SIZING
169
CIRCUIT LAYOUT Circuit placement involves the assignment of each of the circuit's components to a particular physical location on a printed circuit board or silicon wafer. Routing involves the assignment of a particular physical location to the wires between the leads of the circuit's components.
170
LAYOUT Layout.avi
171
LAYOUT — GENERATION 0
172
100%-COMPLIANT LOWPASS FILTER GENERATION 25 WITH 5 CAPACITORS AND 11 INDUCTORS AREA OF 1775.2
BBB3837
173
100%-COMPLIANT LOWPASS FILTER GENERATION 30 WITH 10 INDUCTORS AND 5 CAPACITORS AREA OF 950.3
BBB3838
174
100%-COMPLIANT LOWPASS FILTER BEST-OF-RUN CIRCUIT OF GENERATION 138 WITH 4 INDUCTORS AND 4 CAPACITORS AREA OF 359.4 BBB3839
175
LAYOUT 60 DB AMPLIFIER BBB2272
176
AUTOMATIC SYNTHESIS OF BOTH THE TOPOLOGY AND TUNING OF CONTROLLERS
177
PROGRAM TREE FOR A CONTROLLER
BBB2207
178
CONTROLLER BLOCKS gain integrator differentiator adder subtractor
multiplier differential input integrators inverter lead lag two-parameter lag absolute value limiter divider delay conditional operators (switches)
179
FUNCTION SET FOR CONTROLLER SYNTHESIS
F = {GAIN,INVERTER,LEAD,LAG,LAG2, DIFFERENTIAL_INPUT_INTEGRATOR, DIFFERENTIATOR, ADD_SIGNAL, SUB_SIGNAL,ADD_3_SIGNAL,ADF0, ADF1,ADF2,ADF3,ADF4}
180
TERMINAL SET FOR CONTROLLER SYNTHESIS
T = { REFERENCE_SIGNAL, CONTROLLER_OUTPUT, PLANT_OUTPUT}
181
CONSTRAINED SYNTACTIC STRUCTURE
A grammar that specifies what functions and terminals are allowed as arguments to particular functions For example, the first argument of the GAIN function must be a value-setting subtree whereas the second can be from the general pool of functions Also known as strong typing
182
TWO-LAG PLANT
183
8 FITNESS CASES 8 elements of the fitness measure represent 2 2 2 choices: 2 different values of the plant's internal gain, K (1.0 and 2.0), 2 different values of the plant's time constant (0.5 and 1.0), 2 different values for the height of the reference signal (rising from 0 to 1 volts or from 0 to 1 microvolts at t = 100 milliseconds Does this only add up to 6??
184
FITNESS MEASURE For each of these 8 fitness cases, a transient analysis (time domain) is performed using the SPICE simulator. The contribution to fitness for the 8 elements is Integral of time-weighted absolute error (ITAE) e(t) is difference between plant output and reference signal. Multiplication by B (106 or 1) makes both reference signals equally influential. Additional weighting function, A, heavily penalizes non-compliant amounts of overshoot. A weights all variations up to 2% above the reference signal by 1.0, but bigger variations by 10.0.
185
EVOLVED CONTROLLER FOR TWO-LAG PLANT
BBB2210
186
LESS ITAE AND OVERSHOOT
Using word file (assumed to be –c-overshoot… file) BBB2240 (BMP from eps) BBB2240-c-Overshoot-Evolved.gif was not found on the CD library (BBB2240.eps was the only ref to 2240)
187
BETTER DISTURBANCE REJECTION
Using word file (assumed to be –c-overshoot… file) BBB2241 (BMP from eps) BBB2241-a-Disturbance-Evolved.gif was not found on the CD library (BBB2241.eps was the only ref to 2241)
188
REVERSE ENGINEERING OF METABOLIC PATHWAYS
BBB2531
189
EVOLVED PATHWAY BBB2537
190
ANTENNA SYNTHESIS USING GP
(PROGN3 (TURN-RIGHT 0.125) (LANDMARK (REPEAT 2 (PROGN2 (DRAW 1.0 HALF-MM-WIRE) (DRAW 0.5 NO-WIRE))) (TRANSLATE-RIGHT )) BBB2400
191
USING A TURTLE TO DRAW TWO-DIMENSIONAL ANTENNA
GP2dAntennaExample.avi
192
BEST-OF-RUN ANTENNA FROM GENERATION 90
BBB2397 (BMP from EPS)
193
3-DIMENSIONAL ANTENNA Antenna3axis.avi
194
To be on satellite to be launched in 2004
NASA EVOLVED ANTENNA lohn-st5-evolved-antenna.gif (bmp version for power point) To be on satellite to be launched in 2004
195
OTHER STRUCTURES Bridge.avi
196
GENETIC NETWORK FOR lac operon
BBB2551 (BMP from EPS)
197
EVOLVED NETWORK (IF (< LACTOSE_LEVEL 9.139 ) (IF (<
REPRESSOR_LEVEL ) (IF (> GLUCOSE_LEVEL 5.491 ) 2.02 (IF (< CAP_LEVEL ) (IF (< CAP_LEVEL ) (IF (> LACTOSE_LEVEL ) (IF (> CAP_LEVEL ) (IF (> LACTOSE_LEVEL ) ) ) 0.0 ) (IF (> REPRESSOR_LEVEL ) (IF (< GLUCOSE_LEVEL ) 10.0 (IF (< REPRESSOR_LEVEL 4.268 ) ) ) ) ) ) ) (IF (> CAP_LEVEL 0.842 ) ) ) (IF (< CAP_LEVEL ) 2.022 (IF (< GLUCOSE_LEVEL ) (IF (> LACTOSE_LEVEL ) (IF (> LACTOSE_LEVEL 1.933 ) (IF (> GLUCOSE_LEVEL ) (IF (< GLUCOSE_LEVEL ) (IF (> CAP_LEVEL 1.208 ) ) ) 10.0 ) (IF (> GLUCOSE_LEVEL ) ) ) ) ) 1.982 ) ) )
198
IN C-STYLE PSEUDO CODE else if(LACTOSE_LEVEL < 9.139) { {
if(CAP_LEVEL < 1.769) LAC_mRNA_LEVEL = 2.022; } if(GLUCOSE_LEVEL < 2.382) LAC_mRNA_LEVEL = 10.0; LAC_mRNA_LEVEL = 1.982; } } } if(LACTOSE_LEVEL < 9.139) { if(REPRESSOR_LEVEL < 6.270) LAC_mRNA_LEVEL = 2.022; } else LAC_mRNA_LEVEL = 0.0;
199
PARAMETERIZED TOPOLOGIES
One of the most important characteristics of computer programs is that they ordinarily contain inputs (free variables) and conditional operations
200
PARAMETERIZED TOPOLOGY FOR LOWPASS FILTER
parameterizedLPF.avi
201
PARAMETERIZED TOPOLOGY FOR HIGHPASS FILTER
parameterizedHPF.avi
202
PARAMETERIZED TOPOLOGY FOR GENERAL-PURPOSE CONTROLLER
parameterizedEurekaController.avi
203
EVOLVED EQUATIONS FOR GENERAL-PURPOSE CONTROLLER
Parameterized31and34.gif (bmp version for power point)
204
EVOLVED EQUATIONS FOR GENERAL-PURPOSE CONTROLLER
Parameterized32and33.gif (bmp for power point)
205
PATENTABLE NEW INVENTIONS
PID tuning rules that outperform the Ziegler-Nichols and Åström-Hägglund tuning rules General-purpose controllers outperforming Ziegler-Nichols and Åström-Hägglund rules
206
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
Toy problems Human-competitive non-patent results 20th-century patented inventions 21st-century patented inventions Patentable new inventions
207
PARALLELIZATION WITH SEMI-ISOLATED SUBPOPULATIONS
No BBB information, using image from word file
208
GP PARALLELIZATION Like Hormel, Get Everything Out of the Pig, Including the Oink Keep on Trucking It Takes a Licking and Keeps on Ticking The Whole is Greater than the Sum of the Parts
209
PETA-OPS Human brain operates at 1012 neurons operating at 103 per second = 1015 ops per second 1015 ops = 1 peta-op = 1 bs (brain second)
210
EVOLVABLE HARDWARE CORNER OF XILINX XC6216
211
FUNCTION UNIT FOR CELL OF XILINIX XC6216
212
SORTING NETWORK
213
EVOLVED SORTING NETWORK
214
GP 1987–2002 System Dates Speed-up over first system
Human-competitive results Problem Category Serial LISP 1987–1994 1 (base) toy problems 64 transputers 1994–1997 9 2 human-competitive results not related to patented inventions 64 PowerPC’s 1995–2000 204 12 20th-century patented inventions 70 Alpha’s 1999–2001 1,481 1,000 Pentium II’s 2000–2002 13,900 21st-century patented inventions 4-week runs on 1,000 Pentium II’s 130,000 patentable new inventions
215
PROMISING GP APPLICATION AREAS
Problem areas involving many variables that are interrelated in highly non-linear ways Inter-relationship of variables is not well understood Discovery of the size and shape of the solution is a major part of the problem "Black art" problems
216
PROMISING GP APPLICATION AREAS (CONTINUED)
Areas where you simply have no idea how to program a solution, but where you know what you want
217
PROMISING GP APPLICATION AREAS (CONTINUED)
Problem areas where a good approximate solution is satisfactory design control bioinformatics classification data mining system identification forecasting
218
PROMISING GP APPLICATION AREAS (CONTINUED)
Areas where large computerized databases are accumulating and computerized techniques are needed to analyze the data genome, protein, microarray data satellite image data astronomical data petroleum databases financial databases medical records marketing databases
219
PROMISING GP APPLICATION AREAS (CONTINUED)
Areas for which humans find it very difficult to write good programs parallel computers cellular automata multi-agent strategies field-programmable game arrays digital signal processors swarm intelligence
220
CHARACTERISTICS SUGGESTING USE OF GP
(1) discovering the size and shape of the solution, (2) reusing substructures, (3) discovering the number of substructures, (4) discovering the nature of the hierarchical references among substructures, (5) passing parameters to a substructure, (6) discovering the type of substructures (e.g., subroutines, iterations, loops, recursions, or storage), (7) discovering the number of arguments possessed by a substructure, (8) maintaining syntactic validity and locality by means of a developmental process, or (9) discovering a general solution in the form of a parameterized topology containing free variables
221
DESIGNING A GIRAFFE Long neck Long tongue
Vegetable-digesting enzymes in stomach 4 legs Long legs Brown coloration
222
THE DESIGN OF A GOOD GIRAFE
Neck length Tongue length Carnivorous? Number of legs Leg length Coloration 15.11 feet 14 inches No 4 9.96 feet Brown Floating point Boolean Integer Categorical
223
NON-LINEARITY — GIRAFE
Taken one-by-one, some gene values found in a giraffe, such as the long neck contribute (alone) negatively to fitness requires considerable material to construct requires considerable energy to maintain prone to injury (thereby hurting rate of survival and reproduction) Thus, maximizing any one variable will not lead to the global optimum solution
224
NON-LINEARITY (CONTINUED)
When the variables are taken in pairs (there are 15 possible pairs), many combinations of pairs (e.g., Long neck and long tongue) are doubly detrimental
225
NON-LINEARITY (CONTINUED)
But, certain combinations of traits, when taken together, are "co-adapted sets of alleles" that yield a very fit animal for eating high acacia leaves in the jungle environment, having good camouflage, having high escape velocity when faced with predators, and exploiting a niche (and avoiding competition) with other animals feeding on low-hanging vegetation
226
SEARCH METHODS IN GENERAL
Initial structure(s) Fitness measure Operations for creating new structures Parameters Termination criterion and method of designating the result
227
SPACE WITH MANY LOCAL OPTIMA
228
SEARCH METHODS Blind random search does not use acquired information in deciding on the future direction of the search Hill combing and gradient descent use acquired information; however, they are prone to becoming trapped on local optima The previous point is especially true for non-trivial search spaces
229
7 DIFFERENCES BETWEEN GP AND ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APPROACHES
230
REPRESENTATION Genetic programming overtly conducts it
search for a solution to the given problem in program space
231
ROLE OF POINT-TO-POINT TRANSFORMATIONS IN THE SEARCH
Genetic programming does not conduct its search by transforming a single point in the search space into another single point, but instead transforms a set of points into another set of points
232
ROLE OF HILL CLIMBING IN THE SEARCH
Genetic programming does not rely exclusively on greedy hill climbing to conduct its search, but instead allocates a certain number of trials, in a principled way, to choices that are known to be inferior
233
DETERMINISM IN THE SEARCH
Genetic programming conducts its search probabilistically
234
ROLE OF AN EXPLICIT KNOWLEDGE BASE
Genetic programming does NOT make use of a knowledge base
235
ROLE OF FORMAL LOGIC IN THE SEARCH
Genetic programming does not utilize formal logic in it’s search strategy. Contradictory alternatives are created and actively maintained.
236
UNDERPINNINGS OF THE TECHNIQUE
Biologically inspired
237
TURING (1948) Turing made the connection between
searches and the challenge of getting a computer to solve a problem without explicitly programming it in his 1948 essay “Intelligent Machines” "Further research into intelligence of machinery will probably be very greatly concerned with 'searches' ... “
238
TURING’S 3 APPROACHES TO MACHINE INTELLIGENCE (1948)
LOGIC-BASED SEARCH One approach that Turing identified is a search through the space of integers representing candidate computer programs.
239
TURING’S 3 APPROACHES (CONTINUED)
CULTURAL SEARCH A second approach is the "cultural search“ which relies on knowledge and expertise acquired over a period of years from others (akin to present-day knowledge- based systems).
240
TURING’S 3 APPROACHES (CONTINUED)
GENETICAL OR EVOLUTIONARY SEARCH "There is the genetical or evolutionary search by which a combination of genes is looked for, the criterion being the survival value.“
241
TURING (1950) From Turing’s 1950 paper "Computing
Machinery and Intelligence" … “We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications”
242
TURING (1950) (CONTINUED) “Structure of the child machine =
Hereditary material “Changes of the child machine = Mutations “Natural selection = Judgment of the experimenter”
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.