Presentation is loading. Please wait.

Presentation is loading. Please wait.

Early work in intelligent systems Alan Turing (1912 – 1954) Arthur Samuel (1901-1990)

Similar presentations


Presentation on theme: "Early work in intelligent systems Alan Turing (1912 – 1954) Arthur Samuel (1901-1990)"— Presentation transcript:

1 Early work in intelligent systems Alan Turing (1912 – 1954) Arthur Samuel (1901-1990)

2 Early work in intelligent systems Alan Turing (1912 – 1954) Father of computer science, mathematician, philosopher, codebreaker (WW II), homosexual The Turing Machine The Turing Test (AI)

3 Early work in intelligent systems Alan Turing (1950): We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications: Structure of the child machine = Hereditary material Changes of the child machine = Mutations Natural selection = Judgment of the experimenter.

4 Early work in intelligent systems Arthur Samuel (1901-1990) “How can computers learn to solve problems without being explicitly programmed? In other words, how can computers be made to do what is needed to be done, without being told exactly how to do it?" (1959) “The aim is to get machines to exhibit behavior, which if done by humans, would be assumed to involve the use of intelligence.” (1983)

5 Genetic Programming Breed a population of computer programs to solve a given problem –An extension of genetic algorithms –Selection, crossover, mutation

6 Preparatory Steps John Koza: the human user supplies: (1)The set of terminals (e.g., the independent variables of the problem, zero-argument functions, and random constants) (2) The set of primitive functions for each branch of the program to be evolved (3) The fitness measure (4) The parameters for controlling the run (5) The termination criteria

7 1. Terminal Set External inputs to the program Numerical constants (problem dependent?) – , e, 0, 1, …, random numbers, …

8 2. Function Set Arithmetic functions Conditional branches (if statements) Problem specific functions (controllers, filters, integrators, differentiators, circuit elements, …)

9 3. Fitness Measure The GP measures the fitness of each individual (computer program) Fitness is usually averaged over a variety of different cases –Program inputs –Initial conditions –Different environments

10 4. Control Parameters Population size (thousands or millions) Selection method Crossover probability Mutation probability Maximum program size Elitism option

11 5. Termination Criterion Maximum number of generations / real time Convergence of highest / mean fitness …

12 GP Flowchart

13 Initialization

14 Max((* x x) (+ x (* 3 y))) Prefix notation (Lisp) Max(x*x, x+3*y) Nodes (points, functions) Links, terminals

15 Program Tree (+ 1 2 (IF (> TIME 10) 3 4)) If Time > 10 then x = 3 else x = 4 Solution = 1 + 2 + x

16 Mutation Select one individual probabilistically Pick one point in the individual Delete the subtree at the chosen point Grow a new subtree at the mutation point in same way as for the initial random population The result is a syntactically valid executable program

17 Crossover Select two parents probabilistically based on fitness Randomly pick a node in the first parent (often internal nodes 90% of the time) Independently randomly pick a node in the second parent Swap subtrees at the chosen nodes

18 Reproduction Select an individual probabilistically based on fitness Copy it (unchanged) into the next generation of the population (cloning)

19 Example Independent variableDependent variable 1.00 -0.800.84 -0.600.76 -0.400.76 -0.200.84 0.001.00 0.201.24 0.401.56 0.601.96 0.802.44 1.003.00 Generate a computer program with one input x whose output equals the given data y ( y = x 2 + x + 1 )

20 Preparatory Steps 1Terminal set: T = {x, Random Constants} 2Function set: F = { +, -, *, % } 3Fitness:The sum of the absolute value of the differences between the program’s output and the given data (low is good) 4Parameters:Population size M = 4 5Termination:An individual emerges whose sum of absolute errors is less than 0.1

21 Initialization

22 Fitness Evalution x+1x 2 +12x 0.671.001.702.67

23 Reproduction Copy (a), the most fit individual Mutate (c)

24 Crossover

25

26 Interpreting a program tree

27 { – [ + ( – 3 0 ) ( – x 1 ) ] [ / ( – 3 0 ) ( – x 2 ) ] } What does this evaluate as? What are the terminals, functions, and lists?

28 Interpreting a program tree { – [ + ( – 3 0 ) ( – x 1 ) ] [ / ( – 3 0 ) ( – x 2 ) ] }  [ (3 – 0) + (x – 1) ] – [ (3 – 0) / (x – 2) ] Terminals = { 3, 0, x, 1, 2} Functions = { –, +, / } Lists = ( – 3 0 ), [ + ( – 3 0 ) ( – x 1 ) ], …

29 Interpreting a program tree recursion \Re*cur"sion\ (-sh?n), n. [L. recursio.] See recursion. factorial ( n ) if n = = 0 then return 1 else return n * factorial (n – 1)

30 Interpreting a program tree Recursive function EVAL: if EXPR is a list then// i.e., delimited by parentheses PROC = EXPR(1) VAL = PROC [ EVAL(EXPR(2)), EVAL(EXPR(3)), …] else // i.e., EXPR is a terminal if EXPR is a variable or constant then VAL = EXPR else// i.e., EXPR is a function with no arguments VAL = EXPR ( ) end

31 Can computer programs create new inventions?

32 GP Inventions Two patents filed by Keane, Koza, and Streeter on July 12, 2002 1.Creation of Tuning Rules for PID Controllers that Outperform the Ziegler- Nichols and Åström-Hägglund Tuning Rules 2.Creation of 3 Non-PID Controllers that Outperform a PID Controller that uses the Ziegler-Nichols or Åström-Hägglund Tuning Rules

33 GP for Antenna Design X band antenna – Jason Lohn, NASA Ames –Wide beamwidth for a circularly polarized wave –Wide bandwidth

34 The evolution of genetic programming HardwareYearsCPU PowerResults Texas Instruments LISP machine 1987–19941 (base)Example problems 64-node Transtech transputer 1994–19979Human competitive results 64-node Parsytec parallel computer 1995-200022Reproduction of 20th century patents 70-node Alpha parallel computer 1999-20017Circuit synthesis 1,000-node Pentium II2000-20029Reproduction of 21st century patents 1,000-node Pentium II 4 weeks of CPU time 20029Two new patents

35 GP Computational Effort Human brain 10 12 neurons 1 msec  10 15 operations per second  1 peta-op = 1 brain second (B-sec) Keane, Koza, Streeter patents: Pop.Gens.HoursNodesMHzB-sec Patent 1100K761071K350135 Patent 2100K32514091K3501775

36 When should you use GP? Problem involving many variables that are interrelated in highly nonlinear ways Relationships among variables is not well understood Discovery of the size and shape of the solution is a major part of the problem “Black art” problems (controller tuning) Areas where you have no idea how to program a solution, but you know what you want

37 When should you use GP? Problems where a good approximate solution is satisfactory –Design –Control and estimation –Bioinformatics –Classification –Data mining –System identification –Forecasting

38 When should you use GP?  Areas where large computerized databases are accumulating and computerized techniques are needed to analyze the data  genome, protein, microarray data  satellite image data  astronomical data  petroleum databases  medical records  marketing databases  financial databases

39 Schema Theory for GP The # symbol represents “don’t care” Example: H = ( + ( – # y ) # ) instances are: ( + ( – x y ) x )→( x – y ) + x ( + ( – x y ) y )→( x – y ) + y ( + ( – y y ) x ) →( y – y ) + x ( + ( – y y ) y )→( y – y ) + y

40 Schema Theory for GP Example: H = ( + ( – # y ) # ) o(H) = number of defined symbols o(H) = ? Length N(H) = number of symbols N(H) = ? Defining length L(H) = number of links joining defined symbols L(H) = ?

41 Schema Theory for GP All these schema sample the program ( + ( – 2 x ) y ) What are the schema defining length L, order o, and length N? +–#+#–#– # 2x # # x # 2# # ##

42 Schema Theory for GP L = 3L = 2L = 1L = 0 o = 4o = 2o = 2o = 1 N = 5N = 5N = 5N = 5 +–#+#–#– # 2x # # x # 2# # ##

43 Schema Theory for GP How many schema match a tree of length N ? For example, consider the program ( + ( – 2 x ) ( – 3 y ) ) + – 2x – 3y

44 Schema Theory for GP Definitions: m(H, t) = number of schema H at generation # t G = structure of schema H For example, if H = ( + ( – # y ) # ) then G = ( # ( # # # ) # )

45 Schema Theory for GP m(H, t) = number of schema H at gen. # t m(H, t+1/2) = number of schema selected for crossover / mutation m(H, t+1) = number of schema after crossover / mutation Fitness proportionate selection: m(H, t+1/2) = m(H, t) f(H, t) / f ave

46 Schema Theory for GP Crossover: two ways for destruction of schema H 1.Program h  H crosses with program g that has a different structure than G  Event D 1 2.Program h  H crosses with program g that has the same structure as G, but g  H  Event D 2 Pr(crossover destruction) = Pr(D) = Pr(D 1 ) + Pr( D 2 )

47 Crossover Destruction – Type 1 + y x + – 2x – 3y ( + ( – 2 x ) ( – 3 y ) ) ( + x y ) Crossover results in ( + y ( – 3 y ) ) ( + x ( – 2 x ) ) Both schema are destroyed + y x +– 2x – 3y

48 Crossover Destruction – Type 2 If h = (+ x y)  H = ( # x y) and g = (g 1 y x)  H then crossover between the + and x gives: ( + y x ) and (g 1 x y )  H, schema preserved But if h = (+ x y)  H = ( + x #) and g = ( g 1 y x)  H then crossover between the + and x gives: ( + y x ) and (g 1 x y )  H, schema destroyed (unless g 1 = “+”)

49 Crossover Destruction – Type 1 Program h  H crosses with program g that has a different structure than G  Event D 1 M = population size Pr(D 1 ) = Pr(D | g  G) Pr(g  G) Pr(g  G) = [M – m(G, t+1/2)] / M Pr(D | g  G) = P diff

50 Crossover Destruction – Type 2 Program h  H crosses with program g that has the same structure as G but g  H  Event D 2 Pr(D 2 ) = Pr(D | g  G) Pr(g  G) Pr(g  G) = m(G, t+1/2) / M Pr(D | g  G) = Pr(D | g  H) Pr(g  H | g  G) Pr(g  H | g  G) = [ m(G, t+1/2) – m(H, t+1/2) ] / m(G, t+1/2)

51 Crossover Destruction – Type 2 Pr(D | g  H) ≤ L(H) / [ N(H) – 1 ] Therefore, Pr(D 2 ) ≤ { L(H) / [ N(H) – 1 ] }  [ m(G, t+1/2) – m(H, t+1/2) ] / M

52 Crossover Destruction Pr(D) = Pr (D 1 ) + Pr ( D 2 ) ≤ { [M – m(G, t+1/2)] / M } P diff + { L(H) / [ N(H) – 1 ] }  [ m(G, t+1/2) – m(H, t+1/2) ] / M

53 Crossover Destruction Crossover occurs with probability p c m(H, t+1) = (1 – p c ) m(H, t+1/2) + p c m(H, t+1/2) [1 – Pr(D)]  m(H, t+1) = m(H, t+1/2) [1 – p c Pr(D)]

54 Mutation Destruction Pr(Mutation Destruction) = 1 – (1 – p m ) o(H)  p m o(H) m(H, t+1) = m(H, t+1/2) [1 – p c Pr(D)] [1 – p m o(H) ] Combine previous results to obtain a lower bound for m(H, t+1)

55 Schema Theory for GP m(H, t+1)  [ m(H, t) f(H, t) / f ave ] [ 1 - p m o(H) ] × [1 – p c { ( 1 – m(G,t) f(G,t) / M f ave ) P diff + ( L(H) / [ N(H) – 1 ] )  [ m(G,t) f(G,t) – m(H,t) f(H,t) ] / M f ave ) } ] Slightly more complex than GA schema theorem

56 Schema Theory for GP Simplification: early in the GP run (high diversity) we have: Pr(D | g  G) = P diff  1 m(G,t) f(G,t) / M f ave << 1

57 Schema Theory for GP m(H, t+1)  [ m(H, t) f(H, t) / f ave ] (1 - p m o(H)) × [1 – p c { ( 1 – m(G,t) f(G,t) / M f ave ) P diff + ( L(H) / [ N(H) – 1 ] )  (m(G,t) f(G,t) – m(H,t) f(H,t)) / M f ave ) } ]  [ m(H, t) f(H, t) / f ave ] (1 - p m o(H)) × [1 – p c { 1 + ( L(H) / [ N(H) – 1 ] )  (– m(H,t) f(H,t)) / M f ave ) } ]

58 Schema Theory for GPs For short schema, L(H) / [ N(H) – 1 ] << 1 and m(H, t) f(H, t) / f ave (1 - p m o(H)) × [1 – p c { 1 + ( L(H) / [ N(H) – 1 ] )  (– m(H,t) f(H,t)) / M f ave ) } ]  [ m(H, t) f(H, t) / f ave ] (1 - p m o(H)) (1 – p c )

59 Schema Theory for GPs m(H, t+1)  [ m(H, t) f(H, t) / f ave ] (1 - p m o(H)) (1 – p c ) Nearly the same as the GA schema theorem: m(H, t+1)  [ m(H, t) f(H, t) / f ave ] (1 - p m o(H)) × (1 – p c L(H) / [ N(H) – 1 ] )

60 GP references www.genetic-programming.org www.genetic-programming.com cswww.essex.ac.uk/staff/poli


Download ppt "Early work in intelligent systems Alan Turing (1912 – 1954) Arthur Samuel (1901-1990)"

Similar presentations


Ads by Google