Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Evolutionary Algorithms

Similar presentations


Presentation on theme: "Machine Learning Evolutionary Algorithms"— Presentation transcript:

1 Machine Learning Evolutionary Algorithms

2 You are here

3 What is Evolutionary Computation?
An abstraction from the theory of biological evolution that is used to create optimization procedures or methodologies, usually implemented on computers, that are used to solve problems.

4 Brief History : the ancestors
1948, Turing: proposes “genetical or evolutionary search” 1962, Bremermann optimization through evolution and recombination 1964, Rechenberg introduces evolution strategies 1965, L. Fogel, Owens and Walsh introduce evolutionary programming 1975, Holland introduces genetic algorithms 1992, Koza introduces genetic programming

5 Darwinian Evolution Survival of the fittest
All environments have finite resources (i.e., can only support a limited number of individuals) Life forms have basic instinct/ lifecycles geared towards reproduction Therefore some kind of selection is inevitable Those individuals that compete for the resources most effectively have increased chance of reproduction Note: fitness in natural evolution is a derived, secondary measure, i.e., we (humans) assign a high fitness to individuals with many offspring

6 Darwinian Evolution:Summary
Population consists of diverse set of individuals Combinations of traits that are better adapted tend to increase representation in population Individuals are “units of selection” Variations occur through random changes yielding constant source of diversity, coupled with selection means that: Population is the “unit of evolution” Note the absence of “guiding force”

7 The Concept of Natural Selection
Limited number of resources Competition results in struggle for existence Success depends on fitness -- fitness of an individual: how well-adapted an individual is to their environment. This is determined by their genes (blueprints for their physical and other characteristics). Successful individuals are able to reproduce and pass on their genes

8 Crossing-over Chromosome pairs align and duplicate
Inner pairs link at a centromere and swap parts of themselves After crossing-over one of each pair goes into each gamete

9 Recombination (Crossing-Over)
Image from

10 New person cell (zygote)
Fertilisation Sperm cell from Father Egg cell from Mother New person cell (zygote)

11 Mutation Occasionally some of the genetic material changes very slightly during this process (replication error) This means that the child might have genetic material information not inherited from either parent This can be catastrophic: offspring in not viable (most likely) neutral: new feature not influences fitness advantageous: strong new feature occurs Redundancy in the genetic code forms a good way of error checking

12 Genetic code All proteins in life on earth are composed of sequences built from 20 different amino acids DNA is built from four nucleotides in a double helix spiral: purines A,G; pyrimidines T,C Triplets of these from codons, each of which codes for a specific amino acid Much redundancy: purines complement pyrimidines the DNA contains much rubbish 43=64 codons code for 20 amino acids genetic code = the mapping from codons to amino acids For all natural life on earth, the genetic code is the same !

13 Motivations for Evolutionary Computation
The best problem solver known in nature is: the (human) brain that created “the wheel, New York, wars and so on” (after Douglas Adams’ Hitch-Hikers Guide) the evolution mechanism that created the human brain (after Darwin’s Origin of Species) Answer 1  neurocomputing Answer 2  evolutionary computing

14 Problem type 1 : Optimisation
We have a model of our system and seek inputs that give us a specified goal e.g. time tables for university, call center, or hospital design specifications, etc etc

15 EC A population of individuals exists in an environment with limited resources Competition for those resources causes selection of those fitter individuals that are better adapted to the environment These individuals act as seeds for the generation of new individuals through recombination and mutation The new individuals have their fitness evaluated and compete (possibly also with parents) for survival. Over time Natural selection causes a rise in the fitness of the population

16 General Scheme of EC

17 Pseudo-code for typical EA

18 What are the different types of EAs
Historically different flavours of EAs have been associated with different representations: Binary strings : Genetic Algorithms Real-valued vectors : Evolution Strategies Trees: Genetic Programming Finite state Machines: Evolutionary Programming

19 Evolutionary Algorithms
Parameters of EAs may differ from one type to another. Main parameters: Population size Maximum number of generations Selection factor Mutation rate Cross-over rate There are six main characteristics of EAs Representation Selection Recombination Mutation Fitness Function Survivor Decision

20 Example: Discrete Representation (Binary alphabet)
Representation of an individual can be using discrete values (binary, integer, or any other system with a discrete set of values). Following is an example of binary representation. CHROMOSOME GENE

21 Evaluation (Fitness) Function
Represents the requirements that the population should adapt to Called also quality function or objective function Assigns a single real-valued fitness to each phenotype which forms the basis for selection So the more discrimination (different values) the better Typically we talk about fitness being maximised Some problems may be best posed as minimisation problems

22 Population Holds (representations of) possible solutions
Usually has a fixed size and is a multiset of genotypes Some sophisticated EAs also assert a spatial structure on the population e.g., a grid. Selection operators usually take whole population into account i.e., reproductive probabilities are relative to current generation

23 Parent Selection Mechanism
Assigns variable probabilities of individuals acting as parents depending on their fitness Usually probabilistic high quality solutions more likely to become parents than low quality but not guaranteed even worst in current population usually has non-zero probability of becoming a parent

24 Mutation Acts on one genotype and delivers another Element of randomness is essential and differentiates it from other unary heuristic operators May guarantee connectedness of search space and hence convergence proofs

25 Recombination Merges information from parents into offspring
Choice of what information to merge is stochastic Most offspring may be worse, or the same as the parents Hope is that some are better by combining elements of genotypes that lead to good traits

26 Survivor Selection replacement
Most EAs use fixed population size so need a way of going from (parents + offspring) to next generation Often deterministic Fitness based : e.g., rank parents+offspring and take best Age based: make as many offspring as parents and delete all parents Sometimes do combination

27 Example: Fitness proportionate selection
Expected number of times fi is selected for mating is: Better (fitter) individuals have: more space more chances to be selected Best Worst

28 Example: Tournament selection
Select k random individuals, without replacement Take the best k is called the size of the tournament

29 Example: Ranked based selection
Individuals are sorted on their fitness value from best to worse. The place in this sorted list is called rank. Instead of using the fitness value of an individual, the rank is used by a function to select individuals from this sorted list. The function is biased towards individuals with a high rank (= good fitness).

30 Example: Ranked based selection
Fitness: f(A) = 5, f(B) = 2, f(C) = 19 Rank: r(A) = 2, r(B) = 3, r(C) = 1 Function: h(A) = 3, h(B) = 5, h(C) = 1 Proportion on the roulette wheel: p(A) = 11.1%, p(B) = 33.3%, p(C) = 55.6% *skip*

31 Initialisation / Termination
Initialisation usually done at random, Need to ensure even spread and mixture of possible allele values Can include existing solutions, or use problem-specific heuristics, to “seed” the population Termination condition checked every generation Reaching some (known/hoped for) fitness Reaching some maximum allowed number of generations Reaching some minimum level of diversity Reaching some specified number of generations without fitness improvement

32 Algorithm performance
Never draw any conclusion from a single run use statistical measures (averages, medians) from a sufficient number of independent runs From the application point of view design perspective: find a very good solution at least once production perspective: find a good solution at almost every run More detailed explanation of the actual problem tackled by the EA. Is the problem static or dynamic? Is the evaluation function subjective ? or noisy ? The more diagrams or pictures here the better - show how the EA search works within the context of the application area 3

33 Genetic Algorithms

34 GA Overview Developed: USA in the 1970’s
Early names: J. Holland, K. DeJong, D. Goldberg Typically applied to: discrete optimization Attributed features: not too fast good heuristic for combinatorial problems Special Features: Traditionally emphasizes combining information from good parents (crossover) many variants, e.g., reproduction models, operators

35 Genetic algorithms Holland’s original GA is now known as the simple genetic algorithm (SGA) Other GAs use different: Representations Mutations Crossovers Selection mechanisms

36 SGA technical summary tableau
Representation Binary strings Recombination N-point or uniform Mutation Bitwise bit-flipping with fixed probability Parent selection Fitness-Proportionate Survivor selection All children replace parents Speciality Emphasis on crossover

37 SGA reproduction cycle
Select parents for the mating pool (size of mating pool = population size) Shuffle the mating pool For each consecutive pair apply crossover with probability pc , otherwise copy parents For each offspring apply mutation (bit-flip with probability pm independently for each bit) Replace the whole population with the resulting offspring

38 SGA operators: 1-point crossover
Choose a random point on the two parents Split parents at this crossover point Create children by exchanging tails Pc typically in range (0.6, 0.9)

39 SGA operators: mutation
Alter each gene independently with a probability pm pm is called the mutation rate Typically between 1/pop_size and 1/ chromosome_length

40 SGA operators: Selection
Main idea: better individuals get higher chance Chances proportional to fitness Implementation: roulette wheel technique Assign to each individual a part of the roulette wheel Spin the wheel n times to select n individuals A C 1/6 = 17% 3/6 = 50% B 2/6 = 33% fitness(A) = 3 fitness(B) = 1 fitness(C) = 2

41 Simple problem: max x2 over {0,1,…,31} GA approach:
An example Simple problem: max x2 over {0,1,…,31} GA approach: Representation: binary code, e.g  13 Population size: 4 1-point xover, bitwise mutation Roulette wheel selection Random initialisation We show one generational cycle done by hand

42 x2 example: selection

43 X2 example: crossover

44 X2 example: mutation

45 Shows many shortcomings, e.g.
The simple GA Shows many shortcomings, e.g. Representation is too restrictive Mutation & crossovers only applicable for bit-string & integer representations Selection mechanism sensitive for converging populations with close fitness values Generational population model (step 5 in SGA repr. cycle) can be improved with explicit survivor selection

46 Two-point Crossover Two points are chosen in the strings
The material falling between the two points is swapped in the string for the two offspring Example:

47 n-point crossover Choose n random crossover points
Split along those points Glue parts, alternating between parents Generalisation of 1 point (still some positional bias)

48 Uniform crossover Assign 'heads' to one parent, 'tails' to the other
Flip a coin for each gene of the first child Make an inverse copy of the gene for the second child Inheritance is independent of position

49 Cycle crossover example
Step 1: identify cycles Step 2: copy alternate cycles into offspring

50 Crossover OR mutation? Exploration: Discovering promising areas in the search space, i.e. gaining information on the problem Exploitation: Optimising within a promising area, i.e. using information There is co-operation AND competition between them Crossover is explorative, it makes a big jump to an area somewhere “in between” two (parent) areas Mutation is exploitative, it creates random small diversions, thereby staying near (in the area of ) the parent

51 Other representations
Gray coding of integers (still binary chromosomes) Gray coding is a mapping that means that small changes in the genotype cause small changes in the phenotype (unlike binary coding). “Smoother” genotype-phenotype mapping makes life easier for the GA Nowadays it is generally accepted that it is better to encode numerical variables directly as Integers Floating point variables

52 Floating point mutations
Non-uniform mutations: Many methods proposed, such as time-varying range of change etc. Most schemes are probabilistic but usually only make a small change to value Most common method is to add random deviate to each variable separately, taken from N(0, ) Gaussian distribution and then curtail to range Standard deviation  controls amount of change (2/3 of drawingns will lie in range (-  to + )

53 Multiparent recombination
Recall that we are not constricted by the practicalities of nature Noting that mutation uses 1 parent, and “traditional” crossover 2, the extension to a>2 is natural to examine Three main types: Based on allele frequencies, e.g., p-sexual voting generalising uniform crossover Based on segmentation and recombination of the parents, e.g., diagonal crossover generalising n-point crossover Based on numerical operations on real-valued alleles, e.g., center of mass crossover, generalising arithmetic recombination operators

54 Fitness Based Competition
Selection can occur in two places: Selection from current generation to take part in mating (parent selection) Selection from parents + offspring to go into next generation (survivor selection) Selection operators work on whole individual i.e. they are representation-independent Distinction between selection operators: define selection probabilities algorithms: define how probabilities are implemented

55 Tournament Selection All methods above rely on global population statistics Could be a bottleneck esp. on parallel machines Relies on presence of external fitness function which might not exist: e.g. evolving game players Informal Procedure: Pick k members at random then select the best of these Repeat to select more individuals

56 Tournament Selection 2 Probability of selecting i will depend on:
Rank of i Size of sample k higher k increases selection pressure Whether contestants are picked with replacement Picking without replacement increases selection pressure Whether fittest contestant always wins (deterministic) or this happens with probability p For k = 2, time for fittest individual to take over population is the same as linear ranking with s = 2 • p

57 Survivor Selection Most of methods above used for parent selection
Survivor selection can be divided into two approaches: Age-Based Selection e.g. SGA In SSGA can implement as “delete-random” (not recommended) or as first-in-first-out (a.k.a. delete-oldest) Fitness-Based Selection Using one of the methods above

58 Example application of order based GAs: JSSP
Precedence constrained job shop scheduling problem J is a set of jobs. O is a set of operations M is a set of machines Able  O  M defines which machines can perform which operations Pre  O  O defines which operation should precede which Dur :  O  M  IR defines the duration of o  O on m  M The goal is now to find a schedule that is: Complete: all jobs are scheduled Correct: all conditions defined by Able and Pre are satisfied Optimal: the total duration of the schedule is minimal

59 Precedence constrained job shop scheduling GA
Representation: individuals are permutations of operations Permutations are decoded to schedules by a decoding procedure take the first (next) operation from the individual look up its machine (here we assume there is only one) assign the earliest possible starting time on this machine, subject to machine occupation precedence relations holding for this operation in the schedule created so far fitness of a permutation is the duration of the corresponding schedule (to be minimized) use any suitable mutation and crossover use roulette wheel parent selection on inverse fitness Generational GA model for survivor selection use random initialisation

60 An Example Application to Transportation System Design
Taken from the ACM Student Magazine Undergraduate project of Ricardo Hoar & Joanne Penner Vehicles on a road system Modelled as individual agents using ANT technology (Also an AI technique) Want to increase traffic flow Uses a GA approach to evolve solutions to: The problem of timing traffic lights Optimal solutions only known for very simple road systems Details not given about bit-string representation But traffic light switching times are real-valued numbers over a continuous space

61 Evolution strategies

62 ES quick overview Developed: Germany in the 1970’s
Early names: I. Rechenberg, H.-P. Schwefel Typically applied to: numerical optimisation Attributed features: fast good optimizer for real-valued optimisation relatively much theory Special: self-adaptation of (mutation) parameters standard

63 Basic Evolution Strategy
1. Generate some random individuals 2. Select the p best individuals based on some selection algorithm (fitness function) 3. Use these p individuals to generate c children 4. Go to step 2, until the desired result is achieved (i.e. little difference between generations)

64 ES (cont) Recombination & Mutation: These operands are very similar to the those of GA. The main difference is that mutation amount is not constant in ES. There are additional parameters, sigma, which are used as the mutation amount of the original mutation amount. So, the mutation amount also gets closer to the optimum value. As recombination function, three main functions can be used: Arithmetic mean of the parents Geometric mean of the parents Discrete cross-over method.

65 Encoding Individuals are encoded as vectors of real numbers (object parameters) op = (o1, o2, o3, … , om) The strategy parameters control the mutation of the object parameters sp = (s1, s2, s3, … , sm) These two parameters constitute the individual’s chromosome

66 Object Parameter Mutation
affected by the strategy parameters (another vector of real numbers) op = (8, 12, 31, … ,5) opmutated = (8.2, 11.9, 31.3, …, 5.7) sp = (.1, .3, .2, …, .5)

67 Example: Mutation for real valued representation
Perturb values by adding some random noise Often, a Gaussian/normal distribution N(0,) is used, where 0 is the mean value  is the standard deviation and x’i = xi + N(0,i) for each parameter

68 Discrete Recombination
Similar to crossover of genetic algorithms Equal probability of receiving each parameter from each parent (8, 12, 31, … ,5) (2, 5, 23, … , 14) (2, 12, 31, … , 14)

69 Intermediate Recombination
Often used to adapt the strategy parameters Each child parameter is the mean value of the corresponding parent parameters (8, 12, 31, … ,5) (2, 5, 23, … , 14) (5, 8.5, 27, … , 9.5)

70 ES technical summary tableau
Representation Real-valued vectors Recombination Discrete or intermediary Mutation Gaussian perturbation Parent selection Uniform random Survivor selection (,) or (+) Specialty Self-adaptation of mutation step sizes

71 Another historical example: the jet nozzle experiment
Task: to optimize the shape of a jet nozzle Approach: random mutations to shape + selection Initial shape Final shape

72 Evolution Process p parents produce c children in each generation
Four types of processes: p,c p/r,c p+c p/r+c

73 p,c p parents produce c children using mutation only (no recombination) The fittest p children become the parents for the next generation Parents are not part of the next generation c  p p/r,c is the above with recombination

74 p+c p parents produce c children using mutation only (no recombination) The fittest p individuals (parents or children) become the parents of the next generation p/r+c is the above with recombination

75 When to Use GA’s vs. ES’s Genetic Algorithms Evolution Strategies
More important to find optimal solution (GA’s more likely to find global maximum; usually slower) Problem parameters can be represented as bit strings (computational problems) Evolution Strategies “Good enough” solution acceptable (ES’s usually faster; can readily find local maximum) Problem parameters are real numbers (engineering problems)

76 Quiz The following data set will be used to learn a decision tree for predicting whether students are lazy (L) or diligent (D) based on their weight (Normal or Underweight), their eye color (Amber or Violet) and the number of eyes they have (2 or 3 or 4). The following numbers may be helpful as you answer this problem without using a calculator: log2 0.1 =−3.32,log2 0.2 =−2.32,log2 0.3 =−1.73,log2 0.4 =−1.32,log2 0.5 =−1. Draw the full decision tree learned for this data What is the training set error

77 Weight Eye Color Num. Eyes Output N A 2 L
N V L U V L U A D N A D N V D U A D


Download ppt "Machine Learning Evolutionary Algorithms"

Similar presentations


Ads by Google