Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genetic Algorithms Jelena Mirković, Aleksandra Popović, Dražen Drašković, Veljko Milutinović School of Electrical Engineering, University of Belgrade Marko.

Similar presentations


Presentation on theme: "Genetic Algorithms Jelena Mirković, Aleksandra Popović, Dražen Drašković, Veljko Milutinović School of Electrical Engineering, University of Belgrade Marko."— Presentation transcript:

1 Genetic Algorithms Jelena Mirković, Aleksandra Popović, Dražen Drašković, Veljko Milutinović School of Electrical Engineering, University of Belgrade Marko Bajec Faculty of Computer & Information Science University of Ljubljana

2 2 / 53 What You Will Learn From This Tutorial? What is a genetic algorithm? What is a genetic algorithm? Principles of genetic algorithms. Principles of genetic algorithms. How to design an algorithm? How to design an algorithm? Comparison of gas and conventional algorithms. Comparison of gas and conventional algorithms. Applications of GA Applications of GA –GA and the internet –GA and image segmentation –GA and system design Genetic programming Genetic programming Part II Part I Part III

3 Part I: GA Theory What are genetic algorithms? How to design a genetic algorithm?

4 4 / 53 Genetic Algorithm Is Not......Gene coding...Gene coding

5 5 / 53 Genetic Algorithm Is... … Computer algorithm That resides on principles of genetics and evolution

6 6 / 53 Instead of Introduction... Hill climbing Hill climbing local global

7 7 / 53 Instead of Introduction…(2) Multi-climbers Multi-climbers

8 8 / 53 Instead of Introduction…(3) Genetic algorithm Genetic algorithm I am not at the top. My high is better! I am at the top Height is... I will continue

9 9 / 53 Instead of Introduction…(3) Genetic algorithm - few microseconds after Genetic algorithm - few microseconds after

10 10 / 53 The GA Concept Genetic algorithm (GA) introduces the principle of evolution and genetics into search among possible solutions to a given problem. Genetic algorithm (GA) introduces the principle of evolution and genetics into search among possible solutions to a given problem. The idea is to simulate the process in natural systems. The idea is to simulate the process in natural systems. This is done by the creation within a machine of a population of individuals represented by chromosomes, in essence a set of character strings, that are analogous to the DNA, that we have in our own chromosomes. This is done by the creation within a machine of a population of individuals represented by chromosomes, in essence a set of character strings, that are analogous to the DNA, that we have in our own chromosomes.

11 11 / 53 Survival of the Fittest The main principle of evolution used in GA is “survival of the fittest”. The main principle of evolution used in GA is “survival of the fittest”. The good solution survive, while bad ones die. The good solution survive, while bad ones die.

12 12 / 53 Nature and GA... Nature reality Genetic algorithm ChromosomeString Gene Character Locus String position Genotype Population Phenotype Decoded structure

13 13 / 53 The History of GA Cellular automata Cellular automata –John Holland, university of Michigan, 1975. Until the early 80s, the concept was studied theoretically. Until the early 80s, the concept was studied theoretically. In 80s, the first “real world” GAs were designed. In 80s, the first “real world” GAs were designed.

14 14 / 53 Algorithmic Phases Initialize the population Initialize the population Select individuals for the mating pool Perform crossover Insert offspring into the population The End Perform mutation yes no no Stop?

15 Designing GA... Designing GA... How to represent genomes? How to represent genomes? How to define the crossover operator? How to define the crossover operator? How to define the mutation operator? How to define the mutation operator? How to define fitness function? How to define fitness function? How to generate next generation? How to generate next generation? How to define stopping criteria? How to define stopping criteria?

16 16 / 53 Representing Genomes... RepresentationExample string 1 0 1 1 1 0 0 1 1 0 1 1 1 0 0 1 array of strings http avala yubc net ~apopovic tree - genetic programming > bxor or c ba

17 17 / 53 Crossover Crossover is concept from genetics. Crossover is concept from genetics. Crossover is sexual reproduction. Crossover is sexual reproduction. Crossover combines genetic material from two parents, in order to produce superior offspring. Crossover combines genetic material from two parents, in order to produce superior offspring. Few types of crossover: Few types of crossover: –One-point –Multiple point.

18 One-point Crossover Parent #1 Parent #2 0 1 5 3 5 4 7 6 7 6 2 4 2 3 0 1

19 One-point Crossover Parent #1 Parent #2 0 1 5 3 5 4 7 6 7 6 2 4 2 3 0 1

20 20 / 53 Mutation Mutation introduces randomness into the population. Mutation introduces randomness into the population. Mutation is asexual reproduction. Mutation is asexual reproduction. The idea of mutation is to reintroduce divergence into a converging population. The idea of mutation is to reintroduce divergence into a converging population. Mutation is performed on small part of population, in order to avoid entering unstable state. Mutation is performed on small part of population, in order to avoid entering unstable state.

21 21 / 53 Mutation... 11010100 01010101 10 01 Parent Child

22 22 / 53 About Probabilities... Average probability for individual to crossover is, in most cases, about 80%. Average probability for individual to crossover is, in most cases, about 80%. Average probability for individual to mutate is about 1-2%. Average probability for individual to mutate is about 1-2%. Probability of genetic operators follow the probability in natural systems. Probability of genetic operators follow the probability in natural systems. The better solutions reproduce more often. The better solutions reproduce more often.

23 23 / 53 Fitness Function Fitness function is evaluation function, that determines what solutions are better than others. Fitness function is evaluation function, that determines what solutions are better than others. Fitness is computed for each individual. Fitness is computed for each individual. Fitness function is application depended. Fitness function is application depended.

24 24 / 53 Selection The selection operation copies a single individual, probabilistically selected based on fitness, into the next generation of the population. The selection operation copies a single individual, probabilistically selected based on fitness, into the next generation of the population. There are few possible ways to implement selection: There are few possible ways to implement selection: –“Only the strongest survive” Choose the individuals with the highest fitness for next generationChoose the individuals with the highest fitness for next generation –“Some weak solutions survive” Assign a probability that a particular individual will be selected for the next generationAssign a probability that a particular individual will be selected for the next generation More diversityMore diversity Some bad solutions might have good parts!Some bad solutions might have good parts!

25 25 / 53 Selection - Survival of The Strongest 0.930.510.720.310.120.64 Previous generation Next generation 0.930.720.64

26 26 / 53 Selection - Some Weak Solutions Survive 0.930.510.720.310.120.64 Previous generation Next generation 0.930.720.640.12 0.12

27 Mutation and Selection... Phenotype D D D Selection Mutation Solution distribution

28 28 / 53 Stopping Criteria Final problem is to decide when to stop execution of algorithm. Final problem is to decide when to stop execution of algorithm. There are two possible solutions to this problem: There are two possible solutions to this problem: –First approach: Stop after production of definite number of generationsStop after production of definite number of generations –Second approach: Stop when the improvement in average fitness over two generations is below a thresholdStop when the improvement in average fitness over two generations is below a threshold

29 29 / 53 GA vs. Ad-hoc Algorithms Genetic Algorithm Ad-hoc Algorithms Speed Human work Applicability Performance Slow * Generally fast Minimal Long and exhaustive General There are problems that cannot be solved analytically Excellent Depends * Not necessary!

30 30 / 53 Problems With GAs Sometimes GA is extremely slow, and much slower than usual algorithms Sometimes GA is extremely slow, and much slower than usual algorithms

31 31 / 53 Advantages of GAs Concept is easy to understand. Concept is easy to understand. Minimum human involvement. Minimum human involvement. Computer is not learned how to use existing solution, but to find new solution! Computer is not learned how to use existing solution, but to find new solution! Modular, separate from application Modular, separate from application Supports multi-objective optimization Supports multi-objective optimization Always an answer; answer gets better with time !!! Always an answer; answer gets better with time !!! Inherently parallel; easily distributed Inherently parallel; easily distributed Many ways to speed up and improve a GA-based application as knowledge about problem domain is gained Many ways to speed up and improve a GA-based application as knowledge about problem domain is gained Easy to exploit previous or alternate solutions Easy to exploit previous or alternate solutions

32 32 / 53 GA: An Example - Diophantine Equations Diophantine equation (n=4): Diophantine equation (n=4): A*x + b*y + c*z + d*q = s For given a, b, c, d, and s - find x, y, z, q For given a, b, c, d, and s - find x, y, z, q Genome: Genome: (X, y, z, p) = x yzq

33 33 / 53 GA: An Example - Diophantine Equations(2) Crossover Crossover Mutation Mutation ( 1, 2, 3, 4 ) ( 5, 6, 7, 8 ) ( 1, 6, 3, 4 ) ( 5, 2, 7, 8 ) ( 1, 2, 3, 4 ) ( 1, 2, 3, 9 )

34 34 / 53 GA: An Example - Diophantine Equations(3) First generation is randomly generated of numbers lower than sum (s). First generation is randomly generated of numbers lower than sum (s). Fitness is defined as absolute value of difference between total and given sum: Fitness is defined as absolute value of difference between total and given sum: Fitness = abs (total - sum), Algorithm enters a loop in which operators are performed on genomes: crossover, mutation, selection. Algorithm enters a loop in which operators are performed on genomes: crossover, mutation, selection. After number of generation a solution is reached. After number of generation a solution is reached.

35 Some Applications of GAs GA Internet search Data mining Software guided circuit design Control systems design Stock prize prediction Path finding Mobile robots search Optimization Trend spotting

36 Part II: Applications of GAs GA and the Internet GA and image segmentation GA and system design

37 Genetic Algorithm and the Internet

38 38 / 53 Introduction GA can be used for intelligent internet search. GA can be used for intelligent internet search. GA is used in cases when search space is relatively large. GA is used in cases when search space is relatively large. GA is adoptive search. GA is adoptive search. GA is a heuristic search method. GA is a heuristic search method.

39 Search Engines & Web Crawlers 39 / 53 SearchEngineSearchEngine Web Crawler instructs IndexedDBIndexedDB Collects & stores data to Provides data for GenericWebCrawlerGenericWebCrawlerFocusedWebCrawlerFocusedWebCrawler isimplementedas Get all pages; Breadth-first algorithm Get only „best“ pages…; Best-first algorithm

40 40 / 53 Focused Web Crawlers FocusedWebCrawlerFocusedWebCrawler Domain-focused Search WebAnalysisalgosWebAnalysisalgosWebSearchalgosWebSearchalgos Relayson Content-based web analysis (e.g., Vector Space Model) Link-based web analysis (e.g., PageRank, HITS) Breadth-first search (depends on a good selection of seed pages… Problems with performance!) Best-first search (try to predict what links lead to quality pages… Problems: „local search“ (LS))

41 41 / 53 Using GA to Solve LS problem List of search key words Search Generation 0 Seed pages ….. Apply SELECTION, CROSSOVER, MUTATION ….. Generation 1 ….. Generation 2 Apply SELECTION, CROSSOVER, MUTATION ….. Generation n Result pages CONVERGENCE

42 Example of a GA algorithm SELECTION SELECTION –Calculate Fitness value for each page C – FV(c) ; –Randomly select members of the generation by taking into account the distribution of FV – selected pages ; CROSSOVER CROSSOVER –Extract all URLs from the selected pages; –Calculate Crossover value for each URL u C(u) C(u) = sum(FV(p)) for all pages p that have a link to u ; MUTATION MUTATION –Select random 3 words from the domain lexicon –Search over popular search engines and extract top x pages - mutation pages; –Create next generation by combining selected and mutation pages ; CONVERGENCE CONVERGENCE –Repeat steps 1 to 3 –Until the num of pages with Fv treshold reaches the pre-set number. 42 / 53

43 43 / 53 Algorithm Phases Process set of URLs given by user Select all links from input set Evaluate fitness function for all genomes Perform crossover, mutation, and reproduction Satisfactorysolutionobtained? The End

44 44 / 53 A System for the GA Internet Search Essence: If “desperate,” do database mutation If “happy,” do locality based mutation Essence: If “desperate,” do database mutation If “happy,” do locality based mutation CONTROLPROGRAM AgentSpider Input set Topic Space Time Output set Current set Top data Net data Generator

45 45 / 53 Spider Spider is software packages, that picks up internet documents from user supplied input with depth specified by user. Spider is software packages, that picks up internet documents from user supplied input with depth specified by user. Spider takes one URL, fetches all links, and documents thy contain with predefined depth. Spider takes one URL, fetches all links, and documents thy contain with predefined depth. The fetched documents are stored on local hard disk with same structure as on the original location. The fetched documents are stored on local hard disk with same structure as on the original location. Spider’s task is to produce the first generation. Spider’s task is to produce the first generation. Spider is used during crossover and mutation. Spider is used during crossover and mutation.

46 46 / 53 Agent Agent takes as an input a set of urls, and calls spider, for every one of them, with depth 1. Agent takes as an input a set of urls, and calls spider, for every one of them, with depth 1. Then, agent performs extraction of keywords from each document, and stores it in local hard disk. Then, agent performs extraction of keywords from each document, and stores it in local hard disk.

47 47 / 53 Generator Generator generates a set of urls from given keywords, using some conventional search engine. Generator generates a set of urls from given keywords, using some conventional search engine. It takes as input the desired topic, calls yahoo search engine, and submits a query looking for all documents covering the specific topic. It takes as input the desired topic, calls yahoo search engine, and submits a query looking for all documents covering the specific topic. Generator stores URL and topic of given web page in database called topdata. Generator stores URL and topic of given web page in database called topdata.

48 48 / 53 Topic It uses topdata DB in order to insert random urls from database into current set. It uses topdata DB in order to insert random urls from database into current set. Topic performs mutation. Topic performs mutation.

49 49 / 53 Space Space takes as input the current set from the agent application and injects into it those urls from the database netdata that appeared with the greatest frequency in the output set of previous searches. Space takes as input the current set from the agent application and injects into it those urls from the database netdata that appeared with the greatest frequency in the output set of previous searches.

50 50 / 53 Time Time takes set of urls from agent and inserts ones with greatest frequency into DB netdata. Time takes set of urls from agent and inserts ones with greatest frequency into DB netdata. The netdata DB contains of three fields: URL, topic, and count number. The netdata DB contains of three fields: URL, topic, and count number. The DB is updated in each algorithm iteration. The DB is updated in each algorithm iteration.

51 51 / 53 How Does the System Work? CONTROLPROGRAM AgentSpider Input set Topic Space Time Output set Current set Top data Net data Generator command flow data flow

52 52 / 53 GA and the Internet: Conclusion GA for internet search, on contrary to other gas, is much faster and more efficient that conventional solutions, such as standard internet search engines. GA for internet search, on contrary to other gas, is much faster and more efficient that conventional solutions, such as standard internet search engines. INTERNET

53 Conclusion: Evolution of Future Research 53 / 53


Download ppt "Genetic Algorithms Jelena Mirković, Aleksandra Popović, Dražen Drašković, Veljko Milutinović School of Electrical Engineering, University of Belgrade Marko."

Similar presentations


Ads by Google