Presentation is loading. Please wait.

Presentation is loading. Please wait.

Project 2: Classification Using Genetic Programming 2008. 10. 27 Kim, MinHyeok Biointelligence laboratory Artificial.

Similar presentations


Presentation on theme: "Project 2: Classification Using Genetic Programming 2008. 10. 27 Kim, MinHyeok Biointelligence laboratory Artificial."— Presentation transcript:

1 Project 2: Classification Using Genetic Programming 2008. 10. 27 Kim, MinHyeok mhkim@bi.snu.ac.kr mhkim@bi.snu.ac.kr Biointelligence laboratory Artificial Intelligence

2 Contents Project outline Description on the data set Genetic Programming  Brief overview  Fitness function & Selection methods  Classification with GP (in this project) Guide to writing reports  Style & contents Submission guide / Marking scheme 2 (C) 2008, SNU Biointelligence Laboratory

3 3 Outline Goal  Understand the Genetic Programming (GP) deeper  Practice researching and writing a paper Forest Fires problem (classification)  To predict whether a fire occurs or not  Using Genetic Programming  Estimating several statistics on the dataset Data set  Variation of the ‘Forest Fires data set’  http://archive.ics.uci.edu/ml/datasets/Forest+Fires http://archive.ics.uci.edu/ml/datasets/Forest+Fires

4 Forest Fires Data Set Description  Database of 517 samples  You can use at most 500 samples for training  17 samples for prediction  12 attributes  X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,label  Integer or real value  Label (Class)  Two classes –0 : a fire does not occur –1 : a fire occurs 4 (C) 2008, SNU Biointelligence Laboratory

5 Brief Summary of GP A kind of evolutionary algorithms It is represented with a tree structure You need to set up following elements for GP run  The set of terminals (input attributes, the class variable, constants)  The set of functions (numerical / condition operators)  The fitness measure  The algorithm parameters  population size, maximum number of generations  crossover rate and mutation rate  maximum depth of GP trees etc.  The method for designating a result and the criterion for terminating a run. 5 (C) 2008, SNU Biointelligence Laboratory

6 6 GP Flowchart GA loopGP loop

7 Initialization Maximum initial depth of trees D max is set. Full method (each branch has depth = D max ):  nodes at depth d < D max randomly chosen from function set F  nodes at depth d = D max randomly chosen from terminal set T Grow method (each branch has depth  D max ):  nodes at depth d < D max randomly chosen from F  T  nodes at depth d = D max randomly chosen from T Common GP initialisation: ramped half-and-half, where grow and full method each deliver half of initial population 7 (C) 2008, SNU Biointelligence Laboratory

8 Fitness Functions Relative squared error The number of outputs that are within  % of the correct value And you can try other fitness functions which are well- defined to solve problems

9 Selection methods (1/2) Fitness proportional (roulette wheel) selection  The roulette wheel can be constructed as follows.  Calculate the total fitness for the population.  Calculate selection probability p k for each chromosome v k.  Calculate cumulative probability q k for each chromosome v k.

10 Procedure: Proportional_Selection  Generate a random number r from the range [0,1].  If r  q 1, then select the first chromosome v 1 ; else, select the kth chromosome v k (2  k  pop_size) such that q k-1 < r  q k. pkpk qkqk 10.082407 20.1106520.193059 30.1319310.324989 40.1214230.446412 50.0725970.519009 60.1288340.647843 70.0779590.725802 80.1020130.827802 90.0836630.911479 100.0885211.000000

11 Selection methods (2/2) Tournament selection  Tournament size q Ranking-based selection  2    POP_SIZE  1   +  2 and  - = 2 -  + Elitism  To preserve n good solutions until the next generation

12 Classification with GP (in this project) Function Regression  Search a function f(x) s.t.  f(x) ≥ threshold twhen y=1  f(x) < threshold twhen y=0 Converting to Boolean value ∧ ¬∨ = >< 0 rainRH 50 wind + FFMCISI IF > 1 0 f(x)t

13 What to do for the experiment? Select a library that implements GP  You can find various libraries written in C++/Java/Matlab  See the list of recommended libraries on the next page Build up your own code for the experiment  Check sample codes and tutorials of libraries for quick start  Add comments to explain the flow of your program Caution  Running GP may take much time 13 (C) 2008, SNU Biointelligence Laboratory

14 Recommended Libraries for GP C++  GPLib: http://www.cs.bham.ac.uk/~cmf/GPLib/index.htmlhttp://www.cs.bham.ac.uk/~cmf/GPLib/index.html Java  JGAP: http://jgap.sourceforge.net/http://jgap.sourceforge.net/  ECJ: http://cs.gmu.edu/~eclab/projects/ecj/http://cs.gmu.edu/~eclab/projects/ecj/ Matlab toolbox  GPLAB: http://gplab.sourceforge.net/http://gplab.sourceforge.net/ More References  Implementations section in Wiki – Genetic Programming: http://en.wikipedia.org/wiki/Genetic_programming http://en.wikipedia.org/wiki/Genetic_programming 14 (C) 2008, SNU Biointelligence Laboratory

15 Reports Style English only!! Scientific journal-style  How to Write A Paper in Scientific Journal Style and Format  http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWsections.html http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWsections.html 15 (C) 2008, SNU Biointelligence Laboratory Experimental process Section of Paper What did I do in a nutshell? Abstract What is the problem?Introduction How did I solve the problem? Materials and Methods What did I find out? Results What does it mean? Discussion Who helped me out? Acknowledgments (optional) Whose work did I refer to? Literature Cited Extra InformationAppendices (optional)

16 Report Contents (1/3) System description  Used programming language and running environments Result tables Analysis & discussion (Very Important!!) 16 (C) 2008, SNU Biointelligence Laboratory Training Average  SD BestWorst Setting 1 %  % % Setting 2 %  % % Setting 3 %  % % Your prediction 12…1617Equation

17 Report Contents (2/3) Graph  Avg., Max. Fitness versus Generation  Tree size versus Generation 17 (C) 2008, SNU Biointelligence Laboratory

18 Report Contents (3/3) Basic experiments  Changing parameters for the crossover and mutation  Various function sets: arithmetic, numerical Optional experiments  Various selection methods  Depth limitation  Population size, generation numbers  Comparison to Neural Network  … References 18 (C) 2008, SNU Biointelligence Laboratory

19 19 (C) 2008, SNU Biointelligence Laboratory Submission Guide Due date: Nov. 19 (Wed) 18:00 Submit both ‘hardcopy’ and ‘email’  Hardcopy submission to the office (301-417 )  E-mail submission to mhkim@bi.snu.ac.krmhkim@bi.snu.ac.kr  Subject : [AI Project1 Report] Student number, Name  Report + your source code with comments + executable file(s)  Length: report should be summarized within 12 pages. We are NOT interested in the accuracy and your programming skill, but your creativity and research ability. If your major is not a C.S, team project with a C.S major student is possible (Use the class board to find your partner and notice the information of your team to TA (bhkim@bi.snu.ac.kr) by Nov. 5)bhkim@bi.snu.ac.kr

20 Marking Scheme 5 points for programming 5 points for result prediction 30 points for experiment & analysis  15 pts for experiments, 15pts for analysis 10 points for report Late work  - 10% per one day  Maximum 7 days 20 (C) 2008, SNU Biointelligence Laboratory

21 QnA 21 (C) 2008, SNU Biointelligence Laboratory

22 Test Data XYmonthdayFFMCDMCDCISItempRHwindrain Data01659392.9133.3699.69.226.4214.50 Data026311279.53106.71.111.8314.50 Data03437493.2114.45609.530.2224.90 Data04656190.493.3298.17.519.1395.40 Data0563479114.625.612.313.7339.40 Data0654479114.625.612.317.6275.80 Data07435589.625.473.75.7184040 Data087510191.748.5696.111.116.14440 Data09863591.733.377.598.39740.2 Data10758296.1181.1671.214.327.3634.96.4 Data11659691.294.3744.48.415.4574.90 Data12868192.1207672.68.221.1542.20 Data13749588.255.2732.311.615.2643.10 Data14439291.9111.7770.36.515.9532.20 Data15369792.4124.1680.78.517.2581.30 Data16369190.9126.5686.5715.6663.10 Data17997285.848.3313.43.918422.70


Download ppt "Project 2: Classification Using Genetic Programming 2008. 10. 27 Kim, MinHyeok Biointelligence laboratory Artificial."

Similar presentations


Ads by Google