Download presentation
Presentation is loading. Please wait.
1
16 November, 2005 Statistics in HEP, Manchester 1
2
Liliana Teodorescu 2 Introduction to evolutionary computation Introduction to evolutionary computation Gene Expression Programming (GEP) Application of GEP for Event Selection Conclusion
3
Statistics in HEP, Manchester Liliana Teodorescu 3 Evolutionary computation simulates the natural evolution on a computer process leading to maintenance or increase of a population ability to survive and reproduce in a specific environment quantitatively measured by evolutionary fitness Goal of natural evolution - to generate a population of individuals with increasing fitness Goal of evolutionary computation - to generate a set of solutions (to a problem) of increasing quality
4
Statistics in HEP, Manchester Liliana Teodorescu 4 Individual – candidate solution to a problem Chromosome – representation of the candidate solution decodingencoding Gene – constituent entity of the chromosome Population – set of individuals/chromosomes Fitness function – representation of how good a candidate solution is Genetic operators – operators applied on chromosomes in order to create genetic variation (other chromosomes)
5
Statistics in HEP, Manchester Liliana Teodorescu 5 Natural evolution simulation - core of the evolutionary algorithms: optimisation algorithms (iteratively improve the quality of the solutions until an optimal/feasible solution is found) Initial population creation (randomly) Fitness evaluation (of each chromosome) Terminate? Selection of individuals (proportional with fitness) Reproduction (genetic operators) Replacement of the current population with the new one yes no Stop Start Run Problem definition Encoding of the candidate solution Fitness definition Run Decoding the best fitted chromosome = solution New generation Basic evolutionary algorithm
6
Statistics in HEP, Manchester Liliana Teodorescu 6 Genetic Algorithms (GA) (J. H. Holland, 1975) Genetic Programming (GP) (J. R. Koza, 1992) Gene Expression Programming (GEP) (C. Ferreira, 2001) Main differences Encoding method Reproduction method
7
Statistics in HEP, Manchester Liliana Teodorescu 7 Encoding Chromosome - fixed-length binary string (common technique) Gene - each bit of the string genes chromosome Reproduction Recombination (crossover) – exchanges parts of two chromosomes (usual rate 0.7) Mutation – changes the gene value (usual rate 0.001-0.0001) 100 111 11 01 Point choosen randomly 1001 10011 0 10011011
8
Statistics in HEP, Manchester Liliana Teodorescu 8 GP search for the computer program to solve the problem, not for the solution to the problem. Computer program - any computing language (in principle) - LISP (List Processor) (in practice) LISP - highly symbol-oriented a*b-c (-(*ab)c) - Mathematical expression S-expression Graphical representation of S-expression * c ab functions (+,*) and terminals (a,b,c) Chromosome: S-expression - variable length => more flexibility - sintax constraints => invalid expressions produced in the evolution process must be eliminated => waste of CPU Encoding Reproduction Recombination (crossover) and Mutation (usualy)
9
Statistics in HEP, Manchester Liliana Teodorescu 9 works with two entities: chromosomes and expression trees search for the computer program that solve the problem (as GP) Candidate solution represented by an expression tree (ET) (similar with GP tree) Q + * d - cab ET encoded in a chromosome: read ET from left to right and from top to bottom Q*-+abcd Q means sqrt Decoding the chromosome (translates the chromosome in an ET) first line of ET (root) – first element of the chromosome next line of ET – as many arguments needed by the element in the previous line Encoding
10
Statistics in HEP, Manchester Liliana Teodorescu 10 Chromosome – has one or more genes of equal length Gene – head: contains both functions and terminals (length h) - tail: contains only terminals (length t) t=h(n-1)+1 n – number of arguments of the function with the highest number of arguments e.g. set of functions: Q,*,/,-,+ set of terminals: a,b n=2; h=15 (choosen) =>t =16 => length of gene=15+16=31 *b+a-aQab+//+b+babbabbbababbaaa * b + - a Q a a ET ends before the end of the gene!
11
Statistics in HEP, Manchester Liliana Teodorescu 11 Reproduction Genetic operators applied on chromosoms not on ET => always produce sintactically correct structures! Recombination Mutation Transposition – a part of the chromosome moved to another part of the same chromosome e.g. Mutation: Q replaced with * * b + - a Q a a * b + - a * a a *b+a-aQab+//+b+babbabbbababbaaa b *b+a-a*ab+//+b+babbabbbababbaaa
12
Statistics in HEP, Manchester Liliana Teodorescu 12 This is the first try! GEP for event selection cuts/selection criteria finding classification problem (signal/background classification) statistical learning approach 4000-5000 training events (for classification rule extraction) 4000-5000 test events (others than training events) limitations imposed by APS 3.0 Data samples: Monte-Carlo simulation from BaBar experiment Ks production in e + e - (~10 GeV) Software resources APS 3.0 (Automatic Problem Solver) - commercial package (Windows based) - www.gepsoft.com - function finding, classification, time series analysis
13
Statistics in HEP, Manchester Liliana Teodorescu 13 Functions and constants to be used in the classification rules (cuts type rule) - AND, AND1 (x>=0 and y>=0 => 1 else 0), AND2 (x 1 else 0) -, =, =, != - floating point constants (-10,10) Data – variables usually used in a cut based analysis for selection - doca (distance of closest approach) - R XY, |R Z | (region around interaction point) - |cos( hel )| - SFL (Signed Flight Length) - Fsig (Flight Significance) - Pchi ( 2 probability of the vertex) - Mass (K S reconstructed mass) GEP parameters - fitness function: number of hits (events correctly classified) - gene length (head = 1-5) - no. of chromosomes per generation: 100 - no. of generations per run: 500-2000 - genetic operators rates: mutation 0.044, inversion 0.1, transposition 0.3, recombination 0.1
14
Statistics in HEP, Manchester Liliana Teodorescu 14 Training sample Total = 3985 events S = 3390 events B = 595 events 5 Variables - doca (distance of closest approach) - R XY, |R Z | (region around interaction point) - |cos( hel )| - SFL (Signed Flight Length) No. of genes = 1, Head length =2
15
Statistics in HEP, Manchester Liliana Teodorescu 15
16
Statistics in HEP, Manchester Liliana Teodorescu 16
17
Statistics in HEP, Manchester Liliana Teodorescu 17 ~92% ~8%
18
Statistics in HEP, Manchester Liliana Teodorescu 18 GEP is still in the R&D phase needs software development -> underway fast identification of powerful cuts signal/background separation of ~90% accuracy for both low and high background samples Can it go beyond that? Can it develop more complex selection criteria? GEP allows
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.