Presentation is loading. Please wait.

Presentation is loading. Please wait.

16 November, 2005 Statistics in HEP, Manchester 1.

Similar presentations


Presentation on theme: "16 November, 2005 Statistics in HEP, Manchester 1."— Presentation transcript:

1 16 November, 2005 Statistics in HEP, Manchester 1

2 Liliana Teodorescu 2 Introduction to evolutionary computation  Introduction to evolutionary computation  Gene Expression Programming (GEP)  Application of GEP for Event Selection  Conclusion

3 Statistics in HEP, Manchester Liliana Teodorescu 3  Evolutionary computation simulates the natural evolution on a computer process leading to maintenance or increase of a population ability to survive and reproduce in a specific environment quantitatively measured by evolutionary fitness  Goal of natural evolution - to generate a population of individuals with increasing fitness  Goal of evolutionary computation - to generate a set of solutions (to a problem) of increasing quality

4 Statistics in HEP, Manchester Liliana Teodorescu 4  Individual – candidate solution to a problem  Chromosome – representation of the candidate solution decodingencoding  Gene – constituent entity of the chromosome  Population – set of individuals/chromosomes  Fitness function – representation of how good a candidate solution is  Genetic operators – operators applied on chromosomes in order to create genetic variation (other chromosomes)

5 Statistics in HEP, Manchester Liliana Teodorescu 5 Natural evolution simulation - core of the evolutionary algorithms: optimisation algorithms (iteratively improve the quality of the solutions until an optimal/feasible solution is found) Initial population creation (randomly) Fitness evaluation (of each chromosome) Terminate? Selection of individuals (proportional with fitness) Reproduction (genetic operators) Replacement of the current population with the new one yes no Stop Start Run  Problem definition  Encoding of the candidate solution  Fitness definition  Run  Decoding the best fitted chromosome = solution New generation Basic evolutionary algorithm

6 Statistics in HEP, Manchester Liliana Teodorescu 6  Genetic Algorithms (GA) (J. H. Holland, 1975)  Genetic Programming (GP) (J. R. Koza, 1992)  Gene Expression Programming (GEP) (C. Ferreira, 2001) Main differences  Encoding method  Reproduction method

7 Statistics in HEP, Manchester Liliana Teodorescu 7  Encoding Chromosome - fixed-length binary string (common technique) Gene - each bit of the string genes chromosome  Reproduction Recombination (crossover) – exchanges parts of two chromosomes (usual rate 0.7) Mutation – changes the gene value (usual rate 0.001-0.0001) 100 111 11 01 Point choosen randomly 1001 10011 0 10011011

8 Statistics in HEP, Manchester Liliana Teodorescu 8 GP search for the computer program to solve the problem, not for the solution to the problem. Computer program - any computing language (in principle) - LISP (List Processor) (in practice) LISP - highly symbol-oriented a*b-c (-(*ab)c) - Mathematical expression S-expression Graphical representation of S-expression * c ab functions (+,*) and terminals (a,b,c) Chromosome: S-expression - variable length => more flexibility - sintax constraints => invalid expressions produced in the evolution process must be eliminated => waste of CPU  Encoding  Reproduction Recombination (crossover) and Mutation (usualy)

9 Statistics in HEP, Manchester Liliana Teodorescu 9  works with two entities: chromosomes and expression trees  search for the computer program that solve the problem (as GP) Candidate solution represented by an expression tree (ET) (similar with GP tree) Q + * d - cab ET encoded in a chromosome: read ET from left to right and from top to bottom Q*-+abcd Q means sqrt Decoding the chromosome (translates the chromosome in an ET) first line of ET (root) – first element of the chromosome next line of ET – as many arguments needed by the element in the previous line Encoding

10 Statistics in HEP, Manchester Liliana Teodorescu 10 Chromosome – has one or more genes of equal length Gene – head: contains both functions and terminals (length h) - tail: contains only terminals (length t) t=h(n-1)+1 n – number of arguments of the function with the highest number of arguments e.g. set of functions: Q,*,/,-,+ set of terminals: a,b n=2; h=15 (choosen) =>t =16 => length of gene=15+16=31 *b+a-aQab+//+b+babbabbbababbaaa * b + - a Q a a ET ends before the end of the gene!

11 Statistics in HEP, Manchester Liliana Teodorescu 11 Reproduction Genetic operators applied on chromosoms not on ET => always produce sintactically correct structures!  Recombination  Mutation  Transposition – a part of the chromosome moved to another part of the same chromosome e.g. Mutation: Q replaced with * * b + - a Q a a * b + - a * a a *b+a-aQab+//+b+babbabbbababbaaa b *b+a-a*ab+//+b+babbabbbababbaaa

12 Statistics in HEP, Manchester Liliana Teodorescu 12 This is the first try! GEP for event selection  cuts/selection criteria finding  classification problem (signal/background classification)  statistical learning approach  4000-5000 training events (for classification rule extraction)  4000-5000 test events (others than training events)  limitations imposed by APS 3.0 Data samples:  Monte-Carlo simulation from BaBar experiment  Ks production in e + e - (~10 GeV) Software resources  APS 3.0 (Automatic Problem Solver) - commercial package (Windows based) - www.gepsoft.com - function finding, classification, time series analysis

13 Statistics in HEP, Manchester Liliana Teodorescu 13 Functions and constants to be used in the classification rules (cuts type rule) - AND, AND1 (x>=0 and y>=0 => 1 else 0), AND2 (x 1 else 0) -, =, =, != - floating point constants (-10,10) Data – variables usually used in a cut based analysis for selection - doca (distance of closest approach) - R XY, |R Z | (region around interaction point) - |cos(  hel )| - SFL (Signed Flight Length) - Fsig (Flight Significance) - Pchi (  2 probability of the vertex) - Mass (K S reconstructed mass) GEP parameters - fitness function: number of hits (events correctly classified) - gene length (head = 1-5) - no. of chromosomes per generation: 100 - no. of generations per run: 500-2000 - genetic operators rates: mutation 0.044, inversion 0.1, transposition 0.3, recombination 0.1

14 Statistics in HEP, Manchester Liliana Teodorescu 14 Training sample Total = 3985 events S = 3390 events B = 595 events 5 Variables - doca (distance of closest approach) - R XY, |R Z | (region around interaction point) - |cos(  hel )| - SFL (Signed Flight Length) No. of genes = 1, Head length =2

15 Statistics in HEP, Manchester Liliana Teodorescu 15

16 Statistics in HEP, Manchester Liliana Teodorescu 16

17 Statistics in HEP, Manchester Liliana Teodorescu 17 ~92% ~8%

18 Statistics in HEP, Manchester Liliana Teodorescu 18 GEP  is still in the R&D phase  needs software development -> underway  fast identification of powerful cuts  signal/background separation of ~90% accuracy for both low and high background samples Can it go beyond that? Can it develop more complex selection criteria? GEP allows


Download ppt "16 November, 2005 Statistics in HEP, Manchester 1."

Similar presentations


Ads by Google