Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Structure of search space, complexity of stochastic combinatorial optimization algorithms and application to biological motifs discovery Robin Gras INRIA.

Similar presentations


Presentation on theme: "1 Structure of search space, complexity of stochastic combinatorial optimization algorithms and application to biological motifs discovery Robin Gras INRIA."— Presentation transcript:

1 1 Structure of search space, complexity of stochastic combinatorial optimization algorithms and application to biological motifs discovery Robin Gras INRIA Rennes

2 2 Black box global combinatorial optimization Search of the maximums of a function F (fitness) with integer variables. Search space C = set of all legal instantiations. Global = the search space is the Cartesian product of the domains of the variables. Black box = the analytical form of F is not known but its value is computable in each point. A lot of difficult bioinformatics problems (most of them are NP- hard) can be represented as black box combinatorial optimisation problems.

3 3 A few definitions Operator o : Move (exploration) from a set of points Ec (a sample) to another set of points. Neighborhood Vo (given an operator): set of all reachable points from a set Ec by one application of the operator. Landscape = triplet (C,F,Vo) Local maximum Mo (given an operator): X is a local maximum given o iff: Metaheuristic: heuristic exploration algorithm for black box combinatorial optimization problems.

4 4 What is complexity? All difficult problems are not equal…

5 5 Influence of the maximums Basin of attraction of Mo: the set of points of C from which Mo is reachable by a sequence of applications of o. Study of basins of attraction. Study of fitness cloud (variation of fitness values in function of fitness values). F(x) Global maximum Local maximum Reverse hill-climber Results (empirical): Number of global maximums compared with the size of C Number of global maximums compared with number of local maximums Size and overlap of basins of attractions Linked to the neighborhood!

6 6 Problem decomposability: epistasie A problem of size n which is decomposable in n/k independent sub- problems of size k has a complexity in 2 k.n/k Epistasie: maximum number of variables of which depend each of the n variables ~ level of non-linearity  measures of non-linearity. Example: spin glasses Function with a tunable level of epistasie : NK-landscape With N the size of the problem and K the number of dependencies

7 7 Fitness and instantiation: deceptive functions Function built to be difficult for hill-climber. Function that is almost linear except for a few points of the search space. Example: trap 5 function Trap 5 (X) Independent of the neighborhood!

8 8 Efficiency of evolutionary approaches. Two main strategies: - A priori definition of the operators - Discovery of the operators

9 9 Classic genetic algorithms (Holland 75) Exploration by sampling. Population = sample Evaluation and bias by selection Generation of a new population by application of operators Experimental studies of the efficiency and the behavior  various conclusions. Theoretical studies. Convergence proofs in simple cases (Goldberg 87). Introduction to the notion of schemes (Bagley 67) Computation of the efficiency on deceptive problems (Goldberg 93):

10 10 Probabilistic model building algorithms Principle: discovering the dependencies and the structure using the sample. First: model = univariate distribution of the variables (Muhlenbein et Voigt 96). BOA: Bayesian network (Pelikan et Golderg 99). hBOA: Bayesian network + decision graph + ecological niches (Pelikan et Goldberg 00). building of the network Population generation Population size Number of generations Global complexity Validation on spin glasses, MAXSAT and deceptive functions.

11 11 Still limitations… Convergence proofs do not take into account the strong heuristic (greedy) used to build the network. The quality measure is not computed at each step. High global computation cost.  What are the consequences on the real efficiency of hBOA?

12 12 Non deceptive function Deceptive function Adjacent Configuration X i = {x i, x i+m, …, x i+(k-1)m } with i  {1, …,m} Problem of size 120 and 100 with sub-functions of size between 4 and 12 Classic genetic algorithm with adjacent configuration Non-adjacent configuration: 100% of failure ! Classical genetic algorithm Non-adjacent configuration X i = {x (i-1).k+1, x (i-1).k+2, …, x (i-1).k+k } with i  {1, …,m} Efficiency of evolutionary algorithms on high epistasie level deceptive problems

13 13 hBOA behavior Adjacent configuration When the epistasie level is above 6 the global maximum is never reached in 100 generations. Computation time became prohibitive: more than 60 hours with epistasie of 12. High dependency with the structure of the deceptive function. Non-adjacent configuration Same results so hBOA is not dependent of the configuration.  real capacity to detect and handle the dependencies but the heuristics used do not allow the obtaining of demonstrated results.

14 14 Simple PMBGA algorithm dedicated to deceptive problems Test of several measures on bivariate frequencies: only frequencies, conditional probability, mutual information and statistical implication. Algorithm: taking into account the deceptive property: Random generation of the initial population Tournament selection Computation of the measures for each couple of variables Building of a solution: Direct obtaining of the solution in each case (adjacent or not) in few minutes! With mixed sub-functions (deceptive and non-deceptive) Impossible to discover the solution. hBOA : no difference with mixed sub-functions

15 15 Conclusions on black box problems Easy if there is no dependency. As soon as the epistasie level is 2 or above and there is overlap between dependencies the problem is NP-hard. In general the problem is NP-hard when the dependencies are not known. Two possible approaches: Expert: Use of expert knowledge about the problem to build pertinent neighborhood and operators. Automatic: discovery of the dependencies by sampling and model building. Limited by the number of dependencies and necessity of new heuristics.

16 16 Further works: new algorithms Definition of a more structured benchmark than NK-landscape. New PMBGA algorithm New probabilistic model. New quality measure for the model. More efficient heuristics for the model building. Learning of the number of dependencies. PMBGP algorithm Specialization for non-linear regression. Use of dependencies discovery.


Download ppt "1 Structure of search space, complexity of stochastic combinatorial optimization algorithms and application to biological motifs discovery Robin Gras INRIA."

Similar presentations


Ads by Google