1 Structure of search space, complexity of stochastic combinatorial optimization algorithms and application to biological motifs discovery Robin Gras INRIA.

Slides:



Advertisements
Similar presentations
10/01/2014 DMI - Università di Catania 1 Combinatorial Landscapes & Evolutionary Algorithms Prof. Giuseppe Nicosia University of Catania Department of.
Advertisements

Computational Intelligence Winter Term 2009/10 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
P ROBABILITY T HEORY APPENDIX C P ROBABILITY T HEORY you can never know too much probability theory. If you are well grounded in probability theory, you.
Tuesday, May 14 Genetic Algorithms Handouts: Lecture Notes Question: when should there be an additional review session?
Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin Pelikan, David E. Goldberg, and Erick Cantu-Paz.
Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.
The loss function, the normal equation,
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Estimation of Distribution Algorithms Ata Kaban School of Computer Science The University of Birmingham.
Evolutionary Computational Intelligence Lecture 10a: Surrogate Assisted Ferrante Neri University of Jyväskylä.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Theory Chapter 11. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Theory Overview (reduced w.r.t. book) Motivations and problems Holland’s.
Evolutionary Computational Intelligence
Introduction to Genetic Algorithms Yonatan Shichel.
Reporter : Mac Date : Multi-Start Method Rafael Marti.
Learning From Data Chichang Jou Tamkang University.
An Introduction to Black-Box Complexity
Artificial Intelligence Genetic Algorithms and Applications of Genetic Algorithms in Compilers Prasad A. Kulkarni.
A TABU SEARCH APPROACH TO POLYGONAL APPROXIMATION OF DIGITAL CURVES.
Design of Curves and Surfaces by Multi Objective Optimization Rony Goldenthal Michel Bercovier School of Computer Science and Engineering The Hebrew University.
Planning operation start times for the manufacture of capital products with uncertain processing times and resource constraints D.P. Song, Dr. C.Hicks.
Evolutionary Computational Intelligence Lecture 8: Memetic Algorithms Ferrante Neri University of Jyväskylä.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Ant Colony Optimization: an introduction
Metaheuristics The idea: search the solution space directly. No math models, only a set of algorithmic steps, iterative method. Find a feasible solution.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Prepared by Barış GÖKÇE 1.  Search Methods  Evolutionary Algorithms (EA)  Characteristics of EAs  Genetic Programming (GP)  Evolutionary Programming.
Genetic Algorithms and Ant Colony Optimisation
Theory A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Chapter 11 1.
Changing Perspective… Common themes throughout past papers Repeated simple games with small number of actions Mostly theoretical papers Known available.
Evolutionary Algorithms BIOL/CMSC 361: Emergence Lecture 4/03/08.
林偉楷 Taiwan Evolutionary Intelligence Laboratory.
FDA- A scalable evolutionary algorithm for the optimization of ADFs By Hossein Momeni.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Estimation of Distribution Algorithms (EDA)
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
Genetic algorithms Charles Darwin "A man who dares to waste an hour of life has not discovered the value of life"
Genetic Algorithms Siddhartha K. Shakya School of Computing. The Robert Gordon University Aberdeen, UK
How to apply Genetic Algorithms Successfully Prabhas Chongstitvatana Chulalongkorn University 4 February 2013.
Siddhartha Shakya1 Estimation Of Distribution Algorithm based on Markov Random Fields Siddhartha Shakya School Of Computing The Robert Gordon.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
1 Genetic Algorithms and Ant Colony Optimisation.
Heuristic Methods for the Single- Machine Problem Chapter 4 Elements of Sequencing and Scheduling by Kenneth R. Baker Byung-Hyun Ha R2.
Accelerating Random Walks Wei Wei and Bart Selman.
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
GENETIC ALGORITHMS Tanmay, Abhijit, Ameya, Saurabh.
For Solving Hierarchical Decomposable Functions Dept. of Computer Engineering, Chulalongkorn Univ., Bangkok, Thailand Simultaneity Matrix Assoc. Prof.
DEPARTMENT/SEMESTER ME VII Sem COURSE NAME Operation Research Manav Rachna College of Engg.
Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *
Something about Building Block Hypothesis Ying-Shiuan You Taiwan Evolutionary Intelligence LAB 2009/10/31.
© P. Pongcharoen CCSI/1 Scheduling Complex Products using Genetic Algorithms with Alternative Fitness Functions P. Pongcharoen, C. Hicks, P.M. Braiden.
By Ping-Chu Hung Advisor: Ying-Ping Chen.  Introduction: background and objectives  Review of ECGA  ECGA for integer variables ◦ Experiments and performances.
On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.
Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.
Digital Optimization Martynas Vaidelys.
C.-S. Shieh, EC, KUAS, Taiwan
Who cares about implementation and precision?
Analytics and OR DP- summary.
Probability-based Evolutionary Algorithms
Irina Rish IBM T.J.Watson Research Center
A Consensus-Based Clustering Method
Metaheuristic methods and their applications. Optimization Problems Strategies for Solving NP-hard Optimization Problems What is a Metaheuristic Method?
“Hard” Optimization Problems
FDA – A Scalable Evolutionary Algorithm for the Optimization of Additively Decomposed Functions BISCuit EDA Seminar
Presentation transcript:

1 Structure of search space, complexity of stochastic combinatorial optimization algorithms and application to biological motifs discovery Robin Gras INRIA Rennes

2 Black box global combinatorial optimization Search of the maximums of a function F (fitness) with integer variables. Search space C = set of all legal instantiations. Global = the search space is the Cartesian product of the domains of the variables. Black box = the analytical form of F is not known but its value is computable in each point. A lot of difficult bioinformatics problems (most of them are NP- hard) can be represented as black box combinatorial optimisation problems.

3 A few definitions Operator o : Move (exploration) from a set of points Ec (a sample) to another set of points. Neighborhood Vo (given an operator): set of all reachable points from a set Ec by one application of the operator. Landscape = triplet (C,F,Vo) Local maximum Mo (given an operator): X is a local maximum given o iff: Metaheuristic: heuristic exploration algorithm for black box combinatorial optimization problems.

4 What is complexity? All difficult problems are not equal…

5 Influence of the maximums Basin of attraction of Mo: the set of points of C from which Mo is reachable by a sequence of applications of o. Study of basins of attraction. Study of fitness cloud (variation of fitness values in function of fitness values). F(x) Global maximum Local maximum Reverse hill-climber Results (empirical): Number of global maximums compared with the size of C Number of global maximums compared with number of local maximums Size and overlap of basins of attractions Linked to the neighborhood!

6 Problem decomposability: epistasie A problem of size n which is decomposable in n/k independent sub- problems of size k has a complexity in 2 k.n/k Epistasie: maximum number of variables of which depend each of the n variables ~ level of non-linearity  measures of non-linearity. Example: spin glasses Function with a tunable level of epistasie : NK-landscape With N the size of the problem and K the number of dependencies

7 Fitness and instantiation: deceptive functions Function built to be difficult for hill-climber. Function that is almost linear except for a few points of the search space. Example: trap 5 function Trap 5 (X) Independent of the neighborhood!

8 Efficiency of evolutionary approaches. Two main strategies: - A priori definition of the operators - Discovery of the operators

9 Classic genetic algorithms (Holland 75) Exploration by sampling. Population = sample Evaluation and bias by selection Generation of a new population by application of operators Experimental studies of the efficiency and the behavior  various conclusions. Theoretical studies. Convergence proofs in simple cases (Goldberg 87). Introduction to the notion of schemes (Bagley 67) Computation of the efficiency on deceptive problems (Goldberg 93):

10 Probabilistic model building algorithms Principle: discovering the dependencies and the structure using the sample. First: model = univariate distribution of the variables (Muhlenbein et Voigt 96). BOA: Bayesian network (Pelikan et Golderg 99). hBOA: Bayesian network + decision graph + ecological niches (Pelikan et Goldberg 00). building of the network Population generation Population size Number of generations Global complexity Validation on spin glasses, MAXSAT and deceptive functions.

11 Still limitations… Convergence proofs do not take into account the strong heuristic (greedy) used to build the network. The quality measure is not computed at each step. High global computation cost.  What are the consequences on the real efficiency of hBOA?

12 Non deceptive function Deceptive function Adjacent Configuration X i = {x i, x i+m, …, x i+(k-1)m } with i  {1, …,m} Problem of size 120 and 100 with sub-functions of size between 4 and 12 Classic genetic algorithm with adjacent configuration Non-adjacent configuration: 100% of failure ! Classical genetic algorithm Non-adjacent configuration X i = {x (i-1).k+1, x (i-1).k+2, …, x (i-1).k+k } with i  {1, …,m} Efficiency of evolutionary algorithms on high epistasie level deceptive problems

13 hBOA behavior Adjacent configuration When the epistasie level is above 6 the global maximum is never reached in 100 generations. Computation time became prohibitive: more than 60 hours with epistasie of 12. High dependency with the structure of the deceptive function. Non-adjacent configuration Same results so hBOA is not dependent of the configuration.  real capacity to detect and handle the dependencies but the heuristics used do not allow the obtaining of demonstrated results.

14 Simple PMBGA algorithm dedicated to deceptive problems Test of several measures on bivariate frequencies: only frequencies, conditional probability, mutual information and statistical implication. Algorithm: taking into account the deceptive property: Random generation of the initial population Tournament selection Computation of the measures for each couple of variables Building of a solution: Direct obtaining of the solution in each case (adjacent or not) in few minutes! With mixed sub-functions (deceptive and non-deceptive) Impossible to discover the solution. hBOA : no difference with mixed sub-functions

15 Conclusions on black box problems Easy if there is no dependency. As soon as the epistasie level is 2 or above and there is overlap between dependencies the problem is NP-hard. In general the problem is NP-hard when the dependencies are not known. Two possible approaches: Expert: Use of expert knowledge about the problem to build pertinent neighborhood and operators. Automatic: discovery of the dependencies by sampling and model building. Limited by the number of dependencies and necessity of new heuristics.

16 Further works: new algorithms Definition of a more structured benchmark than NK-landscape. New PMBGA algorithm New probabilistic model. New quality measure for the model. More efficient heuristics for the model building. Learning of the number of dependencies. PMBGP algorithm Specialization for non-linear regression. Use of dependencies discovery.