Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Introduction Simple Random Sampling Stratified Random Sampling
Estimation of Means and Proportions
Shortest Vector In A Lattice is NP-Hard to approximate
General Linear Model With correlated error terms  =  2 V ≠  2 I.
Hypothesis testing Another judgment method of sampling data.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Outline input analysis input analyzer of ARENA parameter estimation
The General Linear Model. The Simple Linear Model Linear Regression.
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
The Mean Square Error (MSE):. Now, Examples: 1) 2)
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Simple Linear Regression
Robust Estimator 學生 : 范育瑋 老師 : 王聖智. Outline Introduction LS-Least Squares LMS-Least Median Squares RANSAC- Random Sample Consequence MLESAC-Maximum likelihood.
1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Presenting: Assaf Tzabari
Parametric Inference.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Visual Recognition Tutorial
Visual Recognition Tutorial
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
Correlation. The sample covariance matrix: where.
Collaborative Filtering Matrix Factorization Approach
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
Statistical Decision Theory
The Triangle of Statistical Inference: Likelihoood
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.
Siddhartha Shakya1 Estimation Of Distribution Algorithm based on Markov Random Fields Siddhartha Shakya School Of Computing The Robert Gordon.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
8.1 Estimating µ with large samples Large sample: n > 30 Error of estimate – the magnitude of the difference between the point estimate and the true parameter.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Why do GAs work? Symbol alphabet : {0, 1, * } * is a wild card symbol that matches both 0 and 1 A schema is a string with fixed and variable symbols 01*1*
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
Computacion Inteligente Least-Square Methods for System Identification.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
BINARY LOGISTIC REGRESSION
Visual Recognition Tutorial
Sample Mean Distributions
Spectral Clustering.
Basic Econometrics Chapter 4: THE NORMALITY ASSUMPTION:
More about Posterior Distributions
Collaborative Filtering Matrix Factorization Approach
POINT ESTIMATOR OF PARAMETERS
EC 331 The Theory of and applications of Maximum Likelihood Method
Generally Discriminant Analysis
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar

Abstract A class of EDAs on which universal results on the rate of diversity loss can be derived.  The class (SML-EDA) requires two restrictions.  In each generation, the new probability model is built using only sampled from the current probability model.  Maximum likelihood is used to set model parameters. The algorithm will never find the optimum unless the population size grows exponentially.

EDA Initialize the EDA probability model (Start with a random population of M vectors.) Repeat Select N vectors using a selection method Learn a new probability model from the selected population Sample M vectors from the probability model Until stopping criteria are satisfied

SML-EDA A restricted class of EDAs needs 3 assumptions.  1. Data in generation t can only affect data in generation t+1 through the probability model.  2. The parameters of the estimated model are chosen using maximum likelihood.  3. The sample size M and the size of the population N are of a constant ratio independent of the number of variables L. SML-EDA (Simple, Maximum-Likelihood EDA)  The class of EDAs for which assumptions 1 and 2 hold are called SML-EDAs.  This class of EDAs includes BOA, MIMIC, FDA, UMDA, etc.

The Empirical frequency (the component i has the value A through the population) One diversity measure The trace of the empirical co-variance matrix Some Definitions for diversity loss Population member Population size Expectation that 2 components both take the value A Expectation that they take value A independently. the value is fixates, 0 random population, maximum value

Diversity Loss on a Flat Fitness Landscape Theorem 1  For EDAs in class SML-EDA on a flat landscape, the expected value of v is reduced in each generation by the following.  Because the parameters are set by ML, empirical variance is reduced by a factor (1-1/N) from the parent population. Universal “expected diversity loss” for all SML-EDA  decays with characteristic time approximately equal to the population size for large N.

A universal bound for the minimum population size in the needle problem Needle in a haystack problem  There is one special state (the needle), which has a high fitness value, and all others have the same low fitness value.

Probability that the needle is never sampled. Theorem 2  In the limit that such that for any EDA in SML-EDA searching on the Needle problem.  The population size must grow at least as fast as for the optimum to be found. The time that the needle is first sampled.

Proof of theorem 2  Lemma 1  Let t* be defined as If the needle has not been found after a time t > t*, the probability that the needle will never be found is greater that 1 – ε, where.

 Proof of Lemma 1  We can write the probability about the diversity loss with the formula that the variance is small.  Choosing t* to be large enough so that if v t is so small, there must be fixation at every component except possibly one component. If L-1 components are fixed, the needle will only be sampled if they are fixed at values found in the needle. We are not certain that v t appropriately small, it is just probable so. Take  t* is calculated by solving simultaneous equations.

 Lemma 2  The probability that the needle is not found after t* steps obeys  Proof –The probability that SML-EDA does not find the needle in time t* the result is found by putting into this equation the value for t* The Prob. of not finding the needle

 So, the probability of never finding the needle can be decomposed into  Combining Lemma 1, 2 gives the following  If in the limit, N grows sufficiently slowly that the third term vanishes, then the probability of never finding the needle will go to 1.  Thus, if, as, the needle will never be found.

Expected Runtime for the Needle Problem and the Limits of Universality Corollary 1.  If the needle is found, the time to find it is bounded above by t* with probability Convergence conditions will not be universal, but will be particular to the EDA.