Topologically Adaptive Stochastic Search I.E. Lagaris & C. Voglis Department of Computer Science University of Ioannina - GREECE IOANNINA ATHENS THESSALONIKI.

Topologically Adaptive Stochastic Search I.E. Lagaris & C. Voglis Department of Computer Science University of Ioannina - GREECE IOANNINA ATHENS THESSALONIKI

Global Optimization  The goal is to find the Global minimum (or minima) inside a bounded domain:  One way to do that, is to find all the local minima and choose among them the global one (or ones).  A popular method of that kind is the so called “Multistart”.

Local Optimization  Let a point  Starting from x, a local search procedure L, reaches a minimum  This may be denoted as: Multistart applies repeatedly a local optimization procedure.

Regions of Attraction  For a local search procedure L, the region of attraction of the minimum y i, is defined by: Observe the dependence on L.

“IDEAL” MultiStart (IMS)  This is a version in which every local minimum is found only once.  It assumes that from the position of a minimum, its region of attraction may be directly determined.  Since this is a false assumption, IMS is of no practical value.  However it offers a framework and a target.

1.Initialize: Set k=1 Sample 2.Terminate if a stopping rule applies 3.Sample 4.Main Step: 5.Iterate: Go back to step 2. Ideal MultiStart (IMS)

Making IMS practical  Since the regions of attraction of the minima discovered so far, are not known, it is not possible to determine if a point belongs or not to their union.  However, a probability may be estimated, based on several assumptions.  Hence, a stochastic modification may render IMS useful.

Stochastic modification of the main step 4.Main Step: Estimate the probability p, that Apply a local search with probability p. If Then Endif

The probability estimation  Overestimated probability ( p→1 ), increases the computational cost, and transforms the algorithm towards the standard MultiStart.  Underestimated probability will cause an iteration delay without significant computational cost. (Only sampling, no local search).

Probability model  If a sample point is close to an already known minimizer, the probability that it does not belong to its region of attraction is small and zero at the limit of complete coincidence.  From the above follows that:

Probability model  Let  If, R i being a radius such that A i is contained in the sphere ( y i, R i ), then certainly:  Hence

Probability model Where and P 3 (z) is a cubic polynomial so that both are continuous.

Defining the model parameters  There are three parameters to specify for each z. Namely: a, r, R.  All of them will depend on the associated minimum y i, and the iteration count ( k ), i.e. a=a i (k), r=r i (k), and R=R i (k).

Interpreting the model parameters  r i is the distance below which the probability is descending quadratically and depends on the size of the “valley”.  As the algorithm proceeds, y i may be discovered repeatedly. Every time it is rediscovered, r i is increased in order to adapt to the local geometry.

Interpreting the model parameters  a i is the probability at z i = r i  As y i is being rediscovered, a i should be decreased to render a future rediscovery less probable.  If l i is the number of times y i is being discovered so far, then we set:

Choosing the model parameters  r i is being increased as: and is safeguarded by:  η being the machine precision.  R i is taken to be and is updated every time a local search rediscovers y i.

Gradient Information  In the case where d=y i -x is descent, the probability is reduced by a factor p g  [0,1].  p g is zero when d is parallel to, and one when it is perpendicular to it.  Namely this factor is given by: and is used only when z i  [0.7r i,0.9r i ]

Ascending Gradient Rule  If the direction is not descent at x, i.e. if it signals that x is not “attracted” towards y i, i.e. does not fall inside its region of attraction. In this case

Asymptotic guaranty  The previous gradient rule, together with the model s(x) guarantee that asymptotically all minima will be found with probability one.  Hence the global minimum will surely be recovered asymptotically.

Probability  Having estimated the probability we can estimate ideally as: However the product creates a problem illustrated next.

The probability at x is reduced since it falls inside two spheres centered at y i and y j. Note that x will lead to a new minimum and ideally its probability should have been high. This is an effect that may be amplified in many dimensions. Local minimum not discovered yet

Estimating the probability  To circumvent this problem we consider the following estimate: Where the index “ cn ” stands for Closest Neighbor. Namely we take in account only the closest minimizer.

Local nature of the probability  The probability model is based on distances from the discovered minima.  It is implicitly assumed that the closer to a minimum a point is, the greater the probability that falls inside its RA.  This is not true for all local search procedures L.

Local search properties Regions of attraction should contain the minimum and be contiguous. Ideally the regions of attraction should resemble the ones produced by a descent method with infinitesimal step. So the local search should be carefully chosen. The local search dictates the shape of the regions of attraction.

Desired local search Simplex, with small initial opening

Undesired local search BFGS with strong Wolfe line search

http://www.geatbx.com/docu/fcnindex-msh_f8_8-21.gif Ackley Rastrigin Griewangk Shubert

Rotated Quadratics This test function is constructed  so that its contours form non-convex domains.  C. Voglis, private communication

Preliminary results

Parallel processing  The described process uses a single sample point and performs a local search with a probability.  If many points are sampled, multiple local searches may be performed in parallel, gaining so significantly in performance.

Parallel processing gain  Note however that the probability estimation will be based on data that are updated in batches.  This update delay is significant in the first few rounds only.  A further gain may be possible using a clustering technique before the local search is applied.

Clustering filter Sample M points Estimate the probability to start a local search (LS). Decide from which points a LS will start. Apply to these points a clustering technique and decide to start a LS from only one point of each cluster. Send the selected points to the available processors that will perform the LS.

Topologically Adaptive Stochastic Search I.E. Lagaris & C. Voglis Department of Computer Science University of Ioannina - GREECE IOANNINA ATHENS THESSALONIKI.

Similar presentations

Presentation on theme: "Topologically Adaptive Stochastic Search I.E. Lagaris & C. Voglis Department of Computer Science University of Ioannina - GREECE IOANNINA ATHENS THESSALONIKI."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Topologically Adaptive Stochastic Search I.E. Lagaris & C. Voglis Department of Computer Science University of Ioannina - GREECE IOANNINA ATHENS THESSALONIKI.

Similar presentations

Presentation on theme: "Topologically Adaptive Stochastic Search I.E. Lagaris & C. Voglis Department of Computer Science University of Ioannina - GREECE IOANNINA ATHENS THESSALONIKI."— Presentation transcript:

Similar presentations

About project

Feedback