Download presentation
Presentation is loading. Please wait.
1
Topologically Adaptive Stochastic Search I.E. Lagaris & C. Voglis Department of Computer Science University of Ioannina - GREECE IOANNINA ATHENS THESSALONIKI
2
Global Optimization The goal is to find the Global minimum (or minima) inside a bounded domain: One way to do that, is to find all the local minima and choose among them the global one (or ones). A popular method of that kind is the so called “Multistart”.
3
Local Optimization Let a point Starting from x, a local search procedure L, reaches a minimum This may be denoted as: Multistart applies repeatedly a local optimization procedure.
4
Regions of Attraction For a local search procedure L, the region of attraction of the minimum y i, is defined by: Observe the dependence on L.
5
“IDEAL” MultiStart (IMS) This is a version in which every local minimum is found only once. It assumes that from the position of a minimum, its region of attraction may be directly determined. Since this is a false assumption, IMS is of no practical value. However it offers a framework and a target.
6
1.Initialize: Set k=1 Sample 2.Terminate if a stopping rule applies 3.Sample 4.Main Step: 5.Iterate: Go back to step 2. Ideal MultiStart (IMS)
7
Making IMS practical Since the regions of attraction of the minima discovered so far, are not known, it is not possible to determine if a point belongs or not to their union. However, a probability may be estimated, based on several assumptions. Hence, a stochastic modification may render IMS useful.
8
Stochastic modification of the main step 4.Main Step: Estimate the probability p, that Apply a local search with probability p. If Then Endif
9
The probability estimation Overestimated probability ( p→1 ), increases the computational cost, and transforms the algorithm towards the standard MultiStart. Underestimated probability will cause an iteration delay without significant computational cost. (Only sampling, no local search).
10
Probability model If a sample point is close to an already known minimizer, the probability that it does not belong to its region of attraction is small and zero at the limit of complete coincidence. From the above follows that:
11
Probability model Let If, R i being a radius such that A i is contained in the sphere ( y i, R i ), then certainly: Hence
12
Probability model Where and P 3 (z) is a cubic polynomial so that both are continuous.
13
Defining the model parameters There are three parameters to specify for each z. Namely: a, r, R. All of them will depend on the associated minimum y i, and the iteration count ( k ), i.e. a=a i (k), r=r i (k), and R=R i (k).
14
Interpreting the model parameters r i is the distance below which the probability is descending quadratically and depends on the size of the “valley”. As the algorithm proceeds, y i may be discovered repeatedly. Every time it is rediscovered, r i is increased in order to adapt to the local geometry.
15
Interpreting the model parameters a i is the probability at z i = r i As y i is being rediscovered, a i should be decreased to render a future rediscovery less probable. If l i is the number of times y i is being discovered so far, then we set:
16
Choosing the model parameters r i is being increased as: and is safeguarded by: η being the machine precision. R i is taken to be and is updated every time a local search rediscovers y i.
17
Gradient Information In the case where d=y i -x is descent, the probability is reduced by a factor p g [0,1]. p g is zero when d is parallel to, and one when it is perpendicular to it. Namely this factor is given by: and is used only when z i [0.7r i,0.9r i ]
18
Ascending Gradient Rule If the direction is not descent at x, i.e. if it signals that x is not “attracted” towards y i, i.e. does not fall inside its region of attraction. In this case
19
Asymptotic guaranty The previous gradient rule, together with the model s(x) guarantee that asymptotically all minima will be found with probability one. Hence the global minimum will surely be recovered asymptotically.
20
Probability Having estimated the probability we can estimate ideally as: However the product creates a problem illustrated next.
21
The probability at x is reduced since it falls inside two spheres centered at y i and y j. Note that x will lead to a new minimum and ideally its probability should have been high. This is an effect that may be amplified in many dimensions. Local minimum not discovered yet
22
Estimating the probability To circumvent this problem we consider the following estimate: Where the index “ cn ” stands for Closest Neighbor. Namely we take in account only the closest minimizer.
23
Local nature of the probability The probability model is based on distances from the discovered minima. It is implicitly assumed that the closer to a minimum a point is, the greater the probability that falls inside its RA. This is not true for all local search procedures L.
24
Local search properties Regions of attraction should contain the minimum and be contiguous. Ideally the regions of attraction should resemble the ones produced by a descent method with infinitesimal step. So the local search should be carefully chosen. The local search dictates the shape of the regions of attraction.
25
Desired local search Simplex, with small initial opening
26
Undesired local search BFGS with strong Wolfe line search
27
http://www.geatbx.com/docu/fcnindex-msh_f8_8-21.gif Ackley Rastrigin Griewangk Shubert
28
Rotated Quadratics This test function is constructed so that its contours form non-convex domains. C. Voglis, private communication
29
Preliminary results
30
Parallel processing The described process uses a single sample point and performs a local search with a probability. If many points are sampled, multiple local searches may be performed in parallel, gaining so significantly in performance.
31
Parallel processing gain Note however that the probability estimation will be based on data that are updated in batches. This update delay is significant in the first few rounds only. A further gain may be possible using a clustering technique before the local search is applied.
32
Clustering filter Sample M points Estimate the probability to start a local search (LS). Decide from which points a LS will start. Apply to these points a clustering technique and decide to start a LS from only one point of each cluster. Send the selected points to the available processors that will perform the LS.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.