It is easy to find the minimum of if you are human. For a computer this takes more work:
The animation shows a simplified form of the Nelder-Mead algorithm: a simplex-based minimization algorithm that does not use any derivatives of . Such algorithms are easy to come up with for functions of one variable, e.g., the bisection method. But how to minimize a function of two variables?
A natural way to look for minimum is to slide along the graph in the direction opposite to ; this is the method of steepest descent. But for computational purposes we need a discrete process, not a continuous one. Instead of thinking of a point sliding down, think of a small tetrahedron tumbling down the graph of ; this is a discrete process of flips and flops. The process amounts to the triangle of contact being replaced by another triangle with an adjacent side. The triangle is flipped in the direction away from the highest vertex.
This is already a reasonable minimization algorithm: begin with a triangle ; find the values of at the vertices of ; reflect the triangle away from the highest value; if the reflected point has a smaller value, move there; otherwise stop.
But there’s a problem: the size of triangle never changes in this process. If is large, we won’t know where the minimum is even if eventually covers it. If is small, it will be moving in tiny steps.
Perhaps, instead of stopping when reflection does not work anymore, we should reduce the size of . It is natural to contract it toward the “best” vertex (the one with the smallest value of ), replacing two other vertices with the midpoints of corresponding sides. Then repeat. The stopping condition can be the values of at all vertices becoming very close to one another.
This looks clever, but the results are unspectacular. The algorithm is prone to converge to a non-stationary point where just by an accident the triangle attains a nearly horizontal position. The problem is that the triangle, while changing its size, does not change its shape to fit the geometry of the graph of .
The Nelder-Mead algorithm adapts the shape of the triangle by including the possibility of stretching while flipping. Thus, the triangle can grow smaller and larger, moving faster when the path is clear, or becoming very thin to fit into a narrow passage. Here is a simplified description:
- Begin with some triangle .
- Evaluate the function at each vertex. Call the vertices where is the worst one (the largest value of ) and is the best.
- Reflect about the midpoint of the good side . Let be the reflected point.
- If , then we consider moving even further in the same direction, extending the line beyond by half the length of . Choose between and based on where is smaller, and make the chosen point a new vertex of our triangle, replacing .
- Else, do not reflect and instead shrink the triangle toward .
- Repeat, stopping when we either exceed the number of iterations or all values of at the vertices of triangle become nearly equal.
(The full version of the Nelder-Mead algorithm also includes the comparison of with , and also involves trying a point inside the triangle.)
This is Rosenbrock’s function , one of standard torture tests for minimization algorithms. Its graph has a narrow valley along the parabola . At the bottom of the valley, the incline toward the minimum is relatively small, compared to steep walls surrounding the valley. The steepest descent trajectory quickly reaches the valley but dramatically slows down there, moving in tiny zig-zagging steps.
The algorithm described above gets within of the minimum in 65 steps.
In conclusion, Scilab code with this algorithm.
x = -0.4:0.1:1.6; y = -2:0.1:1.4 // viewing window [X,Y] = meshgrid(x,y); contour(x,y,f(X,Y)',30) // contour plot plot(,,'r+') // minimum point tol = 10^(-6) n = 0 T = [0, -1.5 ; 1.4, -1.5; 1.5, 0.5] // initial triangle for i=1:3 values(i) = f(T(i,1), T(i,2)) end while (%T) xpoly(T(:,1),T(:,2),'lines',1) // draw the triangle [values, index] = gsort(values) // sort the values T = T(index,:) if values(1)-values(3) < tol // close enough? mfprintf(6, "Minimum at (%.3f, %.3f)",T(3,1),T(3,2)) break end R = T(2,:) + T(3,:) - T(1,:) // reflected fR = f(R(1),R(2)) if fR < values(3) E = 1.5*T(2,:) + 1.5*T(3,:) - 2*T(1,:) // extended fE = f(E(1),E(2)) if fE < fR T(1,:)=E; values(1)=fE // pick extended else T(1,:)=R; values(1)=fR // pick reflected end else for i=1:2 T(i,:) = (T(i,:)+T(3,:))/2 // shrink values(i) = f(T(i,1), T(i,2)) end end n = n+1 if n >= 200 disp('Failed to converge'); break // too bad end end