The Nelder-Mead minimization algorithm

It is easy to find the minimum of {f(x,y) = x^2+16y^2} if you are human. For a computer this takes more work:

Search for the minimum of x^2+16y^2
Search for the minimum of x^2+16y^2

The animation shows a simplified form of the Nelder-Mead algorithm: a simplex-based minimization algorithm that does not use any derivatives of {f}. Such algorithms are easy to come up with for functions of one variable, e.g., the bisection method. But how to minimize a function of two variables?

A natural way to look for minimum is to slide along the graph in the direction opposite to {\nabla f}; this is the method of steepest descent. But for computational purposes we need a discrete process, not a continuous one. Instead of thinking of a point sliding down, think of a small tetrahedron tumbling down the graph of {f}; this is a discrete process of flips and flops. The process amounts to the triangle of contact being replaced by another triangle with an adjacent side. The triangle is flipped in the direction away from the highest vertex.

This is already a reasonable minimization algorithm: begin with a triangle {T}; find the values of {f} at the vertices of {T}; reflect the triangle away from the highest value; if the reflected point {R} has a smaller value, move there; otherwise stop.

But there’s a problem: the size of triangle never changes in this process. If {T} is large, we won’t know where the minimum is even if {T} eventually covers it. If {T} is small, it will be moving in tiny steps.

Perhaps, instead of stopping when reflection does not work anymore, we should reduce the size of {T}. It is natural to contract it toward the “best” vertex (the one with the smallest value of {f}), replacing two other vertices with the midpoints of corresponding sides. Then repeat. The stopping condition can be the values of {f} at all vertices becoming very close to one another.

This looks clever, but the results are unspectacular. The algorithm is prone to converge to a non-stationary point where just by an accident the triangle attains a nearly horizontal position. The problem is that the triangle, while changing its size, does not change its shape to fit the geometry of the graph of {f}.

The Nelder-Mead algorithm adapts the shape of the triangle by including the possibility of stretching while flipping. Thus, the triangle can grow smaller and larger, moving faster when the path is clear, or becoming very thin to fit into a narrow passage. Here is a simplified description:

  • Begin with some triangle {T}.
  • Evaluate the function {f} at each vertex. Call the vertices {W,G,B} where {W} is the worst one (the largest value of {f}) and {B} is the best.
  • Reflect {W} about the midpoint of the good side {GB}. Let {R} be the reflected point.
  • If {f(R)<f(B)}, then we consider moving even further in the same direction, extending the line {WR} beyond {R} by half the length of {WR}. Choose between {R} and {E} based on where {f} is smaller, and make the chosen point a new vertex of our triangle, replacing {W}.
  • Else, do not reflect and instead shrink the triangle toward {B}.
  • Repeat, stopping when we either exceed the number of iterations or all values of {f} at the vertices of triangle become nearly equal.

(The full version of the Nelder-Mead algorithm also includes the comparison of {R} with {G}, and also involves trying a point inside the triangle.)

Rosenbrock's function
Rosenbrock’s function

This is Rosenbrock’s function {f(x,y)=100(x^2-y)^2 + (x-1)^2}, one of standard torture tests for minimization algorithms. Its graph has a narrow valley along the parabola {y=x^2}. At the bottom of the valley, the incline toward the minimum {(1,1)} is relatively small, compared to steep walls surrounding the valley. The steepest descent trajectory quickly reaches the valley but dramatically slows down there, moving in tiny zig-zagging steps.

The algorithm described above gets within {0.001} of the minimum in 65 steps.

Minimizing Rosenbrock's function
Minimizing Rosenbrock’s function

In conclusion, Scilab code with this algorithm.

x = -0.4:0.1:1.6; y = -2:0.1:1.4          // viewing window  
[X,Y] = meshgrid(x,y); contour(x,y,f(X,Y)',30)  // contour plot
plot([1],[1],'r+')                             // minimum point
tol = 10^(-6)
n = 0
T = [0, -1.5 ; 1.4, -1.5; 1.5, 0.5]        //  initial triangle
for i=1:3 
    values(i) = f(T(i,1), T(i,2))
while (%T) 
    xpoly(T(:,1),T(:,2),'lines',1)         // draw the triangle  
    [values, index] = gsort(values)          // sort the values 
    T = T(index,:)       
    if values(1)-values(3) < tol            // close enough?
        mfprintf(6, "Minimum at (%.3f, %.3f)",T(3,1),T(3,2))
    R = T(2,:) + T(3,:) - T(1,:)             // reflected  
    fR = f(R(1),R(2))
    if fR < values(3)                         
        E = 1.5*T(2,:) + 1.5*T(3,:) - 2*T(1,:)  // extended  
        fE = f(E(1),E(2))
        if fE < fR 
            T(1,:)=E; values(1)=fE     // pick extended 
            T(1,:)=R; values(1)=fR     // pick reflected 
        for i=1:2
            T(i,:) = (T(i,:)+T(3,:))/2      // shrink
            values(i) = f(T(i,1), T(i,2))      
    n = n+1
    if n >= 200
        disp('Failed to converge'); break    //  too bad 

Life after Google Reader, day 2

Having exported the Shared/Starred items from Google Reader, I considered several ways of keeping them around, and eventually decided to put them right here (see “Reading Lists” in the main navigation bar). My reasons were:

  • This site is unlikely to disappear overnight, since I am paying WordPress to host it.
  • Simple, structured HTML is not much harder for computers to parse than JSON, and is easier for humans to use.
  • Someone else might be interested in the stuff that I find interesting.

First step was to trim down shared.json and starred.json, and flatten them into a comma-separated list. For this I used Google Refine, which, coincidentally, is another tool that Google no longer develops. (It is being continued by volunteers as an open-source project OpenRefine). The only attributes I kept were

  • Title
  • Hyperlink
  • Author(s)
  • Feed name
  • Timestamp
  • Summary

Springer feeds gave me more headache than anything else in this process. They bundled the authors, abstract, and other information into a single string, within which they were divided only by presentational markup. I cleaned up some of Springer-published entries, but also deleted many more than I kept. “Sorry, folks – your paper looks interesting, but it’s in Springer.”

Compared to the clean-up, conversion to HTML (in a Google Spreadsheet) took no time at all. I split the list into chunks 2008-09, 2010-11, 2012, and 2013, the latter being updated periodically. Decided not to do anything about LaTeX markup; much of it would not compile anyway.

Last question: how to keep the 2013 list up-to-date, given that Netvibes offers no export feature for marked items? jQuery to the rescue:

Now with a freehand circle!
Now with a freehand circle!

The script parses the contents of Read Later tab, extracts the items mentioned above, and wraps them into the same HTML format as the existing reading list.

$(document).ready(function() {
  function exportSaved() {
  var html='';
  $('.hentry.readlater').each(function() {
    var title = $('.entry-innerTitle', this).html().split('<span')[0];
    var hyperref = $('.entry-innerTitle', this).attr('href');
    var summary = ($('.entry-content', this).html()+'').replace('\n',' ');
    var timestamp = $('.entry-innerPublished', this).attr('time');
    var feed = $('.entry-innerFeedName', this).html();
    var author = ($('.author', this).html()+'').slice(5);
    var date = new Date(timestamp*1000);
    var month = ('0'+(date.getMonth()+1)).slice(-2);
    var day = ('0'+date.getDate()).slice(-2);
    var readableDate = date.getFullYear()+'-'+month+'-'+day;
        html = html+"<div class='list-item'><h4 class='title'><strong><a href='"+hyperref+"'>"+title+"</a></strong></h4><h4 class='author'>"+author+"</h4><h5 class='source'>"+feed+" : "+readableDate+"</h5><div class='summary' style='font-size:80%;line-height:140%;'>"+summary+"</div></div>";  
   $('export').insertAfter($('#switch-view')).click(function() { exportSaved(); });

The script accesses the page via an extension in Google Chrome. It labels HTML elements with classes for the convenience of future parsing, not for styling. I had to use inline styles to keep WordPress happy. Dates are formatted according to ISO 8601, making it possible to quickly jump to any month using in-page search for “yyyy-mm”. Which would not work with mm/dd/yyyy, of course.

Unfortunately, some publishers still insert the authors’ names in the summary. Unlike arXiv, which serves reasonably structured feeds, albeit with unnecessary hyperlinks in the author field. While I am unhappy with such things (and the state of the web in general), my setup works for me, and this concludes my two-post detour from mathematics.

xkcd 743: Infrastructures

Life after Google Reader

My first reaction to the news was to grab my data and remove Google Reader from bookmarks. I am not in the habit of clinging to doomed platforms. The downloaded zip archive has a few files with self-explanatory names. For example, subscriptions.xml is the list of subscriptions, which can be imported, for example, to Netvibes. Which is what I did, having found the Reader mode of Netvibes a reasonably close approximation. Some of the keyboard shortcuts work the same way (J and K), but most are different:

Netvibes reader shortcuts
Netvibes reader shortcuts

There is no native Android app, and the webapp does not play well with Opera Mobile. At least it works with the stock Android browser:

Netvibes version of “star” is “mark to real later”. But apparently, there is no way to export the list of such items. (Export feeds returns only the list of feeds.) The list of items of interest is more important to me than my collection of feeds.

Besides, I already have two sizable lists of such items, courtesy of Google: shared.json, from back when Google Reader had sensible sharing features; and starred.json from recent years. I guess my next step will be to work out a way to merge this data and keep it up-to-date and under my control in the future.

fourier transform according to Saturday Morning Breakfast Cereal

Although not (yet?) tenured, I decided to implement this transform anyway: see fourier transform. To determine the fouriest transform uniquely, in case of a tie the greatest base wins (this is reasonable because greater values of the base correspond to more compact representations).

Caution: JavaScript integers go only this far
Caution: JavaScript integers go only this far

If no base can beat the decimal, the fourier transform does not exist. I limited consideration to bases up to 36, because they yield convenient representations using the alphabet

Some examples:

number fourier transform(s) fouriest
42 does not exist does not exist
2013 43b22, 4bi21, 56047 43b22
13244 4104345 4104345
57885161 (too many to list) 54244025056

The last line is merely the exponent of the latest and greatest Mersenne prime. If you want to transform the number itself, go four it…

The transform may have a practical application. For example, my office phone number 4431487 is a pretty undistinguished number. But it has a unique fourier (hence fouriest) transform 524445347, which is easier to remember. This will come in handy when dialpads become equipped with a number base switch.

I added the line “Input interpreted as …” to give some indication of how JavaScript handles integers out of its 2^{53} range. For example, entering 9999999999999999 you will find the the code actually transforms 10000000000000000. This one is handled correctly: the fouriest transform is 2422441203251352401446. But in general, round-off errors may distort the results when the input is large. Do not rely on this code in the situations when the fourierness of a very large number is a matter of life and death.

Open problem: can a number of the form 444…4 have a fourier transform? It is conceivable, though unlikely, that its representation in a base smaller than 10 could have more 4s.

Web-based LaTeX to WordPress (and something about polynomials)

Recently I began using the excellent Python program LaTeX to WordPress (LaTeX2WP) by Luca Trevisan. Since some of my posts are written on computers where Python is not readily available, I put LaTeX2WP on PythonAnywhere as a web2py application. The downside is that settings are stuck at default (e.g., you get blackboard bold letters using \N, \Z, \R, \C, \Q). The upside is that the process simplifies to copy-paste-click-copy-paste.

The application is here: Web-based LaTeX2WP and it looks like this:


Although my modification of the source code is utterly trivial, according to GPL I should put it here. Namely, I merged and into one file, since web users can’t edit the style file anyway. Also replaced file input/output by a function call, which comes from web2py controller:

def index():
    import convert4
    form=FORM(TEXTAREA(_name='LaTeX', requires=IS_NOT_EMPTY()), BR(), INPUT(_type='submit', _value='Convert to WordPress'))
    produced = ''
    if form.accepts(request,session):
        produced = convert4.mainproc(form.vars.LaTeX)
    form2=FORM(TEXTAREA(_name='WP', value=produced))
    return dict(form=form, form2=form2)

which in turn populates the html file:

Enter LaTeX code here
Get WordPress code here

To beef up this post, I include sample output. It is based on an old write-up of my discussions with Paul Gustafson during REU 2006 at Texas A&M University. Unfortunately, the project was never finished. I would still like to know if {\partial}-equivalence of polynomials appeared in the literature; the definition looks natural enough.

Given two polynomials {p,q \in {\mathbb C}[z_1,\dots,z_n]}, write {q\preccurlyeq p} if there exists a differential operator {\mathcal T\in {\mathbb C}[\frac{\partial}{\partial z_1},\dots, \frac{\partial}{\partial z_n}]} such that {q=\mathcal Tp}. The relation {\preccurlyeq} is reflexive and transitive, but is not antisymmetric. If both {p\preccurlyeq q} and {q\preccurlyeq p} hold, we can say that {p} and {q} are {\partial}-equivalent.

A polynomial is {\partial}-homogeneous if it is {\partial}-equivalent to a homogeneous polynomial.

It turns out that it is easy to check the {\partial}-homogeneity of {p} by decomposing it into homogeneous parts

\displaystyle    p=p_0+p_1+\dots +p_d   \ \ \ \ \ (1)

where {p_k} is homogeneous of degree {k} and {p_d\not\equiv 0}.

Then use the following

Proposition 1.
The polynomial (1) is {\partial}-homogeneous if and only if {p_k\preccurlyeq p_d} for {k=0,\dots, d-1}.

Proof: Exercise. \Box

For example, the polynomial {p(z,w)=z^3-5w^3+2zw} is not {\partial}-homogeneous since {zw\not\preccurlyeq (z^3-5w^3)}. On the other hand, {q(z,w)=z^3-3z^2w+2zw} is {\partial}-homogeneous because {zw\preccurlyeq (z^3-3z^2w)}. In particular, {p} and {q} are not {\partial}-equivalent.

Proposition 1 also makes it clear that every polynomial in one variable is {\partial}-homogeneous. For univariate polynomials {\partial}-equivalence amounts to having the same degree.

WeBWork class roster import

One way to import classroster into WeBWork (at SU):

  1. Download the roster from Blackboard Grade Center and import it into a spreadsheet
  2. The first four columns A,B,C,D will be Last Name, First Name, UserName, Student ID.
  3. Append the column with the function

    (second row shown). Or


    if using OpenOffice.

  4. Using WeBWork file manager, create a file roster.lst and paste this new column into it.
  5. Use the Import Users command under Classlist editor.