is not such a great way to actually find the derivative numerically. Its symmetric version,

performs much better in computations. For example, consider the derivative at the point . We know that . Numerically, with , we get

(error ) versus

(error ).

Considering this, why don’t we ditch (1) altogether and adopt (2) as the definition of derivative? Just say that by definition,

whenever the limit exists.

This expands the class of differentiable functions: for example, becomes differentiable with . Which looks more like a feature than a bug: after all, has a minimum at , and the horizontal line through the minimum is the closest thing to the tangent line that it has.

Another example: the function

has under this definition, because

and

This example also makes sense: since , getting is expected. In fact, with the new definition we still have basic derivative rules: if are differentiable, then , , , are also differentiable (with the usual caveat about ) and the familiar formulas hold.

Let’s test the chain rule on the function . The rule says

Since , the product on the right is . On the other hand,

which implies, by a computation similar to the above, that . So, if we want to have the chain rule (3), we must accept that

This is where the desire for high numerical precision leads.

Plenty of other things go wrong with the symmetric definition:

Maximum or minimum of may be attained where exists and is nonzero.

A differentiable function may be discontinuous.

Having everywhere does not imply that is increasing.

(Related to previous post but can be read independently). The triangle inequality, one of the axioms of a metric space, can be visualized by coloring the vertices of a triangle by red and blue.

The inequality says that the sum of monochromatic distances must not exceed the sum of dichromatic distances. That is, for every assignment of the vertices to points in the space, the sum of all red-red and blue-blue distances does not exceed the sum of red-blue distances. An assignment is just a map from the set of vertices to the metric space ; it need not be injective.

But why stop at the triangle? We can take any number of points (vertices), color them in some way, and require the same polygonal inequality:

Already for the square we have two distinct plausible colorings to explore: even red-blue split

and predominantly red square

But it turns out that the second coloring is useless: the inequality it defines fails in every metric space with more than one point. More generally, suppose we have red points and blue ones, and . Pick two distinct points . Assign one red point to and all others to . The sum of monochromatic distances is while the sum of dichromatic distances is , which is strictly less.

So, we are limited to nearly-even colorings: those with . For even numbers of vertices this means even split, while odd numbers should be divided as evenly as possible: like 3+2 for the pentagon.

The inequalities turn out to be related. For every , the -gonal inequality implies the -gonal inequality, because if we assign two opposite-colored vertices to the same point, their contributions cancel out. More interestingly: when is odd, the -gonal inequality implies the -gonal inequality. Indeed, suppose we have points, evenly colored. Add an arbitrary th point. Whether the added point is blue or red, the -gonal inequality holds. Averaging these two inequalities, we see the effect of the added points canceling out.

So, if the -gonal inequality holds for all odd , it holds for all . This property is exactly the hypermetric property from the previous post. Except it was stated there in a different form:

for every finite sequence of points and every choice of integers such that . But if the point is repeated times, we can replace by . Then represent +1 as blue and -1 as red.

The hypermetric inequalities were introduced by John B. Kelly (a student of William Ted Martin) in late 1960s. He showed they are necessary for the space to be embeddable into the space . It would be great if they were also sufficient (and for some classes of spaces they are), but this is not so: a counterexample was given by Patrice Assouad in 1977.

It is also interesting to consider the -gonal inequalities for even only. By repetition of vertices, they are equivalent to requiring

for every finite sequence of points and every choice of integers such that . But of course then we have (1) for rational (just clear the denominators), hence for all real (by approximation), as long as they sum to . So, the requirement amounts to the matrix being negative semidefinite on the subspace . Such metrics are called metrics of negative type.

Their relation to embeddability of the space is well-known: is of negative type if and only if the “snowflake” isometrically embeds into a Hilbert space. In other words, we can “draw” any finite metric space of negative type in a Euclidean space, with the understanding that Euclidean distances represent the square roots of the actual distances. This embedding result is a 1935 theorem of Isaac Schoenberg who is also known for connecting dots naturally (introducing splines).

The Wikipedia article Metric (mathematics) offers a plenty of flavors of metrics, from common to obscure: ultrametric, pseudometric, quasimetric, semimetric, premetric, hemimetric and pseudoquasimetric (I kid you not).

One flavor it does not mention is a hypermetric. This is a metric on a set such that the inequality

holds for every finite sequence of points and every choice of integers such that . The requirement that be integers gives some combinatorial meaning to (1); this is not just some quadratic form being negative semidefinite.

As a warm-up, observe that (1) contains in it the triangle inequality: with and we get . But it appears that (1) says something about “polygons” with more vertices too.

To make (1) worth thinking about, it should be satisfied by some important metric space. Such as the real line , for example. It is not quite obvious that the inequality

holds for all reals and all integers adding up to . It helps to order the numbers: and focus on the contribution of a particular gap to the sum (2). The amount it contributes is multiplied by

because for every integer . This proves (2).

Now that we have one hypermetric space, , other such spaces can be created easily. If is any set and any function, consider , the pullback pseudometric on . By applying (2) to the numbers , we see that satisfies the hypermetric inequality. Since (1) is additive in , we can take any family of functions and add together the corresponding pseudometrics. Or even integrate them against a positive measure: .

For example, the plane is a hypermetric space, because the distance between two points and , besides the familiar form

can also be represented as an integral of the aforementioned pullbacks:

A similar integral representation holds in all dimensions; thus, all Euclidean spaces are hypermetric.

Okay, what is not a hypermetric then? For example, the cube distance induced by the norm is not, in dimensions 3 and higher. Specifically, (1) fails as the five-point inequality with . I’ll call it the pentagram inequality:

It says that for any five points in the space the sum of monochromatic distances does not exceed the sum of all bi-chromatic (red-blue) distances.

The pentagram inequality fails when are the columns of the matrix

(first three columns blue, the last two red). Indeed, the sum of monochromatic distances is while the sum of bichromatic distances is .

If the above example does not look conceptual enough, it’s because I found it via computer search. I don’t have much intuition for the pentagram inequality.

Anyway, the example delivers another proof that taking the maximum of three numbers is hard. More precisely, there is no isometric embedding of with the maximum metric into . Unlike the earlier proof, this one does not assume the embedding is linear.

A good reference for hypermetric inequalities is the book Geometry of cuts and metrics by Deza and Laurent.

This simple identity hold for any two real numbers :

Indeed, if realizes the maximum, then both and have the same sign as . After opening the absolute value signs, we get to cancel out.

So, (1) represents , also known as the norm, as the sum of absolute values of linear functions. Let’s try the same with . Since the right hand side of (1) is just the average of over all possible choices of signs, the natural thing to do is to average over all eight choices. The sign in front of can be taken to be , which simplifies the average to

Does this formula give ? Instead of trying random numbers, let’s just plot the unit ball for the norm given by (2). If the identity works, it will be a cube. I used Maple:

Although all terms of (2) look exactly the same, the resulting shape has both triangular and square faces. Where does the difference of shapes come from?

More importantly, is (2) really the best we can do? Is there some other sum of moduli of linear functions that will produce ?

— No.

Even if negative coefficients are allowed?

— Even then. (But you can come arbitrarily close.)

What if we allow integrals with respect to an arbitrary (signed) measure, as in

— Still no. But if is allowed to be a distribution of higher order (an object more singular than a measure), then a representation exists (W. Weil, 1976). Yes, one needs the theory of distributions to write the maximum of three numbers as a combination of linear functions.

I’ll only prove that there is no identity of the form

Indeed, such an identity amounts to having an isometric embedding . The adjoint operator is a submetry meaning that it maps the unit ball of onto the unit ball . The unit ball of is just a cube; all of its faces are centrally symmetric, and this symmetry is preserved by linear maps. But is an octahedron, with triangular faces. A contradiction.

An aside: what if instead of averaging over all choices (i.e., unimodular real coefficients) we take the average over all unimodular complex coefficients? This amounts to . I expected something nice from this norm, but

it’s a strange shape whose equation involves the complete elliptic integral of second kind. Yuck.

One way is to connect them with straight lines, creating a piecewise linear function:

This is the shortest graph of a function that interpolates the data. In other words, the piecewise linear function minimizes the integral

among all functions with . As is often the case, the length functional can be replaced with the elastic energy

because the piecewise linear (and only it) minimizes it too.

Of course, it is not natural for the connecting curve to take such sharp turns at the data points. One could try to fit a polynomial function to these points, which is guaranteed to be smooth. With 11 points we need a 10th degree polynomial. The result is disappointing:

It is not natural for a curve connecting the points with to shoot up over . We want a connecting curve that does not wiggle more than necessary.

To reduce the wiggling and remove sharp turns at the same time, one can minimize the bending energy of the function, thinking of its graph as a thin metal rod. This energy is

and the function that minimizes it subject to conditions looks very nice indeed:

The Euler-Lagrange equation for the functional dictates that the fourth derivative of is zero in the intervals between the knots . Thus, is a piecewise cubic polynomial. Also, both and must be continuous for any function with integrable second derivative. More delicate analysis is required for , but it also can be shown to be continuous for minimizing function ; moreover, must vanish at the endpoints and . Taken together, these properties (all derived from the variational problem) complete the description of a natural cubic spline.

It remains to actually construct one. I prefer to think of this process as adding a correction term to the piecewise linear interpolant . Here the spline is shown together with (green) and (magenta).

On each interval the correction term is a cubic polynomial vanishing at both endpoints. The space of such polynomials is two-dimensional: thus,

on this interval. There are 20 coefficients , to find. At each of 9 knots 1,2,…9 we have two conditions: must have removable singularity and must jump by the amount opposite to the jump of . Since also vanishes at , there are 20 linear equations for 20 unknowns.

It is easier to set up a linear system in terms of . Indeed, the values of at two consecutive knots determine the correction term within: and . This leaves equations (from the jumps of ) for unknowns. The best part is that the matrix of this system is really nice: tridiagonal with dominant diagonal.

One can solve the system for within a for loop, but I used the Scilab solver instead. Here is the Scilab code for the most interesting part: the spline. The jumps of the derivative of the piecewise linear interpolant are obtained from the second order difference of the sequence of y-values.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))
z = [0,-jumps/A,0]
allx = []; spl = []
for i=1:n
xL = a+h*(i-1)
xR = a+h*i
x = linspace(xL,xR,100)
linear = y(i)*(xR-x)/h + y(i+1)*(x-xL)/h
cor = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
allx = [allx, x]
spl = [spl, linear + cor]
end
plot(allx, spl)
plot(a:h:b, y, 'r*')

A rectangular box (aka parallelepiped) looks like a sturdy object:

But this particular box, with dimensions 7.147 by 6.021 by 4.095, took me the better part of an hour to figure out.

It was a part of a numerical methods assignment: find the dimensions of a box with given volume , surface area , and diameter (i.e., space diagonal). Algebraic approach leads to pretty ugly expressions, which is the motivation for a numerical method. Specifically, the task was to apply the Newton-Raphson method to the map

Of course, I understood that not every triple is attainable. Also realized that the Jacobian of is degenerate when two of the coordinates coincide, which is a problem for the method. So I thought: let’s generate some random values that are not too close to one another, and give students the resulting parameters .

With , , and the parameters are , , and . Sure, a little rounding can’t hurt when numbers are of this size and we are quite far from the critical points of . So I put , and in one of the versions of the assignment.

But the Newton-Raphson method would not converge… because no such box exists! The rounding did hurt after all.

This motivated me to describe all attainable triples explicitly, which ended up being less of a chore than I expected. It helps to realize that , which reduces the search to the intersection of the sphere with the plane . This is a circle (called below), and the allowed range for is between the minimum and maximum of on .

This goes into Calculus 3 territory. Using Lagrange multipliers with two constraints looks like a tedious job. Instead, I decided to parametrize . Its center is where . The radius is . We also need an orthonormal basis of the subspace : the vectors

do the job.

So, the circle is parametrized by

This is not as bad as it looks: the product simplifies to

which tells us right away that the volume lies within

In terms of the original data the bounds for take the form

(And of course, cannot be negative even if the lower bound is.) It is easy to see that with equality only for a cube; however can be relatively small even when the box does not look very cube-ish. For the box pictured above, the tolerance is approximately ; after rounding and this drops to , and the desired volume of is way out of the allowable range .

Yes, the set of attainable triples is quite thin. Parallelepipeds are fragile objects: handle them with care.

The function may be nice and important as a part of trigonometric basis, but there is nothing exciting in its graph:

Let’s look at its iterations where is the number of iterations, not an exponent. Here is the graph of :

A rectangular pattern is already visible above; further iterations only make it stronger. For example, :

It may be impossible to see on the graph, but the rectangles are slightly apart from one another (though of course they are connected by the graph of continuous function). This is easier to see on the histogram of the values for , which contains two small gaps in addition to a large one:

What goes on here? The range of on , as well as the range of any of its iterates, is of course connected: it is the closed interval . But the second iterate also has two invariant subintervals, marked here by horizontal lines:

Namely, they are and . It is easy to see that and . The gap between and contains the repelling fixed point of , approximately . Every orbit except for the fixed point itself is pushed away from this point and is eventually trapped in the cycle between and .

But there is more. A closer look at the fourth iterate reveals smaller invariant subintervals of . Here is what it does on :

Here the gap contains a repelling fixed point of , approximately . The invariant subintervals of are and . Also, contains invariant subintervals and . These are the projections of the rectangles in the graph of onto the vertical axes.

No more such splitting occurs. The histogram of the values of iterates of indeed consists of four disjoint intervals. Can one get a Cantor-type set in this way, starting from some boring function?

I like WeBWork for many reasons besides being free and open-source. One minor feature is customization of classroster emails based on the recipients’s name and other information. Including grades via spreadsheet merge would not be FEPRA-compliant, but an email saying that an exam has been graded might be worded differently based on the score that the student received. I used conditional operators in a spreadsheet for this purpose, then merged the data into email.

On the other hand, there is no way to send email only to an algorithmically selected subset of the class, for example to those who have not yet started on the homework that’s due tonight, or those that appear to need extra help. (At least not in the version of WeBWork I am using, which is not the latest one.)

So I wrote a jQuery script that selects email recipients based on the data in a Google spreadsheet. The spreadsheet need not contain sensitive information: only usernames of addressees.

A couple of years ago I found the JSON encoding of a Google Spreadsheet completely opaque. With the experience, reading it became easy. It helps to request /feeds/list/ (rather than /feeds/cells/), so that the spreadsheet is treated as a list of rows rather than a list of cells. Each row becomes an object with “title” being the first column entry, and “content” being the rest of the row data (an object with column headers as keys and cell content as values). The relevant part of the JSON string is actually quite simple:

"title":{"type":"text","$t":"username"}

I process it by creating an object selectUser in which the keys are usernames of addresses. (Could also be an array, of course). This is what the command

selectUser[user.title.$t] = 1

does below. Then, parsing the elements of the relevant <select> list, the script selects those for which selectUser[$(item).attr('value')] is defined.

function selectStudents() {
$.get('https://spreadsheets.google.com/feeds/list/__spreadsheet ID here___/oda/public/basic?alt=json', function (data) {
var userList = data.feed.entry;
var i, user, selectUser = {}, item;
for (i = 0; user = userList[i]; i++) {
selectUser[user.title.$t] = 1;
}
$('select.ScrollingRecordList > option').each(function () {
item = this;
if (selectUser[$(item).attr('value')]) {
$(item).attr('selected', 'selected');
}
});
}, 'json');
}

To have this function called only when needed, my Chrome extension inserts a link “Select recipients” under “Email” in the WeBWork dashboard.

Oh, and you need to use the old format of Google Spreadsheet for this to work. The new (otherwise far superior) format does not currently offer JSON, as far as I know. If you have switched to the new format, any new spreadsheets will be created in it; however the existing ones remain in the old format, and so do their copies. The solution is to keep some old-format spreadsheet for this purpose, just as a source of copies.

Since the id of the spreadsheet is fixed in the script, all I have to do is to locate in the Drive and Import the .csv file generated by WeBWork scoring tool as a new sheet. Then sort and filter the data by the desired criteria, and copy the recipients of the message to the first sheet.

is approximate in general, but exact for linear functions (polynomials of degree at most one).

With two sample points we can integrate any cubic polynomial exactly. The choice of sample points is not obvious: they are to be placed at distance from the midpoint. On the example below, and , so the sample points are . The integral is equal to . One can say that each sample point has weight in this rule.

Three sample points are enough for polynomials of degrees up to and including five. This time, the weights attached to the sample points are not equal. The midpoint is used again, this time with the weight of . Two other sample points are at distance from the midpoint, and their weights are each. This contraption exactly integrates polynomials of degrees up to five.

Compare this with Simpson’s rule, which also uses three sample points but is exact only up to degree three.

The above are examples of Gaussian quadrature: for each positive integer , one can integrate polynomials of degree up to by taking samples at the right places, and weighing them appropriately.

Let’s move from the real line to the complex plane. If one accepts that the analog of interval is a disk in the plane, then quadrature becomes very simple: for any disk and any complex polynomials ,

where is the center of the disk and is its radius. One sample point is enough for all degrees! The proof is easy: rewrite in terms of powers of and integrate them in polar coordinates. The same works for any holomorphic function, as long as it is integrable in .

But maybe the disk is a unique such shape? Not at all: there are other such quadrature domains. A simple family of examples is Neumann ovals (Neumann as in “boundary condition”). Geometrically, they are ellipses inverted in a concentric circle. Analytically, they are (up to linear transformations) images of the unit disk under

This image, denoted below, looks much like an ellipse when is small:

Then it becomes peanut-shaped:

For it looks much like the union of two disks (but the boundary is smooth, contrary to what the plot suggests):

In each of these images, the marked points are the quadrature nodes. Let’s find out what they are and how they work.

Suppose is holomorphic and integrable in . By a change of variables,

Here is holomorphic, but is anti-holomorphic. We want to know what does when integrated against something holomorphic. Power series to the rescue:

hence

Multiply this by and integrate over in polar coordinates: the result is if is odd and

if is even. So, integration of against a power series produces the sum of over even powers only (with the factor of ). The process of dropping odd powers amounts to taking the even part of :

Yes, there’s something about that’s magic (key words: reproducing kernel, specifically Bergman kernel). Plug to conclude

So, the nodes are . They have equal weight, because . Final result:

Again, this is a two-point quadrature formula that is exact for all complex polynomials.