There are plenty of continuous functions such that . Besides the trivial examples and , one can take any equation that is symmetric in and has a unique solution for one variable in terms of the other. For example: leads to .

I can’t think of an explicit example that is also differentiable, but implicitly one can be defined by , for example. In principle, this can be made explicit by solving the cubic equation for , but I’d rather not.

I don’t know of any diffeomorphism such that both and have a nice explicit form.

Let’s change the problem to . There are still two trivial, linear solutions: and . Any other? The new equation imposes stronger constraints on : for example, it implies

But here is a reasonably simple nonlinear continuous example: define

and extend to all by . The result looks like this, with the line drawn in red for comparison.

To check that this works, notice that maps to , which the function maps to , and of course .

From the plot, this function may appear to be differentiable for , but it is not. For example, at the left derivative is while the right derivative is .
This could be fixed by picking another building block instead of , but not worth the effort. After all, the property is inconsistent with differentiability at as long as is nonlinear.

The plots were made in Sage, with the function f define thus:

def f(x):
if x == 0:
return 0
xa = abs(x)
m = math.floor(math.log(xa, 2))
if m % 2 == 0:
return math.copysign(2**(m + xa/2**m), x)
else:
return math.copysign(2**(m+1) * (math.log(xa, 2)-m+1), x)

I wanted some random examples of graphs where every vertex has degree between two given parameters, minDegree and maxDegree. Like this one:

The approach I took was very simple (and not suitable for construction of very large or very regular graphs). Each edge appears with probability p. If the minimal degree is too small, this probability is multiplied by 1.1. If the maximal degree is too big, the probability is divided by 1.1. Either way, the process repeats until success.

So far, this blog had too much Matlab/Scilab and not enough Python. I’ll try to restore the balance. Here, numpy generates random matrices and takes care of degree restrictions; networkx lays out the graph.

import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
vertices = 15
minDegree = 3
maxDegree = 4
p = 0.5
success = False
while not success:
a = (np.random.random((vertices, vertices)) < p).astype(int)
a = np.maximum(a, np.matrix.transpose(a))
np.fill_diagonal(a, 0)
s = a.sum(axis=1)
success = True
if min(s) < minDegree:
success = False
p = p * 1.1
if max(s) > maxDegree:
success = False
p = p / 1.1
g = nx.Graph(a)
nx.draw_networkx(g)
plt.show()

Given a bunch of words, specifically the names of divisions of plants and bacteria, I’m going to use a truncated Singular Value Decomposition to separate bacteria from plants. This isn’t a novel or challenging task, but I like the small size of the example. A similar type of examples is classifying a bunch of text fragments by keywords, but that requires a lot more setup.

Here are 33 words to classify: acidobacteria, actinobacteria, anthocerotophyta, aquificae, bacteroidetes, bryophyta, charophyta, chlamydiae, chloroflexi, chlorophyta, chrysiogenetes, cyanobacteria, cycadophyta, deferribacteres, deinococcus-thermus, dictyoglomi, firmicutes, fusobacteria, gemmatimonadetes, ginkgophyta, gnetophyta, lycopodiophyta, magnoliophyta, marchantiophyta, nitrospirae, pinophyta, proteobacteria, pteridophyta, spirochaetes, synergistetes, tenericutes, thermodesulfobacteria, thermotogae.

As is, the task is too easy: we can recognize the -phyta ending in the names of plant divisions. Let’s jumble the letters within each word:

Not so obvious anymore, is it? Recalling the -phyta ending, we may want to focus on the presence of letter y, which is not so common otherwise. Indeed, the count of y letters is a decent prediction: on the following plot, green asterisks are plants and red are bacteria, the vertical axis is the count of letter Y in each word.

However, the simple count fails to classify several words: having 1 letter Y may or may not mean a plant. Instead, let’s consider the entire matrix of letter counts (here it is in a spreadsheet: 33 rows, one for each word; 26 columns, one for each letter.) So far, we looked at its 25th column in isolation from the rest of the matrix. Truncated SVD uncovers the relations between columns that are not obvious but express patterns such as the presence of letters p,h,t,a along with y. Specifically, write with unitary and diagonal. Replace all entries of , except the four largest ones, by zeros. The result is a rank-4 diagonal matrix . The product is a rank-4 matrix, which keeps some of the essential patterns in but de-emphasizes the accidental.

The entries of are no longer integers. Here is a color-coded plot of its 25th column, which still somehow corresponds to letter Y but takes into account the other letters with which it appears.

Plants are now cleanly separated from bacteria. Plots made in MATLAB as follows:

A=[3,1,2,1,1,0,0,0,2,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0;3,1,2,0,1,0,0,0,2,0,0,0,0,1,1,0,0,1,0,2,0,0,0,0,0,0;2,0,1,0,1,0,0,2,0,0,0,0,0,1,3,1,0,1,0,3,0,0,0,0,1,0;2,0,1,0,1,1,0,0,2,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0;1,1,1,1,3,0,0,0,1,0,0,0,0,0,1,0,0,1,1,2,0,0,0,0,0,0;1,1,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,1,0,1,0,0,0,0,2,0;2,0,1,0,0,0,0,2,0,0,0,0,0,0,1,1,0,1,0,1,0,0,0,0,1,0;2,0,1,1,1,0,0,1,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0;0,0,1,0,1,1,0,1,1,0,0,2,0,0,2,0,0,1,0,0,0,0,0,1,0,0;1,0,1,0,0,0,0,2,0,0,0,1,0,0,2,1,0,1,0,1,0,0,0,0,1,0;0,0,1,0,3,0,1,1,1,0,0,0,0,1,1,0,0,1,2,1,0,0,0,0,1,0;3,1,2,0,1,0,0,0,1,0,0,0,0,1,1,0,0,1,0,1,0,0,0,0,1,0;2,0,2,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,2,0;1,1,1,1,4,1,0,0,1,0,0,0,0,0,0,0,0,3,1,1,0,0,0,0,0,0;0,0,3,1,2,0,0,1,1,0,0,0,1,1,2,0,0,1,2,1,2,0,0,0,0,0;0,0,1,1,0,0,1,0,2,0,0,1,1,0,2,0,0,0,0,1,0,0,0,0,1,0;0,0,1,0,1,1,0,0,2,0,0,0,1,0,0,0,0,1,1,1,1,0,0,0,0,0;2,1,1,0,1,1,0,0,1,0,0,0,0,0,1,0,0,1,1,1,1,0,0,0,0,0;2,0,0,1,3,0,1,0,1,0,0,0,3,1,1,0,0,0,1,2,0,0,0,0,0,0;1,0,0,0,0,0,2,1,1,0,1,0,0,1,1,1,0,0,0,1,0,0,0,0,1,0;1,0,0,0,1,0,1,1,0,0,0,0,0,1,1,1,0,0,0,2,0,0,0,0,1,0;1,0,1,1,0,0,0,1,1,0,0,1,0,0,3,2,0,0,0,1,0,0,0,0,2,0;2,0,0,0,0,0,1,1,1,0,0,1,1,1,2,1,0,0,0,1,0,0,0,0,1,0;3,0,1,0,0,0,0,2,1,0,0,0,1,1,1,1,0,1,0,2,0,0,0,0,1,0;1,0,0,0,1,0,0,0,2,0,0,0,0,1,1,1,0,2,1,1,0,0,0,0,0,0;1,0,0,0,0,0,0,1,1,0,0,0,0,1,1,2,0,0,0,1,0,0,0,0,1,0;2,1,1,0,2,0,0,0,1,0,0,0,0,0,2,1,0,2,0,2,0,0,0,0,0,0;1,0,0,1,1,0,0,1,1,0,0,0,0,0,1,2,0,1,0,2,0,0,0,0,1,0;1,0,1,0,2,0,0,1,1,0,0,0,0,0,1,1,0,1,2,1,0,0,0,0,0,0;0,0,0,0,3,0,1,0,1,0,0,0,0,1,0,0,0,1,3,2,0,0,0,0,1,0;0,0,1,0,3,0,0,0,1,0,0,0,0,1,0,0,0,1,1,2,1,0,0,0,0,0;2,1,1,1,3,1,0,1,1,0,0,1,1,0,2,0,0,2,1,2,1,0,0,0,0,0;1,0,0,0,2,0,1,1,0,0,0,0,1,0,2,0,0,1,0,2,0,0,0,0,0,0];
[U, D, V] = svd(A);
D4 = D .* (D >= D(4, 4));
A4 = U * D4 * V';
plants = (A4(:, 25) > 0.8);
bacteria = (A4(:, 25) <= 0.8);
% the rest is output
words = 1:33;
hold on
plot(words(plants), A4(plants, 25), 'g*');
plot(words(bacteria), A4(bacteria, 25), 'r*');

Two metric spaces are bi-Lipschitz equivalent if there is a bijection and a constant such that
for all . In other words, must be Lipschitz with a Lipschitz inverse; this is asking more than just a homeomorphism (continuous with continuous inverse).

For example, the graph is not bi-Lipschitz equivalent to a line. The cusp is an obstruction:

Indeed, the points are separated by and lie at distance from it. So, the images would lie on opposite sides of , at distance from it. But then the distance between and would be of order , contradicting .

There are other pairs of spaces which are homeomorphic but not bi-Lipschitz equivalent, like a circle and Koch snowflake.

What is the simplest example of two spaces that are not known to be bi-Lipschitz equivalent? Probably: the unit ball and the unit sphere in an infinite-dimensional Hilbert space.

In finite dimensions these are not even homeomorphic: e.g., the sphere has a self-homeomorphism without fixed points, namely , while the ball has no such thing due to Brouwer’s fixed point theorem. But in the infinite-dimensional case and are homeomorphic. Moreover, there exists a Lipschitz map such that the displacement function is bounded below by a positive constant: no hope for anything like Brouwer’s fixed point theorem.

It’s hard to see what an obstruction to bi-Lipschitz equivalence could be: there are no cusps, nothing is fractal-like, dimensions sort of agree (both infinite), topologically they are the same… Here is a possible direction of attack (from Geometric Nonlinear Functional Analysis by Benyamini and Lindenstrauss). If a bi-Lipschitz map exists, then the composition is a Lipschitz involution of with displacement bounded from below. So, if such a map could be ruled out, the problem would be answered in the negative. As far as I know, it remains open.

A neat way to visualize a real number is to make a sunflower out of it. This is an arrangement of points with polar angles and polar radii (so that the concentric disks around the origin get the number of points proportional to their area). The prototypical sunflower has , the golden ratio. This is about the most uniform arrangement of points within a disk that one can get.

But nothing stops us from using other numbers. The square root of 5 is not nearly as uniform, forming distinct spirals.

The number begins with spirals, but quickly turns into something more uniform.

The number has stronger spirals: seven of them, due to approximation.

Of course, if was actually , the arrangement would have rays instead of spirals:

What if we used more points? The previous pictures have 500 points; here is with . The new pattern has 113 rays: .

Apéry’s constant, after beginning with five spirals, refuses to form rays or spirals again even with 3000 points.

The images were made with Scilab as follows, with an offset by 1/2 in the polar radius to prevent the center from sticking out too much.

n = 500
alpha = (sqrt(5)+1)/2
r = sqrt([1:n]-1/2)
theta = 2*%pi*alpha*[1:n]
plot(r.*cos(theta), r.*sin(theta), '*');
set(gca(), "isoview", "on")

The Richardson extrapolation is a simple yet extremely useful idea. Suppose we compute some quantity (such as an integral) where the error of computation depends on a positive parameter (such as step size). Let’s write where is the result of computation and is the error. Often, one can observe either numerically or theoretically that is proportional to some power of , say .

The quantity is an approximation to in which the main source of error is cancelled out. Thus, writing

should give us a much better approximation than either or were.

It is not quite intuitive that one can improve the accuracy of by mixing it with a less accurate approximation .

For a concrete example, let’s apply the trapezoidal rule to the integral with step size . The rule gives

which is quite wrong: the error is . Little surprise, considering the geometry of this approximation:

With the smaller step we get

with the error . The rule being of second order, this error is about times smaller.

The Richardson extrapolation gives

reducing the error by the factor of : it’s now only . And this did not require any additional evaluations of the integrand.

The formula may look familiar: it’s nothing but Simpson’s rule. But the above derivation is much easier than what is typically shown in a calculus textbook: fitting parabolas to the graph of and integrating those.

All this is nice. But what if we started with the Midpoint rule? Let’s find out, following the example above. With ,

which is off by (the Midpoint rule is generally twice as accurate as Trapezoidal). Then

with error . Extrapolate:

and the error drops to , slightly less than for Simpson’s rule.

So, the extrapolated-midpoint (EM) rule appears to be slighly more accurate than Simpson’s, with an equal number of evaluations. Why isn’t it in textbooks alongside Simpson’s rule, then?

A few factors make the EM rule impractical.

1. There is a negative coefficient, of . This means that the rule can give a negative result when integrating a positive function! For example, evaluates to with EM rule. Simpson’s rule, as pretty much any practical quadrature rule, preserves positivity.

2. The sample points for EM are less convenient than for Simpson’s rule.

3. The gain in accuracy isn’t that great. The reason that Midpoint outperforms Trapezoidal by the factor of has to do with how these rule approximate . Midpoint gives , Trapezoidal gives ; the former is twice as accurate. To compare Simpson’s and EM rules, we should consider since both are of the th order of accuracy: they evaluate cubic polynomials exactly. The results are for EM and for Simpson’s: these are nearly equidistant from the correct value . The difference in accuracy is less than .

Despite being impractical, the EM rule has an insight to offer. By moving two evaluation points from to , we made the coefficient of go from positive to negative . (The coefficients are determined by the requirement to evaluate exactly.) So, there must be an intermediate position at which the coefficient of is zero, and we don’t need to compute at all. This idea leads to the two-point Gaussian quadrature

which on the example beats both EM and Simpson’s rules in accuracy, while using fewer function evaluations.

After writing the above, I found that the Extrapolated Midpoint rule has a name: it is Milne’s rule, a member of the family of open Newton-Cotes formulas.