Given a real number ${a}$, one can find its positive part ${a^+}$ in multiple ways: by definition:

$\displaystyle a^+ = \begin{cases} a,\quad &a>0 \\ 0,\quad &a\le 0\end{cases}$

as a maximum ${a^+ = \max(a,0)}$ or by using the absolute value: ${a^+=(a+|a|)/2}$

This is boring.

You know what else is boring? Spreadsheets.

Google Sheets, in particular. Let’s suppose we have a bunch of numbers in a spreadsheet (in a range like A1:B3), and want a single-cell formula (using arrayformula) that gets the positive parts of all of them.

 1   -2
4    0
-1    3

The definition-based approach, =arrayformula(if(A1:B3>0, A1:B3, 0)) does the job:

 1    0
4    0
0    3

But it’s kind of redundant: the range has to be typed twice. This can be annoying if you are not just taking positive part once but computing something nested with them: ${(1-(2-a^+)^+)^+}$. Each application of positive part would double the length of the formula.

The absolute value approach, =arrayformula((A1:B3+abs(A1:B3))/2), has the same issue.

The maximum looks promising, because ${\max(a,0)}$ involves ${a}$ only once. Alas, =arrayformula(max(A1:B3,0)) returns a single number (4 in this example), because max just takes the maximum of all entries.

Is there a spreadsheet formula that returns positive part in array-compatible way, and uses the argument only once?

Yes, there is: =arrayformula(iferror(sqrt(A1:B3)^2, 0))

Negative arguments throw an error at the evaluation of the square root. This error is handled by iferror wrapper, which returns the second argument in this case, namely 0. Otherwise, the value of the square root gets squared, bringing back the input.

This can be nested: for example, the formula

=arrayformula(1-iferror(sqrt(1-iferror(sqrt(A1:B3)^2, 0))^2, 0))

clamps the input values between 0 and 1, performing ${\min(1,\max(0,a))}$. Yes, it’s long — but if the input is itself a 100-character formula, this is still a simplification over the other approaches.

# Rough isometries

An isometry is a map ${f\colon X\rightarrow Y}$ between two metric spaces ${X,Y}$ which preserves all distances: ${d_Y(f(x),f(x')) = d_X(x,x')}$ for all ${x,x'\in X}$. (After typing a bunch of such formulas, one tends to prefer shorter notation: ${|f(x)f(x')| = |xx'|}$, with the metric inferred from contexts.) It’s a popular exercise to prove that every isometry from a compact metric space into itself is surjective.

A rough isometry allows additive distortion of distances: ${|\,|f(x)f(x')| - |xx'|\,|\le \epsilon}$. (As contrasted with bi-Lipschitz maps, which allow multiplicative distortion). Rough isometries ignore small-scale features (in particular, they need not be continuous), but accurately capture the large scale structure of a space. This makes them convenient for studying hyperbolic spaces (trees and their relatives), where the large-scale structure is of interest.

If ${f}$ is an ${\epsilon}$-rough isometry, then the Gromov-Hausdorff distance ${d_{GH}}$ between ${X}$ and ${f(X)}$ is at most ${\epsilon/2}$. This follows from a convenient characterization of ${d_{GH}}$: it is equal to ${\frac12 \inf_R \textrm{dis}\,(R)}$ where the infimum is taken over all subsets ${R\subset X\times Y}$ that project surjectively onto each factor, and ${\textrm{dis}\,(R) = \sup \{|\,|xx'|-|yy'|\,| \colon xRy, \ x'Ry' \} }$.

Since small-scale features are ignored, it is not quite natural to talk about rough isometries being surjective. A natural concept for them is ${\epsilon}$-surjectivity, meaning that every point of ${Y}$ is within distance ${\le \epsilon}$ of ${f(X)}$. When this happens, we can conclude that ${d_{GH}(X,Y)\le 3\epsilon/2}$, because the Hausdorff distance between ${Y}$ and ${f(X)}$ is at most ${\epsilon}$.

Recalling that an isometry of a compact metric space ${X}$ into ${X}$ is automatically onto, it is natural to ask whether ${\epsilon}$-rough isometries of ${X}$ into ${X}$ are ${\epsilon}$-surjective. This, however, turns out to be false in general.

First example, a finite space: ${X=\{0,1,2,\dots,n\}}$ with the metric ${d(i,j) = i+j}$ (when ${i\ne j}$). Consider the backward shift ${f(i)=i-1}$ (and ${f(0)=0}$). By construction, this is a rough isometry with ${\epsilon=2}$. Yet, the point ${n}$ is within distance ${n}$ from ${f(X) = \{0,1,2,\dots, n-1\}}$.

The above metric can be thought of a the distance one has to travel from ${i}$ to ${j}$ with a mandatory visit to ${0}$. This makes it similar to the second example.

Second example, a geodesic space: ${X}$ is the graph shown below, a subset of ${\mathbb R^2}$ with the intrinsic (path) metric, i.e., the length of shortest path within the set.

Define ${f(x,y) = ((x-1)^+, (y-1)^+)}$ where ${{\,}^+}$ means the positive part. Again, this is a ${2}$-rough isometry. The omitted part, shown in red below, contains a ball of radius ${5}$ centered at ${(5,5)}$. Of course, ${5}$ can be replaced by any positive integer.

Both of these examples are intrinsically high-dimensional: they cannot be accurately embedded into ${\mathbb R^d}$ for low ${d}$ (using the restriction metric rather than the path metric). This raises a question: does there exist ${C=C(d)}$ such that for every compact subset ${K\subset \mathbb R^d}$, equipped with the restriction metric, every ${\epsilon}$-rough isometry ${f\colon K\rightarrow K}$ is ${C(d)\epsilon}$-surjective?

# Retraction by contraction

The Euclidean space ${\mathbb R^n}$ has a nice property: every closed convex subset ${C\subset \mathbb R^n}$ is the image of the whole space under a map ${f}$ that is simultaneously:

• a contraction, meaning ${\|f(a)-f(b)\|\le \|a-b\|}$ for all ${a,b\in\mathbb R^n}$;
• a retraction, meaning ${f(a)=a}$ for all ${a\in C}$.

Indeed, we can simply define ${f(x)}$ to be the (unique) nearest point of ${C}$ to ${x}$; it takes a bit of work to verify that ${f}$ is a contraction, but not much.

In other normed spaces, this nearest point projection does not work that well. For example, take ${X=\ell_1^2}$, the two-dimensional space with the Manhattan metric. Consider the line ${C=\{(2t,t)\colon t\in\mathbb R\}}$ which is a closed and convex set. The nearest point of ${C}$ to ${(2,0)}$ is ${(2,1)}$: moving straight up, since changing the first coordinate doesn’t pay off. Since ${(0,0)}$ remains fixed, the nearest point projection increases some pairwise distances, in this case from ${2}$ to ${3}$.

However, there is a contractive retraction onto this line, given by the formuls ${T(x_1,x_2) = \left(\frac23(x_1+x_2), \frac13(x_1+x_2)\right)}$. Indeed, this is a linear map that fixes the line ${x_1=2x_2}$ pointwise and has norm ${1}$ because

$\displaystyle \|T(x)\|_1 = |x_1+x_2|\le \|x\|_1$

More generally, in every normed plane, every closed convex subset admits a contractive retraction. To prove this, it suffices to consider closed halfplanes, since a closed convex set is an intersection of such, and contractions form a semigroup. Furthermore, it suffices to consider lines, because having a contractive retraction onto a line, we can redefine it to be the identity map on one side of the line, and get a contractive retraction onto a halfplane.

Such a retraction onto a line, which is a linear map, is illustrated below.

Given the unit circle (black) and a line (blue), draw supporting lines (red) to the unit circle at the points where it meets the line. Project onto the blue line along the red ones. By construction, the image of the unit disk under this projection is contained in the unit disk. This precisely means that the map has operator norm ${1}$.

In spaces of dimensions ${3}$ or higher, there are closed convex subsets without a contractive retraction. For example, consider the plane in ${\ell_\infty^3}$ passing through the points ${A = (2,2,0)}$, ${B= (2,0,2)}$, and ${C= (0,2,2)}$. This plane has equation ${x_1+x_2+x_3=4}$. The point ${D=(1,1,1)}$ is at distance ${1}$ from each of A,B,C, and it does not lie on the plane. For any point E other than D, at least one of the distances AE, BE, CE exceeds 1. More precisely, the best place to project D is ${(4/3, 4/3, 4/3)}$ which is at distance ${4/3}$ from A, B, and C.

Two natural questions: (a) is there a nice characterization of normed spaces that admit a contractive retraction onto every closed convex subset? (b) what is the smallest constant ${L=L(n)}$ such that every ${n}$-dimensional normed space admits an ${L}$-Lipschitz retraction onto every closed convex subset?

(The answers may well be known, but not to me at present.)

# Spam-fighting strategy: rationing API requests

Based on a true story

Given: ${n}$ websites on which users can post stuff. For ${k=1,\dots, n}$ the ${k}$th site gets ${p_k}$ posts per day on average. Some of the posts are spam or otherwise harmful.

One of the ways to detect spam automatically is to send an API request to any of the sites, get post content and analyze it. The total number of such requests cannot exceed ${10000}$ per day, while the total number of posts, namely ${\sum p_k}$, is substantially larger.

This necessitates rationing. The current strategy is to wait until a site has ${a_k}$ new posts, and then request all of those at once. The upside is that only ${p_k/a_k}$ requests are needed. The downside is that spam may sit undetected for a period of time proportional to ${a_k/p_k}$.

One would like to minimize the mean lifetime of spam. Assuming for now that spam is uniformly distributed among the sites, this lifetime is proportional to

$\displaystyle \sum p_k \frac{a_k}{p_k} = \sum a_k$

So, we need to minimize ${\sum a_k}$ subject to ${\sum p_k/a_k \le 10000}$. The method of Lagrange multipliers says that the gradient of objective function should be proportional to the gradient of constraint. Hence,

$\displaystyle \frac{p_k}{a_k^2} = \lambda$

with ${\lambda}$ independent of ${k}$. This leads to ${a_k = \sqrt{p_k/\lambda}}$.

That is, the size of API queue should be proportional to the square root of the level of activity.

An excellent proxy for overall activity on a site is QPD, questions per day. For example, the QPD ratio of Mathematics to Computer Science is currently 452:11, so the ratio of queue sizes should be about ${\sqrt{452/11} \approx 6.4}$ (in reality, the queue size is currently ${10}$ and ${2}$, respectively).

The value of ${\lambda}$ is easily found from the condition ${\sum p_k/a_k = 10000}$ (pushing the API quota to the limit), with the final result

$\displaystyle a_k =\sqrt{p_k} \, \frac{\sum_{i=1}^n\sqrt{p_i}}{ 10000 }$

In practice, spam is unevenly distributed. Let’s say that ${s_k}$ is the proportion of spam on the ${k}$th site. Then the mean lifetime is proportional to

$\displaystyle \sum p_k s_k \frac{a_k}{p_k} = \sum a_k s_k$

This time, the method of Lagrange multipliers yields

$\displaystyle \frac{p_k}{a_k^2} = \lambda s_k$

hence ${a_k = \sqrt{p_k/(\lambda s_k)}}$. That is, the size of API queue should be proportional to ${\sqrt{p_k/s_k}}$.

Theoretically, the optimal size is

$\displaystyle a_k = \sqrt{p_k/s_s}\, \frac{\sum_{i=1}^n \sqrt{p_i/s_i}}{10000}$

In practice, one does not know ${s_k}$ beyond some general idea of how spammy a site is. Drupal Answers gets a lot more than Computer Science; so even though its QPD exceeds CS by the factor of about four, its API queue size is set to the minimal possible value of ${1}$.

# Oscillatory explosion

Nonlinear differential equations don’t necessarily have globally defined solutions, even if the equation is autonomous (no time is involved explicitly). The simplest example is ${y'=y^2}$ with initial condition ${y(0)=1}$: the solution is ${y(t) = 1/(1-t )}$, which ceases to exist at ${t=1}$, escaping to infinity.

This kind of behavior can often be demonstrated without solving the ODE. Consider ${y''=y^3}$ with ${y(0)=1}$, ${y'(0)=0}$. I’ve no idea what the explicit solution is, but it’s clear that the quantity ${E(t) = \frac12 (y')^2 - \frac14 y^4}$ remains constant: indeed, ${E'(t) = y''y' - y^3 y' = 0}$. (Here, ${E}$ is analogous to the sum of kinetic and potential energy in physics.)

The solution ${y}$ is increasing and convex. Let ${t_n}$ be such that ${y(t_n)=n}$, e.g., ${t_1=0}$. The invariance of ${E}$ yields ${y'(t_n) = 2^{-1/2} (n^2 - 1)}$. By the mean value theorem, ${t_{n+1}-t_n \le \sqrt{2}(n^2-1)^{-1}}$ for ${n=2,3,\dots}$. Since the series on the right converges, the limit ${T = \lim_{n\rightarrow\infty }t_n}$ is finite; this is the blow-up time.

But no blow-up occurs for the equation ${y''=-y^3}$, where the nonlinear term pushes back toward equilibrium. Indeed, the invariant energy is now ${E= \frac12 (y')^2 + \frac14 y^4}$, which implies that ${y}$ and ${y'}$ stay bounded. In the phase space ${(y,y')}$ the solution with initial values ${y(0)=1}$, ${y'(0)=0}$ traces out this curve:

Accordingly, the solution is a periodic function of ${t}$ (although this is not a trigonometric function):

Everything so far has been pretty straightforward. But here is a stranger animal: ${y'''=-y^3}$. As in the previous example, nonlinear term pushes toward equilibrium. Using initial conditions ${y(0)=1}$, ${y'(0)=y''(0)=0}$, I get this numerical solution up to time ${T=5}$:

As in the previous example, ${y}$ oscillates. But the speed and amplitude of oscillation are increasing.

Rapidly increasing:

In the plane ${(y,y')}$ the solution spirals out:

The plots make it clear that the solution ceases to exist in finite time, but I don’t have a proof. The issue is that the function ${y}$ does not escape to infinity in just one direction. Indeed, if ${y(t_0)>0}$, then regardless of the values of ${y'}$ and ${y''}$ at ${t_0}$, the negative third derivative will eventually make the function ${y}$ decrease, and so ${y}$ will attain the value ${0}$ at some ${t_1>t_0}$. After that, the third derivative is positive, guaranteeing the existence of time ${t_2>t_1}$ when ${y}$ returns to ${0}$ again. I haven’t succeeded in proving that the limit ${\lim t_n}$ is finite, giving the time when oscillatory explosion occurs.

The plots were made in SageMathCloud:

var('t y y1 y2')
P = desolve_system_rk4([y1,y2,-y^3],[y,y1,y2], ics=[0,1,0,0],ivar=t,end_points=5,step=0.01)
Q=[ [i,j] for i,j,k,l in P]
list_plot(Q, plotjoined=True)

# Between a cubic spline and the line of regression

Given some data points ${(x_i,y_i)}$ one can devise various functions that extract information from them. Such as

A smoothing spline is something in between the above extremes: it insists on neither being a line (i.e., having zero second derivative) nor on passing through given points (i.e., having zero residuals). Instead, it minimizes a weighted sum of both things: the integral of second derivative squared, and the sum of residuals squared. Like this plot, where the red points are given but the spline chooses to interpolate the green ones:

I’ll demonstrate a version of a smoothing spline that might not be exactly canonical, but is very easy to implement in Matlab or Scilab (which I prefer to use). As in Connecting dots naturally, assume the knots ${a=x_0 equally spaced at distance ${h=(b-a)/n}$. The natural cubic spline is determined by the values of its second derivative at ${x_1,\dots,x_{n-1}}$, denoted ${z_1,\dots,z_n}$. As explained earlier, these can be found from the linear system

$\displaystyle \frac{h}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 1 & 4 & 1 & 0 & \ldots & 0 & 0 \\ 0 & 1 & 4 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 4 \end{pmatrix} \begin{pmatrix} z_1 \\ z_2 \\ \vdots \\ z_{n-1}\end{pmatrix} = - \frac{1}{h} \begin{pmatrix} \Delta^2 y_1 \\ \Delta^2 y_2 \\ \vdots \\ \Delta^2 y_{n-1}\end{pmatrix}$

where the column on the right contains the amounts by which the derivative of the piecewise linear interpolant through given points jumps at every knot. The notation ${\Delta^2y }$ means the second difference of ${(y_i)}$, for example ${\Delta^2y_1 =y_2-2y_1+y_0}$.

A smoothing spline is also a natural spline, but for a different set of points ${(x_i,\tilde y_i)}$. One has to find ${\tilde y_i}$ that minimize a weighted sum of ${\sum_i (\tilde y_i-y_i)^2}$ and of ${\int_a^b (f''(x))^2\,dx}$. The latter integral is easily computed in terms of ${z_i}$: it is equal to ${\frac{h}{3}\sum_{i=1}^{n}(z_i^2+z_{i-1}^2+z_iz_{i-1})}$. Since this quadratic form is comparable to ${\sum_{i=1}^{n}z_i^2}$, I’ll work with the latter instead.

The idea is to set up an underdetermined system for ${z_i}$ and ${\tilde y_i-y_i}$, and let Scilab find a least squares solution of that. Let’s introduce a weight parameter ${\lambda>0}$ that will determine the relative importance of curvature vs residuals. It is convenient to let ${\tilde y_i=y_i+\lambda h^2 \delta_i}$, so that both ${\delta_i}$ and ${z_i}$ (second derivative) scale the same way. The terms ${\delta_i}$ contributes to the linear system for ${z_i}$, since the right hand side now has ${\tilde y_i}$ instead of ${y_i}$. This contribution is ${- \lambda h \Delta^2 \delta}$. Moving it to the left hand-side (since ${\delta_i}$ are unknowns) we obtain the following system.

$\displaystyle \begin{pmatrix} A & | & B \end{pmatrix} \begin{pmatrix} z \\ \delta \end{pmatrix} = - \frac{1}{h} \Delta^2 y$

where ${A}$ is the same tridiagonal matrix as above, and ${B}$ is the rectangular Laplacian-type matrix

$\displaystyle B = \frac{\lambda}{h} \begin{pmatrix} -1 & 2 & -1 & 0 & \ldots & 0 & 0 \\ 0 & -1 & 2 & -1 & \ldots & 0 & 0 \\ 0 & 0 & -1 & 2 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 2 & -1 \end{pmatrix}$

All in all, the system has ${2n }$ unknowns ${z_1,\dots,z_{n-1},\delta_0,\dots,\delta_n}$, and ${(n-1)}$ equations, reflecting the continuity of first derivative at each interior knot. The lsq command of Scilab finds the solution with the least sum of squares of the unknowns, which is what we are after.

Time for some examples. ${\lambda=0}$ and ${\lambda=0.05}$ can be seen above. Here are more:

As ${\lambda}$ increases, the spline approaches the regression line:

Finally, the Scilab code. It is almost the same as for natural spline; the difference is in five lines from B=... to newy=... The code after that is merely evaluation and plotting of the spline.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
lambda = 0.1

n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))

B = lambda/h*(diag(-2*ones(1,n+1))+diag(ones(1,n),1)+diag(ones(1,n),-1))
C = [A, B(2:$-1,:)] sol = lsq(C,-jumps')' z = [0,sol(1:n-1),0] newy = y + lambda*h^2*sol(n:$)

allx = []
spl = []
for i=1:n
xL = a+h*(i-1)
xR = a+h*i
x = linspace(xL,xR,100)
linear = newy(i)*(xR-x)/h + newy(i+1)*(x-xL)/h
correction = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
allx = [allx, x]
spl = [spl, linear+correction]
end

plot(allx, spl)
plot(a:h:b, newy, 'g*')
plot(a:h:b, y, 'r*')

# Stack Exchange sites: questions and visits

After a year, it is time to update the Hertzsprung–Russell diagram of Stack Exchange sites. Here is the new and much improved version: a bubble chart made with Google Spreadsheets. The data is imported with ImportXML function, which apparently means it will be automatically refreshed every two hours. Just in case, here is a (heavily zoomed out) static image. It’s not as usable as the interactive chart you see above, because one can’t read the text on obscured bubbles by hovering over them.

As a rule of thumb, the sites with at least 10000 visits/day and 10 questions/day are those that “made it”: they either graduated from beta or will probably graduate in foreseeable future (assuming the quality of content is decent and the community is somewhat capable of self-organization).

A year ago, I joked about Beer, Italian Language, and Expatriates being either closed or merged into one site. Neither thing happened yet. Of the three, Expatriates site now has the largest numbers of visits and questions, though the quality is often an issue. Italian Language caught up with Russian Language (not to be confused with Русский язык, which is a recently imported site from Hashcode.ru and differentiates itself by targeting native speakers rather than language learners). Of the three, Beer appears to have the least amount of energy at present.

Some of the sites at the bottom, around 100 visits/day, are very new and will not stay there for long. The future of Community Building seems less certain.

Here’s the technical part. The data is/are obtained from this list of sites with the command

=IMPORTXML("http://stackexchange.com/sites?view=list", "//div[contains(@class,'category')]/@class|//input/@value")

The XPath parameter consists of two: //div[contains(@class,'category')]/@class picks up the class of every div element with “category” in its list of classes. This is later used to categorize as Technology, Science, Life/Arts, etc. The other part is //input/@value, which grabs all other data: number of questions, number of answers, number of users… most of which I have no use for. To sort this column by data type I put MOD(ROW()-2,9) next to it:

+------------------------------------------------+------+
|                      Raw                       | Item |
+------------------------------------------------+------+
| lv-item category-technology site-stackoverflow |    0 |
| 1217462400                                     |    1 |
| 9410733                                        |    2 |
| 15707906                                       |    3 |
| 73.95805406                                    |    4 |
| 4258590                                        |    5 |
| 7232805                                        |    6 |
| 7956.285714                                    |    7 |
| Stack Overflow                                 |    8 |
| lv-item category-technology site-serverfault   |    0 |
| 1241049600                                     |    1 |
+------------------------------------------------+------+

The columns C-F filter the content of A by item type. For example, D is the site name: =FILTER(A:A,B:B=8), etc. The Category column needs a bit of extra (routine) work to make it user-friendly. Otherwise, the table is finished:

+---------------------+-------------------+----------------+
|        Name         | Questions per day | Visits per day |
+---------------------+-------------------+----------------+
| Stack Overflow      | 7956.3            |        7232805 |
| Server Fault        | 103.1             |         353739 |
| Super User          | 162.9             |         734077 |
| Meta Stack Exchange | 21.8              |           8680 |
| Web Applications    | 12.7              |          44341 |
| Arqade              | 36.5              |         247448 |
| Webmasters          | 13.6              |          12433 |
| Seasoned Advice     | 6.9               |          80308 |
| Game Development    | 21.1              |          25836 |
| Photography         | 8.5               |          26560 |
| Cross Validated     | 84.4              |          63317 |
| Mathematics         | 616.4             |         150744 |
| Home Improvement    | 17.3              |          71148 |
+---------------------+-------------------+----------------+

The bubble chart uses this table for the log-log scale plot that you see above.

# The volume of the intersection of balls

Fix two positive numbers ${R}$ and ${r}$. The volume of the intersection of balls of radii ${R}$ and ${r}$ is a function of the distance ${d}$ between their centers. Is it a decreasing function?

Wolfram MathWorld gives a formula for this volume in the three-dimensional case

$\displaystyle V(d) = \frac{\pi}{12 d} (R+r-d)^2(d^2+2rd-3r^2+2Rd+6Rr-3R^2)$

(Here and in the sequel it is assumed that ${d\le R+r}$.) From this formula it is not at all obvious that ${V(d)}$ is decreasing. And this is just for three dimensions.

Let’s begin with the one-dimensional case, then. Consider the intervals ${[-R,R]}$ and ${[d-r,d+r]}$ where ${d\le R+r}$. A point ${x}$ belongs to their intersection if and only if

$\displaystyle \max(-R,d-r)\le x \le \min(R,d+R)$

Hence, the length of the intersection is ${L(d)=\min(R,d+r)-\max(-R,d-r)}$. Even this simple formula does not quite make it evident that the function is decreasing for ${d>0}$. If ${d+r>R}$ this is clear enough, but the case ${d+r -d-r>-r}$.

Fortunately, Fubini’s theorem reduces the general case to the one-dimensional one. Take any line ${L}$ parallel to the line connecting the centers. The intersection of each ball with ${L}$ is a line segment whose length is independent of ${d}$. The distance between the midpoints of these segments is ${d}$; thus, the length of the intersection is a decreasing function of ${d}$.

Gromov’s note Monotonicity of the volume of intersection of balls (1987) deals with the more general case of ${k\le n+1}$ balls in ${\mathbb R^n}$: if two sets of balls ${B(x_i,r_i)}$ an ${B(x_i',r_i)}$ satisfy ${\|x_i'-x_j'\|\le \|x_i-x_j\|}$ for all ${i,j}$, then the volume of ${\bigcap B(x_i,r_i)}$ does not exceed the volume of ${\bigcap B(x_i',r_i)}$.

The paper ends with

Question: Who is the author of [this result]? My guess is this was known to Archimedes. Undoubtedly the theorem can be located […] somewhere in the 17th century.

Meanwhile, the corresponding problem for the volume of the union remains open.

# Implicit Möbius strip

Reading the Wolfram MathWorld™ article on the Möbius strip I came across an equation that was news to me.

After the familiar parametric form

$\displaystyle x=(1+s\cos(t/2))\cos t,\quad y=(1+s\cos(t/2))\sin t, \quad z = s\sin(t/2) \qquad (1)$

it is said that “In this parametrization, the Möbius strip is therefore a cubic surface with equation…”

$\displaystyle y(x^2+y^2+z^2-1)-2z(x^2+y^2+x) =0 \qquad\qquad\qquad(2)$

That’s a neat cubic equation for sure. But… wouldn’t the sign of the left hand side of (2) (call it $F$) distinguish the “positive” and “negative” sides of the surface, contrary to its non-orientability?

I went to SageMathCloud™ to check and got this plot of (2):

That’s… not exactly the Möbius strip as I know it. But it’s true that (1) implies (2): the cubic surface contains the Möbius strip. Here is the plot of (1) superimposed on the plot of (2).

The self-intersection arises where the “positive side” $F > 0$ suddenly transitions to negative $F< 0$.

# Trigonometric approximation and the Clenshaw-Curtis quadrature

Trying to approximate a generic continuous function on ${[-1,1]}$ with the Fourier trigonometric series of the form ${\sum_n (A_n\cos \pi nx+B_n\sin \pi nx)}$ is in general not very fruitful. Here’s such an approximation to ${f(x)=\exp(x)}$, with the sum over ${n\le 4}$:

It’s better to make a linear change of variable: consider ${f(2x-1)}$ on the interval ${[0,1]}$, and use the formula for the cosine series. This results in ${\exp(2x-1)}$, which is approximated by the partial sum ${\sum_{n=0}^4 A_n\cos \pi nx}$ of its cosine Fourier series as follows.

But one can do much better with a different, nonlinear change of variable. Consider ${f(\cos x)}$ on the interval ${[0,\pi]}$, and again use the formula for the cosine series. This results in ${\exp(\cos x)}$, which is approximated by the partial sum ${\sum_{n=0}^4 A_n\cos nx}$ of its cosine Fourier series as follows.

Yes, I can’t see any difference either: the error is less than ${10^{-3}}$.

The composition with cosine improves approximation because ${f(\cos t)}$ is naturally a periodic function, with no jumps or corners in its graph. Fourier series, which are periodic by nature, follow such functions more easily.

A practical implication of this approximation of ${f(\cos t)}$ is the Clenshaw-Curtis integration method. It can be expressed in one line:

$\displaystyle \int_{-1}^1 f(x)\,dx = \int_0^\pi f(\cos t)\sin t\,dt \approx \int_0^\pi \sum_{n=0}^N a_n \cos nt \sin t\,dt = \sum_{n=0}^N \frac{(1+(-1)^n) a_n}{1-n^2}$

The integral ${\int_{-1}^1 f(x)\,dx}$ is approximated by summing ${2a_{2k}/(1-4k^2)}$, where ${a_{2k}}$ are even-numbered cosine coefficients of ${f(\cos t)}$. In the example with ${f(x)=\exp(x)}$ using just three coefficients yields

$\displaystyle \frac{2a_0}{1}+\frac{2a_2}{-3}+\frac{2a_4}{-15} \approx 2.350405$

while the exact integral is ${\approx 2.350402}$.

At first this doesn’t look practical at all: after all, the Fourier coefficients are themselves found by integration. But since ${f(\cos t)}$ is so close to a trigonometric polynomial, one can sample it at equally spaced points and apply the Fast Fourier transform to the result, quickly obtaining many coefficients at once. This is what the Clenshaw-Curtis quadrature does (at least in principle, the practical implementation may streamline these steps.)