Polynomial delta function

The function {k_1(x,y) = (3xy+1)/2} has a curious property: for any linear function {\ell}, and any point {y\in \mathbb R}, the integral {\int_{-1}^1 \ell(x)k_1(x,y)\,dx} evaluates to {\ell(y)}. This is easy to check using the fact that odd powers of {x} integrate to zero:

\displaystyle \frac12 \int_{-1}^1 (ax+b)(3xy+1)\,dx = \frac12 \int_{-1}^1 (3ax^2y+b)\,dx = \frac12(2ay+2b) = ay+b

More generally, for any integer {n\ge 0} there exists a unique symmetric polynomial {k_n(x,y)} that has degree {n} in {x} and {y} separately and satisfies {\int_{-1}^1 p(x)k_n(x,y)\,dx = p(y)} for all polynomials {p} of degree at most {n}. For example, {k_0(x,y)=1/2} (obviously) and

\displaystyle k_2(x,y)=\frac98+\frac32xy+\frac{15}{8}(x^2+y^2)+\frac{45}{8}x^2y^2

The formula is not really intuitive, and a 3d plot would not help the matter much. To visualize {k_n}, I plotted {k_n(x,-3/4)}, {k_n(x,0)}, and {k_n(x,1/2)} below (green, red, blue respectively).

Degree 1
Degree 1
Degree 2
Degree 2
Degree 4
Degree 4

For {y\in [-1,1]} and large {n}, the function {k_n(\cdot, y)} approaches the Dirac delta at {y}, although the convergence is slow, especially when {|y|} is close to {1}. I don’t think there is anything good to be said about the case {|y|>1}.

Degree 10
Degree 10
Degree 20
Degree 20

The existence and uniqueness of {k_n} are a consequence of the Riesz representation of linear functionals on an inner product space. Indeed, polynomials of degree at most {n} form such a space {\mathbb P_n} with inner product {\langle p,q\rangle = \int_{-1}^1p(x)q(x)\,dx}, and the functional {p\mapsto p(y)} is linear for any fixed {y\in\mathbb R}. Hence, this functional can be written as {p\mapsto \langle p, k_y\rangle } for some {k_y}. The function {(x,y) \mapsto k_x(y)} is a reproducing kernel for this space. Its symmetry is not immediately obvious.

The Legendre polynomials {P_0,\dots,P_n} are an orthogonal basis of {\mathbb P_n}; more precisely, {\widetilde{P}_j = \sqrt{j+1/2}P_j} form an orthonormal basis. It’s a general fact about reproducing kernels that

\displaystyle k(x,y) = \sum_j \widetilde{P}_j(x)\widetilde{P}_j(y)

(which, incidentally, proves the symmetry {k(y,x)=k(x,y)}). Indeed, taking this sum as the definition of {k} and writing {p = \sum_{j=0}^n c_j \widetilde{P}_j}, we find

\displaystyle \langle p, k(\cdot, y)\rangle = \sum_j \widetilde{P}_j(y) \langle p, \widetilde{P}_j\rangle = \sum_j \widetilde{P}_j(y) c_j = p(y)

This is the Sage code used for the above plots.

n = 20
k = sum([(j+1/2)*legendre_P(j,x)*legendre_P(j,y) for j in range(0,n+1)])
plot(k(x,y=-3/4),(x,-1,1),color='green') + plot(k(x,y=0),(x,-1,1),color='red') +  plot(k(x,y=1/2),(x,-1,1),color='blue')

Higher degrees cause some numerical issues…

Degree 22
Degree 22

Post motivated by Math.SE question

Bounded convex function with no continuous boundary extension

Suppose {f\colon \Omega\rightarrow\mathbb R} is a convex function on a bounded convex domain {\Omega\subset\mathbb R^n}. Does it have a continuous extension to {\overline{\Omega}}?

Of course not if {f} is unbounded, like {f(x)=1/x} on the interval {(0,1)}. So let’s assume {f} is bounded. Then the answer is positive on one dimension, and easy to prove: {f} is monotone in some neighborhood of a boundary point, and being bounded, must have a finite limit there.

In higher dimensions, the above argument does not work anymore. And indeed, a bounded convex function on a nice domain (like a disk) may fail to have a limit at some boundary points.

The example I describe is geometrically natural, but doesn’t seem to have a neat closed form equation. Let {U} be the upper half of the open unit disk. Its boundary consists of the diameter {I} and the semicircle {C}. Define

\displaystyle    f(x,y) = \sup \{\ell(x,y) : \ell \text{ is affine }, \ \ell \le 0 \text{ on }I, \ \ell\le 1 \text{ on }C\}

Equivalently, take the convex hull of the set {(I\times \{0\} )\cup( C\times \{1\})} in {\mathbb R^3}, and let {z=f(x,y)} be the equation of its bottom surface.

This is a convex function by construction. It takes a bit of (routine) work to show that {f} has limit {0} everywhere on {I}, except the endpoints, and that it has limit {1} on {C}, again except the endpoints. Consequently, there is no limit of {f(x,y)} as {(x,y)\rightarrow (1,0)}.

Here’s a plot of {f}:

Convex function with values between 0 and 1
Convex function with values between 0 and 1

To obtain it in a reasonably efficient way, I had to narrow down the class of affine functions without changing the supremum. Note that if {\sup_U \ell<1}, then dividing {\ell} by {\sup_U \ell} gives a better contributor to the supremum. (This increases {\ell} where it is positive, and the parts where it is negative do not matter anyway since {f\ge0}.)

Let {(\cos t, \sin t)} be the point of {C} where {\ell } attains the value {1}. Then {\nabla \ell} is parallel to the radius at that point. To fulfill the condition {\ell\le 0} on {I}, the function must decay quickly enough along the radius toward the center. The required rate is found by projecting {(\pm 1,0)} onto the radius: it is {(1-|\cos t|)^{-1}}. Hence, we only need to consider

\displaystyle    \ell(x,y) = \frac{x\cos t+y\sin t-|\cos t|}{1-|\cos t|}

The one-parameter supremum over {0<t<\pi} is not hard to evaluate numerically. Here is the Matlab code that produced the plot above.

[R,T] = meshgrid(0:.01:1, 0:.01:pi);
X = (2-R).*R.*cos(T); Y = (2-R).*R.*sin(T);
Z = zeros(size(X));
t = .001:.001:pi-.001;
for k = 1:length(t)
    Z = max(Z, (X*cos(t(k))+Y*sin(t(k))-abs(cos(t(k))))./(1-abs(cos(t(k)))));
Z = min(Z, 1);           
surf(X, Y, Z)

The truncation step Z = min(Z, 1); would be unnecessary in exact arithmetics, but it helps to cut out floating point errors.

The factor (2-R) is attached to polar coordinate transformation so that there are more polar circles near the boundary, where the function changes most rapidly.

Finally, note that the nonsmoothness of domain {U} is not an issue here. The function {f} naturally extends to a convex function on the open unit disk, by letting {f=0} in the bottom half.

(Adapted from Extension of bounded convex function to boundary)

568 to 7

Since the number 7 is in the name of the blog (for some reason that I don’t remember), a limited amount of associated numerology seems appropriate, if only as a filler.

568 is the smallest positive integer whose seventh power is the sum of seven seventh powers.

\displaystyle     568^7=525^7+439^7+430^7+413^7+266^7+258^7+127^7

According to Wolfram MathWorld, it is not known whether the seventh power of a positive integer can be written as the sum of 3, 4, 5, or 6 seventh powers.

More numerology:

  • The fouriest transform of 568 is in base seven: 568=14417
  • The digit 7 is conspicuously missing from the digits of 568

Ac-Cent-Tchu-Ate spreadsheet arrays

Given a real number {a}, one can find its positive part {a^+} in multiple ways: by definition:

\displaystyle  a^+ = \begin{cases} a,\quad &a>0 \\ 0,\quad &a\le 0\end{cases}

as a maximum {a^+ = \max(a,0)} or by using the absolute value: {a^+=(a+|a|)/2}

This is boring.

You know what else is boring? Spreadsheets.

Google Sheets, in particular. Let’s suppose we have a bunch of numbers in a spreadsheet (in a range like A1:B3), and want a single-cell formula (using arrayformula) that gets the positive parts of all of them.

 1   -2
 4    0
-1    3

The definition-based approach, =arrayformula(if(A1:B3>0, A1:B3, 0)) does the job:

 1    0
 4    0
 0    3

But it’s kind of redundant: the range has to be typed twice. This can be annoying if you are not just taking positive part once but computing something nested with them: {(1-(2-a^+)^+)^+}. Each application of positive part would double the length of the formula.

The absolute value approach, =arrayformula((A1:B3+abs(A1:B3))/2), has the same issue.

The maximum looks promising, because {\max(a,0)} involves {a} only once. Alas, =arrayformula(max(A1:B3,0)) returns a single number (4 in this example), because max just takes the maximum of all entries.

Is there a spreadsheet formula that returns positive part in array-compatible way, and uses the argument only once?

Yes, there is: =arrayformula(iferror(sqrt(A1:B3)^2, 0))

Negative arguments throw an error at the evaluation of the square root. This error is handled by iferror wrapper, which returns the second argument in this case, namely 0. Otherwise, the value of the square root gets squared, bringing back the input.

This can be nested: for example, the formula

=arrayformula(1-iferror(sqrt(1-iferror(sqrt(A1:B3)^2, 0))^2, 0))

clamps the input values between 0 and 1, performing {\min(1,\max(0,a))}. Yes, it’s long — but if the input is itself a 100-character formula, this is still a simplification over the other approaches.

Rough isometries

An isometry is a map {f\colon X\rightarrow Y} between two metric spaces {X,Y} which preserves all distances: {d_Y(f(x),f(x')) = d_X(x,x')} for all {x,x'\in X}. (After typing a bunch of such formulas, one tends to prefer shorter notation: {|f(x)f(x')| = |xx'|}, with the metric inferred from contexts.) It’s a popular exercise to prove that every isometry from a compact metric space into itself is surjective.

A rough isometry allows additive distortion of distances: {|\,|f(x)f(x')| - |xx'|\,|\le \epsilon}. (As contrasted with bi-Lipschitz maps, which allow multiplicative distortion). Rough isometries ignore small-scale features (in particular, they need not be continuous), but accurately capture the large scale structure of a space. This makes them convenient for studying hyperbolic spaces (trees and their relatives), where the large-scale structure is of interest.

If {f} is an {\epsilon}-rough isometry, then the Gromov-Hausdorff distance {d_{GH}} between {X} and {f(X)} is at most {\epsilon/2}. This follows from a convenient characterization of {d_{GH}}: it is equal to {\frac12 \inf_R \textrm{dis}\,(R)} where the infimum is taken over all subsets {R\subset X\times Y} that project surjectively onto each factor, and {\textrm{dis}\,(R) = \sup \{|\,|xx'|-|yy'|\,| \colon xRy, \ x'Ry' \} }.

Since small-scale features are ignored, it is not quite natural to talk about rough isometries being surjective. A natural concept for them is {\epsilon}-surjectivity, meaning that every point of {Y} is within distance {\le \epsilon} of {f(X)}. When this happens, we can conclude that {d_{GH}(X,Y)\le 3\epsilon/2}, because the Hausdorff distance between {Y} and {f(X)} is at most {\epsilon}.

Recalling that an isometry of a compact metric space {X} into {X} is automatically onto, it is natural to ask whether {\epsilon}-rough isometries of {X} into {X} are {\epsilon}-surjective. This, however, turns out to be false in general.

First example, a finite space: {X=\{0,1,2,\dots,n\}} with the metric {d(i,j) = i+j} (when {i\ne j}). Consider the backward shift {f(i)=i-1} (and {f(0)=0}). By construction, this is a rough isometry with {\epsilon=2}. Yet, the point {n} is within distance {n} from {f(X) = \{0,1,2,\dots, n-1\}}.

The above metric can be thought of a the distance one has to travel from {i} to {j} with a mandatory visit to {0}. This makes it similar to the second example.

Second example, a geodesic space: {X} is the graph shown below, a subset of {\mathbb R^2} with the intrinsic (path) metric, i.e., the length of shortest path within the set.

Comb space
Comb space

Define {f(x,y) = ((x-1)^+, (y-1)^+)} where {{\,}^+} means the positive part. Again, this is a {2}-rough isometry. The omitted part, shown in red below, contains a ball of radius {5} centered at {(5,5)}. Of course, {5} can be replaced by any positive integer.

Omitted part in red
Omitted part in red

Both of these examples are intrinsically high-dimensional: they cannot be accurately embedded into {\mathbb R^d} for low {d} (using the restriction metric rather than the path metric). This raises a question: does there exist {C=C(d)} such that for every compact subset {K\subset \mathbb R^d}, equipped with the restriction metric, every {\epsilon}-rough isometry {f\colon K\rightarrow K} is {C(d)\epsilon}-surjective?

Retraction by contraction

The Euclidean space {\mathbb R^n} has a nice property: every closed convex subset {C\subset \mathbb R^n} is the image of the whole space under a map {f} that is simultaneously:

  • a contraction, meaning {\|f(a)-f(b)\|\le \|a-b\|} for all {a,b\in\mathbb R^n};
  • a retraction, meaning {f(a)=a} for all {a\in C}.

Indeed, we can simply define {f(x)} to be the (unique) nearest point of {C} to {x}; it takes a bit of work to verify that {f} is a contraction, but not much.

In other normed spaces, this nearest point projection does not work that well. For example, take {X=\ell_1^2}, the two-dimensional space with the Manhattan metric. Consider the line {C=\{(2t,t)\colon t\in\mathbb R\}} which is a closed and convex set. The nearest point of {C} to {(2,0)} is {(2,1)}: moving straight up, since changing the first coordinate doesn’t pay off. Since {(0,0)} remains fixed, the nearest point projection increases some pairwise distances, in this case from {2} to {3}.

However, there is a contractive retraction onto this line, given by the formuls {T(x_1,x_2) =    \left(\frac23(x_1+x_2), \frac13(x_1+x_2)\right)}. Indeed, this is a linear map that fixes the line {x_1=2x_2} pointwise and has norm {1} because

\displaystyle \|T(x)\|_1 = |x_1+x_2|\le \|x\|_1

More generally, in every normed plane, every closed convex subset admits a contractive retraction. To prove this, it suffices to consider closed halfplanes, since a closed convex set is an intersection of such, and contractions form a semigroup. Furthermore, it suffices to consider lines, because having a contractive retraction onto a line, we can redefine it to be the identity map on one side of the line, and get a contractive retraction onto a halfplane.

Such a retraction onto a line, which is a linear map, is illustrated below.

Every line in a normed plane is 1-complemented
Every line in a normed plane is 1-complemented

Given the unit circle (black) and a line (blue), draw supporting lines (red) to the unit circle at the points where it meets the line. Project onto the blue line along the red ones. By construction, the image of the unit disk under this projection is contained in the unit disk. This precisely means that the map has operator norm {1}.

In spaces of dimensions {3} or higher, there are closed convex subsets without a contractive retraction. For example, consider the plane in {\ell_\infty^3} passing through the points {A = (2,2,0)}, {B= (2,0,2)}, and {C= (0,2,2)}. This plane has equation {x_1+x_2+x_3=4}. The point {D=(1,1,1)} is at distance {1} from each of A,B,C, and it does not lie on the plane. For any point E other than D, at least one of the distances AE, BE, CE exceeds 1. More precisely, the best place to project D is {(4/3, 4/3, 4/3)} which is at distance {4/3} from A, B, and C.

Two natural questions: (a) is there a nice characterization of normed spaces that admit a contractive retraction onto every closed convex subset? (b) what is the smallest constant {L=L(n)} such that every {n}-dimensional normed space admits an {L}-Lipschitz retraction onto every closed convex subset?

(The answers may well be known, but not to me at present.)

Spam-fighting strategy: rationing API requests

Based on a true story

Given: {n} websites on which users can post stuff. For {k=1,\dots, n} the {k}th site gets {p_k} posts per day on average. Some of the posts are spam or otherwise harmful.

One of the ways to detect spam automatically is to send an API request to any of the sites, get post content and analyze it. The total number of such requests cannot exceed {10000} per day, while the total number of posts, namely {\sum p_k}, is substantially larger.

This necessitates rationing. The current strategy is to wait until a site has {a_k} new posts, and then request all of those at once. The upside is that only {p_k/a_k} requests are needed. The downside is that spam may sit undetected for a period of time proportional to {a_k/p_k}.

One would like to minimize the mean lifetime of spam. Assuming for now that spam is uniformly distributed among the sites, this lifetime is proportional to

\displaystyle \sum p_k \frac{a_k}{p_k} = \sum a_k

So, we need to minimize {\sum a_k} subject to {\sum p_k/a_k \le 10000}. The method of Lagrange multipliers says that the gradient of objective function should be proportional to the gradient of constraint. Hence,

\displaystyle    \frac{p_k}{a_k^2} = \lambda

with {\lambda} independent of {k}. This leads to {a_k = \sqrt{p_k/\lambda}}.

That is, the size of API queue should be proportional to the square root of the level of activity.

An excellent proxy for overall activity on a site is QPD, questions per day. For example, the QPD ratio of Mathematics to Computer Science is currently 452:11, so the ratio of queue sizes should be about {\sqrt{452/11} \approx 6.4} (in reality, the queue size is currently {10} and {2}, respectively).

The value of {\lambda} is easily found from the condition {\sum p_k/a_k = 10000} (pushing the API quota to the limit), with the final result

\displaystyle    a_k =\sqrt{p_k} \, \frac{\sum_{i=1}^n\sqrt{p_i}}{ 10000 }

In practice, spam is unevenly distributed. Let’s say that {s_k} is the proportion of spam on the {k}th site. Then the mean lifetime is proportional to

\displaystyle \sum p_k s_k \frac{a_k}{p_k} = \sum a_k s_k

This time, the method of Lagrange multipliers yields

\displaystyle    \frac{p_k}{a_k^2} = \lambda s_k

hence {a_k = \sqrt{p_k/(\lambda s_k)}}. That is, the size of API queue should be proportional to {\sqrt{p_k/s_k}}.

Theoretically, the optimal size is

\displaystyle    a_k = \sqrt{p_k/s_s}\, \frac{\sum_{i=1}^n \sqrt{p_i/s_i}}{10000}

In practice, one does not know {s_k} beyond some general idea of how spammy a site is. Drupal Answers gets a lot more than Computer Science; so even though its QPD exceeds CS by the factor of about four, its API queue size is set to the minimal possible value of {1}.