Measurability of Banach space valued functions

There is only an indirect proof of the existence of a function {f\colon [0, 1]\to \mathbb R} that is not Lebesgue measurable. But it’s easy to give an explicit example when the codomain of {f} is a Banach space: just let {b(t)} be the sequence of the binary digits of {t}, considered as an element of the sequence space {\ell_\infty}.

Why is {b} not measurable? Recall that a Banach space-valued function {f} is (Bochner) measurable iff there is a sequence of simple functions {\sum v_k \chi_{A_k}} (finite sum, measurable {A_k}, arbitrary vectors {v_k}) that converges to {f} almost everywhere. This property implies that, with an exception of a null set, the range of {f} lies in the separable subspace spanned by all the vectors {v_k} used in the sequence of simple functions. But {b} has the property {\|b(t)-b(s)\|=1} whenever {t\ne s}, so the image of any uncountable set under {b} is nonseparable.

Another way to look at this: on the interval [0, 1) the function {b} is injective and its range has discrete topology, which implies that every subset of [0, 1) is the preimage of some open subset of {\ell_\infty} under {b}.

The binary-digits functions can also be used to illustrate an issue with the duality of Lebesgue-Bochner spaces {L_p(0, 1; X)} where {X} is a Banach space and {1\le p<\infty}. (So, {f} belongs to this space iff it is Bochner measurable and the {L^p} norm of {\|f\|\colon [0, 1]\to [0, \infty)} is finite.) In general we do not have the expected relation {L_p(0, 1; X)^* = L_q(0, 1; X^*)} with {1/p+1/q=1}. The natural isometric embedding of {L_q(0, 1; X^*)} into {L_p(0, 1; X)^*} is still there: any {g\in L_q(0, 1; X^*)} acts on {L_p(0, 1; X)} by {f\mapsto \int \langle f(t), g(t) \rangle\, dt}. But unless {X^*} has the Radon–Nikodym property, these are more bounded linear functionals on {L_p(0, 1; X)}.

To construct such a functional, let {b_n(t)} be the {n}-th binary digit of {t}. Given {f\in L_1(0, 1; \ell_1)}, write it in coordinates as {(f_1, f_2, \dots)} and define {\varphi(f) = \sum_n \int_0^1 f_n b_n}. This is a bounded linear functional, since {|\varphi(f)|\le \sum_n \int_0^1 |f_n| = \|f\|}. But there is no function {g\in L_\infty(0, 1; \ell_\infty)} that represents it, i.e., {\varphi(f) = \int_0^1 \langle f(t), g(t)\rangle \,dt = \sum_n \int_0^1 f_n g_n }. Indeed, if such {g} existed then by considering {f} with only one nonzero coordinate, we find that {g_n} must be {b_n}, using the duality {L_1^* =  L_\infty} in the scalar case. But the function {[0, 1]\to \ell_\infty} with the components {(b_1, b_2, \dots)} is not measurable, as shown above.

This example, which applies to all {1\le p<\infty}, also serves as a reminder that the duality relation {L_p(0, 1; X)^* = L_q(0, 1; X^*)} depends on the dual space {X^*} having the Radon-Nikodym property (RNP), not {X} itself. Indeed, {X=\ell_1} has the RNP; its dual does not.

The importance of {X^*} having the RNP becomes clear once one tries to follow the usual proof of {L_p^*=L_q}. Given {\varphi\in L_p(0,1;X)^*}, we can define an {X^*}-valued measure {\tau} on {[0, 1]} by {\tau(A)(v) = \varphi( v\chi_A)} where {A\subset [0, 1]} is Lebesgue measurable and {v\in X}. This measure has reasonable finiteness and continuity properties coming from {\varphi} being a bounded functional. Still, the existence of a density {g\colon [0, 1]\to X^*} of the measure {\tau} depends on the structure of the Banach space {X^*}.

Immaculate perfection

— I am so lucky. I was born on April 4th 1944, that’s 4/4/44. If you add that up, it comes to 16, one six. One plus six is seven: luckiest number of all.
— You know your math.
— It’s more than math, Mike, it’s… immaculate perfection.

The value of 7 for the the digital root indeed seems preferable, at least on this blog. But looking at the dates formed by repeated digits is too anthropocentric (not to mention xkcd1179). Besides, mathematics already has a concept of perfect numbers: being equal to the sum of proper divisors.

So, are there any perfect numbers with digital root equal to 7, equivalently {n\equiv 7\bmod 9}? For even perfect numbers, one can expect that the Euclid-Euler theorem {n = 2^{p-1} (2^p-1)} will help with the answer, but even so, the solution turns out to be surprisingly simple.

A lucky coincidence is that {2^6\equiv 1\bmod 9} ({64 = 1 + 7\cdot 9}). Since all primes greater than 3 have remainder 1 or 5 mod 6, there are only a few cases to check:

  • {p\equiv 1\bmod 6}, hence {2^{p-1}\equiv 2^{1-1} \equiv 1 \bmod 9} and {2^p-1\equiv 2^1-1 \equiv 1\bmod 9}. In this case {n\equiv 1\cdot 1 \equiv 1 \bmod 9}
  • {p\equiv 5\bmod 6}, hence {2^{p-1}\equiv 2^{5-1} \equiv 7 \bmod 9} and {2^p-1\equiv 2^5-1 \equiv 4\bmod 9}. In this case {n\equiv 7\cdot 4 \equiv 1 \bmod 9}
  • Exceptional cases p=2, 3 correspond to n = 6, 28. The latter is congruent to 1 mod 9, matching the preceding cases.

So, every even perfect number except 6 is congruent to 1 mod 9. Should a perfect number be congruent to 7 mod 9, it has to be odd. An odd perfect number, should it exist, must be either divisible by 9 (not good for us) or congruent to 1 mod 12. (Reference: On the form of an odd perfect number). The condition {n\equiv 1 \bmod 12} is consistent with {n\equiv 7 \bmod 9}: for example, 25 satisfies both. Come to think of it, the sum of proper divisors of 25 is… not 25. But there might still be a 7 mod 9 perfect number out there.

What numbers k have the property that the sequence {n mod k : n is perfect} stabilizes? Assuming odd perfect numbers do not exist, every power of 2 has this property. So do 3 and 9 by the above (and therefore, their products with powers of 2). I don’t know any other such k, in particular n mod 27 does not appear to stabilize. Of course, it is impossible to prove any negative result here since we don’t know if there are infinitely many perfect numbers.

Bipartite metric spaces and positive kernels

Bipartite metric spaces provide simple examples of highly non-Euclidean metric spaces. Let {A} and {B} be two disjoint abstract sets, say, with {m} and {n} points respectively. Define a metric on the union {A\cup B} as follows, where {a, b, c} are some positive numbers:

  • The distance between distinct elements of {A} is {a}.
  • The distance between distinct elements of {B} is {b}.
  • The distance between two elements from different sets is {c}.

The triangle inequality is satisfied provided that {a, b\le 2c}. When {c} is large, the space is not very interesting: it is approximately the union of two simplexes placed at some large distance from each other. On the other hand, when {a=b=2c} we encounter a bizarre situation: every point of {B} is a “midpoint” between any two distinct points of {A}, and vice versa.


A function {K\colon X\times X \to \mathbb R} is called a positive kernel if the matrix {K(x_i, x_j)_{i, j=1}^n} is positive semidefinite for any choice of {n} and {x_1, \dots, x_n\in X}. A classical theorem of Schoenberg says that a metric space {(X, d)} is isometric to a subset of a Hilbert space if and only if {\exp(-\gamma  d^2)} is a positive kernel for all {\gamma >0}. One may ask: is there any function {f\colon [0, \infty)\to\mathbb R} such that {f\circ d} is a positive kernel on {X} for every metric space {(X, d)}?

Suppose {f} is such a function. Applying it to the bipartite metric defined above, we find that the following matrix must be positive semidefinite (the case {m=3}, {n=2} shown; in general the diagonal blocks have sizes {m\times m} and {n\times n}):

{\displaystyle \begin{pmatrix} f(0) & f(a) & f(a)  & f(c) & f(c) \\   f(a) & f(0) & f(a)  & f(c) & f(c) \\   f(a) & f(a) & f(0)  & f(c) & f(c) \\   f(c) & f(c) & f(c)  & f(0) & f(b) \\   f(c) & f(c) & f(c)  & f(b) & f(0)  \end{pmatrix} }

Test the PSD property against vectors of the form {(s, s, s, t, t)}, that is {m} entries equal to some number {s} followed by {n} entries equal to some number {t}. The result is the quadratic form {As^2 + Bt^2 + 2C^2st } where {A = (m^2-m) f(a)+mf(0)}, {B = (n^2-n)f(b)+nf(0)}, and {C = mnf(c)}.

For this quadratic form to be positive semidefinite, we need {A\ge 0} and {C^2\le AB} to hold. The inequality {A\ge 0} amounts to {f(a)\ge 0} and {f(0)\ge 0}, because {m} can be any positive integer. Simply put, {f} must be nonnegative.

The inequality {C^2\le AB} simplifies to

{ mn f(c)^2 \le ( (m-1) f(a)+f(0)) \cdot ((n-1)f(b)+ f(0)) }

When {m=n=1} and {a=b}, we obtain {f(c)\le f(0)}; so, {f} is maximized at {0}. Letting {m, n\to \infty} yields {f(c)^2 \le f(a)f(b)}. What does this mean?

If {c=(a+b)/2}, the inequality {f(c)^2 \le f(a)f(b)} amounts to log-convexity of {f}, and there are lots of log-convex functions. But crucially, our only constraint on {c} in the construction of a bipartite metric space is {c\ge \max(a, b)/2}. In the special case {a=b} we find that {f(c)\le f(a)} whenever {c\ge a/2} (and {a>0}). On one hand, {f} must be nonincreasing because all {c\ge a} are allowed. On the other hand, {f(a/2)\le f(a)} implies that the sequence {k\mapsto f(2^k)} is nondecreasing for {k\in\mathbb Z}. These conditions can be satisfied together only if {f} is constant on {(0, \infty)}.

And there you have it, a complete description of functions {f\colon [0, \infty)\to\mathbb R} that transform every distance matrix into a positive semidefinite matrix: {f\equiv c\ge 0} on {(0, \infty)}, and {f(0)\ge c}. It is immediate that such {f} indeed works for all metric spaces, not just bipartite ones.

Noncontracting Jordan curves

A simple closed curve {\Gamma} on the plane can be parameterized by a homeomorphism {f\colon \mathbb T\to \Gamma} in infinitely many ways. It is natural to look for “nice” parameterizations, like nonexpanding ones in the previous post. This time, let us look for noncontracting parameterizations: {|f(a)-f(b)|\ge |a-b|} for all {a, b\in \mathbb T}. Note that Euclidean distance is used here, not arclength. And we still want {f} to be a homeomorphism. Its noncontracting property simply means that the inverse {f^{-1}} is nonexpanding aka 1-Lipschitz.

What are some necessary conditions for the existence of a noncontracting parameterization? We can mimic the three from the earlier post Nonexpanding Jordan curves, with similar proofs:

  1. The curve must have length at least {2\pi}.
  2. The curve must have diameter at least 2.
  3. The curve must enclose some disk of radius 1. (Apply Kirszbraun’s theorem to {f^{-1}} and note that the resulting Lipschitz map G of the plane will attain 0 somewhere, by a topological degree / winding number argument. Any point where G = 0 works as the center of such a disk.)

This time, the 3rd item supersedes both 1 and 2. Yet, the condition it presents is not sufficient for the existence of a noncontracting parameterization. A counterexample is a disk with a “comb-over”:

combover

Indeed, suppose that {g=f^{-1}} is a nonexpanding map from this curve onto the unit circle. Let {1 = A < B < C} be the three points at which the curve meets the positive x-axis. Since every point of the curve is within small distance from its arc AB, it follows that {g(AB)} is a large subarc of {\mathbb T} that covers almost all of the circle. But the same argument applies to the arcs {g(BC)} and {g(CA)}, a contradiction.

No matter how large the enclosed disk is, a tight combover around it can force arbitrarily large values of the Lipschitz constant of {f^{-1}}. Sad!

What about sufficient conditions?

  1. It is sufficient for the curve to be star-shaped with respect to the origin, in addition to enclosing the unit disk. (Equivalent statement: {\Gamma} has a polar equation {r = r(\theta)} in which {r \ge 1}.) Indeed, the nearest-point projection onto a convex set is a nonexpanding map, and projecting {\Gamma} onto the unit circle in this way gives the desired parameterization.
  2. It is sufficient for the curve to have curvature bounded by 1. I am not going into this further, because second derivatives should not be needed to control a Lipschitz constant.

Recall that for nonexpanding parameterizations, the length of the curve was a single quantity that mostly answered the existence question (length at most 4 — yes; length greater than {2\pi} — no). I do not know of such a quantity for the noncontracting case.

Extreme values of a reproducing kernel for polynomials

For every nonnegative integer {n} there exists a (unique) polynomial {K_n(x, y)} of degree {n} in {x} and {y} separately with the following reproducing property:

{\displaystyle p(x) = \int_{-1}^1 K_n(x, y)p(y)\,dy}

for every polynomial {p} of degree at most {n}, and for every {x}. For example, {K_1(x, y)= (3xy+1)/2}; other examples are found in the post Polynomial delta function.

This fact gives an explicit pointwise bound on a polynomial in terms of its integral on an interval:

{\displaystyle |p(x)| \le M_n(x) \int_{-1}^1 |p(y)|\,dy}

where {M_n(x) = \sup\{|K(x, y)| \colon y\in [-1, 1]\}}. For example, {M_1(x) = (3|x|+1)/2}.

Although in principle {x} could be any real or complex number, it makes sense to restrict attention to {x\in [-1, 1]}, where integration takes place. This leads to the search for extreme values of {K} on the square {Q=[-1, 1]\times [-1, 1]}. Here is how this function looks for {n=1, 2, 3}:

K1
Degree 1
K2
Degree 2
K3
Degree 3

The symmetries {K(x, y)=K(-x, -y) = K(y, x)} are evident here.

Explicitly,

{\displaystyle K_n(x, y) = \sum_{k=0}^n \frac{2k+1}{2} P_k(x)P_k(y)}

where {P_k} is the Legendre polynomial of degree {k} and the factor {(2k+1)/2} is included to make the polynomials an orthonormal set in {L^2(-1, 1)}. Since {P_k} oscillates between {-1} and {1}, it follows that

{\displaystyle |K_n(x, y)|\le \sum_{k=0}^n \frac{2k+1}{2} = \frac{(n+1)^2}{2}}

and this bound is attained at {K(1, 1)=K(-1,-1)=(n+1)^2/2} because {P_k(1)=1} and {P_k(-1)=(-1)^k}.

Is

{\displaystyle K_n(-1, 1) =\sum_{k=0}^n (-1)^k\frac{2k+1}{2} = (-1)^n \frac{n+1}{2}}

the minimum value of {K} on the square {Q}? Certainly not for even {n}. Indeed, differentiating the sum

{\displaystyle S_n(x) = K_n(x, 1) = \sum_{k=0}^n \frac{2k+1}{2} P_k(x)}

with respect to {x} and using {P_k'(-1) =(-1)^{k-1}k(k+1)/2}, we arrive at

{\displaystyle S_n'(-1) = (-1)^{n-1} \frac{n(n^2+3n+2)}{4}}

which is negative if {n} is even, ruling out this point as a minimum.

What about odd {n}, then: is it true that {K_n \ge -(n+1)/2} on the square {Q}?

{n=1}: yes, {K_1(x, y) = (3xy+1)/2 \ge (-3+1)/2 = -1} is clear enough.

{n=3}: the inequality {K_3\ge -2} is also true… at least numerically. It can be simplified to {35(xy)^3 + 9(xy)^2 + 15xy \ge (21x+21y+3)(x^2+y^2)} but I do not see a way forward from there. At least on the boundary of {Q} it can be shown without much work:

{\displaystyle K_3(x, 1) + 2 = \frac{5}{4}(x+1)(7x^2-4x+1)}

The quadratic term has no real roots, which is easy to check.

{n=5}: similar story, the inequality {K_5\ge -3} appears to be true but I can only prove it on the boundary, using

{\displaystyle K_5(x, 1)+3 = \frac{21}{16}(x + 1)(33 x^4 - 18x^3 - 12x^2 + 2x + 3)}

The quartic term has no real roots, which is not so easy to check.

{n=7}: surprisingly, {K_7(4/5, 1) = -2229959/500000} which is about {-4.46}, disproving the conjectural bound {K_7\ge -4}. This fact is not at all obvious from the plot.

K7
Degree 7 kernel

Questions:

  • Is {K_n \ge -Cn} on the square {Q = [-1, 1]\times [-1, 1]} with some universal constant {C}?
  • Is the minimum of {K_n} on {Q} always attained on the boundary of {Q}?
  • Can {M_n(x) = \sup\{|K(x, y)| \colon y\in [-1, 1]\}} be expressed in closed form, at least for small degrees {n}?

Institutions ranked by the number of AMS Fellows, 2019 edition

Only those with the count of 3 or greater are included. Not counting the deceased. Considering CUNY as a single institution. Source of data.

    1. Rutgers The State University of New Jersey New Brunswick : 44
    2. University of California, Los Angeles : 38
    3. Massachusetts Institute of Technology : 34
    4. University of California, Berkeley : 34
    5. University of Michigan : 32
    6. Cornell University : 26
    7. Princeton University : 26
    8. University of Illinois, Urbana-Champaign : 24
    9. University of Wisconsin, Madison : 24
    10. Brown University : 23
    11. Stanford University : 22
    12. University of Texas at Austin : 22
    13. New York University, Courant Institute : 21
    14. The City University of New York : 21
    15. University of California, San Diego : 21
    16. University of Illinois at Chicago : 21
    17. University of Chicago : 20
    18. University of Washington : 20
    19. Stony Brook University : 19
    20. Texas A&M University : 19
    21. University of California, Santa Barbara : 19
    22. University of Minnesota-Twin Cities : 18
    23. University of Pennsylvania : 17
    24. Duke University : 16
    25. Indiana University, Bloomington : 16
    26. Purdue University : 16
    27. University of Maryland : 16
    28. Georgia Institute of Technology : 15
    29. Northwestern University : 15
    30. Ohio State University, Columbus : 15
    31. Pennsylvania State University : 15
    32. University of California, Irvine : 13
    33. University of Utah : 13
    34. Johns Hopkins University, Baltimore : 12
    35. University of British Columbia : 12
    36. Boston University : 11
    37. Harvard University : 11
    38. University of Notre Dame : 11
    39. University of Toronto : 11
    40. Eidgenössische Technische Hochschule Zürich (ETH Zürich) : 10
    41. University of North Carolina at Chapel Hill : 10
    42. University of Virginia : 10
    43. Vanderbilt University : 10
    44. Brandeis University : 9
    45. University of California, Davis : 9
    46. University of Georgia : 9
    47. Columbia University : 8
    48. Institute for Advanced Study : 8
    49. Rice University : 8
    50. Tel Aviv University : 8
    51. University of Oregon : 8
    52. California Institute of Technology : 7
    53. Ecole Polytechnique Fédérale de Lausanne (EPFL) : 7
    54. Michigan State University : 7
    55. Microsoft Research : 7
    56. North Carolina State University : 7
    57. University of Nebraska-Lincoln : 7
    58. University of Oxford : 7
    59. University of Southern California : 7
    60. Williams College : 7
    61. Carnegie Mellon University : 6
    62. The Hebrew University of Jerusalem : 6
    63. Université Pierre et Marie Curie (Paris VI) : 6
    64. University of Arizona : 6
    65. University of Rochester : 6
    66. Harvey Mudd College : 5
    67. Northeastern University : 5
    68. Temple University : 5
    69. Université Paris-Diderot : 5
    70. University of California, Riverside : 5
    71. University of Colorado, Boulder : 5
    72. Virginia Polytechnic Institute and State University : 5
    73. Boston College : 4
    74. Florida State University : 4
    75. Louisiana State University, Baton Rouge : 4
    76. NYU Polytechnic School of Engineering : 4
    77. Norwegian University of Science and Technology : 4
    78. Rutgers The State University of New Jersey Newark : 4
    79. University of Cambridge : 4
    80. University of Connecticut, Storrs : 4
    81. University of Missouri-Columbia : 4
    82. University of Tennessee, Knoxville : 4
    83. University of Warwick : 4
    84. Washington University : 4
    85. Weizmann Institute of Science : 4
    86. Yale University : 4
    87. Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences : 3
    88. Australian National University : 3
    89. Barnard College, Columbia University : 3
    90. Emory University : 3
    91. Imperial College : 3
    92. Københavns Universitet : 3
    93. KTH Royal Institute of Technology : 3
    94. Mathematical Institute, Oxford University : 3
    95. McGill University : 3
    96. Pomona College : 3
    97. Shanghai Jiao Tong University : 3
    98. Tsinghua University : 3
    99. Tufts University : 3
    100. Università degli Studi di Milano : 3
    101. Université Paris-Sud (Paris XI) : 3
    102. Université de Montréal : 3
    103. University of Iowa : 3
    104. University of Melbourne : 3
    105. University of Memphis : 3

 

Numerical integration visualized

scipy.integrate.quad is a popular method of numerical integration with Python. Let’s see how it chooses the points at which to evaluate the function being integrated. Begin with a simple example, the exponential function.

exp1
exp(x) on [-1, 1]
The blue dots indicate the evaluation points, their y-coordinate being the order of evaluation. So, it begins with x=0, continues with something close to -1, etc. The function, drawn in red, is not to its true vertical scale.

The placement of dots gives away the method of integration: it is the Gauss-Kronrod quadrature with 10 Gauss nodes and 21 Kronrod nodes, abbreviated (G10, K21).  The Gauss nodes are included in the Kronrod nodes, and of course the function is not evaluated there again. The evaluation process is slightly out of order in that 0 is a Kronrod node but comes first, followed by 10 Gauss nodes, followed by 10 remaining Kronrod nodes. The process ends there, as the comparison of G10 and K21 results shows the necessary precision was reached.

The square root on [0, 1] is not such a nice function, so (G10, K21) does not reach the required precision at once. The interval is bisected again and again until it does.

sqrt
sqrt(x) on [0, 4]
  Surely the cube root is even worse. Or is it?

cbrt
cbrt(x) on [-1, 1]
The nodes are symmetric about the midpoint of the interval of integration. So for any odd function on an interval symmetric about 0 we get G10 = 0 and K21 = 0, and the process stops at once. To see the effect of the cube root singularity, one has to use a non-symmetric interval such as [-1, 2].

cbrt12
cbrt(x) on [-1, 2]
That’s still fewer subdivisions than for the square root: cancellation between the left and right neighborhoods of 0 still helps. Let’s look at smooth functions next.

sinxsq
sin(x^2) on [0, 7]
Rapid oscillations forces subdivisions here. There are also other reasons for subdivision, such as the loss of analyticity:

exp_1x
exp(-1/x) on [0, 1]
Although exp(-1/x) is infinitely differentiable on this interval, the fact that it is not analytic at 0 makes it a more difficult integrand that exp(x). Finally, an analytic function with no oscillation which still needs a bunch of subintervals:

runge
1/(1+x^2) on [-5, 5]
This is the standard example used to illustrate the Runge phenomenon. Although we are not interpolating here, numerical integration is also influenced by the function having a small radius of convergence of its Taylor series. After all, G10 can be thought of as degree-9 interpolation of the given function at the Gauss nodes (the zeros of a Legendre polynomial),  with the formula returning the integral of the interpolating polynomial.

The code used to plot these things:

import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
global eval_points
f = lambda x: np.exp(x)
def f_int(x):
    eval_points.append(x)
    return f(x)

eval_points = []
a, b = -1, 1
quad(f_int, a, b)
n = len(eval_points)
t = np.linspace(a, b, 1000)
y = f(t)
yy = n*(y - y.min())/(y.max() - y.min())
plt.plot(t, yy, 'r') 
plt.plot(eval_points, np.arange(0, n), '.')
plt.show()