Modes of convergence on abstract (and not so abstract) sets

In how many ways can a series of real-valued functions on an abstract set converge? Having no measure on the domain eliminates the infinitude of modes of convergence based on integral norms. I can think of five modes of convergence of {\sum f_n} where {f_n\colon X\to \mathbb R}:

  • (P) Pointwise convergence: {\sum f_n(x)} converges for each {x\in X}.
  • (U) Uniform convergence: the partial sums {S_n} of the series converge to some function {f\colon X\to\mathbb R} uniformly, i.e., {\sup |S_n-f| \to 0}.
  • (PA) Pointwise convergence of absolute values: {\sum |f_n(x)|} converges for each {x\in X}.
  • (UA) Uniform convergence of absolute values: like uniform, but for {\sum |f_n|}.
  • (M) Weierstrass M-test convergence: {\sum \sup |f_n| } converges.

Implications (all easy): (M) implies (UA), which implies both (U) and (PA). Neither (U) nor (PA) implies the other one, but each of them implies (P).

Perhaps (U) and (PA) deserve an illustration, being incomparable. Let {X = [0, 1]}. The constant functions {f_n(x)=(-1)^n/n} form a series that converges uniformly but not in the sense (PA). In the opposite direction, a series of triangles with height 1 and disjoint supports converges (PA) but not (U).

UA

Notably, the sum of the latter series is not a continuous function. This had to happen: by Dini’s theorem, if a series of continuous functions is (PA)-convergent and its sum is continuous, then it is (UA)-convergent. This “self-improving” property of (PA) convergence will comes up again in the second part of this post.

From abstract sets to normed spaces

In functional analysis, the elements of a normed space {E} can often be usefully interpreted as functions on the unit ball of the dual space {E^*}. Indeed, each {x\in E} induces {f_x(z) = z(x)} for {z\in E^*}. Applying the aforementioned modes of convergence to {\sum x_n} with {x_n\in E}, we arrive at

  • (P) ⇔ Convergence in the weak topology of E.
  • (U) ⇔ Convergence in the norm topology of E.
  • (PA) ⇔ Unconditional convergence in the weak topology of E.
  • (UA) ⇔ Unconditional convergence in the norm topology of E.
  • (M) ⇔ Absolute convergence, i.e., {\sum \|x_n\|} converges.

The equivalences (P), (U), (M) are straightforward exercises, but the unconditional convergence merits further discussion. For one thing, there are subtly different approaches to formalizing the concept. Following “Normed Linear Spaces” by M. M. Day, let’s say that a series {\sum x_n} is

  • (RC) Reordered convergent if there exists {x} such that {\sum x_{\pi(n)} =x} for every bijection {\pi:\mathbb{N}\to\mathbb{N}}
  • (UC) Unordered convergent if there exists {x} such that for every neighborhood {U} of {x} there exists a finite set {E\subset \mathbb{N}} with the property that {\sum_{n\in F}x_n\in U} for every finite set {F} containing {E}.
  • (SC) Subseries convergent if for every increasing sequence of integers {(n_k)} the series {\sum x_{n_k}} converges.
  • (BC) Bounded-multiplier convergent if for every bounded sequence of scalars {(a_n)}, the series {\sum a_n x_n} converges.

In a general locally convex space, (BC) ⇒ (SC) ⇒ (UC) ⇔ (RC). The issue with reversing the first two implications is that they involve the existence of a sum for some new series, and if the space lacks completeness, the sum might fail to exist for no good reason. All four properties are equivalent in sequentially complete spaces (those where every Cauchy sequence converges).

Let’s prove that interpretation of (PA) stated above, using the (BC) form of unconditional convergence. Suppose {\sum x_n} converges in the sense (PA), that is for every linear functional {z} the series {\sum |z(x_n)|} converges. Then it’s clear that {\sum a_n x_n} has the same property for any bounded scalar sequence {(a_n)}. That is, (PA) implies bounded-multiplier convergence in the weak topology. Conversely, suppose {\sum x_n} enjoys weak bounded-multiplier convergence and let {z\in E^*}. Multiplying each {x_n} by a suitable unimodular factor {a_n} we can get {z(a_n x_n) > 0} for all {n}. Now the weak convergence of {\sum a_n x_n} yields the pointwise convergence of {\sum |z(x_n)|}.

A theorem of Orlicz, proved in the aforementioned book by Day, says that (SC) convergence in the weak topology of a Banach space is equivalent to (SC) convergence in the norm topology. Thanks to completeness, in the norm topology of a Banach space all forms of unconditional convergence are equivalent. The upshot is that (PA) automatically upgrades to (UA) in the context of the elements of a Banach space being considered as functions on the dual unit ball.

Retraction by contraction

The Euclidean space {\mathbb R^n} has a nice property: every closed convex subset {C\subset \mathbb R^n} is the image of the whole space under a map {f} that is simultaneously:

  • a contraction, meaning {\|f(a)-f(b)\|\le \|a-b\|} for all {a,b\in\mathbb R^n};
  • a retraction, meaning {f(a)=a} for all {a\in C}.

Indeed, we can simply define {f(x)} to be the (unique) nearest point of {C} to {x}; it takes a bit of work to verify that {f} is a contraction, but not much.


In other normed spaces, this nearest point projection does not work that well. For example, take {X=\ell_1^2}, the two-dimensional space with the Manhattan metric. Consider the line {C=\{(2t,t)\colon t\in\mathbb R\}} which is a closed and convex set. The nearest point of {C} to {(2,0)} is {(2,1)}: moving straight up, since changing the first coordinate doesn’t pay off. Since {(0,0)} remains fixed, the nearest point projection increases some pairwise distances, in this case from {2} to {3}.

However, there is a contractive retraction onto this line, given by the formuls {T(x_1,x_2) =    \left(\frac23(x_1+x_2), \frac13(x_1+x_2)\right)}. Indeed, this is a linear map that fixes the line {x_1=2x_2} pointwise and has norm {1} because

\displaystyle \|T(x)\|_1 = |x_1+x_2|\le \|x\|_1


More generally, in every normed plane, every closed convex subset admits a contractive retraction. To prove this, it suffices to consider closed halfplanes, since a closed convex set is an intersection of such, and contractions form a semigroup. Furthermore, it suffices to consider lines, because having a contractive retraction onto a line, we can redefine it to be the identity map on one side of the line, and get a contractive retraction onto a halfplane.

Such a retraction onto a line, which is a linear map, is illustrated below.

Every line in a normed plane is 1-complemented
Every line in a normed plane is 1-complemented

Given the unit circle (black) and a line (blue), draw supporting lines (red) to the unit circle at the points where it meets the line. Project onto the blue line along the red ones. By construction, the image of the unit disk under this projection is contained in the unit disk. This precisely means that the map has operator norm {1}.


In spaces of dimensions {3} or higher, there are closed convex subsets without a contractive retraction. For example, consider the plane in {\ell_\infty^3} passing through the points {A = (2,2,0)}, {B= (2,0,2)}, and {C= (0,2,2)}. This plane has equation {x_1+x_2+x_3=4}. The point {D=(1,1,1)} is at distance {1} from each of A,B,C, and it does not lie on the plane. For any point E other than D, at least one of the distances AE, BE, CE exceeds 1. More precisely, the best place to project D is {(4/3, 4/3, 4/3)} which is at distance {4/3} from A, B, and C.


Two natural questions: (a) is there a nice characterization of normed spaces that admit a contractive retraction onto every closed convex subset? (b) what is the smallest constant {L=L(n)} such that every {n}-dimensional normed space admits an {L}-Lipschitz retraction onto every closed convex subset?

(The answers may well be known, but not to me at present.)

The shortest circle is a hexagon

Let {\|\cdot\|} be some norm on {{\mathbb R}^2}. The norm induces a metric, and the metric yields a notion of curve length: the supremum of sums of distances over partitions. The unit circle {C=\{x\in \mathbb R^2\colon \|x\|=1\}} is a closed curve; how small can its length be under the norm?

For the Euclidean norm, the length of unit circle is {2\pi\approx 6.28}. But it can be less than that: if {C} is a regular hexagon, its length is exactly {6}. Indeed, each of the sides of {C} is a unit vector with respect to the norm defined by {C}, being a parallel translate of a vector connecting the center to a vertex.

Hexagon as unit disk
Hexagon as unit disk

To show that {6} cannot be beaten, suppose that {C} is the unit circle for some norm. Fix a point {p\in C}. Draw the circle {\{x\colon \|x-p\|=1\}}; it will cross {C} at some point {q}. The points {p,q,q-p, -p, -q, p-q} are vertices of a hexagon inscribed in {C}. Since every side of the hexagon has length {1}, the length of {C} is at least {6}.

It takes more effort to prove that the regular hexagon and its affine images, are the only unit circles of length {6}; a proof can be found in Geometry of Spheres in Normed Spaces by Juan Jorge Schäffer.

Nonlinear Closed Graph Theorem

If a function {f\colon {\mathbb R}\rightarrow {\mathbb R}} is continuous, then its graph {G_f = \{(x,f(x))\colon x\in{\mathbb R}\}} is closed. The converse is false: a counterexample is given by any extension of {y=\tan x} to the real line.

Closed graph, not continuous
Closed graph, not continuous

The Closed Graph Theorem of functional analysis states that a linear map between Banach spaces is continuous whenever its graph is closed. Although the literal extension to nonlinear maps fails, it’s worth noting that linear maps are either continuous or discontinuous everywhere. Hence, if one could show that a nonlinear map with a closed graph has at least one point of continuity, that would be a nonlinear version of the Closed Graph Theorem.


Here is an example of a function with a closed graph and an uncountable set of discontinuities. Let {C\subset {\mathbb R}} be a closed set with empty interior, and define

\displaystyle f(x) = \begin{cases} 0,\quad & x\in C \\ \textrm{dist}\,(x,C)^{-1},\quad & x\notin C \end{cases}

For a general function, the set of discontinuities is an Fσ set. When the graph is closed, we can say more: the set of discontinuities is closed. Indeed, suppose that a function {f} is bounded in a neighborhood of {a} but is not continuous at {a}. Then there are two sequences {x_n\rightarrow a} and {y_n\rightarrow a} such that both sequences {f(x_n)} and {f(y_n)} converge but have different limits. Since at least one of these limits must be different from {f(a)}, the graph of {f} is not closed. Conclusion: a function with a closed graph is continuous at {a} if and only if it is bounded in a neighborhood of {a}. In particular, the set of discontinuities is closed.


Furthermore, the set of discontinuities has empty interior. Indeed, suppose that {f} is discontinuous at every point of a nontrivial closed interval {[a,b]}. Let {A_n = G_f \cap ([a,b]\times [-n,n])}; this is a closed bounded set, hence compact. Its projection onto the {x}-axis is also compact, and this projection is exactly the set {B_n=\{x\in [a,b] : |f(x)|\le n\}}. Thus, {B_n} is closed. The set {B_n} has empty interior, since otherwise {f} would be continuous at its interior points. Finally, {\bigcup B_n=[a,b]}, contradicting the Baire Category theorem.

Summary: for closed-graph functions on {\mathbb R}, the sets of discontinuity are precisely the closed sets with empty interior. In particular, every such function has a point of continuity. The proof works just as well for maps from {\mathbb R^n} to any metric space.


However, the above result does not extend to the setting of Banach spaces. Here is an example of a map {F\colon X\rightarrow X} on a Banach space {X} such that {\|F(x)-F(y)\|=1} whenever {x\ne y}; this property implies that the graph is closed, despite {F} being discontinuous everywhere.

Let {X} the space of all bounded functions {\phi \colon (0,1]\rightarrow\mathbb R} with the supremum norm. Let {(q_n)_{n=1}^\infty} be an enumeration of all rational numbers. Define the function {\psi =F(\phi )} separately on each subinterval {(2^{-n},2^{1-n}]}, {n=1,2,\dots} as

{\displaystyle \psi(t) = \begin{cases} 1 \quad &\text{if } \phi (2^nt-1) > q_n \\ 0 \quad &\text{if } \phi (2^n t-1)\le q_n\end{cases}}

For any two distinct elements {\phi_1,\phi_2} of {X} there is a point {s\in (0,1]} and a number {n\in\mathbb N} such that {q_n} is strictly between {\phi_1(s)} and {\phi_2(s)}. According to the definition of {F} this implies that the functions {F(\phi_1)} and {F(\phi_2)} take on different values at the point {t=2^{-n}(s+1)}. Thus the norm of their difference is {1}.


So much for Nonlinear Closed Graph Theorem. However, the space {X} in the above example is nonseparable. Is there an nowhere continuous map between separable Banach spaces such that its graph is closed?

Binary intersection property, and not fixing what isn’t broken

A metric space has the binary intersection property if every collection of closed balls has nonempty intersection unless there is a trivial obstruction: the distance between centers of two balls exceeds the sum of their radii. In other words, for every family of points {x_\alpha\in X} and numbers {r_\alpha>0} such that {d(x_\alpha,x_\beta)\le r_\alpha+r_\beta} for all {\alpha,\beta} there exists {x\in X} such that {d(x_\alpha,x)\le r_\alpha} for all {\alpha}.

For example, {\mathbb R} has this property: {x=\inf_\alpha (x_\alpha+r_\alpha)} works. But {\mathbb R^2} does not:

Failure of the binary intersection property
Failure of the binary intersection property

The space of bounded sequences {\ell^\infty} has the binary intersection property, and so does the space {B[0,1]} of all bounded functions {f:[0,1]\rightarrow\mathbb R} with the supremum norm. Indeed, the construction for {\mathbb R} generalizes: given a family of bounded functions {f_\alpha} and numbers {r_\alpha>0} as in the definition, let {f(x)=\inf_\alpha (f_\alpha(x)+r_\alpha)}.


The better known space of continuous functions {C[0,1]} has the finite version of binary intersection property, because for a finite family, the construction {\inf_\alpha (f_\alpha(x)+r_\alpha)} produces a continuous function. However, the property fails without finiteness, as the following example shows.

Example. Let {f_n\in C[0,1]} be a function such that {f_n(x)=-1} for {x\le \frac12-\frac1n}, {f_n(x)=1} for {x\ge \frac12+\frac1n}, and {f_n} is linear in between.

Since {\|f_n-f_m\| \le 1} for all {n,m}, we can choose {r_n=1/2} for all {n}. But if a function {f} is such that {\|f-f_n\|\le \frac12} for all {n}, then {f(x) \le -\frac12} for {x<\frac12} and {f(x) \ge \frac12} for {x>\frac12}. There is no continuous function that does that.

More precisely, for every {f\in C[0,1]} we have {\liminf_{n\to\infty} \|f-f_n\|\ge 1 } because {f(x)\approx f(1/2)} in a small neighborhood of {1/2}, while {f_n} change from {1} to {-1} in the same neighborhood when {n} is large.


Given a discontinuous function, one can approximate it with a continuous function in some way: typically, using a mollifier. But such approximations tend to change the function even if it was continuous to begin with. Let’s try to not fix what isn’t broken: look for a retraction of {B[0,1]} onto {C[0,1]}, that is a map {\Phi:B[0,1]\rightarrow C[0,1]} such that {\Phi(f)=f} for all {f\in C[0,1]}.

The failure of binary intersection property, as demonstrated by the sequence {(f_n)} above, implies that {\Phi} cannot be a contraction. Indeed, let {f(x)= \frac12 \,\mathrm{sign}\,(x-1/2)}. This is a discontinuous function such that {\|f-f_n\|\le 1/2} for all {n}. Since {\liminf_{n\to\infty} \|\Phi(f)-f_n\|\ge 1}, it follows that {\Phi} cannot be {L}-Lipschitz with a constant {L<2}.


It was known for a while that there is a retraction from {B[0,1]} onto {C[0,1]} with the Lipschitz constant at most {20}: see Geometric Nonlinear Functional Analysis by Benyamini and Lindenstrauss. In a paper published in 2007, Nigel Kalton proved the existence of a {2}-Lipschitz retraction.

Graphical embedding

This post continues the theme of operating with functions using their graphs. Given an integrable function {f} on the interval {[0,1]}, consider the region {R_f} bounded by the graph {y=f(x)}, the axis {y=0}, and the vertical lines {x=0}, {x=1}.

Total area under and over the graph is the L1 norm
Total area under and over the graph is the L1 norm

The area of {R_f} is exactly {\int_0^1 |f(x)|\,dx}, the {L^1} norm of {f}. On the other hand, the area of a set is the integral of its characteristic function,

\displaystyle    \chi_f = \begin{cases}1, \quad x\in R_f, \\ 0,\quad x\notin R_f \end{cases}

So, the correspondence {f\mapsto \chi_f } is a map from the space of integrable functions on {[0,1]}, denoted {L^1([0,1])}, to the space of integrable functions on the plane, denoted {L^1(\mathbb R^2)}. The above shows that this correspondence is norm-preserving. It also preserves the metric, because integration of {|\chi_f-\chi_g|} gives the area of the symmetric difference {R_f\triangle R_g}, which in turn is equal to {\int_0^1 |f-g| }. In symbols:

\displaystyle    \|\chi_f-\chi_g\|_{L^1} = \int |\chi_f-\chi_g| = \int |f-g| = \|f-g\|_{L^1}

Distance between two functions in terms of their graphs
Distance between two functions in terms of their graphs

The map {f\mapsto \chi_f} is nonlinear: for example {2f} is not mapped to {2 \chi_f} (the function that is equal to 2 on the same region) but rather to a function that is equal to 1 on a larger region.

So far, this nonlinear embedding did not really offer anything new: from one {L^1} space we got into another. It is more interesting (and more difficult) to embed things into a Hilbert space such as {L^2(\mathbb R^2)}. But for the functions that take only the values {0,1,-1}, the {L^2} norm is exactly the square root of the {L^1} norm. Therefore,

\displaystyle    \|\chi_f-\chi_g\|_{L^2} = \sqrt{\int |\chi_f-\chi_g|^2} =    \sqrt{\int |\chi_f-\chi_g|} = \sqrt{\|f-g\|_{L^1}}

In other words, raising the {L^1} metric to power {1/2} creates a metric space that is isometric to a subset of a Hilbert space. The exponent {1/2} is sharp: there is no such embedding for the metric {d(f,g)=\|f-g\|_{L^1}^{\alpha} } with {\alpha>1/2}. The reason is that {L^1}, having the Manhattan metric, contains geodesic squares: 4-cycles where the distances between adjacent vertices are 1 and the diagonal distances are equal to 2. Having such long diagonals is inconsistent with the parallelogram law in Hilbert spaces. Taking the square root reduces the diagonals to {\sqrt{2}}, which is the length they would have in a Hilbert space.

This embedding, and much more, can be found in the ICM 2010 talk by Assaf Naor.

Maximum of three numbers: it’s harder than it sounds

This simple identity hold for any two real numbers {x,y}:

\displaystyle   \max(|x|,|y|) = \frac12\,(|x+y|+|x-y|)   \ \ \ \ \  \ \ \ \ \ (1)

Indeed, if {|x|} realizes the maximum, then both {x+y} and {x-y} have the same sign as {x}. After opening the absolute value signs, we get {y} to cancel out.

So, (1) represents {\max(|x|,|y|)}, also known as the {\ell_\infty} norm, as the sum of absolute values of linear functions. Let’s try the same with {\max(|x|,|y|,|z|)}. Since the right hand side of (1) is just the average of {|\pm x \pm y|} over all possible choices of {\pm } signs, the natural thing to do is to average {|\pm x \pm y \pm z|} over all eight choices. The sign in front of {x} can be taken to be {+}, which simplifies the average to

\displaystyle    \frac14\,(|x+y+z|+|x+y-z|+|x-y+z|+|x-y-z|)    \ \ \ \ \  \ \ \ \ \ (2)

Does this formula give {\max(|x|,|y|,|z|)}? Instead of trying random numbers, let’s just plot the unit ball for the norm given by (2). If the identity works, it will be a cube. I used Maple:

with(plots): f:=(abs(x+y+z)+abs(x+y-z)+abs(x-y-z)+abs(x-y+z))/4:
implicitplot3d(f=1,x=-1/4..1/4,y=-1/4..1/4,z=-1/4..1/4,grid=[25,25,25]);
My precious!
My precious!

Close, but no cube. Luckily, this is my favorite Archimedean solid, the cuboctahedron.

Although all terms of (2) look exactly the same, the resulting shape has both triangular and square faces. Where does the difference of shapes come from?

More importantly, is (2) really the best we can do? Is there some other sum of moduli of linear functions that will produce {\max(|x|,|y|,|z|)}?

— No.

Even if negative coefficients are allowed?

— Even then. (But you can come arbitrarily close.)

What if we allow integrals with respect to an arbitrary (signed) measure, as in

\displaystyle    \iiint |\alpha x+\beta y+\gamma z|\,d \mu(\alpha, \beta, \gamma)    \ \ \ \ \  \ \ \ \ \ (3)

— Still no. But if {\mu} is allowed to be a distribution of higher order (an object more singular than a measure), then a representation exists (W. Weil, 1976). Yes, one needs the theory of distributions to write the maximum of three numbers as a combination of linear functions.

I’ll only prove that there is no identity of the form

\displaystyle  \max(|x|,|y|,|z|) = \sum_{i=1}^N |\alpha_i x+\beta_i y+ \gamma_i z|

Indeed, such an identity amounts to having an isometric embedding {T\colon \ell_\infty^3 \rightarrow \ell_1^N}. The adjoint operator {T^* \colon \ell_\infty^N \rightarrow \ell_1^3} is a submetry meaning that it maps the unit ball of {\ell_\infty^N } onto the unit ball {\ell_1^3}. The unit ball of {\ell_\infty^N } is just a cube; all of its faces are centrally symmetric, and this symmetry is preserved by linear maps. But {\ell_1^3} is an octahedron, with triangular faces. A contradiction. {\ \Box}

An aside: what if instead of averaging {|\pm x \pm y|} over all {\pm } choices (i.e., unimodular real coefficients) we take the average over all unimodular complex coefficients? This amounts to {\|(x,y)\| = \frac{1}{2\pi} \int_0^{2\pi} |x+e^{it}y|\,dt}. I expected something nice from this norm, but

Complex averaging in real space
Complex averaging in real space

it’s a strange shape whose equation involves the complete elliptic integral of second kind. Yuck.