Visualizing the Hardy norms on polynomials

The Hardy norm of a polynomial {f} is

{\displaystyle \|f\|_p = \left( \int_0^{1} |f(e^{2\pi it})|^p \,dt \right)^{1/p} }

with the usual convention

{\displaystyle \|f\|_\infty = \sup_{[0, 1]} |f(e^{2\pi it})| }

(This applies more generally to holomorphic functions, but polynomials suffice for now.) The quantity {\|f\|_p} makes sense for {0<p\le \infty} but is not actually a norm when {p<1}.

When restricted to polynomials of degree {d}, the Hardy norm provides a norm on {\mathbb C^{d+1}}, which we can try to visualize. In the special case {p=2} the Hardy norm agrees with the Euclidean norm by Parseval’s theorem. But what is it for other values of {p}?

Since the space {\mathbb C^{d+1}} has {2d+2} real dimensions, it is hard to visualize unless {d} is very small. When {d=0}, the norm we get on {\mathbb C^1} is just the Euclidean norm regardless of {p}. The first nontrivial case is {d=1}. That is, we consider the norm

{\displaystyle \|(a, b)\|_p = \left( \int_0^{1} |a + b e^{2\pi it}|^p \,dt \right)^{1/p} }

Since the integral on the right does not depend on the arguments of the complex numbers {a, b}, we lose nothing by restricting attention to {(a, b)\in \mathbb R^2}. (Note that this would not be the case for degrees {d\ge 2}.)

One easy case is {\|(a, b)\|_\infty = \sup_{[0, 1]} |a+be^{2\pi i t}| = |a|+|b|} since we can find {t} for which the triangle inequality becomes an equality. For the values {p\ne 2,\infty} numerics will have to do: we can use them to plot the unit ball for each of these norms. Here it is for {p=4}:

hardy4
p = 4

And {p=10}:

hardy10
p = 10

And {p=100} which is pretty close to the rotated square that we would get for {p=\infty}.

hardy100
p = 100

In the opposite direction, {p=1} brings a surprise: the Hardy {1}-norm is strictly convex, unlike the usual {1}-norm.

hardy1
p = 1

This curve already appeared on this blog in Maximum of three numbers: it’s harder than it sounds where I wrote

it’s a strange shape whose equation involves the complete elliptic integral of second kind. Yuck.

Well, that was 6 years ago; with time I grew to appreciate this shape and the elliptic integrals.

Meanwhile, surprises continue: when {p=1/2 < 1}, the Hardy norm is an actual norm, with a convex unit ball.

hardy05
p = 0.5

Same holds for {p=1/5}:

hardy02
p = 0.2

And even for {p=1/100}:

hardy001
And here are all these norms together: from the outside in, the values of {p} are 0.01, 0.2, 0.5, 1, 2, 4, 10, 100. Larger {p} makes a larger Hardy norm, hence a smaller unit ball: the opposite behavior to the usual {p}-norms {(|a|^p + |b|^p)^{1/p}}.

hardy_all
All together now

I think this is a pretty neat picture: although the shapes look vaguely like {\ell^p} balls, the fact that the range of the exponent is {[0, \infty] } instead of {[1, \infty]} means there is something different in the behavior of these norms. For one thing, the Hardy norm with {p=1} turns out to be almost isometrically dual to the Hardy norm with {p=4}: this became the subject of a paper and then another one.

Orthogonality in normed spaces

For a vector {x} in a normed space {X}, define the orthogonal complement {x^\perp} to be the set of all vectors {y} such that {\|x+ty\|\ge \|x\|} for all scalars {t}. In an inner product space (real or complex), this agrees with the normal definition of orthogonality because {\|x+ty\|^2 - \|x\|^2 = 2\,\mathrm{Re}\,\langle x, ty\rangle + o(t)} as {t\to 0}, and the right hand side can be nonnegative only if {\langle x, y\rangle=0}.

Let’s see what properties of orthogonal complement survive in a general normed space. For one thing, {x^\perp=X} if and only if {x=0}. Another trivial property is that {0\in x^\perp} for all {x}. More importantly, {x^\perp} is a closed set that contains some nonzero vectors.

  •  Closed because the complement is open: if {\|x+ty\| < \|x\|} for some {t}, the same will be true for vectors close to {y}.
  • Contains a nonzero vector because the Hahn-Banach theorem provides a norming functional for {x}, i.e., a unit-norm linear functional {f\in X^*} such that {f(x)=\|x\|}. Any {y\in \ker f} is orthogonal to {x}, because {\|x+ty\|\ge f(x+ty) = f(x) = \|x\|}.

In general, {x^\perp} is not a linear subspace; it need not even have empty interior. For example, consider the orthogonal complement of the first basis vector in the plane with {\ell_1} (taxicab) metric: it is \{(x, y)\colon |y|\ge |x|\}.

download
The orthogonal complement of a horizontal vector in the taxicab plane

This example also shows that orthogonality is not symmetric in general normed spaces: {(1,1)\in (1,0)^\perp} but {(1,0)\notin (1,1)^\perp}. This is why I avoid using notation {y \perp x} here.

In fact, {x^\perp} is the union of kernels of all norming functionals of {x}, so it is only a linear subspace when the norming functional is unique. Containment in one direction was already proved. Conversely, suppose {y\in x^\perp} and define a linear functional {f} on the span of {x,y} so that {f(ax+by) = a\|x\|}. By construction, {f} has norm 1. Its Hahn-Banach extension is a norming functional for {x} that vanishes on {y}.

Consider {X=L^p[0,1]} as an example. A function {f} satisfies {1\in f^\perp} precisely when its {p}th moment is minimal among all translates {f+c}. This means, by definition, that its “{L^p}-estimator” is zero. In the special cases {p=1,2,\infty} the {L^p} estimator is known as the median, mean, and midrange, respectively. Increasing {p} gives more influence to outliers, so {1\le p\le 2} is the more useful range for it.

Quasi-projections and isometries

A typical scenario: given a subset {E} of a metric space {X} and a point {x\in X}, we look for a point {y\in E} that is nearest to {x}: that is, {d(x, y) = \mathrm{dist}\,(x, E)}. Such a point is generally not unique: for example, if {E} is the graph of cosine function and {x = (\pi, \pi/2)}, then both {(\pi/2, 0)} and {(3\pi/2, 0)} qualify as nearest to {x}. This makes the nearest-point projection onto {E} discontinuous: moving {x} slightly to the left or to the right will make its projection onto {E} jump from one point to another. Not good.

proj1
Discontinuous nearest-point projection

Even when the nearest point projection is well-defined and continuous, it may not be the kind of projection we want. For example, in a finite-dimensional normed space with strictly convex norm we have a continuous nearest-point projection onto any linear subspace, but it is in general a nonlinear map.

Let’s say that {P\colon X\to E} is a quasi-projection if {d(x, P(x)) \le C \mathrm{dist}\,(x, E)} for some constant {C} independent of {x}. Such maps are much easier to construct: indeed, every Lipschitz continuous map {P\colon X\to E} such that {P(x)=x} for {x \in E} is a quasi-projection. For example, one quasi-projection onto the graph of cosine is the map {(x, y)\mapsto (x, \cos x)} shown below.

proj2
Continuous quasi-projection

If {X} is a Banach space and {E} is its subspace, then any idempotent operator with range {E} is a quasi-projection onto {E}. Not every subspace admits such an operator but many do (these are complemented subspaces; they include all subspaces of finite dimension or finite codimension). By replacing “nearest” with “close enough” we gain linearity. And even some subspaces that are not linearly complemented admit a continuous quasi-projection.

Here is a neat fact: if {M} and {N} are subspaces of a Euclidean space and {\dim M = \dim N}, then there exists an isometric quasi-projection of {M} onto {N} with constant {C=\sqrt{2}}. This constant is best possible: for example, an isometry from the {y}-axis onto the {x}-axis has to send {(0, 1)} to one of {(\pm 1, 0)}, thus moving it by distance {\sqrt{2}}.

proj3
An isometry must incur sqrt(2) distance cost

Proof. Let {k} be the common dimension of {M} and {N}. Fix some orthonormal bases in {M} and {N}. In these bases, the orthogonal (nearest-point) projection from {M} to {N} is represented by some {k\times k} matrix {P} of norm at most {1}. We need an orthogonal {k\times k} matrix {Q} such that the map {M\to N} that it defines is a {\sqrt{2}}-quasi-projection. What exactly does this condition mean for {Q}?

Let’s say {x\in M}, {y\in N} is the orthogonal projection of {x} onto {N}, and {z\in N} is where we want to send {x} by an isometry. Our goal is {\|x-z\|\le \sqrt{2}\|x-y\|}, in addition to {\|z\|=\|x\|}. Squaring and expanding inner products yields {2\langle x, y\rangle - \langle x, z \rangle \le \|y\|^2}. Since both {y} and {z} are in {N}, we can replace {x} on the left by its projection {y}. So, the goal simplifies to {\|y\|^2 \le \langle y, z\rangle}. Geometrically, this means placing {z} so that its projection onto the line through {y} lies on the continuation of this line beyond {y}.

So far so good, but the disappearance of {x} from the inequality is disturbing. Let’s bring it back by observing that {\|y\|^2 \le \langle y, z\rangle} is equivalent to {4(\|y\|^2 - \langle y, z\rangle) + \|z\|^2 \le \|x\|^2}, which is simply {\|2y-z\| \le \|x\|}. So that’s what we want to do: map {x} so that the distance from its image to {2y} does not exceed {\|x\|}. In terms of matrices and their operator norm, this means {\|2P-Q\|\le 1}.

It remains to show that every square matrix of norm at most {2} (such as {2P} here) is within distance {1} of some orthogonal matrix. Let {2P = U\Sigma V^T} be the singular value decomposition, with {U, V} orthogonal and {\Sigma} a diagonal matrix with the singular values of {2P} on the diagonal. Since the singular values of {2P} are between {0} and {2}, it follows that {\|\Sigma-I\|\le 1}. Hence {\|2P - UV^T\|\le 1}, and taking {Q=UV^T} concludes the proof.

(The proof is based on a Stack Exchange post by user hypernova.)

The shortest circle is a hexagon

Let {\|\cdot\|} be some norm on {{\mathbb R}^2}. The norm induces a metric, and the metric yields a notion of curve length: the supremum of sums of distances over partitions. The unit circle {C=\{x\in \mathbb R^2\colon \|x\|=1\}} is a closed curve; how small can its length be under the norm?

For the Euclidean norm, the length of unit circle is {2\pi\approx 6.28}. But it can be less than that: if {C} is a regular hexagon, its length is exactly {6}. Indeed, each of the sides of {C} is a unit vector with respect to the norm defined by {C}, being a parallel translate of a vector connecting the center to a vertex.

Hexagon as unit disk
Hexagon as unit disk

To show that {6} cannot be beaten, suppose that {C} is the unit circle for some norm. Fix a point {p\in C}. Draw the circle {\{x\colon \|x-p\|=1\}}; it will cross {C} at some point {q}. The points {p,q,q-p, -p, -q, p-q} are vertices of a hexagon inscribed in {C}. Since every side of the hexagon has length {1}, the length of {C} is at least {6}.

It takes more effort to prove that the regular hexagon and its affine images, are the only unit circles of length {6}; a proof can be found in Geometry of Spheres in Normed Spaces by Juan Jorge Schäffer.

Diameter vs radius, part II

A set A in a metric space X has diameter \mathrm{diam}\, A=\sup_{a,b\in A} |a-b| and radius \mathrm{rad}\, A = \inf_{x\in X}\sup_{a\in A} |a-x|. It’s easier to find the radius by expressing it in a different form: the smallest (or infimal) value of r such that the \bigcap_{a\in A} \overline{B}(a,r)\ne\varnothing, where \overline{B}(a,r) is the closed ball of radius r with center a.

Suppose X is a normed linear space and f\colon A\to X is a map that does not increase distances (hence does not increase the diameter of any set). I already said that the radius may increase under f, but my example was incorrect. Here is a correct one: the set A consists of 3 points in red.

Radius increases under a nonexpanding map

The blue hexagon is the unit sphere in this space. The three points in red have distance 2 from one another. So do their images under f, but the radius increases from 1 to 2/\sqrt{3}. The set f(A) consists of three vertices of the regular hexagon (in green) circumscribed about the blue one.

I think this example is as bad as it gets in two dimensions: that is, we should have \mathrm{rad}\, f(A) \le \frac{2}{\sqrt{3}} \mathrm{rad}\, A in any 2-dimensional normed space. Informally, the worst case is when the unit ball is triangular, which it can’t be because of the symmetry requirement. The hexagon is the next worst thing.

In higher dimensions the constant cannot be smaller than \frac{2}{\sqrt{3}}, since the above construction can be implemented in a subspace. I don’t know whether the constant grows with dimension or not (either way it can’t exceed 2, as remarked before).