Quasi-projections and isometries

A typical scenario: given a subset {E} of a metric space {X} and a point {x\in X}, we look for a point {y\in E} that is nearest to {x}: that is, {d(x, y) = \mathrm{dist}\,(x, E)}. Such a point is generally not unique: for example, if {E} is the graph of cosine function and {x = (\pi, \pi/2)}, then both {(\pi/2, 0)} and {(3\pi/2, 0)} qualify as nearest to {x}. This makes the nearest-point projection onto {E} discontinuous: moving {x} slightly to the left or to the right will make its projection onto {E} jump from one point to another. Not good.

proj1
Discontinuous nearest-point projection

Even when the nearest point projection is well-defined and continuous, it may not be the kind of projection we want. For example, in a finite-dimensional normed space with strictly convex norm we have a continuous nearest-point projection onto any linear subspace, but it is in general a nonlinear map.

Let’s say that {P\colon X\to E} is a quasi-projection if {d(x, P(x)) \le C \mathrm{dist}\,(x, E)} for some constant {C} independent of {x}. Such maps are much easier to construct: indeed, every Lipschitz continuous map {P\colon X\to E} such that {P(x)=x} for {x \in E} is a quasi-projection. For example, one quasi-projection onto the graph of cosine is the map {(x, y)\mapsto (x, \cos x)} shown below.

proj2
Continuous quasi-projection

If {X} is a Banach space and {E} is its subspace, then any idempotent operator with range {E} is a quasi-projection onto {E}. Not every subspace admits such an operator but many do (these are complemented subspaces; they include all subspaces of finite dimension or finite codimension). By replacing “nearest” with “close enough” we gain linearity. And even some subspaces that are not linearly complemented admit a continuous quasi-projection.

Here is a neat fact: if {M} and {N} are subspaces of a Euclidean space and {\dim M = \dim N}, then there exists anĀ isometric quasi-projection of {M} onto {N} with constant {C=\sqrt{2}}. This constant is best possible: for example, an isometry from the {y}-axis onto the {x}-axis has to send {(0, 1)} to one of {(\pm 1, 0)}, thus moving it by distance {\sqrt{2}}.

proj3
An isometry must incur sqrt(2) distance cost

Proof. Let {k} be the common dimension of {M} and {N}. Fix some orthonormal bases in {M} and {N}. In these bases, the orthogonal (nearest-point) projection from {M} to {N} is represented by some {k\times k} matrix {P} of norm at most {1}. We need an orthogonal {k\times k} matrix {Q} such that the map {M\to N} that it defines is a {\sqrt{2}}-quasi-projection. What exactly does this condition mean for {Q}?

Let’s say {x\in M}, {y\in N} is the orthogonal projection of {x} onto {N}, and {z\in N} is where we want to send {x} by an isometry. Our goal is {\|x-z\|\le \sqrt{2}\|x-y\|}, in addition to {\|z\|=\|x\|}. Squaring and expanding inner products yields {2\langle x, y\rangle - \langle x, z \rangle \le \|y\|^2}. Since both {y} and {z} are in {N}, we can replace {x} on the left by its projection {y}. So, the goal simplifies to {\|y\|^2 \le \langle y, z\rangle}. Geometrically, this means placing {z} so that its projection onto the line through {y} lies on the continuation of this line beyond {y}.

So far so good, but the disappearance of {x} from the inequality is disturbing. Let’s bring it back by observing that {\|y\|^2 \le \langle y, z\rangle} is equivalent to {4(\|y\|^2 - \langle y, z\rangle) + \|z\|^2 \le \|x\|^2}, which is simply {\|2y-z\| \le \|x\|}. So that’s what we want to do: map {x} so that the distance from its image to {2y} does not exceed {\|x\|}. In terms of matrices and their operator norm, this means {\|2P-Q\|\le 1}.

It remains to show that every square matrix of norm at most {2} (such as {2P} here) is within distance {1} of some orthogonal matrix. Let {2P = U\Sigma V^T} be the singular value decomposition, with {U, V} orthogonal and {\Sigma} a diagonal matrix with the singular values of {2P} on the diagonal. Since the singular values of {2P} are between {0} and {2}, it follows that {\|\Sigma-I\|\le 1}. Hence {\|2P - UV^T\|\le 1}, and taking {Q=UV^T} concludes the proof.

(The proof is based on a Stack Exchange post by user hypernova.)

Classifying words: a tiny example of SVD truncation

Given a bunch of words, specifically the names of divisions of plants and bacteria, I’m going to use a truncated Singular Value Decomposition to separate bacteria from plants. This isn’t a novel or challenging task, but I like the small size of the example. A similar type of examples is classifying a bunch of text fragments by keywords, but that requires a lot more setup.

Here are 33 words to classify: acidobacteria, actinobacteria, anthocerotophyta, aquificae, bacteroidetes, bryophyta, charophyta, chlamydiae, chloroflexi, chlorophyta, chrysiogenetes, cyanobacteria, cycadophyta, deferribacteres, deinococcus-thermus, dictyoglomi, firmicutes, fusobacteria, gemmatimonadetes, ginkgophyta, gnetophyta, lycopodiophyta, magnoliophyta, marchantiophyta, nitrospirae, pinophyta, proteobacteria, pteridophyta, spirochaetes, synergistetes, tenericutes, thermodesulfobacteria, thermotogae.

As is, the task is too easy: we can recognize the -phyta ending in the names of plant divisions. Let’s jumble the letters within each word:

aaabccdeiiort, aaabcceiinortt, aacehhnoooprttty, aacefiiqu, abcdeeeiorstt, abhoprtyy, aachhoprty, aacdehilmy, cefhilloorx, achhlooprty, ceeeghinorssty, aaabcceinorty, aaccdhoptyy, abcdeeeefirrrst, cccdeehimnoorsstuu, cdgiilmooty, cefiimrstu, aabcefiorstu, aadeeegimmmnostt, agghiknopty, aeghnoptty, acdhilooopptyy, aaghilmnoopty, aaachhimnoprtty, aeiinoprrst, ahinoppty, aabceeiooprrtt, adehiopprtty, aceehioprsst, eeeginrssstty, ceeeinrsttu, aabcdeeefhilmoorrsttu, aeeghmoortt

Not so obvious anymore, is it? Recalling the -phyta ending, we may want to focus on the presence of letter y, which is not so common otherwise. Indeed, the count of y letters is a decent prediction: on the following plot, green asterisks are plants and red are bacteria, the vertical axis is the count of letter Y in each word.

svd1
Count of Y in each word

However, the simple count fails to classify several words: having 1 letter Y may or may not mean a plant. Instead, let’s consider the entire matrix {A} of letter counts (here it is in a spreadsheet: 33 rows, one for each word; 26 columns, one for each letter.) So far, we looked at its 25th column in isolation from the rest of the matrix. Truncated SVD uncovers the relations between columns that are not obvious but express patterns such as the presence of letters p,h,t,a along with y. Specifically, write {A = UDV^T} with {U,V} unitary and {D} diagonal. Replace all entries of {D}, except the four largest ones, by zeros. The result is a rank-4 diagonal matrix {D_4}. The product {A_4 = UD_4V^T} is a rank-4 matrix, which keeps some of the essential patterns in {A} but de-emphasizes the accidental.

The entries of {D_4} are no longer integers. Here is a color-coded plot of its 25th column, which still somehow corresponds to letter Y but takes into account the other letters with which it appears.

svd2
The same column of the letter-count matrix, after truncation

Plants are now cleanly separated from bacteria. Plots made in MATLAB as follows:


A=[3,1,2,1,1,0,0,0,2,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0;3,1,2,0,1,0,0,0,2,0,0,0,0,1,1,0,0,1,0,2,0,0,0,0,0,0;2,0,1,0,1,0,0,2,0,0,0,0,0,1,3,1,0,1,0,3,0,0,0,0,1,0;2,0,1,0,1,1,0,0,2,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0;1,1,1,1,3,0,0,0,1,0,0,0,0,0,1,0,0,1,1,2,0,0,0,0,0,0;1,1,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,1,0,1,0,0,0,0,2,0;2,0,1,0,0,0,0,2,0,0,0,0,0,0,1,1,0,1,0,1,0,0,0,0,1,0;2,0,1,1,1,0,0,1,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0;0,0,1,0,1,1,0,1,1,0,0,2,0,0,2,0,0,1,0,0,0,0,0,1,0,0;1,0,1,0,0,0,0,2,0,0,0,1,0,0,2,1,0,1,0,1,0,0,0,0,1,0;0,0,1,0,3,0,1,1,1,0,0,0,0,1,1,0,0,1,2,1,0,0,0,0,1,0;3,1,2,0,1,0,0,0,1,0,0,0,0,1,1,0,0,1,0,1,0,0,0,0,1,0;2,0,2,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,2,0;1,1,1,1,4,1,0,0,1,0,0,0,0,0,0,0,0,3,1,1,0,0,0,0,0,0;0,0,3,1,2,0,0,1,1,0,0,0,1,1,2,0,0,1,2,1,2,0,0,0,0,0;0,0,1,1,0,0,1,0,2,0,0,1,1,0,2,0,0,0,0,1,0,0,0,0,1,0;0,0,1,0,1,1,0,0,2,0,0,0,1,0,0,0,0,1,1,1,1,0,0,0,0,0;2,1,1,0,1,1,0,0,1,0,0,0,0,0,1,0,0,1,1,1,1,0,0,0,0,0;2,0,0,1,3,0,1,0,1,0,0,0,3,1,1,0,0,0,1,2,0,0,0,0,0,0;1,0,0,0,0,0,2,1,1,0,1,0,0,1,1,1,0,0,0,1,0,0,0,0,1,0;1,0,0,0,1,0,1,1,0,0,0,0,0,1,1,1,0,0,0,2,0,0,0,0,1,0;1,0,1,1,0,0,0,1,1,0,0,1,0,0,3,2,0,0,0,1,0,0,0,0,2,0;2,0,0,0,0,0,1,1,1,0,0,1,1,1,2,1,0,0,0,1,0,0,0,0,1,0;3,0,1,0,0,0,0,2,1,0,0,0,1,1,1,1,0,1,0,2,0,0,0,0,1,0;1,0,0,0,1,0,0,0,2,0,0,0,0,1,1,1,0,2,1,1,0,0,0,0,0,0;1,0,0,0,0,0,0,1,1,0,0,0,0,1,1,2,0,0,0,1,0,0,0,0,1,0;2,1,1,0,2,0,0,0,1,0,0,0,0,0,2,1,0,2,0,2,0,0,0,0,0,0;1,0,0,1,1,0,0,1,1,0,0,0,0,0,1,2,0,1,0,2,0,0,0,0,1,0;1,0,1,0,2,0,0,1,1,0,0,0,0,0,1,1,0,1,2,1,0,0,0,0,0,0;0,0,0,0,3,0,1,0,1,0,0,0,0,1,0,0,0,1,3,2,0,0,0,0,1,0;0,0,1,0,3,0,0,0,1,0,0,0,0,1,0,0,0,1,1,2,1,0,0,0,0,0;2,1,1,1,3,1,0,1,1,0,0,1,1,0,2,0,0,2,1,2,1,0,0,0,0,0;1,0,0,0,2,0,1,1,0,0,0,0,1,0,2,0,0,1,0,2,0,0,0,0,0,0];
[U, D, V] = svd(A);
D4 = D .* (D >= D(4, 4));
A4 = U * D4 * V';
plants = (A4(:, 25) > 0.8);
bacteria = (A4(:, 25) <= 0.8);
  % the rest is output
words = 1:33;
hold on
plot(words(plants), A4(plants, 25), 'g*');
plot(words(bacteria), A4(bacteria, 25), 'r*');