Sweetened and flavored dessert made from gelatinous or starchy ingredients and milk

Takagi (高木) curves are fractals that are somehow less known than Cantor sets and Sierpinski carpets, yet they can also be useful as (counter-)examples. The general 高木 curve is the graph y=f(x) of a function f that is built from triangular waves. The nth generation wave has equation y=2^{-n} \lbrace 2^n x \rbrace where \lbrace\cdot\rbrace means the distance to the nearest integer. Six of these waves are pictured below.

Triangular Waves

Summation over n creates the standard 高木 curve T, also known as the blancmange curve:

\displaystyle y=\sum_{n=0}^{\infty} 2^{-n} \lbrace 2^n x\rbrace

Standard Takagi curve

Note the prominent cusps at dyadic rationals: more on this later.

General 高木 curves are obtained by attaching coefficients c_n to the terms of the above series. The simplest of these, and the one of most interest to me, is the alternating 高木 curve T_{alt}:

\displaystyle y=\sum_{n=0}^{\infty} (-2)^{-n} \lbrace 2^n x\rbrace

Alternating Takagi curve

The alternation of signs destroys the cusps that are so prominent in T. Quantitatively speaking, the diameter of any subarc of T_{alt} is bounded by the distance between its endpoints times a fixed constant. The curves with this property are called quasiarcs, and they are precisely the quasiconformal images of line segments.

Both T and T_{alt} have infinite length. More precisely, the length of the nth generation of either curve is between \sqrt{(n+1)/2} and \sqrt{n+1}+1. Indeed, the derivative of x\mapsto 2^{-k}\lbrace 2^k x\rbrace is just the Rademacher function r_k. Therefore, the total variation of the sum \sum_{k=0}^n c_k 2^{-k}\lbrace 2^k x\rbrace is the L^1 norm of \sum_{k=0}^n c_k r_k. With c_k=\pm 1 the sharp form of the Хинчин inequality from the previous post yields

\displaystyle 2^{-1/2}\sqrt{n+1} \le \left\|\sum_{k=0}^n c_k r_k\right\|_{L^1} \le \sqrt{n+1}

For the upper bound I added 1 to account for the horizontal direction. Of course, the bound of real interest is the lower one, which proves unrectifiability. So far, a construction involving these curves shed a tiny bit of light on the following questions:

Which sets K\subset \mathbb R^n have the property that any quasiconformal image of K contains a rectifiable curve?

I won’t go (yet) into the reasons why this question arose. Any set with nonempty interior has the above property, since quasiconformal maps are homeomorphisms. A countable union of lines in the plane does not; this is what 高木 curves helped to show. The wide gap between these results remains to be filled.

The Khintchine inequality

Today’s technology should make it possible to use the native transcription of names like Хинчин without inventing numerous ugly transliterations. The inequality is extremely useful in both analysis and probability: it says that the L^p norm of a linear combination of Rademacher functions (see my post on the Walsh basis) can be computed from its coefficients, up to a multiplicative error that depends only on p. Best of all, this works even for the troublesome p=1; in fact for all 0<p<\infty. Formally stated, the inequality is

\displaystyle A_p\sqrt{\sum c_n^2} \le \left\|\sum c_n r_n\right\|_{L^p} \le B_p\sqrt{\sum c_n^2}

where the constants A_p,B_p depend only on p. The orthogonality of Rademacher functions tells us that A_2=B_2=1, but what are the other constants? They were not found until almost 60 years after the inequality was proved. The precise values, established by Haagerup in 1982, behave in a somewhat unexpected way. Actually, only A_p does. The upper bound is reasonably simple:

\displaystyle B_p=\begin{cases} 1, \qquad 0<p\le 2 \\ \sqrt{2}\left[\Gamma(\frac{p+1}{2})/\sqrt{\pi}\right]^{1/p},  \qquad 2<p<\infty \end{cases}

The lower bound takes an unexpected turn:

\displaystyle A_p=\begin{cases} 2^{\frac{1}{2}-\frac{1}{p}},\qquad 0<p\le p_0 \\  \sqrt{2}\left[\Gamma(\frac{p+1}{2})/\sqrt{\pi}\right]^{1/p}, \qquad p_0<p<2 \\  1,\qquad 2\le p<\infty \end{cases}

The value of p_0 is determined by the continuity of A_p, and is not far from 2: precisely, p_0\approx 1.84742. Looks like a bug in the design of the Universe.

Rademacher series

For a concrete example, I took random coefficients c_0...c_4 and formed the linear combination shown above. Then computed its L^p norm and the bounds in the Khintchine inequality. The norm is in red, the lower bound is green, the upper bound is yellow.

Two-sided bounds

It’s a tight squeeze near p=2

Another orthonormal basis: Hermite functions

This is an orthonormal basis for L^2(\mathbb R). Since the measure of \mathbb R is infinite, functions will have to decay at infinity in order to be in L^2. The Hermite functions are
\displaystyle \Phi_n(x)=(2^n n! \sqrt{\pi})^{-1/2} H_n(x)e^{-x^2/2}
where H_n is the nth Hermite polynomial, defined by
\displaystyle H_n(x)=(-1)^n e^{x^2} \left(\frac{d}{dx}\right)^n e^{-x^2}.
The goal is to prove that the functions \Phi_n can be obtained from x^n e^{-x^2/2} via the Gram-Schmidt process. (They actually form a basis, but I won’t prove that.)

One can observe that the term e^{-x^2/2} would be unnecessary if we considered the weighted space L^2(\mathbb R, w) with weight w(x)=e^{-x^2} and the inner product \langle f,g\rangle=\int_{\mathbb R} fg\,w\,dx. In this language, we orthogonalize the sequence of monomials \lbrace x^n\rbrace\subset L^2(\mathbb R, w) and get the ON basis of polynomials \{c_n H_n\} with c_n = (2^n n! \sqrt{\pi})^{-1/2} being a normalizing constant. But since weighted spaces were never introduced in class, I’ll proceed with the original formulation. First, an unnecessary graph of \Phi_0,\dots,\Phi_4; the order is red, green, yellow, blue, magenta.

Hermite Functions

Claim 1. H_n is a polynomial of degree n with the leading term 2^n x^n. Proof by induction, starting with H_0=1. Observe that

\displaystyle H_{n+1}=- e^{x^2} \frac{d}{dx}\left(e^{-x^2} H_n\right) =2x H_n - H_n'

where the first term has degree n+1 and the second n-1. So, their sum has degree exactly n+1, and the leading coefficient is 2^{n+1}. Claim 1 is proved.

In particular, Claim 1 tells us that the span of the \Phi_0,\dots,\Phi_n is the same as the span of \lbrace x^k e^{-x^2/2}\colon 0\le k\le n\rbrace.

Claim 2. \Phi_m\perp \Phi_n for m\ne n. We may assume m<n. Must show \int_{\mathbb R} H_m(x) H_n(x) e^{-x^2}\,dx=0. Since H_m is a polynomial of degree m<n, it suffices to prove

(*) \displaystyle \int_{\mathbb R} x^k H_n(x) e^{-x^2}\,dx=0 for integers 0\le k<n.

Rewrite (*) as \int_{\mathbb R} x^k \left(\frac{d}{dx}\right)^n e^{-x^2} \,dx=0 and integrate by parts repeatedly, throwing the derivatives onto x^k until the poor guy can't handle it anymore and dies. No boundary terms appear because e^{-x^2} decays superexponentially at infinity, easily beating any polynomial factors. Claim 2 is proved.

Combining Claim 1 and Claim 2, we see that \Phi_n belongs to the (n+1)-dimensional space \mathrm{span}\,\lbrace x^k e^{-x^2/2}\colon 0\le k\le n\rbrace, and is orthogonal to the n-dimensional subspace \mathrm{span}\,\lbrace x^k e^{-x^2/2}\colon 0\le k\le n-1\rbrace. Since the “Gram-Schmidtization'' of x^n e^{-x^2/2} has the same properties, we conclude that \Phi_n agrees with this “Gram-Schmidtization'' up to a scalar factor.

It remains to prove that the scalar factor is unimodular (\pm 1 since we are over reals).

Claim 3. \langle \Phi_n, \Phi_n\rangle=1 for all n. To this end we must show \int_{\mathbb R} H_n(x)H_n(x)e^{-x^2}\,dx =2^n n! \sqrt{\pi}. Expand the first factor H_n into monomials, use (*) to kill the degrees less than n, and recall Claim 1 to obtain
\int_{\mathbb R} H_n(x)H_n(x)e^{-x^2}\,dx = 2^n \int_{\mathbb R} x^n H_n(x)e^{-x^2}\,dx = (-1)^n 2^n\int_{\mathbb R} x^n \left(\frac{d}{dx}\right)^n e^{-x^2} \,dx.
As in the proof of Claim 2, we integrate by parts throwing the derivatives onto x^n. After n integrations the result is
2^n \int_{\mathbb R} n! e^{-x^2} \,dx = 2^n n! \sqrt{\pi}, as claimed.

P.S. According to Wikipedia, these are the “physicists’ Hermite polynomials”. The “probabilists’ Hermite polynomials” are normalized to have the leading coefficient 1.

The Walsh basis

If you ask a random passerby to give you an orthonormal basis for L^2[0,1], they will probably respond with e_n(t)=\exp(2\pi i nt), n\in \mathbb Z. There is a lot to like about this exponential basis: most importantly, it diagonalizes the \frac{d}{dt} operator: \frac{d}{dt}e_n=2\pi i n e_n. This property makes the exponential basis indispensable in the studies of differential equations. However, I prefer to describe the Walsh basis, which has several advantages:

  • the basis functions take just two values \pm 1, which simplifies the computation of coefficients
  • the proof of the basis property is easier than for the exponential basis
  • there is a strong connection to probability: the Walsh expansion can be seen as conditional expectation, and the partial sums form a Doob martingale
  • partial sums converge a.e. for any L^1 function, which is not the case for the exponential basis.

First, introduce the Rademacher functions r_n=\mathrm{sign}\, \sin (2^{n+1} \pi t), n=0,1,2,\dots (The enumeration is slightly different from what I used in class.) These are r_0,r_1,r_2,r_3:

Rademacher functions

Alternatively, one can define r_n as the function which takes the values +1,-1 alternatively on the dyadic intervals \displaystyle \bigg[\frac{j}{2^{n+1}},\frac{j+1}{2^{n+1}}\bigg).

To define the nth Walsh function W_n, express the index n as the sum of powers of 2, i.e., n=2^{p_1}+2^{p_2}+\dots and let W_n=r_{p_1}r_{p_2}\dots . For example, W_{13}=r_3r_2r_0 because 13=2^3+2^2+2^0. Since the binary representation is unique, the definition makes sense. We also have W_0=1 because the product of an empty set of numbers is 1.

In class I checked that the set \lbrace W_n\colon n=0,1,2,\dots\rbrace is orthonormal. Also, for any integer k\ge 0 the linear span of \lbrace W_n\colon 0\le n< 2^k \rbrace is the space V_k of all functions that are constant on the dyadic intervals of length 2^{-k}. This follows by observing that \lbrace W_n\colon 0\le n< 2^k \rbrace\subset V_k and that the dimension of V_k is 2^k.

To prove that the Walsh basis is indeed a basis, suppose that h\in L^2[0,1] is orthogonal to all W_n. Since h\perp V_k for all k, the integral of h over any dyadic interval is zero (note that the characteristic function of any dyadic interval belongs to some V_k). But any subinterval I\subset [0,1] can be written as a disjoint countable union of dyadic intervals: just take all dyadic intervals that are contained in I. (You don't necessarily get the right type of endpoints, but as long as we work with integrals, the difference between open and closed intervals is immaterial.) Thus, the integral of h over any subinterval of [0,1] is zero. By the Lebesgue differentiation theorem, for a.e. t we have \displaystyle h(t)=\lim_{\delta\to 0}\frac{1}{2\delta}\int_{t-\delta}^{t+\delta} h =0. Thus h=0 as required.

The proof is even simpler if we use the non-centered form of the Lebesgue differentiation theorem: for a.e. t the average \frac{1}{b-a}\int_a^b h approaches h(t) as a,b\to t in such a way that a\le t\le b. Armed with this theorem, we can consider the sequence of dyadic intervals containing t, and immediately obtain h(t)=0 a.e.

Having proved that \lbrace W_n\rbrace is a basis, let’s expand something in it. For example, this moderately ugly function f:

Ugly function

I used Maple to compute the coefficients c_n=\langle f, W_n\rangle and plotted the partial sums \sum_{n=0}^N c_n W_n for N=1,3,7,15:

Partial sums

Such partials sums (those that use 2^k basis functions) are particularly nice: they are obtained simply by averaging f over each dyadic interval of length 2^{-k}. In probability theory this is known as conditional expectation. The conditional expectation is a contraction in any L^p space, including L^1 which gives so much trouble to the exponential basis. The highly oscillatory parts of f are killed by the dyadic averaging; in contrast, when integrated against the exponentials, they may cleverly accumulate and destroy the convergence of partial sums.

Playing with iterated function systems

After Colin Carroll posted several fractal experiments with Matlab, I decided to do something of the sort. One difference is that I use Scilab, an open-source alternative to Matlab.

The first experiment: drawing the Sierpinski carpet using the Chaos Game. Namely, given a finite family of strict contractions f_1,\dots,f_r\colon \mathbb R^2\to \mathbb R^2 and an initial point p_0, plot the sequence p_{n+1}=f_{j_n}(p_n), where j_n \in \{1,\dots,r\} is chosen randomly at each step. To simplify matters, let f_j be the similarity transformation with scaling factor s\in (0,1) and the fixed point v_j.

A canonical example is: v_1,v_2,v_3 are the vertices of equilateral triangle, s=1/2. This produces the fractal known as the Sierpinski gasket. For a different example, set s=1/3 and let v_1,\dots,v_8 be the vertices of square together with midpoints of its sides. The resulting fractal is known as the Sierpinski carpet.

Sierpinski Carpet
Sierpinski Carpet

This image was obtained by calling the scilab function given below as Scarpet(1/3, 100000). The function is essentially a translation of Colin’s code to scilab. Caution: if you copy and paste this code, watch out for line breaks and encoding of quote marks.

function Scarpet(scale,steps)
    x = [1,0,-1,-1,-1,0,1,1];
    y = [1,1,1,0,-1,-1,-1,0];
    point = zeros(steps,2);
    vert = grand(1,steps,'uin',1,sides);
    for j = 2:steps
        point(j,:) = scale*point(j-1,:) + b*[x(vert(j)),y(vert(j))];

Regardless of the choice of initial point p_0, the set of cluster points of the sequence (p_n) is exactly the invariant set K, namely the unique nonempty compact set such that K=\bigcup_{j=1}^r f_j(K). This is proved, for example, in the book Integral, Probability, and Fractal Measures by Gerald Edgar.

The scaling factor s=1/3 for the carpet is chosen so that the images of the original square under the eight similarities touch, but do not overlap. With a smaller factor the fractal looks like dust (a totally disconnected set), while with s\ge 1/2 it becomes a solid square. The intermediate range 1/3<s<1/2 is tricky: I think that K has measure zero, but can’t even prove that it’s nowhere dense.

It’s also possible to draw K in the opposite way, by removing points rather than adding them. To this end, let P be the convex hull of the set \{v_1,\dots,v_r\}; that is, a solid convex polygon. It’s not hard to see that K\subset P. Therefore, \bigcup_{j=1}^r f_j(K)\subset \bigcup_{j=1}^r f_j(P), but since the set on the left is K itself, we get K\subset \bigcup_{j=1}^r f_j(P). By induction, K=\bigcap_{n=1}^{\infty} P_n where P_0=P and P_{n+1}=\bigcup_{j=1}^r f_j(P_n).

Fat Sierpinski gasket: s=3/5 instead of 1/2

The above example is ifs(3,3/5,11), calling the Scilab code below.

function ifs(sides,scale,steps)
    b=1-scale; t=2*%pi*(1:sides)/sides; x=cos(t); y=sin(t);
    xpols=x'; ypols=y';
    for j=2:steps
        xpols=scale*xpols; ypols=scale*ypols;
        xpolsnew=[]; ypolsnew=[];
        for k=1:sides
            xpolsnew=[xpolsnew xpols+b*x(k)*ones(xpols)];
            ypolsnew=[ypolsnew ypols+b*y(k)*ones(ypols)];
        xpols=xpolsnew; ypols=ypolsnew;
    a=gca(); a.data_bounds=[-1,-1;1,1];

The final example is an “upper bound” for the fat pentagonal fractal that Colin created with the Chaos Game: the points v_1,\dots,v_5 are the vertices of regular pentagon, and s=1/2. The function was called as ifs(5,1/2,8). Again, I think that the invariant set has measure zero, but can’t even prove that the interior is empty. (Or find a reference where this is already done.)

Pentagonal set with s=1/2
Pentagonal set with s=1/2

For the sake of completeness

Let’s prove the completeness of \ell^p. The argument consists of two steps.

Claim 1. Suppose X is a normed space in which every absolutely convergent series converges; that is, \sum_{n=1}^{\infty} x_n converges whenever x_n\in X are such that \sum_{n=1}^{\infty} \|x_n\| converges. Then the space is complete.

Proof. Take a Cauchy sequence \{y_n\}\subset X. For j=1,2,\dots find an integer n_j such that \|y_n-y_m\|<2^{-j} as long as n,m\ge n_j. (This is possible because the sequence is Cauchy.) Also let n_0=1 and consider the series \sum_{j=1}^{\infty} (y_{n_{j}}-y_{n_{j-1}}). By the hypothesis this series converges. Its partial sums simplify (telescope) to y_{n_j}-y_1. Hence the subsequence \{y_{n_j}\} has a limit. It remains to apply a general theorem about metric spaces: if a Cauchy sequence has a convergent subsequence, then the entire sequence converges. This proves Claim 1.

Claim 2. Every absolutely convergent series in \ell^p converges.

Proof. The elements of \ell^p are functions from \mathbb N to \mathbb C, so let’s write them as such: f_j\colon \mathbb N\to \mathbb C. (This avoids confusion of indices.) Suppose the series \sum_{j=1}^{\infty} \|f_j\| converges. Then for any n the series \sum_{j=1}^{\infty} |f_j(n)| also converges, by Comparison Test. Hence \sum_{j=1}^{\infty} f_j(n) converges (absolutely convergent implies convergent for series of real or complex numbers). Let f(n) = \sum_{j=1}^{\infty} f_j(n). So far the convergence is only pointwise, so we are not done. We still have to show that the series converges in \ell^p, that is, its tails have small \ell^2 norm: \sum_{n=1}^\infty |\sum_{j=k}^{\infty} f_j(n)|^p \to 0 as k\to\infty.

What we need now is a dominating function, so that we can apply the Dominated Convergence Theorem. Namely, we need a function g\colon \mathbb N\to [0,\infty) such that
(1) \sum_{n=1}^{\infty} g(n)<\infty, and
(2) |\sum_{j=k}^{\infty} f_j(n)|^p \le g(n) for all k,n.

Set g=(\sum_{j=1}^{\infty} |f_j|)^p. Then (2) follows from the triangle inequality. Also, g is the increasing limit of functions g_k =(\sum_{j=1}^k |f_j|)^p, for which we have
\sum_n g_k(n) \le (\sum_{j=1}^k \|f_j\|)^p \le (\sum_{j=1}^{\infty} \|f_j\|)^p<\infty
using the triangle inequality in \ell^p. Therefore, \sum_n g(n)<\infty by the Monotone Convergence Theorem.

Almost norming functionals, Part 2

Let E be a real Banach space with the dual E^*. Fix \delta\in (0,1) and call a linear functional e^*\in E^* almost norming for e if |e|=|e^*|=1 and e^*(e)\ge \delta. In Part 1 I showed that in any Banach space there exists a continuous selection of almost norming functionals. Here I will prove that there is no uniformly continuous selection in \ell_1.

Claim. Let S be the unit sphere in \ell_1^n, the n-dimensional \ell_1-space.  Suppose that f\colon S\to \ell_{\infty}^n is a map such that f(e) is almost norming e in the above sense. Then the modulus of continuity \omega_f satisfies \omega_f(2/n)\ge 2\delta.

(If an uniformly continuous selection was available in \ell_1, it would yield selections in \ell_1^n with a modulus of continuity independent of n.)

Proof. Write f=(f_1,\dots,f_n). For any \epsilon\in \{-1,1\}^n we have n^{-1}\epsilon \in S, hence

\sum\limits_{i=1}^n \epsilon_i f_i(n^{-1}\epsilon)\ge n\delta for all \epsilon\in \{-1,1\}^n. Sum over all \epsilon and change the order of summation:

\sum\limits_{i=1}^n \sum\limits_{\epsilon}\epsilon_i f_i(n^{-1}\epsilon)\ge n2^n\delta

There exists i\in\{1,2,\dots,n\} such that

\sum\limits_{\epsilon}\epsilon_i f_i(n^{-1}\epsilon) \ge 2^n \delta

Fix this i from now on. Define \tilde \epsilon to be the same \pm vector as \epsilon, but with the ith component flipped. Rewrite the previous sum as

\sum\limits_{\epsilon} -\epsilon_i f_i(n^{-1}\tilde \epsilon)\ge 2^n\delta

and add them together:

\sum\limits_{\epsilon}\epsilon_i [f_i(n^{-1}\epsilon)-f_i(n^{-1}\tilde \epsilon)]\ge 2^{n+1}\delta

Since \|n^{-1}\epsilon-n^{-1}\tilde \epsilon\|=2/n, it follows that 2^n \omega_f(2/n) \ge 2^{n+1}\delta, as claimed.

A relation between polynomials

This is a brief foray into algebra from a 2006 REU project at Texas A&M.

Given two polynomials P,Q \in \mathbb C[z_1,\dots,z_n], we write Q\preccurlyeq P if there is a differential operator T\in \mathbb C[\frac{\partial}{\partial z_1},\dots, \frac{\partial}{\partial z_n}] such that Q=T P.

The relation \preccurlyeq  is reflexive and transitive, but is not antisymmetric. If both Q\preccurlyeq P and Q\preccurlyeq P hold, we say that P and Q are \partial-equivalent, denoted P\thicksim Q.

A polynomial is \partial -homogeneous if it is \partial -equivalent to a homogeneous polynomial. Obviously, any polynomial in one variable has this property. Polynomials in more than one variable usually do not have it.

The interesting thing about \partial -homogeneous polynomials is that they are refinable, meaning that one has a nontrivial identity of the form P(z)=\sum_{j\in\mathbb Z^n} c_{j} P(\lambda z-j) where c_{j}\in \mathbb C, j\in \mathbb Z^n, and only finitely many of the coefficients c_j are nonzero. The value of \lambda does not matter as long as |\lambda|\ne 0,1. Conversely, every \lambda -refinable polynomial is \partial -homogeneous.

Controlled bilipschitz extension

A map f\colon X\to Y is L-bilipschitz if L^{-1} |a-b| \le |f(a)-f(b)| \le L |a-b| for all a,b\in X. This definition makes sense if X and Y are general metric spaces, but let’s suppose they are subsets on the plane \mathbb R^2.

Definition 1. A set A\subset \mathbb R^2 has the BL extension property if any bilipschitz map f\colon A\to\mathbb R^2 can be extended to a bilipschitz map F\colon \mathbb R^2\to\mathbb R^2. (Extension means that F is required to agree with f on A.)

Lines and circles have the BL extension property. This was proved in early 1980s independently by Tukia, Jerison and Kenig, and Latfullin.

Definition 2. A set A\subset \mathbb R^2 has the controlled BL extension property if there exists a constant C such that any L-bilipschitz map f\colon A\to\mathbb R^2 can be extended to a C L-bilipschitz map F\colon \mathbb R^2\to\mathbb R^2.

Clearly, Definition 2 asks for more than Definition 1. I can prove that a line has the controlled BL extension property, even with a modest constant such as C=2000. (Incidentally, one cannot take C=1.) I still can’t prove the controlled BL extension property for a circle.

Update: extension from line is done in this paper and from the circle in this one.