Derivations and the curvature tensor

Let {M} be a Riemannian manifold with Riemannian connection {\nabla}. A connection is a thing that knows how to differentiate a vector field {Y} in the direction of a vector field {X}; the result is denoted by {\nabla_X Y} and is also a vector field. For consistency of notation, it is convenient to write {\nabla_X f } for the derivative of scalar function {f} in the direction {X}, even though this derivative does not need a connection: vector fields are born with the ability to differentiate functions.

The pairs {(f,Y)}, with {f} a scalar function and {Y} a vector field, form a funky nonassociative algebra {\mathcal{A}} described in the previous post. And {\nabla_X} is a derivation on this algebra, because

  • {\nabla_X(fY) = f\nabla_X Y + (\nabla_X f)Y } by the definition of a connection
  • {\nabla_X\langle Y, Z\rangle = \langle \nabla_X Y, Z\rangle + \langle Y, \nabla_X Z\rangle} by the metric property of the Riemannian connection.

Recall that the commutator of two derivations is a derivation. Or just check this again: if {{}'} and {{}^\dag} are derivations, then

{ {(ab)'}^\dag = (a'b+ab')^\dag = {a'}^\dag b+a'b^\dag +a^\dag b'+a{b'}^\dag }
{  {(ab)^\dag}' = (a^\dag b+ab^\dag)' = {a^\dag }' b+a^\dag b' +a' b^\dag +a{b^\dag}' }

and the difference {{(ab)'}^\dag-{(ab)^\dag}'} simplifies to what it should be.

Thus, for any pair {X,Y} of vector fields the commutator {\nabla_X \nabla_Y-\nabla_Y\nabla_X} is a derivation on {\mathcal{A}}. The torsion-free property of the connection tells us how it works on functions:

\displaystyle  (\nabla_X \nabla_Y-\nabla_Y\nabla_X) f =   \nabla_{[X,Y]}f=\nabla_{\nabla_XY}f -\nabla_{\nabla_YX}f

Subtracting {\nabla_{[X,Y]}f} from the commutator, we get a derivation that kills scalar functions,

\displaystyle  R(X,Y) = \nabla_X \nabla_Y-\nabla_Y\nabla_X - \nabla_{[X,Y]}

But a derivation that kills scalar functions is linear over functions:

\displaystyle R(X,Y)(fZ) = R(X,Y)(f)\, Z + f\,R(X,Y)Z = f\,R(X,Y)Z

In plain terms, {R(X,Y)} processes any given vector field {Z} pointwise, applying some linear operator {L_p} to the vector {Z_p} at every point {p} of the manifold. No derivatives of {Z} are actually taken, either of first or of second order.

Moreover, the derivation property immediately implies that {R(X,Y)} is a skew-symmetric operator: for any vector fields {Z,W}

\displaystyle  \langle R(X,Y)Z,W\rangle + \langle R(X,Y)W,Z\rangle  = R(X,Y)\langle Z,W\rangle =0

because {R(X,Y)} kills scalar functions.

The other kind of skew-symmetry was evident from the beginning: {R(X,Y)=-R(Y,X)} by definition.

What is not yet evident is that {R(X,Y)} is also a tensor in {X} and {Y}, that is, it does not differentiate the direction fields themselves. To prove this, write {R(X,Y)=\nabla_{X,Y}^2-\nabla_{Y,X}^2} where \displaystyle  \nabla_{X,Y}^2 = \nabla_X \nabla_Y - \nabla_{\nabla_X Y}  should be thought of as the pointwise second-order derivative in the directions {X,Y} (i.e., the result of plugging two direction vectors into the Hessian matrix). By symmetry, it suffices to show that {\nabla_{X,Y}^2} is a tensor in {X} and {Y}. For {X}, this is clear from the definition of connection. Concerning {Y}, we have

{ \nabla_{X,fY}^2 = \nabla_X (f\nabla_{Y}) - \nabla_{(\nabla_X f) Y+f\nabla_X Y} }
{= f \nabla_X \nabla_{Y} + (\nabla_X f )\nabla_{Y} - (\nabla_X f) \nabla_{ Y} - f \nabla_{\nabla_X Y}  }
{= f \nabla_{X,Y}^2  }

That’s it, we have a tensor that takes three vector fields {X,Y,Z} and produces another one, denoted {R(X,Y)Z}. Now I wonder if there is a way to use the language of derivations to give a slick proof of the first Bianchi identity, {R(X,Y)Z+R(Y,Z)X+R(Z,X)Y=0}

To avoid having two picture-less posts in a row, here is something completely unrelated:

New design of the Command key?

New design of the Command key?

This is the image of the unit circle {|z|=1} under the polynomial {z^3-\sqrt{3}\,\bar z}. Which area is larger: red or green? Answer hidden below.

They are equal.

Posted in Uncategorized | Tagged , , , , | 2 Comments

Derivations

A map {D} is a derivation if it satisfies the Leibniz rule: {D(ab)=D(a)b+aD(b)}. To make sense out of this, we need to be able to

  • multiply arguments of {D} together
  • multiply values of {D} by arguments of {D}
  • add the results

For example, if {D\colon R\to M} where {R} is a ring and {M} is a two-sided module over {R}, then all of the above makes sense. In practice it often happens that {M=R}. In this case, the commutator (Lie bracket) of two derivations {D_1,D_2} is defined as {[D_1,D_2]=D_1\circ D_2-D_2\circ D_1} and turns out to be a derivation as well. If {R} is also an algebra over a field {K}, then {K}-linearity of {D} can be added to the requirements of being a derivation, but I am not really concerned about that.

What I am concerned about is that two of my favorite instances of the Leibniz rule are not explicitly covered by the ring-to-module derivations. Namely, for smooth functions {\varphi\colon{\mathbb R}\rightarrow{\mathbb R}}, {F\colon{\mathbb R}\rightarrow{\mathbb R}^n} and {G\colon{\mathbb R}\rightarrow{\mathbb R}^n} we have

\displaystyle     (\varphi F)' = \varphi' F + \varphi F' \quad \text{and} \quad (F\cdot G)' = F'\cdot G+F\cdot G'    \ \ \ \ \ (1)

Of course, {{\mathbb R}^n} could be any {{\mathbb R}}-vector space {V} with an inner product.

It seems that the most economical way to fit (1) into the algebraic concept of derivation is to equip the vector space {{\mathbb R}\oplus V} with the product

\displaystyle   (\alpha,u)(\beta,v)= (\alpha\beta+u\cdot v, \alpha v+\beta u)  \ \ \ \ \ (2)

making it a commutative algebra over {{\mathbb R}}. Something tells me to put {-u\cdot v} there, but I resist. Actually, I should have said “commutative nonassociative algebra”:

\displaystyle    \{(\alpha,u)(\beta,v)\}(\gamma,w) = (\alpha\beta+u\cdot v, \alpha v+\beta u) (\gamma,w) \\ \\    = (\alpha\beta\gamma+\gamma u\cdot v+ \alpha v\cdot w+\beta u\cdot w,    \alpha\beta w + \gamma \alpha v+ \gamma \beta u +(u\cdot v) w)

Everything looks nice, except for the last term {(u\cdot v) w}, which destroys associativity.

Now we can consider maps {{\mathbb R}\rightarrow {\mathbb R}\oplus V}, which are formal pairs of scalar functions and vector-valued functions. The derivative acts component-wise {(\varphi,F)'=(\varphi',F')} and according to (1), it is indeed a derivation:

\displaystyle     \left\{(\varphi,F)(\psi,G)\right\}'= (\varphi,F)'(\psi,G)+(\varphi,F)(\psi,G)'   \ \ \ \ \ (3)

Both parts of (1) are included in (3) as special cases {(\varphi,0)(0,F)} and {(0,F)(0,G)}.

If (2) has a name, I do not know it. Clifford algebras do a similar thing and are associative, but they are also larger. If I just want to say that (1) is a particular instance of a derivation on an algebra, (2) looks like the right algebra structure to use (maybe with {-u\cdot v} if you insist). If {V} has no inner product, the identity {(\varphi F)' = \varphi' F + \varphi F'} can still be expressed via (2) using the trivial inner product {u\cdot v=0}.

Posted in Uncategorized | Tagged , | Leave a comment

Fourth order obstacle problem

Having solved the obstacle problem for a string, let us turn to a more difficult one, in which an elastic string is replaced with an elastic rod (or plate, if we are looking at a cross-section). Elastic rods resist bending much in the same way that strings don’t. This can be modeled by minimizing the bending energy

\displaystyle B(u) = \frac12 \int_{-2}^2 u''(x)^2 \,dx

subject to the boundary conditions {u(-2)=0=u(2)}, {u'(-2)=0=u'(2)}, and the same obstacle as before: {u(x)\le -\sqrt{1-x^2}}. The boundary conditions for {u'} mean that the rod is clamped on both ends.

As before, the obstacle permits one-sided variations {u+\varphi} with { \varphi\le 0} smooth and compactly supported. The linear term of {B(u+\varphi)} is {\int u''\varphi''}, which after double integration by parts becomes {\int u^{(4)} \varphi}. Since the minimizer satisfies {E(u+\varphi)-E(u)\ge 0}, the conclusion is {\int u^{(4)} \varphi \ge 0 } whenever {\varphi\le 0}. Therefore, {u^{(4)}\le 0} everywhere, at least in the sense of distributions. In the parts where the rod does not touch the obstacle, we can do variation of either sign and obtain {u^{(4)}=0}; that is, {u} is a cubic polynomial there.

So far everything looks similar to the previous post. But the fourth derivative of the obstacle function {-\sqrt{1-x^2}} is {3(4x^2+1)/(1-x^2)^{7/2}}, which is positive. Since the minimizer {u} must satisfy {u^{(4)}\le 0}, it cannot assume the shape of the obstacle. The contact can happen only at isolated points.

Therefore, {u} is a cubic spline with knots at the contact points and at {\pm 2}. The distributional derivative {u^{(4)}} consists of negative point masses placed at the contact points. Integrating twice, we find that {u''} is a piecewise affine concave function; in particular it is continuous. The minimizer will be {C^2}-smooth in {(-2,2)}.

How many contact points are there? If only one, then by symmetry it must be at {x=0}, and the only three-knot cubic spline that satisfies the boundary conditions and passes through {(0,-1)} with zero derivative is {(-1/4)(1+|x|)(2-|x|)^2}. But it does not stay below the obstacle:

Three-knot spline fails the obstacle condition

Three-knot spline fails the obstacle condition

With a smaller circle, or a longer bar, the one-contact (three-knot) spline would work. For example, on {[-3,3]}:

With a longer bar, a three-knot spline solves the problem

With a longer bar, a three-knot spline solves the problem

But with our parameters we look for two contact points. By symmetry, the middle piece of the spline must be of the form {q(x)=cx^2+d}. The other two will be {p(x)=(ax+b)(2-x)^2} and {p(-x)}, also by symmetry and to satisfy the boundary conditions at {\pm 2}. At the positive knot {x_0} the following must hold:

\displaystyle p(x_0)=q(x_0)=-\sqrt{1-x_0^2}, \quad p'(x_0)=q'(x_0)=\frac{-x_0}{\sqrt{1-x_0^2}}, \quad p''(x_0)=q''(x_0)

where the last condition comes from the fact that {u''} is concave and therefore continuous. With five equations and five unknowns, Maple finds solutions in closed form. One of them has {x_0=0}, {a=b=-1/4} as above, and is not what we want. The other has {x_0=(\sqrt{10}-2)/3\approx 0.3874} and coefficients such as {a=-\frac{1}{3645}\sqrt{505875+164940\sqrt{10}}}. Ugly, but it works:

Minimizer

Minimizer

This time, the bar does stay below the obstacle, touching it only at two points. The amount by which it comes off the obstacle in the middle is very small. Here is the difference {u(x)-(-\sqrt{1-x^2})}:

Deviation from the obstacle

Deviation from the obstacle

And this is the second derivative {u''}.

Second derivative is Lipschitz continuous

Second derivative is Lipschitz continuous

Again, the minimizer has a higher degree of regularity (Lipschitz continuous second derivative) than a generic element of the function space in which minimization takes place (square-integrable second derivative).

If the rod is made shorter (and the obstacle stays the same), the two-contact nature of the solution becomes more pronounced.

Shorter rod

Shorter rod

Assuming the rod stays in one piece, of course.

Posted in Seminar | Tagged , , | Leave a comment

Second order obstacle problem

Imagine a circular (or cylindrical, in cross-section) object being supported by an elastic string. Like this:

Obstacle problem

Obstacle problem

To actually compute the equilibrium mass-string configuration, I would have to take some values for the mass of the object and for the resistance of the string. Instead, I simply chose the position of the object: it is the unit circle with center at {(0,0)}. It remains to find the equilibrium shape of the string. The shape is described by equation {y=u(x)} where {u} minimizes the appropriate energy functional subject to boundary conditions {u(-2)=0=u(2)} and the obstacle {u(x)\le -\sqrt{1-x^2}}. The functional could be the length

\displaystyle  L(u) = \int_{-2}^2 \sqrt{1+u'(x)^2}\,dx

or its quadratization

\displaystyle E(u) = \frac12 \int_{-2}^2 u'(x)^2 \,dx

The second one is nicer because it yields linear Euler-Lagrange equation/inequality. Indeed, the obstacle permits one-sided variations {u+\varphi} with { \varphi\le 0} smooth and compactly supported. The linear term of {E(u+\varphi)} is {\int u'\varphi'}, which after integration by parts becomes {-\int u'' \varphi}. Since the minimizer satisfies {E(u+\varphi)-E(u)\ge 0}, the conclusion is {\int u'' \varphi \le 0 } whenever {\varphi\le 0}. Therefore, {u''\ge 0} everywhere (at least in the sense of distributions), which means {u} is a convex function. In the parts where the string is free, we can do variation of either sign and obtain {u''=0}; that is, {u} is an affine function there.

The convexity of {u} in the part where it touches the obstacle is consistent with the shape of the obstacle: the string can assume the same shape as the obstacle.

The function {u} can now be determined geometrically: the only way the function can come off the circle, stay convex, and meet the boundary condition is by leaving the circle along the tangents that pass through the endpoint {(\pm 2,0)}. This is the function pictured above. Its derivative is continuous: Lipschitz continuous, to be precise.

First derivative is Lipschitz continuous

First derivative is Lipschitz continuous

The second derivative does not exist at the transition points. Still, the minimizer has a higher degree of regularity (Lipschitz continuous derivative) than a generic element of the function space in which minimization takes place (square-integrable derivative).

As a bonus, the minimizer of energy {E} turns out to minimize the length {L} as well.

All in all, this was an easy problem. Next post will be on its fourth-order version.

Posted in Seminar | Tagged , , | Leave a comment

Biased and unbiased mollification

When we want to smoothen (mollify) a given function {f:{\mathbb R}\rightarrow{\mathbb R}}, the standard recipe suggests: take the {C^{\infty}}-smooth bump function

\displaystyle    \varphi(t) = \begin{cases} c\, \exp\{1/(1-t^2)\}\quad & |t|<1; \\   0 \quad & |t|\ge 1 \end{cases}

where {c} is chosen so that {\int_{{\mathbb R}} \varphi=1} (for the record, {c\approx 2.2523}).

Standard bump

Standard bump

Make the bump narrow and tall: {\varphi_{\delta}(t)=\delta^{-1}\varphi(t/\delta)}. Then define {f_\delta = f*\varphi_\delta}, that is

\displaystyle    f_\delta(x) = \int_{\mathbb R} f(x-t) \varphi_\delta(t)\,dt = \int_{\mathbb R} f(t) \varphi_\delta(x-t)\,dt

The second form of the integral makes it clear that {f_\delta} is infinitely differentiable. And it is easy to prove that for any continuous {f} the approximation {f_\delta} converges to { f} uniformly on compact subsets of {{\mathbb R}}.

The choice of the particular mollifier given above is quite natural: we want a {C^\infty} function with compact support (to avoid any issues with fast-growing functions {f}), so it cannot be analytic. And functions like {\exp(-1/t)} are standard examples of infinitely smooth non-analytic functions. Being nonnegative is obviously a plus. What else to ask for?

Well, one may ask for a good rate of convergence. If {f} is an ugly function like {f(x)=\sqrt{|x|}}, then we probably should not expect fast convergence. But is could be something like {f(x)=|x|^7}; a function that is already six times differentiable. Will the rate of convergence be commensurate with the head start {f\in C^6} that we are given?

No, it will not. The limiting factor is not the lack of seventh derivative at {x=0}; it is the presence of (nonzero) second derivative at {x\ne 0}. To study this effect in isolation, consider the function {f(x)=x^2}, which has nothing beyond the second derivative. Here it is together with {f_{0.1}}: the red and blue graphs are nearly indistinguishable.

Parabola

Good approximation

But upon closer inspection, {f_{0.1}} misses the target by almost {2\cdot 10^{-3}}. And not only around the origin: the difference {f_{0.1}-f} is constant.

Overshoot

But it overshoots the target

With {\delta=0.01} the approximation is better.

Better approximation

Better approximation

But upon closer inspection, {f_{0.01}} misses the target by almost {2\cdot 10^{-5}}.

Still overshoots

Still overshoots

And so it continues, with the error of order {\delta^2}. And here is where it comes from:

\displaystyle f_\delta(0) = \int_{\mathbb R} t^2\varphi_\delta(t)\,dt = \delta^{-1} \int_{\mathbb R} t^2\varphi(t/\delta)\,dt  = \delta^{2} \int_{\mathbb R} s^2\varphi(s)\,ds

The root of the problem is the nonzero second moment {\int_{\mathbb R} s^2\varphi(s)\,ds \approx 0.158}. But of course, this moment cannot be zero if {\varphi} does not change sign. All familiar mollifiers, from Gaussian and Poisson kernels to compactly supported bumps such as {\varphi}, have this limitation. Since they do not reproduce quadratic polynomials exactly, they cannot approximate anything with nonzero second derivative better than to the order {\delta^2}.

Let’s find a mollifier without such limitations; that is, with zero moments of all orders. One way to do it is to use the Fourier transform. Since {\int_{\mathbb R} t^n \varphi(t)\,dt } is a multiple of {\widehat{\varphi}^{(n)}(0)}, it suffices to find a nice function {\psi} such that {\psi(0)=1} and {\psi^{(n)}(0)=0} for {n\ge 1}; the mollifier will be the inverse Fourier transform of {\psi}.

As an example, I took something similar to the original {\varphi}, but with a flat top:

\displaystyle  \psi(\xi) = \begin{cases} 1 \quad & |\xi|\le 0.1; \\    \exp\{1-1/(1-(|\xi|-0.01)^2)\} \quad & 0.1<|\xi|<1.1\\  0\quad & |\xi|\ge 1.1  \end{cases}

Fourier transform of unbiased mollifier

Fourier transform of unbiased mollifier

The inverse Fourier transform of {\psi} is a mollifier that reproduces all polynomials exactly: {p*\varphi = p} for any polynomial. Here it is:

Unbiased mollifier

Unbiased mollifier

Since I did not make {\psi} very carefully (its second derivative is discontinuous at {\pm 0.01}), the mollifier {\varphi} has a moderately heavy tail. With a more careful construction it would decay faster than any power of {t}. However, it cannot be compactly supported. Indeed, if {\varphi} were compactly supported, then {\widehat{\varphi}} would be real-analytic; that is, represented by its power series centered at {\xi=0}. But that power series is

\displaystyle 1+0+0+0+0+0+0+0+0+0+\dots

The idea of using negative weights in the process of averaging {f} looks counterintuitive, but it’s a fact of life. Like the appearance of negative coefficients in the 9-point Newton-Cotes quadrature formula… but that’s another story.

Credit: I got the idea of this post from the following remark by fedja on MathOverflow:

The usual spherical cap mollifiers reproduce constants and linear functions faithfully but have a bias on quadratic polynomials. That’s why you cannot go beyond {C^2} and {\delta^2} with them.

Posted in Uncategorized | Tagged , , , , | 2 Comments

Entropic uncertainty

Having considered the SMBC version of the Fourier transform, it is time to take a look at the traditional one:

\displaystyle \widehat{f}(\xi)=\int_{{\mathbb R}}f(x)e^{-2\pi i \xi x}\,dx

(I am not going to worry about the convergence of any integrals in this post.) It is obvious that for any {\xi\in{\mathbb R}}

\displaystyle |\widehat{f}(\xi)|\le \int_{{\mathbb R}}|f(x)|\,dx

which can be tersely stated as {\|\widehat{f}\|_\infty\le \|f\|_1} using the {L^p} norm notation. A less obvious, but more important, relation is {\|\widehat{f}\|_2= \|f\|_2}. Interpolating between {p=1} and {p=2} we obtain the Hausdorff-Young inequality {\|\widehat{f}\|_q\le \|f\|_p} for {1\le p\le 2}. Here and in what follows {q=p/(p-1)}.

Summarizing the above, the function {r(p)=\|\widehat{f}\|_q/\|f\|_p} does not exceed {1} on the interval {[1,2]} and attains the value {1} at {p=2}. This brings back the memories of Calculus I and the fun we had finding the absolute maximum of a function on a closed interval. More specifically, it brings the realization that {r'(2)\ge 0}. (I do not worry about differentiability either.)

What does the inequality {r'(2)\ge 0} tell us about {f}? When writing it out, it is better to work with {(\log r)' = r'/r}, avoiding another memory of Calculus I: the Quotient Rule.

\displaystyle    \log r = \frac{1}{q} \int_{{\mathbb R}} |\widehat{f}(\xi )|^q \,d\xi   -\frac{1}{p} \int_{{\mathbb R}} |f(x)|^{p} \,dx

To differentiate this, we have to recall {(a^p)'=a^p \log a}, but nothing more unpleasant happens:

\displaystyle    (\log r)'(2) =  - \frac12 \int_{{\mathbb R}} |\widehat{f}(\xi )|^2 \,\log |\widehat{f}(\xi)|\,d\xi - \frac12 \int_{{\mathbb R}} |f(x)|^2 \,\log |f(x)|\,dx

Here the integral with {\widehat{f}} gets the minus sign from the chain rule: {(p/(p-1))'=-1} at {p=2}. In terms of the Shannon entropy {H(\phi)=-\int |\phi|\log |\phi| }, the inequality {(\log r)'(2)\ge 0} becomes simply

\displaystyle    H(|f|^2)+H(|\widehat{f}|^2)\ge 0    \ \ \ \ \ (1)

Inequality (1) was proved by I. Hirschman in 1957, and I followed his proof above. The left side of (1) is known as the entropic uncertainty (or Hirschman uncertainty) of {f}. As Hirschman himself conjectured, (1) is not sharp: it can be improved to

\displaystyle    H(|f|^2)+H(|\widehat{f}|^2)\ge 1-\log 2   \ \ \ \ \ (2)

The reason is that the Hausdorff-Young inequality {r(p)\le 1} is itself not sharp for {1<p<2}. It took about twenty years until W. Beckner proved the sharp form of the Hausdorff-Young inequality in his Ph.D. thesis (1975):

\displaystyle    r(p) \le \sqrt{p^{1/p}/q^{1/q}}   \ \ \ \ \ (3)

Here is the plot of the upper bound in (3):

Sharp constant

Sharp constant

Since the graph of {r} stays below this curve and touches it at {(2,1)}, the derivative {r'(2)} is no less than the slope of the curve at {p=2}, which is {(1-\log 2)/4}. Recalling that {H(|f|^2)+H(|\widehat{f}|^2)=4r'(2)}, we arrive at (2).

The best known form of the uncertainty principle is due to H. Weyl:

\displaystyle    \|\,x f\,\|_2 \cdot \|\,\xi \widehat{f}\,\|_2 \ge \frac{1}{4\pi}\|f\|_2^2   \ \ \ \ \ (4)

Although (4) can be derived from (2), this route is rather inefficient: Beckner’s theorem is hard, while a direct proof of (4) takes only a few lines: integration by parts {\int |f(x)|^2\,dx = -\int x \frac{d}{dx}|f(x)|^2\,dx }, chain rule and the Cauchy-Schwarz inequality.

But we can take another direction and use (1) (not the hard, sharp form (2)) to obtain the following inequality, also due to Hirschman: for every {\alpha>0} there is {C_\alpha>0} such that

\displaystyle    \|\,|x|^\alpha f\,\|_2 \cdot \|\,|\xi|^\alpha \widehat{f}\,\|_2 \ge C_\alpha \|f\|_2^2   \ \ \ \ \ (5)

It is convenient to normalize {f} so that {\|f\|_2=1}. This makes {\rho =|f|^2} a probability distribution (and {|\widehat f |^2 } as well). Our goal is to show that for any probability distribution {\rho}

\displaystyle    H(\rho) \le \alpha^{-1} \log \int |x|^\alpha \rho(x)\,dx + B_\alpha    \ \ \ \ \ (6)

where {B_\alpha} depends only on {\alpha}. Clearly, (1) and (6) imply (5).

A peculiar feature of (6) is that {x} appears in the integral on the right, but not on the left. This naturally makes one wonder how (6) behaves under scaling {\rho_\lambda(x)=\lambda \rho (\lambda x)}. Well, wonder no more—

\displaystyle    H(\rho_\lambda)= H(\rho) - \log \lambda \int \rho = H(\rho)- \log\lambda

and

\displaystyle     \int |x|^\alpha \rho_\lambda (x)\,dx =\lambda^{-\alpha}    \int |x|^\alpha \rho (x)\,dx

Thus, both sides of (6) change by {-\log \lambda}. The inequality passed the scaling test, and now we turn scaling to our advantage by making {\int |x|^\alpha \rho(x)\,dx =1}. This reduces (6) to {H(\rho)\le B_\alpha}.

Now comes a clever trick (due to Beckner): introduce another probability measure {d\gamma = A_\alpha \exp(-|x|^\alpha/\alpha)\,dx} where {A_\alpha} is a normalizing factor. Let {\phi(x) = A_\alpha^{-1}\exp( |x|^\alpha/\alpha)\,\rho (x)}, so that {\int \phi\,d\gamma=1}. By Jensen’s inequality,

\displaystyle    \int \phi \log \phi \,d\gamma \ge \int \phi \,d\gamma \cdot \log \int \phi \,d\gamma =0

On the other hand,

\displaystyle    \int \phi \log \phi \,d\gamma = \int \rho \log \phi \,dx    = -\log A_\alpha+\alpha^{-1} -H(\rho)

and we have desired bound {H(\rho)\le B_\alpha}.

Biographical note

Although Wikipedia refers to Isidore Isaac Hirschman, Jr. as a living person, he died in 1990. From the MAA Digital Library:

Halmos photographed analyst Isidore Hirschman (1922-1990) in June of 1960. Hirschman earned his Ph.D. in 1947 from Harvard with the dissertation “Some Representation and Inversion Problems for the Laplace Transform,” written under David Widder. After writing ten papers together, Hirschman and Widder published the book The Convolution Transform in 1955 (Princeton University Press; now available from Dover Publications). Hirschman spent most of his career (1949-1978) at Washington University in St. Louis, Missouri, where he published mainly in harmonic analysis and operator theory.

Further reading:

  1. Beckner, William. “Inequalities in Fourier analysis.” Ann. of Math.(2) 102.1 (1975): 159-182.
  2. Cowling, Michael G., and John F. Price. “Bandwidth versus time concentration: the Heisenberg-Pauli-Weyl inequality.” SIAM Journal on Mathematical Analysis 15.1 (1984): 151-165.
  3. Folland, Gerald B., and Alladi Sitaram. “The uncertainty principle: a mathematical survey.” Journal of Fourier Analysis and Applications 3.3 (1997): 207-238.
  4. Hirschman Jr, I. I. “A note on entropy.” American Journal of Mathematics 79.1 (1957): 152-156.
Posted in Uncategorized | Tagged , , | Leave a comment

Life after Google Reader, day 2

Having exported the Shared/Starred items from Google Reader, I considered several ways of keeping them around, and eventually decided to put them right here (see “Reading Lists” in the main navigation bar). My reasons were:

  • This site is unlikely to disappear overnight, since I am paying WordPress to host it.
  • Simple, structured HTML is not much harder for computers to parse than JSON, and is easier for humans to use.
  • Someone else might be interested in the stuff that I find interesting.

First step was to trim down shared.json and starred.json, and flatten them into a comma-separated list. For this I used Google Refine, which, coincidentally, is another tool that Google no longer develops. (It is being continued by volunteers as an open-source project OpenRefine). The only attributes I kept were

  • Title
  • Hyperlink
  • Author(s)
  • Feed name
  • Timestamp
  • Summary

Springer feeds gave me more headache than anything else in this process. They bundled the authors, abstract, and other information into a single string, within which they were divided only by presentational markup. I cleaned up some of Springer-published entries, but also deleted many more than I kept. “Sorry, folks – your paper looks interesting, but it’s in Springer.”

Compared to the clean-up, conversion to HTML (in a Google Spreadsheet) took no time at all. I split the list into chunks 2008-09, 2010-11, 2012, and 2013, the latter being updated periodically. Decided not to do anything about LaTeX markup; much of it would not compile anyway.

Last question: how to keep the 2013 list up-to-date, given that Netvibes offers no export feature for marked items? jQuery to the rescue:

Now with a freehand circle!

Now with a freehand circle!

The script parses the contents of Read Later tab, extracts the items mentioned above, and wraps them into the same HTML format as the existing reading list.

$(document).ready(function() {
  function exportSaved() {
  var html='';
  $('.hentry.readlater').each(function() {
    var title = $('.entry-innerTitle', this).html().split('<span')[0];
    var hyperref = $('.entry-innerTitle', this).attr('href');
    var summary = ($('.entry-content', this).html()+'').replace('\n',' ');
    var timestamp = $('.entry-innerPublished', this).attr('time');
    var feed = $('.entry-innerFeedName', this).html();
    var author = ($('.author', this).html()+'').slice(5);
    var date = new Date(timestamp*1000);
    var month = ('0'+(date.getMonth()+1)).slice(-2);
    var day = ('0'+date.getDate()).slice(-2);
    var readableDate = date.getFullYear()+'-'+month+'-'+day;
        html = html+"<div class='list-item'><h4 class='title'><strong><a href='"+hyperref+"'>"+title+"</a></strong></h4><h4 class='author'>"+author+"</h4><h5 class='source'>"+feed+" : "+readableDate+"</h5><div class='summary' style='font-size:80%;line-height:140%;'>"+summary+"</div></div>";  
    });
 console.log(html);
   }
   $('export').insertAfter($('#switch-view')).click(function() { exportSaved(); });
}); 

The script accesses the page via an extension in Google Chrome. It labels HTML elements with classes for the convenience of future parsing, not for styling. I had to use inline styles to keep WordPress happy. Dates are formatted according to ISO 8601, making it possible to quickly jump to any month using in-page search for “yyyy-mm”. Which would not work with mm/dd/yyyy, of course.

Unfortunately, some publishers still insert the authors’ names in the summary. Unlike arXiv, which serves reasonably structured feeds, albeit with unnecessary hyperlinks in the author field. While I am unhappy with such things (and the state of the web in general), my setup works for me, and this concludes my two-post detour from mathematics.

xkcd 743: Infrastructures

Posted in Uncategorized | Tagged | Leave a comment