# Entropic uncertainty

Having considered the SMBC version of the Fourier transform, it is time to take a look at the traditional one: $\displaystyle \widehat{f}(\xi)=\int_{{\mathbb R}}f(x)e^{-2\pi i \xi x}\,dx$

(I am not going to worry about the convergence of any integrals in this post.) It is obvious that for any ${\xi\in{\mathbb R}}$ $\displaystyle |\widehat{f}(\xi)|\le \int_{{\mathbb R}}|f(x)|\,dx$

which can be tersely stated as ${\|\widehat{f}\|_\infty\le \|f\|_1}$ using the ${L^p}$ norm notation. A less obvious, but more important, relation is ${\|\widehat{f}\|_2= \|f\|_2}$. Interpolating between ${p=1}$ and ${p=2}$ we obtain the Hausdorff-Young inequality ${\|\widehat{f}\|_q\le \|f\|_p}$ for ${1\le p\le 2}$. Here and in what follows ${q=p/(p-1)}$.

Summarizing the above, the function ${r(p)=\|\widehat{f}\|_q/\|f\|_p}$ does not exceed ${1}$ on the interval ${[1,2]}$ and attains the value ${1}$ at ${p=2}$. This brings back the memories of Calculus I and the fun we had finding the absolute maximum of a function on a closed interval. More specifically, it brings the realization that ${r'(2)\ge 0}$. (I do not worry about differentiability either.)

What does the inequality ${r'(2)\ge 0}$ tell us about ${f}$? When writing it out, it is better to work with ${(\log r)' = r'/r}$, avoiding another memory of Calculus I: the Quotient Rule. $\displaystyle \log r = \frac{1}{q} \int_{{\mathbb R}} |\widehat{f}(\xi )|^q \,d\xi -\frac{1}{p} \int_{{\mathbb R}} |f(x)|^{p} \,dx$

To differentiate this, we have to recall ${(a^p)'=a^p \log a}$, but nothing more unpleasant happens: $\displaystyle (\log r)'(2) = - \frac12 \int_{{\mathbb R}} |\widehat{f}(\xi )|^2 \,\log |\widehat{f}(\xi)|\,d\xi - \frac12 \int_{{\mathbb R}} |f(x)|^2 \,\log |f(x)|\,dx$

Here the integral with ${\widehat{f}}$ gets the minus sign from the chain rule: ${(p/(p-1))'=-1}$ at ${p=2}$. In terms of the Shannon entropy ${H(\phi)=-\int |\phi|\log |\phi| }$, the inequality ${(\log r)'(2)\ge 0}$ becomes simply $\displaystyle H(|f|^2)+H(|\widehat{f}|^2)\ge 0 \ \ \ \ \ (1)$

Inequality (1) was proved by I. Hirschman in 1957, and I followed his proof above. The left side of (1) is known as the entropic uncertainty (or Hirschman uncertainty) of ${f}$. As Hirschman himself conjectured, (1) is not sharp: it can be improved to $\displaystyle H(|f|^2)+H(|\widehat{f}|^2)\ge 1-\log 2 \ \ \ \ \ (2)$

The reason is that the Hausdorff-Young inequality ${r(p)\le 1}$ is itself not sharp for ${1. It took about twenty years until W. Beckner proved the sharp form of the Hausdorff-Young inequality in his Ph.D. thesis (1975): $\displaystyle r(p) \le \sqrt{p^{1/p}/q^{1/q}} \ \ \ \ \ (3)$

Here is the plot of the upper bound in (3):

Since the graph of ${r}$ stays below this curve and touches it at ${(2,1)}$, the derivative ${r'(2)}$ is no less than the slope of the curve at ${p=2}$, which is ${(1-\log 2)/4}$. Recalling that ${H(|f|^2)+H(|\widehat{f}|^2)=4r'(2)}$, we arrive at (2).

The best known form of the uncertainty principle is due to H. Weyl: $\displaystyle \|\,x f\,\|_2 \cdot \|\,\xi \widehat{f}\,\|_2 \ge \frac{1}{4\pi}\|f\|_2^2 \ \ \ \ \ (4)$

Although (4) can be derived from (2), this route is rather inefficient: Beckner’s theorem is hard, while a direct proof of (4) takes only a few lines: integration by parts ${\int |f(x)|^2\,dx = -\int x \frac{d}{dx}|f(x)|^2\,dx }$, chain rule and the Cauchy-Schwarz inequality.

But we can take another direction and use (1) (not the hard, sharp form (2)) to obtain the following inequality, also due to Hirschman: for every ${\alpha>0}$ there is ${C_\alpha>0}$ such that $\displaystyle \|\,|x|^\alpha f\,\|_2 \cdot \|\,|\xi|^\alpha \widehat{f}\,\|_2 \ge C_\alpha \|f\|_2^2 \ \ \ \ \ (5)$

It is convenient to normalize ${f}$ so that ${\|f\|_2=1}$. This makes ${\rho =|f|^2}$ a probability distribution (and ${|\widehat f |^2 }$ as well). Our goal is to show that for any probability distribution ${\rho}$ $\displaystyle H(\rho) \le \alpha^{-1} \log \int |x|^\alpha \rho(x)\,dx + B_\alpha \ \ \ \ \ (6)$

where ${B_\alpha}$ depends only on ${\alpha}$. Clearly, (1) and (6) imply (5).

A peculiar feature of (6) is that ${x}$ appears in the integral on the right, but not on the left. This naturally makes one wonder how (6) behaves under scaling ${\rho_\lambda(x)=\lambda \rho (\lambda x)}$. Well, wonder no more— $\displaystyle H(\rho_\lambda)= H(\rho) - \log \lambda \int \rho = H(\rho)- \log\lambda$

and $\displaystyle \int |x|^\alpha \rho_\lambda (x)\,dx =\lambda^{-\alpha} \int |x|^\alpha \rho (x)\,dx$

Thus, both sides of (6) change by ${-\log \lambda}$. The inequality passed the scaling test, and now we turn scaling to our advantage by making ${\int |x|^\alpha \rho(x)\,dx =1}$. This reduces (6) to ${H(\rho)\le B_\alpha}$.

Now comes a clever trick (due to Beckner): introduce another probability measure ${d\gamma = A_\alpha \exp(-|x|^\alpha/\alpha)\,dx}$ where ${A_\alpha}$ is a normalizing factor. Let ${\phi(x) = A_\alpha^{-1}\exp( |x|^\alpha/\alpha)\,\rho (x)}$, so that ${\int \phi\,d\gamma=1}$. By Jensen’s inequality, $\displaystyle \int \phi \log \phi \,d\gamma \ge \int \phi \,d\gamma \cdot \log \int \phi \,d\gamma =0$

On the other hand, $\displaystyle \int \phi \log \phi \,d\gamma = \int \rho \log \phi \,dx = -\log A_\alpha+\alpha^{-1} -H(\rho)$

and we have desired bound ${H(\rho)\le B_\alpha}$.

Biographical note

Although Wikipedia refers to Isidore Isaac Hirschman, Jr. as a living person, he died in 1990. From the MAA Digital Library:

Halmos photographed analyst Isidore Hirschman (1922-1990) in June of 1960. Hirschman earned his Ph.D. in 1947 from Harvard with the dissertation “Some Representation and Inversion Problems for the Laplace Transform,” written under David Widder. After writing ten papers together, Hirschman and Widder published the book The Convolution Transform in 1955 (Princeton University Press; now available from Dover Publications). Hirschman spent most of his career (1949-1978) at Washington University in St. Louis, Missouri, where he published mainly in harmonic analysis and operator theory.