Polynomial delta function

The function {k_1(x,y) = (3xy+1)/2} has a curious property: for any linear function {\ell}, and any point {y\in \mathbb R}, the integral {\int_{-1}^1 \ell(x)k_1(x,y)\,dx} evaluates to {\ell(y)}. This is easy to check using the fact that odd powers of {x} integrate to zero:

\displaystyle \frac12 \int_{-1}^1 (ax+b)(3xy+1)\,dx = \frac12 \int_{-1}^1 (3ax^2y+b)\,dx = \frac12(2ay+2b) = ay+b

More generally, for any integer {n\ge 0} there exists a unique symmetric polynomial {k_n(x,y)} that has degree {n} in {x} and {y} separately and satisfies {\int_{-1}^1 p(x)k_n(x,y)\,dx = p(y)} for all polynomials {p} of degree at most {n}. For example, {k_0(x,y)=1/2} (obviously) and

\displaystyle k_2(x,y)=\frac98+\frac32xy+\frac{15}{8}(x^2+y^2)+\frac{45}{8}x^2y^2

The formula is not really intuitive, and a 3d plot would not help the matter much. To visualize {k_n}, I plotted {k_n(x,-3/4)}, {k_n(x,0)}, and {k_n(x,1/2)} below (green, red, blue respectively).

Degree 1
Degree 1
Degree 2
Degree 2
Degree 4
Degree 4

For {y\in [-1,1]} and large {n}, the function {k_n(\cdot, y)} approaches the Dirac delta at {y}, although the convergence is slow, especially when {|y|} is close to {1}. I don’t think there is anything good to be said about the case {|y|>1}.

Degree 10
Degree 10
Degree 20
Degree 20

The existence and uniqueness of {k_n} are a consequence of the Riesz representation of linear functionals on an inner product space. Indeed, polynomials of degree at most {n} form such a space {\mathbb P_n} with inner product {\langle p,q\rangle = \int_{-1}^1p(x)q(x)\,dx}, and the functional {p\mapsto p(y)} is linear for any fixed {y\in\mathbb R}. Hence, this functional can be written as {p\mapsto \langle p, k_y\rangle } for some {k_y}. The function {(x,y) \mapsto k_x(y)} is a reproducing kernel for this space. Its symmetry is not immediately obvious.

The Legendre polynomials {P_0,\dots,P_n} are an orthogonal basis of {\mathbb P_n}; more precisely, {\widetilde{P}_j = \sqrt{j+1/2}P_j} form an orthonormal basis. It’s a general fact about reproducing kernels that

\displaystyle k(x,y) = \sum_j \widetilde{P}_j(x)\widetilde{P}_j(y)

(which, incidentally, proves the symmetry {k(y,x)=k(x,y)}). Indeed, taking this sum as the definition of {k} and writing {p = \sum_{j=0}^n c_j \widetilde{P}_j}, we find

\displaystyle \langle p, k(\cdot, y)\rangle = \sum_j \widetilde{P}_j(y) \langle p, \widetilde{P}_j\rangle = \sum_j \widetilde{P}_j(y) c_j = p(y)

This is the Sage code used for the above plots.

n = 20
k = sum([(j+1/2)*legendre_P(j,x)*legendre_P(j,y) for j in range(0,n+1)])
plot(k(x,y=-3/4),(x,-1,1),color='green') + plot(k(x,y=0),(x,-1,1),color='red') +  plot(k(x,y=1/2),(x,-1,1),color='blue')

Higher degrees cause some numerical issues…

Degree 22
Degree 22

Post motivated by Math.SE question

B-splines and probability

If one picks two real numbers {X_1,X_2} from the interval {[0,1]} (independent, uniformly distributed), their sum {S_2=X_1+X_2} has the triangular distribution.

Also known as the hat function
Also known as the hat function

The sum {S_3} of three such numbers has a differentiable probability density function:

Piecewise quadratic, C1 smooth
Piecewise quadratic, C1 smooth

And the density of {S_4=X_1+X_2+X_3+X_4} is smoother still: the p.d.f. has two
continuous derivatives.

This begins to look normal
This begins to look normal

As the number of summands increases, these distributions converge to normal if they are translated and scaled properly. But I am not going to do that. Let’s keep the number of summands to four at most.

The p.d.f. of {S_n} is a piecewise polynomial of degree {n-1}. Indeed, for {S_1=X_1} the density is piecewise constant, and the formula

\displaystyle  S_n(x) = \int_{x-1}^x S_{n-1}(t)\,dt

provides the inductive step.

For each {n}, the translated copies of function {S_n} form a partition of unity:

\displaystyle  \sum_{k\in\mathbb Z}S_n(x-k)\equiv 1

The integral recurrence relation gives an easy proof of this:

\displaystyle  \sum_{k\in\mathbb Z}\int_{x-k-1}^{x-k} S_{n-1}(t)\,dt = \int_{\mathbb R} S_{n-1}(t)\,dt = 1

And here is the picture for the quadratic case:

Piecewise quadratic partition of unity
Piecewise quadratic partition of unity

A partition of unity can be used to approximate functions by piecewise polynomials: just multiply each partition element by the value of the function at the center of the corresponding interval, and add the results.

Doing this with {S_2} amounts to piecewise linear interpolation: the original function {f(x)=e^{-x/2}} is in blue, the weighted sum of hat functions in red.

Using the function exp(-x/2)
PL interpolation of exp(-x/2)

With {S_4} we get a smooth curve.

Approximating exp(-x/2) with a cubic B-spline
Approximating exp(-x/2) with a cubic B-spline

Unlike interpolating splines, this curve does not attempt to pass through the given points exactly. However, it has several advantages over interpolating splines:

  • Is easier to calculate; no linear system to solve;
  • Yields positive function for positive data;
  • Yields monotone function for monotone data

Squarish polynomials

For some reason I wanted to construct polynomials approximating this piecewise constant function {f}:

How to approximate this with polynomials?
So square

Of course approximation cannot be uniform, since the function is not continuous. But it can be achieved in the sense of convergence of graphs in the Hausdorff metric: their limit should be the “graph” shown above, with the vertical line included. In concrete terms, this means for every {\epsilon>0} there is {N} such that for {n\ge N} the polynomial {p_n} satisfies

\displaystyle  |p_n-f|\le \epsilon\quad \text{ on }\ [0,2]\setminus [1-\epsilon,1+\epsilon]

and also

\displaystyle  -\epsilon\le  p_n  \le 1+\epsilon\quad \text{ on }\ [1-\epsilon,1+\epsilon]

How to get such {p_n} explicitly? I started with the functions {f_m(x) = \exp(-x^m)} when {m} is large. The idea is that as {m\rightarrow\infty}, the limit of {\exp(-x^m)} is what is wanted: {1} when {x<1}, {0} when {x>1}. Also, for each {m} there is a Taylor polynomial {T_{m,n}} that approximates {f_m} uniformly on {[0,2]}. Since the Taylor series is alternating, it is not hard to find suitable {n}. Let’s shoot for {\epsilon=0.01} in the Taylor remainder and see where this leads:

  • Degree {7} polynomial for {\exp(-x)}
  • Degree {26} polynomial for {\exp(-x^2)}
  • Degree {69} polynomial for {\exp(-x^3)}
  • Degree {180} polynomial for {\exp(-x^4)}
  • Degree {440} polynomial for {\exp(-x^5)}

The results are unimpressive, though:

Taylor polynomials of exp(-x^m) are not so square
Taylor polynomials of exp(-x^m) are not so square

To get within {0.01} of the desired square-ness, we need {\exp(-1.01^m)<0.01}. This means {m\ge 463}. Then, to have the Taylor remainder bounded by {0.01} at {x=2}, we need {2^{463n}/n! < 0.01}. Instead of messing with Stirling’s formula, just observe that {2^{463n}/n!} does not even begin to decrease until {n} exceeds {2^{463}}, which is more than {10^{139}}. That’s a … high degree polynomial. I would not try to ask a computer algebra system to plot it.

Bernstein polynomials turn out to work better. On the interval {[0,2]} they are given by

\displaystyle    p_n(x) = 2^{-n} \sum_{k=0}^n f(2k/n) \binom{n}{k} x^k (2-x)^{n-k}

To avoid dealing with {f(1)}, it is better to use odd degrees. For comparison, I used the same or smaller degrees as above: {7, 25, 69, 179, 439}.

Squarish Bernstein polynomials
Squarish Bernstein polynomials

Looks good. But I don’t know of a way to estimate the degree of Bernstein polynomial required to obtain Hausdorff distance less than a given {\epsilon} (say, {0.01}) from the square function.

Quadrature rules and quadrature domains

The midpoint rule of numerical integration

\displaystyle \int_a^b f(x)\,dx \approx (b-a)f\left(\frac{a+b}{2}\right)

is approximate in general, but exact for linear functions (polynomials of degree at most one).

Midpoint Rule
Midpoint Rule

With two sample points we can integrate any cubic polynomial exactly. The choice of sample points is not obvious: they are to be placed at distance {\dfrac{1}{2\sqrt{3}}(b-a)} from the midpoint. On the example below, {a=0} and {b=1}, so the sample points are {x_{1,2} = \dfrac12 \pm \dfrac{1}{2\sqrt{3}}}. The integral is equal to {\dfrac12 f(x_1)+\dfrac12 f(x_2)}. One can say that each sample point has weight {1/2} in this rule.

Two-point quadrature: areas of yellow rectangles add up to the integral
Two-point quadrature: areas of yellow rectangles add up to the integral

Three sample points are enough for polynomials of degrees up to and including five. This time, the weights attached to the sample points are not equal. The midpoint is used again, this time with the weight of {4/9}. Two other sample points are at distance {\dfrac{\sqrt{3}}{2\sqrt{5}}(b-a)} from the midpoint, and their weights are {5/18} each. This contraption exactly integrates polynomials of degrees up to five.

Three-point quadrature
Three-point quadrature: rectangles of unequal width

Compare this with Simpson’s rule, which also uses three sample points but is exact only up to degree three.

The above are examples of Gaussian quadrature: for each positive integer {n}, one can integrate polynomials of degree up to {2n-1} by taking {n} samples at the right places, and weighing them appropriately.

Let’s move from the real line to the complex plane. If one accepts that the analog of interval {(a,b)} is a disk in the plane, then quadrature becomes very simple: for any disk {D} and any complex polynomials {p},

\displaystyle   \iint_D p = \pi r^2 p(c)

where {c} is the center of the disk and {r} is its radius. One sample point is enough for all degrees! The proof is easy: rewrite {p} in terms of powers of {(z-c)} and integrate them in polar coordinates. The same works for any holomorphic function, as long as it is integrable in {D}.

Disk: a quadrature domain with one node
Disk: a quadrature domain with one node

But maybe the disk is a unique such shape? Not at all: there are other such quadrature domains. A simple family of examples is Neumann ovals (Neumann as in “boundary condition”). Geometrically, they are ellipses inverted in a concentric circle. Analytically, they are (up to linear transformations) images of the unit disk {\mathbb{D}} under

\displaystyle  \varphi(z)=\frac{z}{1-c^2 z^2}\quad (0\le c<1)

This image, denoted {\Omega } below, looks much like an ellipse when {c} is small:

Neumann oval with c=0.3
Neumann oval with c=0.3

Then it becomes peanut-shaped:

Neumann oval with c=0.6
Neumann oval with c=0.6

For {c\approx 1} it looks much like the union of two disks (but the boundary is smooth, contrary to what the plot suggests):

Neumann oval with c=0.95
Neumann oval with c=0.95

In each of these images, the marked points are the quadrature nodes. Let’s find out what they are and how they work.

Suppose {f} is holomorphic and integrable in {\Omega}. By a change of variables,

\displaystyle  \iint_{\Omega } f = \iint_{\mathbb D} (f\circ \varphi)\, \varphi' \,\overline{\varphi'}

Here { (f\circ \varphi) \varphi'} is holomorphic, but { \overline{\varphi'} } is anti-holomorphic. We want to know what { \overline{\varphi'} } does when integrated against something holomorphic. Power series to the rescue:

\displaystyle    \varphi(z) = z\sum_{n=0}^\infty c^{2n} z^{2n} = \sum_{n=0}^\infty c^{2n} z^{2n+1}


\displaystyle  \overline{\varphi'(z)} = \sum_{n=0}^\infty c^{2n} (2n+1) \bar z^{2n}

Multiply this by {z^{k}} and integrate over {\mathbb D} in polar coordinates: the result is {0} if {k} is odd and

\displaystyle    2\pi c^{k} (k+1) \int_0^1 r^{2k} r\,dr= \pi c^{k}

if {k} is even. So, integration of {\overline{\varphi'(z)}} against a power series {g(z) = \sum a_k z^k} produces the sum of {a_{k} c^{k}} over even powers {k} only (with the factor of {\pi}). The process of dropping odd powers amounts to taking the even part of {g}:

\displaystyle  \iint_{\mathbb D} g\, \overline{\varphi'} = \frac{\pi}{2} ( g(c)+ g(-c))

Yes, there’s something about { \overline{\varphi'} } that’s magic (key words: reproducing kernel, specifically Bergman kernel). Plug {g= (f\circ \varphi)\, \varphi' } to conclude

\displaystyle  \iint_{\Omega } f = \frac{\pi}{2} \Big\{f(\varphi(c))\, \varphi'(c) + f(\varphi(-c) ) \,\varphi'(-c)\Big\}

So, the nodes are {\varphi(\pm c) = \pm c/(1-c^4)}. They have equal weight, because {\varphi'(\pm c) = \dfrac{1+c^4}{(1-c^4)^2}}. Final result:

\displaystyle  \iint_{\Omega } f = \frac{\pi (1+c^4)}{2 (1-c^4)^2}\, \left\{f \left( \frac{c}{1-c^4} \right) + f \left(- \frac{c}{1-c^4} \right) \right\}

Again, this is a two-point quadrature formula that is exact for all complex polynomials.

As a bonus, put {f\equiv 1} to find that the area of {\Omega} is {\dfrac{\pi (1+c^4)}{ (1-c^4)^2}}.

Biased and unbiased mollification

When we want to smoothen (mollify) a given function {f:{\mathbb R}\rightarrow{\mathbb R}}, the standard recipe suggests: take the {C^{\infty}}-smooth bump function

\displaystyle    \varphi(t) = \begin{cases} c\, \exp\{1/(1-t^2)\}\quad & |t|<1; \\   0 \quad & |t|\ge 1 \end{cases}

where {c} is chosen so that {\int_{{\mathbb R}} \varphi=1} (for the record, {c\approx 2.2523}).

Standard bump
Standard bump

Make the bump narrow and tall: {\varphi_{\delta}(t)=\delta^{-1}\varphi(t/\delta)}. Then define {f_\delta = f*\varphi_\delta}, that is

\displaystyle    f_\delta(x) = \int_{\mathbb R} f(x-t) \varphi_\delta(t)\,dt = \int_{\mathbb R} f(t) \varphi_\delta(x-t)\,dt

The second form of the integral makes it clear that {f_\delta} is infinitely differentiable. And it is easy to prove that for any continuous {f} the approximation {f_\delta} converges to { f} uniformly on compact subsets of {{\mathbb R}}.

The choice of the particular mollifier given above is quite natural: we want a {C^\infty} function with compact support (to avoid any issues with fast-growing functions {f}), so it cannot be analytic. And functions like {\exp(-1/t)} are standard examples of infinitely smooth non-analytic functions. Being nonnegative is obviously a plus. What else to ask for?

Well, one may ask for a good rate of convergence. If {f} is an ugly function like {f(x)=\sqrt{|x|}}, then we probably should not expect fast convergence. But is could be something like {f(x)=|x|^7}; a function that is already six times differentiable. Will the rate of convergence be commensurate with the head start {f\in C^6} that we are given?

No, it will not. The limiting factor is not the lack of seventh derivative at {x=0}; it is the presence of (nonzero) second derivative at {x\ne 0}. To study this effect in isolation, consider the function {f(x)=x^2}, which has nothing beyond the second derivative. Here it is together with {f_{0.1}}: the red and blue graphs are nearly indistinguishable.

Good approximation

But upon closer inspection, {f_{0.1}} misses the target by almost {2\cdot 10^{-3}}. And not only around the origin: the difference {f_{0.1}-f} is constant.

But it overshoots the target

With {\delta=0.01} the approximation is better.

Better approximation
Better approximation

But upon closer inspection, {f_{0.01}} misses the target by almost {2\cdot 10^{-5}}.

Still overshoots
Still overshoots

And so it continues, with the error of order {\delta^2}. And here is where it comes from:

\displaystyle f_\delta(0) = \int_{\mathbb R} t^2\varphi_\delta(t)\,dt = \delta^{-1} \int_{\mathbb R} t^2\varphi(t/\delta)\,dt  = \delta^{2} \int_{\mathbb R} s^2\varphi(s)\,ds

The root of the problem is the nonzero second moment {\int_{\mathbb R} s^2\varphi(s)\,ds \approx 0.158}. But of course, this moment cannot be zero if {\varphi} does not change sign. All familiar mollifiers, from Gaussian and Poisson kernels to compactly supported bumps such as {\varphi}, have this limitation. Since they do not reproduce quadratic polynomials exactly, they cannot approximate anything with nonzero second derivative better than to the order {\delta^2}.

Let’s find a mollifier without such limitations; that is, with zero moments of all orders. One way to do it is to use the Fourier transform. Since {\int_{\mathbb R} t^n \varphi(t)\,dt } is a multiple of {\widehat{\varphi}^{(n)}(0)}, it suffices to find a nice function {\psi} such that {\psi(0)=1} and {\psi^{(n)}(0)=0} for {n\ge 1}; the mollifier will be the inverse Fourier transform of {\psi}.

As an example, I took something similar to the original {\varphi}, but with a flat top:

\displaystyle  \psi(\xi) = \begin{cases} 1 \quad & |\xi|\le 0.1; \\    \exp\{1-1/(1-(|\xi|-0.01)^2)\} \quad & 0.1<|\xi|<1.1\\  0\quad & |\xi|\ge 1.1  \end{cases}

Fourier transform of unbiased mollifier
Fourier transform of unbiased mollifier

The inverse Fourier transform of {\psi} is a mollifier that reproduces all polynomials exactly: {p*\varphi = p} for any polynomial. Here it is:

Unbiased mollifier
Unbiased mollifier

Since I did not make {\psi} very carefully (its second derivative is discontinuous at {\pm 0.01}), the mollifier {\varphi} has a moderately heavy tail. With a more careful construction it would decay faster than any power of {t}. However, it cannot be compactly supported. Indeed, if {\varphi} were compactly supported, then {\widehat{\varphi}} would be real-analytic; that is, represented by its power series centered at {\xi=0}. But that power series is

\displaystyle 1+0+0+0+0+0+0+0+0+0+\dots

The idea of using negative weights in the process of averaging {f} looks counterintuitive, but it’s a fact of life. Like the appearance of negative coefficients in the 9-point Newton-Cotes quadrature formula… but that’s another story.

Credit: I got the idea of this post from the following remark by fedja on MathOverflow:

The usual spherical cap mollifiers reproduce constants and linear functions faithfully but have a bias on quadratic polynomials. That’s why you cannot go beyond {C^2} and {\delta^2} with them.

Web-based LaTeX to WordPress (and something about polynomials)

Recently I began using the excellent Python program LaTeX to WordPress (LaTeX2WP) by Luca Trevisan. Since some of my posts are written on computers where Python is not readily available, I put LaTeX2WP on PythonAnywhere as a web2py application. The downside is that settings are stuck at default (e.g., you get blackboard bold letters using \N, \Z, \R, \C, \Q). The upside is that the process simplifies to copy-paste-click-copy-paste.

The application is here: Web-based LaTeX2WP and it looks like this:


Although my modification of the source code is utterly trivial, according to GPL I should put it here. Namely, I merged latex2wp.py and latex2wpstyle.py into one file convert4.py, since web users can’t edit the style file anyway. Also replaced file input/output by a function call, which comes from web2py controller:

def index():
    import convert4
    form=FORM(TEXTAREA(_name='LaTeX', requires=IS_NOT_EMPTY()), BR(), INPUT(_type='submit', _value='Convert to WordPress'))
    produced = ''
    if form.accepts(request,session):
        produced = convert4.mainproc(form.vars.LaTeX)
    form2=FORM(TEXTAREA(_name='WP', value=produced))
    return dict(form=form, form2=form2)

which in turn populates the html file:

Enter LaTeX code here
Get WordPress code here

To beef up this post, I include sample output. It is based on an old write-up of my discussions with Paul Gustafson during REU 2006 at Texas A&M University. Unfortunately, the project was never finished. I would still like to know if {\partial}-equivalence of polynomials appeared in the literature; the definition looks natural enough.

Given two polynomials {p,q \in {\mathbb C}[z_1,\dots,z_n]}, write {q\preccurlyeq p} if there exists a differential operator {\mathcal T\in {\mathbb C}[\frac{\partial}{\partial z_1},\dots, \frac{\partial}{\partial z_n}]} such that {q=\mathcal Tp}. The relation {\preccurlyeq} is reflexive and transitive, but is not antisymmetric. If both {p\preccurlyeq q} and {q\preccurlyeq p} hold, we can say that {p} and {q} are {\partial}-equivalent.

A polynomial is {\partial}-homogeneous if it is {\partial}-equivalent to a homogeneous polynomial.

It turns out that it is easy to check the {\partial}-homogeneity of {p} by decomposing it into homogeneous parts

\displaystyle    p=p_0+p_1+\dots +p_d   \ \ \ \ \ (1)

where {p_k} is homogeneous of degree {k} and {p_d\not\equiv 0}.

Then use the following

Proposition 1.
The polynomial (1) is {\partial}-homogeneous if and only if {p_k\preccurlyeq p_d} for {k=0,\dots, d-1}.

Proof: Exercise. \Box

For example, the polynomial {p(z,w)=z^3-5w^3+2zw} is not {\partial}-homogeneous since {zw\not\preccurlyeq (z^3-5w^3)}. On the other hand, {q(z,w)=z^3-3z^2w+2zw} is {\partial}-homogeneous because {zw\preccurlyeq (z^3-3z^2w)}. In particular, {p} and {q} are not {\partial}-equivalent.

Proposition 1 also makes it clear that every polynomial in one variable is {\partial}-homogeneous. For univariate polynomials {\partial}-equivalence amounts to having the same degree.