Improving the Wallis product

The Wallis product for {\pi}, as seen on Wikipedia, is

{\displaystyle 2\prod_{k=1}^\infty \frac{4k^2}{4k^2-1}  = \pi \qquad \qquad (1)}

Historical significance of this formula nonwithstanding, one has to admit that this is not a good way to approximate {\pi}. For example, the product up to {k=10} is

{\displaystyle 2\,\frac{2\cdot 2\cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10 \cdot 10\cdot 12\cdot 12\cdot 14\cdot 14\cdot 16\cdot 16\cdot 18 \cdot 18\cdot 20\cdot 20}{1\cdot 3\cdot 3\cdot 5 \cdot 5\cdot 7\cdot 7\cdot 9\cdot 9\cdot 11\cdot 11\cdot 13\cdot 13\cdot 15\cdot 15\cdot 17\cdot 17\cdot 19\cdot 19\cdot 21} =\frac{137438953472}{44801898141} }

And all we get for this effort is the lousy approximation {\pi\approx \mathbf{3.0677}}.

But it turns out that (1) can be dramatically improved with a little tweak. First, let us rewrite partial products in (1) in terms of double factorials. This can be done in two ways: either

{\displaystyle 2\prod_{k=1}^n \frac{4k^2}{4k^2-1}  =  (4n+2) \left(\frac{(2n)!!}{(2n+1)!!}\right)^2  \qquad \qquad (2)}

or

{\displaystyle 2\prod_{k=1}^n \frac{4k^2}{4k^2-1}  =  \frac{2}{2n+1} \left(\frac{(2n)!!}{(2n-1)!!}\right)^2  \qquad \qquad (3)}

Seeing how badly (2) underestimates {\pi}, it is natural to bump it up: replace {4n+2} with {4n+3}:

{\displaystyle \pi \approx b_n= (4n+3) \left(\frac{(2n)!!}{(2n+1)!!}\right)^2  \qquad \qquad (4)}

Now with {n=10} we get {\mathbf{3.1407}} instead of {\mathbf{3.0677}}. The error is down by two orders of magnitude, and all we had to do was to replace the factor of {4n+2=42} with {4n+3=43}. In particular, the size of numerator and denominator hardly changed:

{\displaystyle b_{10}=43\, \frac{2\cdot 2\cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10 \cdot 10\cdot 12\cdot 12\cdot 14\cdot 14\cdot 16\cdot 16\cdot 18 \cdot 18\cdot 20\cdot 20}{3\cdot 3\cdot 5 \cdot 5\cdot 7\cdot 7\cdot 9\cdot 9\cdot 11\cdot 11\cdot 13\cdot 13\cdot 15\cdot 15\cdot 17\cdot 17\cdot 19\cdot 19\cdot 21\cdot 21} }

Approximation (4) differs from (2) by additional term {\left(\frac{(2n)!!}{(2n+1)!!}\right)^2}, which decreases to zero. Therefore, it is not obvious whether the sequence {b_n} is increasing. To prove that it is, observe that the ratio {b_{n+1}/b_n} is

{\displaystyle  \frac{4n+7}{4n+3}\left(\frac{2n+2}{2n+3}\right)^2}

which is greater than 1 because

{\displaystyle  (4n+7)(2n+2)^2 - (4n+3)(2n+3)^2 = 1 >0 }

Sweet cancellation here. Incidentally, it shows that if we used {4n+3+\epsilon} instead of {4n+3}, the sequence would overshoot {\pi} and no longer be increasing.

The formula (3) can be similarly improved. The fraction {2/(2n+1)} is secretly {4/(4n+2)}, which should be replaced with {4/(4n+1)}. The resulting approximation for {\pi}

{\displaystyle c_n =  \frac{4}{4n+1} \left(\frac{(2n)!!}{(2n-1)!!}\right)^2  \qquad \qquad (5)}

is about as good as {b_n}, but it approaches {\pi} from above. For example, {c_{10}\approx \mathbf{3.1425}}.

The proof that {c_n} is decreasing is familiar: the ratio {c_{n+1}/c_n} is

{\displaystyle  \frac{4n+1}{4n+5}\left(\frac{2n+2}{2n+1}\right)^2}

which is less than 1 because

{\displaystyle  (4n+1)(2n+2)^2 - (4n+5)(2n+1)^2 = -1 <0 }

Sweet cancellation once again.

Thus, {b_n<\pi<c_n} for all {n}. The midpoint of this containing interval provides an even better approximation: for example, {(b_{10}+c_{10})/2 \approx \mathbf{3.1416}}. The plot below displays the quality of approximation as logarithm of the absolute error:

Logarithm of approximation error
  • yellow dots show the error of Wallis partial products (2)-(3)
  • blue is the error of {b_n}
  • red is for {c_n}
  • black is for {(b_n+c_n)/2}

And all we had to do was to replace {4n+2} with {4n+3} or {4n+1} in the right places.

Condition number and maximal rotation angle

The condition number {K(A)} of an invertible matrix {A} is the product of norms {\|A\|\,\|A^{-1}\|}, or, in deciphered form, the maximum of the ratio {|Au|/|Av|} taken over all unit vectors {u,v}. It comes up a lot in numerical linear algebra and in optimization problems. One way to think of the condition number is in terms of the image of the unit ball under {A}, which is an ellipsoid. The ratio of longest and shortest axes of the ellipsoid is {K(A)}.

But for positive definite matrices {A} the condition number can also be understood in terms of rotation. First, note that the positivity of {\langle Av,v\rangle } says precisely that that the angle between {v} and {Av} is always less than {\pi/2}. Let {\gamma} be the maximum of such angles taken over all {v\ne 0} (or over all unit vectors, the same thing). Then

\displaystyle  K(A)=\frac{1+\sin\gamma }{1-\sin\gamma} \ \ \ \  \ \ \ \ \ (1)

One can also make (1) a little less geometric by introducing {\delta=\delta(A)} as the largest number such that {\langle Av,v\rangle \ge \delta|Av|\,|v|} holds for all vectors {v}. Then {\delta=\cos \gamma}, and (1) takes the form

\displaystyle  K=\frac{1+\sqrt{1-\delta^2}}{1-\sqrt{1-\delta^2}} =  \left(\frac{1+\sqrt{1-\delta^2}}{\delta}\right)^2  \ \ \ \ \ \ \ \ \ \ (2)

Could this be of any use? The inequality

\displaystyle \langle Av,v\rangle \ge \delta\,|Av|\,|v| \ \ \ \ \ \ \ \ \ \ (3)

is obviously preserved under addition of matrices. Therefore, it is preserved by integration. In particular, if the Hessian of a twice differentiable convex function {u} satisfies (3) at every point, then integration along a line segment from {x} to {y} yields

\displaystyle    \langle \nabla u(x)- \nabla u(y),x-y\rangle \ge \delta\,|\nabla u(x)-\nabla u(y)|\,|x-y|  \ \ \ \ \ \ \ \ \ \ (4)

Conversely, if {u} is a twice differentiable convex function such that (4) holds, then (by differentiation) its Hessian satisfies (3), and therefore admits a uniform bound on the condition number by virtue of (2). Thus, for such functions inequality (4) is equivalent to uniform boundedness of the condition number of the Hessian.

But the Hessian itself does not appear in (4). Condition (4) expresses “uniform boundedness of the condition number of the Hessian” without requiring {u} to be twice differentiable. As a simple example, take {u(x_1,x_2)=|x|^{4/3}}. The Hessian matrix is

\displaystyle  \frac{4}{9}|x|^{-4/3} \begin{pmatrix} x_1^2+3x_2^2 & -2x_1x_2 \\ -2x_1x_2 & 3x_1^2+x_2^2 \end{pmatrix}

The eigenvalues are {\frac{4}{3}|x|^{-1/3}} and {\frac{4}{9}|x|^{-1/3}}. Thus, even though the eigenvalues blow up at the origin and decay at infinity, the condition number of the Hessian remains equal to {3}. Well, except that the second derivatives do not exist at the origin. But if we use the form (4) instead, with {\delta = \sqrt{3}/2}, then non-differentiability becomes a non-issue.

Let’s prove (2). It suffices to work in two dimensions, because both {K(A)} and {\delta(A)} are determined by the restrictions of {A} to two-dimensional subspaces. In two dimensions we can represent the linear map {A} as {z\mapsto \alpha z+\beta \bar z} for some complex numbers {\alpha,\beta}. Actually, {\alpha} is real and positive because {A} is symmetric positive definite. As {z} runs through unimodular complex numbers, the maximum of {|\alpha z+\beta \bar z|} is {\alpha+|\beta|} and the minimum is {\alpha-|\beta|}. Therefore, {K(A)=\frac{1+|\beta|/\alpha}{1-|\beta|/\alpha}}.

When {|z|=1}, the angle {\gamma} that the vector {\alpha z+\beta \bar z} forms with {z} is equal to the argument of {\bar z (\alpha z+\beta \bar z)=\alpha+\beta \bar z^2}. The latter is maximized when {0, \alpha, \alpha+\beta \bar z^2} form a right triangle with hypotenuse {\alpha}.

Proof by picture
Proof by picture

Hence, {\sin\gamma = |\beta|/\alpha}. This proves {K(A)=\frac{1+|\beta|/\alpha}{1-|\beta|/\alpha}}, and (2) follows.


There are similar observations in the literature on quasiconformality of monotone maps, including an inequality similar to (2) (for general matrices), but I have not seen either (1) or (2) stated as an identity for positive definite matrices.

Collisions II

Continuation of the preceding post, in which I considered the ODE system

\displaystyle    \dot{x}_i = \sum_{j\ne i}\frac{x_j-x_i}{|x_j-x_i|},\quad i=1,\dots,n

with the initial conditions {x_i(0)=p_i}, where {p_1,\dots,p_n} are distinct points in a Euclidean space. This can be seen as the gradient flow (in the space of all {n}-point subsets) of the function {\sum_{i\ne j}|x_i-x_j|}. Not surprisingly, minimization of the sum of distances leads to collisions. And they happen as quickly as they possibly could: here is a precise statement.

Lemma 1. Let {t_c} be the time of the first collision. Then

{\displaystyle \frac{1}{2(n-1)}\,\delta\le t_c\le \frac{1}{2}\,\delta}

where {\displaystyle \delta=\min_{i\ne j}|p_i-p_j|}.

The lower bound follows at once from {|\dot{x}_i|\le n-1}. To prove the upper bound on {t_c}, we need the concept of monotonicity. A map {F\colon{\mathbb R}^d\rightarrow{\mathbb R}^d} is called monotone if {\langle F(a)-F(b),a-b\rangle \ge 0} for all {a,b\in{\mathbb R}^d}. It is not hard to prove that the gradient of any convex function is monotone. In particular, {F(x)=x/|x|} is a monotone map, being the gradient of convex function {x\mapsto |x|} (the lack of smoothness at the origin need not concern us here).

Renumbering the points, we may assume {|p_1-p_2|=\delta}. Consider the function {\varphi(t)=|x_1-x_2|}, {0<t<t_c}. Differentiation yields

\displaystyle \dot{\varphi}=|x_1-x_2|^{-1}\langle \dot{x}_1-\dot{x}_2, x_1-x_2\rangle.

This inner product consists of the term

\displaystyle \langle F(x_2-x_1)-F(x_1-x_2), x_1-x_2\rangle = -2|x_1-x_2|

and the sum over {j=3,\dots,n} of

\displaystyle \langle F(x_j-x_1)-F(x_j-x_2), x_1-x_2\rangle \\ = - \langle F(x_j-x_1)-F(x_j-x_2),\ (x_j-x_1)-(x_j-x_2)\rangle \\ \le 0.

Thus, {\dot{\varphi} \le -2} for {0<t<t_c}, which implies {t_c\le \varphi(0)/2}. {\Box}

It is worth noting that if the process is stopped at the moment of first collision, it defines a map between symmetric products {r\colon ({\mathbb R}^d)^{n}\rightarrow ({\mathbb R}^d)^{n-1}} which fixes {({\mathbb R}^d)^{n-1}} pointwise. The latter statement can now be made quantitative: if a set {A\in ({\mathbb R}^d)^{n}} is at distance {\epsilon} from {({\mathbb R}^d)^{n-1}}, then {d_H(r(A),A)\le (n-1)\,\epsilon}. Indeed, in the notation of Lemma 1 we have {\delta=2 \epsilon}. Since the first collision occurs at a time {t_c\le \epsilon}, it follows that {|x_i(t_c)-p_i|\le (n-1) t_c\le (n-1)\,\epsilon}.

Time for some pictures. I changed the Scilab routine so that after each collision it drops one of the collided particles from the process, and restarts. As predicted, this did not change the qualitative picture. Main difference is that trajectories are now visibly non-smooth: angles are formed when a collision happens elsewhere.

4 points
4 points
5 points
5 points
6 points
6 points

In the above examples some trajectories cross in space, but not in space-time.

7 points
7 points

8 points
8 points
9 points
9 points

While the above plots may look like hierarchical clustering trees, solving a system of {n} nonlinear ODE is hardly a practical algorithm for clustering {n} points.

The updated scilab routine is below. This time I called it with random parameters:

 gravity2(grand(1,7,"unf",0,2),grand(1,7,"def"),300) 
function gravity2(vx,vy,Nsteps)
    h = 0.002;
    Npoints = length(vx);
    moving = ones(vx);
    tolerance = h*(Npoints-1)/3;
    x = zeros(Nsteps,Npoints);
    y = zeros(Nsteps,Npoints);
    x(1,:) = vx;
    y(1,:) = vy;
    for m = 1:(Nsteps-1)
        x(m+1,:)= x(m,:);
        y(m+1,:)= y(m,:);
        for i = 1:Npoints
            if moving(i)==0 then continue
            end
            for j = (i+1):Npoints
                if moving(j)==0 then continue
                end
                v = [x(m,j),y(m,j)]-[x(m,i),y(m,i)];
                if norm(v) < tolerance then 
                    moving(j)=0; 
                else
                    x(m+1,i) = x(m+1,i) + h*v(1)/norm(v);
                    x(m+1,j) = x(m+1,j) - h*v(1)/norm(v);
                    y(m+1,i) = y(m+1,i) + h*v(2)/norm(v);
                    y(m+1,j) = y(m+1,j) - h*v(2)/norm(v);
                end
            end
        end
    end
    clf();
    plot(x(:,:),y(:,:),'o');
endfunction