# Vectorization of integration

A frequently asked question about numerical integration is: how to speed up the process of computing ${\int_a^b f(x,p)\,dx}$ for many values of parameter ${p}$? Running an explicit for over the values of ${p}$ seems slow… is there a way to vectorize this, performing integration on an array of functions at once (which in reality means, pushing the loop to a lower level language)?

Usually, the answer is some combination of “no” and “you are worrying about the wrong thing”. Integration routines are typically adaptive, meaning they pick evaluation points based on the function being integrated. Different values of the parameter will require different evaluation points to achieve the desired precision, so vectorization is out of question. This applies, for example, to quad method of SciPy integration module.

Let’s suppose we insist on vectorization and are willing to accept a non-adaptive method. What are our options, considering SciPy/NumPy? I will compare

The test case is ${\int_0^1 (p+1)x^p\,dx}$ with ${p=0,0.01,\dots,99.99}$. Theoretically all these are equal to ${1}$. But most of these functions are not analytic at ${0}$, and some aren’t even differentiable there, so their integration is not a piece of cake.

Keeping in mind that Romberg’s integration requires ${2^k}$ subintervals, let’s use ${1024}$ equal subintervals, hence ${1025}$ equally spaced sample points. Here is the vectorized evaluation of all our functions at these points, resulting in a 2D array “values”:

import numpy as np
import scipy.integrate as si
x = np.linspace(0, 1, 1025).reshape(1, -1)
dx = x[0,1] - x[0,0]
p = np.arange(0, 100, 0.01).reshape(-1, 1)
values = (p+1)*x**p


This computation took 0.799 seconds on my AWS instance (t2.nano). Putting the results of evaluation together takes less time:

• Trapezoidal rule np.trapz(values, dx=dx) took 0.144 seconds and returned values ranging from 0.99955 to 1.00080.
• Simpson’s rule si.simps(values, dx=dx) took 0.113 seconds and returned values ranging from 0.99970 to 1.0000005.
• Romberg quadrature si.romb(values, dx=dx) was fastest at 0.0414 seconds, and also most accurate, with output between 0.99973 and 1.000000005.

Conclusions so far:

• SciPy’s implementation of Romberg quadrature is surprisingly fast, considering that this algorithm is the result of repeated Richardson extrapolation of the trapezoidal rule (and Simpson’s rule is just the result of the first extrapolation step). It’d be nice to backport whatever makes Romberg so effective back to Simpson’s rule.
• The underestimation errors 0.999… are greatest when ${p}$ is near zero, so the integrand is very nonsmooth at ${0}$. The lack of smoothness renders Richardson extrapolation ineffective, hence all three rules have about the same error here.
• The overestimation errors are greatest when ${p}$ is large. The function is pretty smooth then, so upgrading from trapezoidal to Simpson to Romberg brings substantial improvements.

All of this is nice, but the fact remains that non-vectorized adaptive integration is both faster and much more accurate. The following loop with quad, which uses adaptive Clenshaw-Curtis quadrature, runs in 0.616 seconds (less than it took to evaluate our functions on a grid) and returns values between 0.99999999995 and 1.00000000007. Better to use exponential notation here: ${1-5\cdot 10^{-11} }$ and ${1+7\cdot 10^{-11}}$.

funcs = [lambda x, p=p_: (p+1)*x**p for p_ in np.arange(0, 100, 0.01)]
result = [si.quad(fun, 0, 1)[0] for fun in funcs]


An adaptive algorithm shines when ${p}$ is small, by placing more evaluation points near zero. It controls both over- and under- estimates equally well, continuing to add points until the required precision is reached.

The last hope of non-adaptive methods is Gaussian quadrature, which is implemented in SciPy as fixed_quad (“fixed” referring to the number of evaluation points used). With 300 evaluation points, the integration takes 0.263 seconds (excluding the computation of nodes and weights, which can be cached) and the results are between ${1-2\cdot 10^{-12}}$ and ${1+2\cdot 10^{-7}}$. This is twice as fast as a loop with quad, and more accurate for large ${p}$ — but sadly, not so accurate for small ${p}$. As said before, ${x^p}$ with ${p}$ near zero is really a showcase for adaptive methods.