A frequently asked question about numerical integration is: how to speed up the process of computing for many values of parameter ? Running an explicit *for* over the values of seems slow… is there a way to *vectorize* this, performing integration on an array of functions at once (which in reality means, pushing the loop to a lower level language)?

Usually, the answer is some combination of “no” and “you are worrying about the wrong thing”. Integration routines are typically *adaptive*, meaning they pick evaluation points based on the function being integrated. Different values of the parameter will require different evaluation points to achieve the desired precision, so vectorization is out of question. This applies, for example, to `quad`

method of SciPy integration module.

Let’s suppose we insist on vectorization and are willing to accept a non-adaptive method. What are our options, considering SciPy/NumPy? I will compare

- Trapezoidal rule,
`numpy.trapz`

- Simpson’s rule,
`scipy.integrate.simps`

- Romberg’s method,
`scipy.integrate.romb`

- Gaussian quadrature,
`scipy.integrate.fixed_quad`

The test case is with . Theoretically all these are equal to . But most of these functions are not analytic at , and some aren’t even differentiable there, so their integration is not a piece of cake.

Keeping in mind that Romberg’s integration requires subintervals, let’s use equal subintervals, hence equally spaced sample points. Here is the vectorized evaluation of all our functions at these points, resulting in a 2D array “values”:

import numpy as np import scipy.integrate as si x = np.linspace(0, 1, 1025).reshape(1, -1) dx = x[0,1] - x[0,0] p = np.arange(0, 100, 0.01).reshape(-1, 1) values = (p+1)*x**p

This computation took 0.799 seconds on my AWS instance (t2.nano). Putting the results of evaluation together takes less time:

- Trapezoidal rule
`np.trapz(values, dx=dx)`

took 0.144 seconds and returned values ranging from 0.99955 to 1.00080. - Simpson’s rule
`si.simps(values, dx=dx)`

took 0.113 seconds and returned values ranging from 0.99970 to 1.0000005. - Romberg quadrature
`si.romb(values, dx=dx)`

was fastest at 0.0414 seconds, and also most accurate, with output between 0.99973 and 1.000000005.

Conclusions so far:

- SciPy’s implementation of Romberg quadrature is surprisingly fast, considering that this algorithm is the result of repeated Richardson extrapolation of the trapezoidal rule (and Simpson’s rule is just the result of the first extrapolation step). It’d be nice to backport whatever makes Romberg so effective back to Simpson’s rule.
- The underestimation errors 0.999… are greatest when is near zero, so the integrand is very nonsmooth at . The lack of smoothness renders Richardson extrapolation ineffective, hence all three rules have about the same error here.
- The overestimation errors are greatest when is large. The function is pretty smooth then, so upgrading from trapezoidal to Simpson to Romberg brings substantial improvements.

All of this is nice, but the fact remains that non-vectorized adaptive integration is both faster and much more accurate. The following loop with `quad`

, which uses adaptive Clenshaw-Curtis quadrature, runs in 0.616 seconds (less than it took to evaluate our functions on a grid) and returns values between 0.99999999995 and 1.00000000007. Better to use exponential notation here: and .

funcs = [lambda x, p=p_: (p+1)*x**p for p_ in np.arange(0, 100, 0.01)] result = [si.quad(fun, 0, 1)[0] for fun in funcs]

An adaptive algorithm shines when is small, by placing more evaluation points near zero. It controls both over- and under- estimates equally well, continuing to add points until the required precision is reached.

The last hope of non-adaptive methods is Gaussian quadrature, which is implemented in SciPy as `fixed_quad`

(“fixed” referring to the number of evaluation points used). With 300 evaluation points, the integration takes 0.263 seconds (excluding the computation of nodes and weights, which can be cached) and the results are between and . This is twice as fast as a loop with `quad`

, and more accurate for large — but sadly, not so accurate for small . As said before, with near zero is really a showcase for adaptive methods.