## Sequential characterization and removability for uniform continuity

There is a useful sequential characterization of continuity in metric spaces. Let ${f\colon X\to Y}$ be a map between metric spaces. If for every convergent sequence ${x_n\to p}$ in ${X}$ we have ${f(x_n)\to f(p)}$ in ${Y}$, then ${f}$ is continuous. And the converse is true as well.

Uniformly continuous functions also have a useful property related to sequences: if ${f\colon X\to Y}$ is uniformly continuous and ${\{x_n\}}$ is a Cauchy sequence in ${X}$, then ${\{f(x_n)\}}$ is a Cauchy sequence in ${Y}$. However, this property does not characterize uniform continuity. For example, if ${X=Y=\mathbb R}$, then Cauchy sequences are the same as convergent sequences, and therefore any continuous function preserves the Cauchy-ness of sequences—it does not have to be uniformly continuous.

Let us say that two sequences ${\{x_n\}}$ and ${\{x_n'\}}$ are equivalent if the distance from ${x_n}$ to ${x_n'}$ tends to zero. The sequential characterization of uniform continuity is: ${f\colon X\to Y}$ is uniformly continuous if and only if for any two equivalent sequences ${\{x_n\}}$ and ${\{x_n'\}}$ in ${X}$, their images ${\{f(x_n)\}}$ and ${\{f(x_n')\}}$ are equivalent in ${Y}$. The proof of this claim is straightforward.

In the special case when ${\{x_n'\}}$ is a constant sequence, the sequential characterization of uniform continuity reduces to the sequential characterization of continuity.

A typical example of the use of this characterization is the proof that a continuous function on a compact set is uniformly continuous: pick two equivalent sequences with non-equivalent images, pass to suitable subsequences, get a contradiction with continuity.

Here is a different example. To state it, introduce the notation ${N_r(p) = \{x\colon d(x, p).

Removability Theorem. Let ${f\colon X\to Y}$ be continuous. Suppose that there exists ${p\in X}$ such that for every ${r>0}$, the restriction of ${f}$ to ${X\setminus N_r(p)}$ is uniformly continuous. Then ${f}$ is uniformly continuous on ${X}$.

This is a removability result because from having a certain property on subsets of ${X}$ we get it on all of ${X}$. To demonstrate its use, let ${X=[0, \infty)}$ with the standard metric, ${p=0}$, and ${f(x)=\sqrt{x}}$. The uniform continuity of ${f}$ on ${X\setminus N_r(p)}$ follows immediately from the derivative ${f'}$ being bounded on that set (so, ${f}$ is Lipschitz continuous there). By the removability theorem, ${f}$ is uniformly continuous on ${X}$.

Before proving the theorem, let us restate the sequential characterization in an equivalent form (up to passing to subsequences): ${f\colon X\to Y}$ is uniformly continuous if and only if for any two equivalent sequences ${\{x_n\}}$ and ${\{x_n'\}}$ there exist equivalent subsequences ${\{f(x_{n_k})\}}$ and ${\{f(x_{n_k}')\}}$, with the same choice of indices ${n_k}$ in both.

Proof of the theorem. Suppose ${\{x_n\}}$ and ${\{x_n'\}}$ are equivalent sequences in ${X}$. If ${x_n\to p}$, then ${x_n'\to p}$ as well, and the continuity of ${f}$ at ${p}$ implies that both ${\{f(x_n)\}}$ and ${\{f(x_n')\}}$ converge to ${p}$, hence are equivalent sequences. If ${x_n\not\to p}$, then by passing to a subsequence we can achieve ${d(x_n, p)\ge r }$ for some constant ${r>0}$. By the triangle inequality, for sufficiently large ${n}$ we have ${d(x_n', p)\ge r/2}$. Since ${f}$ is uniformly continuous on ${X\setminus N_{r/2}(p)}$, it follows that ${\{f(x_n)\}}$ and ${\{f(x_n')\}}$ are equivalent.

## Recursive randomness of reals: summing a random decreasing sequence

In a comment to Recursive randomness of integers Rahul pointed out a continuous version of the same problem: pick ${x_0}$ uniformly in ${[0,1]}$, then ${x_1}$ uniformly in ${[0,x_0]}$, then ${x_2}$ uniformly in ${[0, x_1]}$, etc. What is the distribution of the sum ${S=\sum_{n=0}^\infty x_n}$?

The continuous version turns out to be easier to analyse. To begin with, it’s equivalent to picking uniformly distributed, independent ${y_n\in [0,1]}$ and letting ${x_n = y_0y_1\cdots y_n}$. Then the sum is

${\displaystyle y_0+y_0y_1 + y_0y_1y_2 + \cdots }$

which can be written as

${\displaystyle y_0(1+y_1(1+y_2(1+\cdots )))}$

So, ${S}$ is a stationary point of the random process ${X\mapsto (X+1)U}$ where ${U}$ is uniformly distributed in ${[0,1]}$. Simply put, ${S}$ and ${(S+1)U}$ have the same distribution. This yields the value of ${E[S]}$ in a much simpler way than in the previous post:

${E[S]=E[(S+1)U] = (E[S] + 1) E[U] = (E[S] + 1)/2}$

hence ${E[S]=1}$.

We also get an equation for the cumulative distribution function ${C(t) = P[S\le t]}$. Indeed,

${\displaystyle P[S\le t] = P[(S+1)U \le t] = P[S \le t/U-1]}$

The latter probability is ${\int_0^1 P[S\le t/u-1]\,du = \int_0^1 C(t/u-1)\,du}$. Conclusion: ${C(t) = \int_0^1 C(t/u-1)\,du}$. Differentiate to get an equation for the probability density function ${p(t)}$, namely ${p(t) = \int_0^1 p(t/u-1)\,du/u}$. It’s convenient to change the variable of integration to ${v = t/u-1}$, which leads to

${\displaystyle p(t) = \int_{t-1}^\infty p(v)\,\frac{dv}{v+1}}$

Another differentiation turns the integral equation into a delay differential equation,

${\displaystyle p'(t) = - \frac{p(t-1)}{t}}$

Looks pretty simple, doesn’t it? Since the density is zero for negative arguments, it is constant on ${[0,1]}$. This constant, which I’ll denote ${\gamma}$, is ${\int_{0}^\infty p(v)\,\frac{dv}{v+1}}$, or simply ${E[1/(S+1)]}$. I couldn’t get an analytic formula for ${\gamma}$. My attempt was ${E[1/(S+1)] = \sum_{n=0}^\infty (-1)^n M_n}$ where ${M_n=E[S^n]}$ are the moments of ${S}$. The moments can be computed recursively using ${E[S^n] = E[(S+1)^n]E[U^n]}$, which yields

${\displaystyle M_n=\frac{1}{n} \sum_{k=0}^{n-1} \binom{n}{k}M_k}$

The first few moments, starting with ${M_0}$, are 1, 1, 3/2, 17/6, 19/3, 81/5, 8351/80… Unfortunately the series ${\sum_{n=0}^\infty (-1)^n M_n}$ diverges, so this approach seems doomed. Numerically ${\gamma \approx 0.5614}$ which is not far from the Euler-Mascheroni constant, hence the choice of notation.

On the interval (1,2) we have ${p'(t) = -\gamma/t}$, hence

${p(t) = \gamma(1-\log t)}$ for ${1 \le t \le 2}$.

The DDE gets harder to integrate after that… on the interval ${[2,3]}$ the solution already involves the dilogarithm (Spence’s function):

${\displaystyle p(t) = \gamma(1+\pi^2/12 - \log t + \log(t-1)\log t + \mathrm{Spence}\,(t))}$

following SciPy’s convention for Spence. This is as far as I went… but here is an experimental confirmation of the formulas obtained so far (link to full size image).

To generate a sample from distribution S, I begin with a bunch of zeros and repeat “add 1, multiply by U[0,1]” many times. That’s it.

import numpy as np
import matplotlib.pyplot as plt
trials = 10000000
terms = 10000
x = np.zeros(shape=(trials,))
for _ in range(terms):
np.multiply(x+1, np.random.uniform(size=(trials,)), out=x)
_ = plt.hist(x, bins=1000, normed=True)
plt.show()

I still want to know the exact value of ${\gamma}$… after all, it’s also the probability that the sum of our random decreasing sequence is less than 1.

### Update

The constant I called “${\gamma}$” is in fact ${\exp(-\gamma)}$ where ${\gamma}$ is indeed Euler’s constant… This is what I learned from the Inverse Symbolic Calculator after solving the DDE (with initial value 1) numerically, and calculating the integral of the solution. From there, it did not take long to find that

Oh well. At least I practiced solving delay differential equations in Python. There is no built-in method in SciPy for that, and although there are some modules for DDE out there, I decided to roll my own. The logic is straightforward: solve the ODE on an interval of length 1, then build an interpolating spline out of the numeric solution and use it as the right hand side in the ODE, repeat. I used Romberg’s method for integrating the solution; the integration is done separately on each interval [k, k+1] because of the lack of smoothness at the integers.

import numpy as np
from scipy.integrate import odeint, romb
from scipy.interpolate import interp1d
numpoints = 2**12 + 1
solution = [lambda x: 1]
integrals = [1]
for k in range(1, 15):
y0 = solution[k-1](k)
t = np.linspace(k, k+1, numpoints)
rhs = lambda y, x: -solution[k-1](np.clip(x-1, k-1, k))/x
y = odeint(rhs, y0, t, atol=1e-15, rtol=1e-13).squeeze()
solution.append(interp1d(t, y, kind='cubic', assume_sorted=True))
integrals.append(romb(y, dx=1/(numpoints-1)))
total_integral = sum(integrals)
print("{:.15f}".format(1/total_integral))


As a byproduct, the program found the probabilities of the random sum being in each integer interval:

• 56.15% in [0,1]
• 34.46% in [1,2]
• 8.19% in [2,3]
• 1.1% in [3,4]
• 0.1% in [4,5]
• less than 0.01% chance of being greater than 5

## Multipliers preserving series convergence

The Comparison Test shows that if ${\sum a_n}$ is an absolutely convergent series, and ${\{b_n\}}$ is a bounded sequence, then ${\sum a_nb_n}$ converges absolutely. Indeed, ${|a_nb_n|\le M|a_n|}$ where ${M}$ is such that ${|b_n|\le M}$ for all ${n}$.

With a bit more effort one can prove that this property of preserving absolute convergence is equivalent to being a bounded sequence. Indeed, if ${\{b_n\}}$ is unbounded, then for every ${k}$ there is ${n_k}$ such that ${|b_{n_k}|\ge 2^k}$. We can ensure ${n_k > n_{k-1}}$ since there are infinitely many candidates for ${n_k}$. Define ${a_n=2^{-k}}$ if ${n = n_k}$ for some ${k}$, and ${a_n=0}$ otherwise. Then ${\sum a_n}$ converges but ${\sum a_nb_n}$ diverges because its terms do not approach zero.

What if we drop “absolutely”? Let’s say that a sequence ${\{b_n\}}$ preserves convergence of series if for every convergent series ${\sum a_n}$, the series ${\sum a_n b_n}$ also converges. Being bounded doesn’t imply this property: for example, ${b_n=(-1)^n}$ does not preserve convergence of the series ${\sum (-1)^n/n}$.

Theorem. A sequence ${\{b_n\}}$ preserves convergence of series if and only if it has bounded variation, meaning ${\sum |b_n-b_{n+1}| }$ converges.

For brevity, let’s say that ${\{b_n\}}$ is BV. Every bounded monotone sequence is BV because the sum ${\sum |b_n-b_{n+1}| }$ telescopes. On the other hand, ${(-1)^n}$ is not BV, and neither is ${(-1)^n/n}$. But ${(-1)^n/n^p}$ is for ${p>1}$. The following lemma describes the structure of BV sequences.

Lemma 1. A sequence ${\{b_n\}}$ is BV if and only if there are two increasing bounded sequences ${\{c_n\}}$ and ${\{d_n\}}$ such that ${b_n=c_n-d_n}$ for all ${n}$.

Proof. If such ${c_n,d_n}$ exist, then by the triangle inequality $\displaystyle \sum_{n=1}^N |b_n-b_{n+1}| = \sum_{n=1}^N (|c_n-c_{n +1}| + |d_{n+1}-d_n|) = \sum_{n=1}^N (c_{n+1}-c_n) + \sum_{n=1}^N (d_{n+1}-d_n)$ and the latter sums telescope to ${c_{N+1}-c_1 + d_{N+1}-d_1}$ which has a limit as ${N\rightarrow\infty}$ since bounded monotone sequences converge.

Conversely, suppose ${\{b_n\}}$ is BV. Let ${c_n = \sum_{k=1}^{n-1}|b_k-b_{k+1}|}$, understanding that ${c_1=0}$. By construction, the sequence ${\{c_n\}}$ is increasing and bounded. Also let ${d_n=c_n-b_n}$; as a difference of bounded sequences, this is bounded too. Finally,

$\displaystyle d_{n+1}-d_n = c_{n+1} -c_n + b_n - b_{n+1} = |b_n-b_{n+1}|+ b_n - b_{n+1} \ge 0$

which shows that ${\{d_n\}}$ is increasing.

To construct a suitable example where ${\sum a_nb_n}$ diverges, we need another lemma.

Lemma 2. If a series of nonnegative terms ${\sum A_n}$ diverges, then there is a sequence ${c_n\rightarrow 0}$ such that the series ${\sum c_n A_n}$ still diverges.

Proof. Let ${s_n = A_1+\dots+A_n}$ (partial sums); then ${A_n=s_n-s_{n-1}}$. The sequence ${\sqrt{s_n}}$ tends to infinity, but slower than ${s_n}$ itself. Let ${c_n=1/(\sqrt{s_n}+\sqrt{s_{n-1}})}$, so that ${c_nA_n = \sqrt{s_n}-\sqrt{s_{n-1}}}$, and we are done: the partial sums of ${\sum c_nA_n}$ telescope to ${\sqrt{s_n}}$.

Proof of the theorem, Sufficiency part. Suppose ${\{b_n\}}$ is BV. Using Lemma 1, write ${b_n=c_n-d_n}$. Since ${a_nb_n = a_nc_n - a_n d_n}$, it suffices to prove that ${\sum a_nc_n}$ and ${\sum a_nd_n}$ converge. Consider the first one; the proof for the other is the same. Let ${L=\lim c_n}$ and write ${a_nc_n = La_n - a_n(L-c_n)}$. Here ${\sum La_n}$ converges as a constant multiple of ${\sum a_n}$. Also, ${\sum a_n(L-c_n)}$ converges by the Dirichlet test: the partial sums of ${\sum a_n}$ are bounded, and ${L-c_n}$ decreases to zero.

Proof of the theorem, Necessity part. Suppose ${\{b_n\}}$ is not BV. The goal is to find a convergent series ${\sum a_n}$ such that ${\sum a_nb_n}$ diverges. If ${\{b_n\}}$ is not bounded, then we can proceed as in the case of absolute convergence, considered above. So let’s assume ${\{b_n\}}$ is bounded.

Since ${\sum_{n=1}^\infty |b_n-b_{n+1}|}$ diverges, by Lemma 2 there exists ${\{c_n\} }$ such that ${c_n\rightarrow 0}$ and ${\sum_{n=1}^\infty c_n|b_n-b_{n+1}|}$ diverges. Let ${d_n}$ be such that ${d_n(b_n-b_{n+1}) = c_n|b_n-b_{n+1}|}$; that is, ${d_n}$ differs from ${c_n}$ only by sign. In particular, ${d_n\rightarrow 0}$. Summation by parts yields

$\displaystyle { \sum_{n=1}^N d_n(b_n-b_{n+1}) = \sum_{n=2}^N (d_{n}-d_{n-1})b_n + d_1b_1-d_Nb_{N+1} }$

As ${N\rightarrow\infty}$, the left hand side of does not have a limit since ${\sum d_n(b_n-b_{n+1})}$ diverges. On the other hand, ${d_1b_1-d_Nb_{N+1}\rightarrow d_1b_1}$ since ${d_N\rightarrow 0}$ while ${b_{N+1}}$ stays bounded. Therefore, ${\lim_{N\rightarrow\infty} \sum_{n=2}^N (d_{n}-d_{n-1})b_n}$ does not exist.

Let ${a_n= d_n-d_{n-1}}$. The series ${\sum a_n}$ converges (by telescoping, since ${\lim_{n\rightarrow\infty} d_n}$ exists) but ${\sum a_nb_n}$ diverges, as shown above.

In terms of functional analysis the preservation of absolute convergence is essentially the statement that ${(\ell_1)^* = \ell_\infty}$. Notably, the ${\ell_\infty}$ norm of ${\{b_n\}}$, i.e., ${\sup |b_n|}$, is the number that controls by how much ${\sum |a_nb_n|}$ can exceed ${\sum |a_n|}$.

I don’t have a similar quantitative statement for the case of convergence. The BV space has a natural norm too, ${\sum |b_n-b_{n-1}|}$ (interpreting ${b_0}$ as ${0}$), but it’s not obvious how to relate this norm to the values of the sums ${\sum a_n}$ and ${\sum a_nb_n}$.

## Iterating the logistic map: limsup of nonperiodic orbits

Last time we found that when a sequence with ${x_1\in (0,1)}$ and ${x_{n+1} = 4x_n(1-x_n)}$ does not become periodic, its upper limit ${\limsup x_n}$ must be at least ${\approx 0.925}$. This time we’ll see that ${\limsup x_n}$ can be as low as ${(2+\sqrt{3})/4\approx 0.933}$ and determine for which ${x_1}$ it is equal to 1.

The quadratic polynomial ${f(x)=4x(1-x)}$ maps the interval ${[0,1]}$ onto itself. Since the linear function ${g(x) = 1-2x}$ maps ${[0,1]}$ onto ${[-1,1]}$, it follows that the composition ${h=g\circ f\circ g^{-1}}$ maps ${[-1,1]}$ onto ${[-1,1]}$. This composition is easy to compute: ${h(x) = 2x^2-1 }$.

We want to know whether the iteration of ${f}$, starting from ${x_1}$, produces numbers arbitrarily close to ${1}$. Since ${f\circ f \circ \cdots \circ f = g^{-1}\circ h \circ h \circ \cdots \circ h\circ g}$ the goal is equivalent to finding whether the iteration of ${h}$, starting from ${g(x_1)}$, produces numbers arbitrarily close to ${g(1) = -1}$. To shorten formulas, let’s write ${h_n}$ for the ${n}$th iterate of ${h}$, for example, ${h_3 = h\circ h\circ h}$.

So far we traded one quadratic polynomial ${f}$ for another, ${h}$. But ${h}$ satisfies a nice identity: ${h(\cos t)=2\cos^2 t-1 = \cos(2t)}$, hence ${h_n(\cos t) = \cos (2^n t)}$ for all ${n\in\mathbb N}$. It’s convenient to introduce ${\alpha = \frac{1}{\pi}\cos^{-1}(1-2x_1)}$, so that ${ h_n(g(x_1)) = h_n(\cos 2\pi \alpha ) = \cos(2^n\cdot 2\pi \alpha) }$.

The problem becomes to determine whether the numbers ${2^n\cdot 2\pi \alpha}$ come arbitrarily close to ${\pi}$, modulo an integer multiple of ${2\pi}$. Dividing by ${2\pi}$ rephrases this as: does the fractional part of ${2^n \alpha}$ come arbitrarily close to ${1/2}$?

A number that is close to ${1/2}$ has the binary expansion beginning either with ${0.01111111\dots}$ or with ${0.10000000\dots}$. Since the binary expansion of ${2^n\alpha}$ is just the binary expansion of ${\alpha}$ shifted ${n}$ digits to the left, the property ${\limsup x_n=1}$ is equivalent to the following: for every ${k\in\mathbb N}$ the binary expansion of ${\alpha}$ has infinitely many groups of the form “1 followed by k zeros” or “0 followed by k ones”.

A periodic expansion cannot have the above property; this, ${\alpha}$ must be irrational. The property described above can then be simplified to “irrational and has arbitrarily long runs of the same digit”, since a long run of ${0}$s will be preceded by a ${1}$, and vice versa.

For example, combining the pairs 01 and 10 in some non-periodic way, we get an irrational number ${\alpha}$ such that the fractional part of ${2^n\alpha}$ does not get any closer to 1/2 than ${0.01\overline{10}_2 = 5/12}$ or ${0.10\overline{01}_2 = 7/12}$. Hence, ${\cos 2^n 2\pi \alpha \ge -\sqrt{3}/2}$, which leads to the upper bound ${x_n\le (2+\sqrt{3})/4\approx 0.933}$ for the sequence with the starting value ${x_1=(1-\cos\pi\alpha)/2}$.

Let us summarize the above observations about ${\limsup x_n}$.

Theorem: ${\limsup x_n=1}$ if and only if (A) the number ${\alpha = \frac{1}{\pi}\cos^{-1}(1-2x_1)}$ is irrational, and (B) the binary expansion of ${\alpha}$ has arbitrarily long runs of the same digit.

Intuitively, one expects that a number that satisfies (A) will also satisfy (B) unless it was constructed specifically to fail (B). But to verify that (B) holds for a given number is not an easy task.

As a bonus, let’s prove that for every rational number ${y\in (-1,1)}$, except 0, 1/2 and -1/2, the number ${\alpha = \frac{1}{\pi}\cos^{-1}y}$ is irrational. This will imply, in particular, that ${x_1=1/3}$ yields a non-periodic sequence. The proof follows a post by Robert Israel and requires a lemma (which could be replaced with an appeal to Chebyshev polynomials, but the lemma keeps things self-contained).

Lemma. For every ${n\in \mathbb N}$ there exists a monic polynomial ${P_n}$ with integer coefficients such that ${P_n(2 \cos t) = 2\cos nt }$ for all ${t}$.

Proof. Induction, the base case ${n=1}$ being ${P_1(x)=x}$. Assuming the result for integers ${\le n}$, we have ${2 \cos (n+1)t = e^{i(n+1)t} + e^{-i(n+1)t} }$ ${ = (e^{int} + e^{-int})(e^{it} + e^{-it}) - (e^{i(n-1)t} + e^{-i(n-1)t}) }$ ${ = P_n(2 \cos t) (2\cos t) - P_{n-1}(2\cos t) }$
which is a monic polynomial of ${2\cos t}$. ${\Box}$

Suppose that there exists ${n}$ such that ${n\alpha \in\mathbb Z}$. Then ${2\cos(\pi n\alpha)=\pm 2}$. By the lemma, this implies ${P_n(2\cos(\pi \alpha)) =\pm 2}$, that is ${P_n(2y)=\pm 2}$. Since ${2y}$ is a rational root of a monic polynomial with integer coefficients, the Rational Root Theorem implies that it is an integer. ${\Box}$

## A limsup exercise: iterating the logistic map

Define the sequence ${\{x_n\}}$ as follows: ${x_1=1/3}$ and ${x_{n+1} = 4x_n(1-x_n)}$ for ${n=1,2,\dots}$. What can we say about its behavior as ${n\rightarrow\infty}$?

The logistic map ${f(x)=4x(1-x)}$ leaves the interval [0,1] invariant (as a set), so ${0\le x_n\le 1}$ for all ${n}$. There are two fixed points: 0 and 3/4.

Can ${x_n}$ ever be 0? If ${n}$ is the first index this happens, then ${x_{n-1}}$ must be ${1}$. Working backwards, we find ${x_{n-2}=1/2}$, and ${x_{n-3}\in \{1/2 \pm \sqrt{2}/4\}}$. But this is impossible since all elements of the sequence are rational. Similarly, if ${n}$ is the first index when ${x_n = 3/4}$, then ${x_{n-1}=1/4}$ and ${x_{n-2}\in \{1/2\pm \sqrt{3}/4\}}$, a contradiction again. Thus, the sequence never stabilizes.

If ${x_n}$ had a limit, it would have to be one of the two fixed points. But both are repelling: ${f'(x) = 4 - 8x}$, so ${|f'(0)|=4>1 }$ and ${|f'(3/4)| = 2 > 1}$. This means that a small nonzero distance to a fixed point will increase under iteration. The only way to converge to a repelling fixed point is to hit it directly, but we already know this does not happen. So the sequence ${\{x_n\}}$ does not converge.

But we can still consider its upper and lower limits. Let’s try to estimate ${S = \limsup x_n}$ from below. Since ${f(x)\ge x}$ for ${x\in [0,3/4]}$, the sequence ${\{x_n\}}$ increases as long as ${x_n\le 3/4}$. Since we know it doesn’t have a limit, it must eventually break this pattern, and therefore exceed 3/4. Thus, ${S\ge 3/4}$.

This can be improved. The second iterate ${f_2(x)=f(f(x))}$ satisfies ${f_2(x)\ge x}$ for ${x}$ between ${3/4}$ and ${a = (5+\sqrt{5})/8 \approx 0.9}$. So, once ${x_n>3/4}$ (which, by above, happens infinitely often), the subsequence ${x_n, x_{n+2}, x_{n+4},\dots}$ increases until it reaches ${a}$. Hence ${S\ge a}$.

The bound ${\limsup x_n\ge a}$ is best possible if the only information about ${x_1}$ is that the sequence ${x_n}$ does not converge. Indeed, ${a}$ is a periodic point of ${f}$, with the corresponding iteration sequence ${\{(5+ (-1)^n\sqrt{5})/8\}}$.

Further improvement is possible if we recall that our sequence is rational and hence cannot hit ${a}$ exactly. By doubling the number of iterations (so that the iterate also fixes ${a}$ but also has positive derivative there) we arrive at the fourth iterate ${f_4}$. Then ${f_4(x)\ge x}$ for ${a\le x\le b}$, where ${b }$ is the next root of ${f_4(x)-x}$ after ${a}$, approximately ${0.925}$. Hence ${S\ge b}$.

This is a colorful illustration of the estimation process (made with Sage): we are covering the line ${y=x}$ with the iterates of ${f}$, so that each subsequent one rises above the line the moment the previous one falls. This improves the lower bound on ${S}$ from 0.75 to 0.9 to 0.92.

Although this process can be continued, the gains diminish so rapidly that it seems unlikely one can get to 1 in this way. In fact, one cannot because we are not using any properties of ${x_1}$ other than “the sequence ${x_n}$ is not periodic.” And it’s not true that ${\limsup x_n = 1}$ for every non-periodic orbit of ${f}$. Let’s return to this later.

## Sequences and nets

According to the (Banach-)Alaoglu theorem, for any Banach space $X$ the closed unit ball of $X^*$ is compact in the weak* topology (the weakest/coarsest topology that makes all evaluation functionals $f\mapsto f(x)$ continuous).

For example, take $X=\ell_{\infty}$. The dual space $\ell_{\infty}^*$ contains an isometric copy of $\ell_1$ because $\ell_\infty^*=\ell_1^{**}$. The sequence $x_n=n^{-1}(e_1+\dots+e_n)$, where $e_n$ are the standard basis vectors, is contained in the unit sphere of $\ell_1$. Should $(x_n)$ have a weak*-convergent subsequence? Maybe it should, but it does not.

Indeed, take any subsequence $(x_{n_k})$. If necessary, choose a further subsequence so that $n_{k+1}\ge 3n_k$ for all $k$. Define

$\displaystyle y=\sum_{k}(-1)^k\sum_{n_{k-1}< j\le n_k} e_j$

where we set $n_0=0$. Two things to notice here: (1) $y\in \ell_{\infty}$; and (2) at least 2/3 of the coefficients of $e_j$, $1\le j\le n_k$, have the sign $(-1)^k$. Hence, $\langle x_{n_k},y\rangle\le -1/3$ when $k$ is odd and $\langle x_{n_k},y\rangle\ge 1/3$ when $k$ is even. This shows that $(x_{n_k})$ does not converge in the weak* topology.

The above does not contradict the Banach-Alaoglu theorem. Since $\ell_\infty$ is not separable, the weak* topology on the unit ball of its dual is not metrizable. The compactness can be stated in terms of nets instead of sequences: every bounded net in $X^*$ has a convergent subnet. In particular, the sequence $(x_n)$ has a convergent subnet (which is not a sequence). I personally find subnets a recipe for erroneous arguments. So I prefer to say: the infinite set $\{x_n\}$ has a cluster point $x$; namely, every neighborhood of $x$ contains some $x_n$. You can use the reverse inclusion of neighborhoods to define a subnet, but I’d rather not to. Everything we want to know about $x$ can be easily proved from the cluster point definition. For example,

• $\|x\|_{X^*}=1$
• $\langle x, 1_{\infty}\rangle =1$ where $1_{\infty}$ stands for the $\ell_\infty$ vector with all coordinates 1.
• $\langle x, y\rangle = \langle x, Sy\rangle$ where $S$ is the shift operator on $\ell_\infty$