When the digits of pi go to 11

There is an upward trend in the digits of {\pi}. I just found it using Maple.

X := [0, 1, 2, 3, 4, 5, 6, 7, 8]:
Y := [3, 1, 4, 1, 5, 9, 2, 6, 5]:
LinearFit([1, n], X, Y, n);

2.20000000000000+.450000000000000*n

Here the digits are enumerated beginning with the {0}th, which is {3}. The regression line {y = 2.2 + 0.45n} predicts that the {20}th digit of {\pi} is approximately {11}.

It goes to 11
It goes to 11

But maybe my data set is too small. Let’s throw in one more digit; that ought to be enough. Next digit turns out to be {3}, and this hurts my trend. The new regression line {y=2.67+0.27n} has smaller slope, and it crosses the old one at {n\approx 2.7}.

Next digit, not as good
Next digit, not as good

But we all know that {3} can be easily changed to {8}. The old “professor, you totaled the scores on my exam incorrectly” trick. Finding a moment when none of the {\pi}-obsessed people are looking, I change the decimal expansion of {\pi} to {3.1 41592658\dots}. New trend looks even better than the old: the regression line became steeper, and it crosses the old one at the point {n\approx 2.7}.

Much better!
Much better!

What, {2.7} again? Is this a coincidence? I try changing the {9}th digit to other numbers, and plot the resulting regression lines.

What is going on?
What is going on?

All intersect at the same spot. The hidden magic of {\pi} is uncovered.

(Thanks to Vincent Fatica for the idea of this post.)

5 thoughts on “When the digits of pi go to 11”

  1. To be more precise, Vincent Fatica observed (empirically, and so conjectured) that if the regression line for N data points and the regression line for N+1 data points (where the additional point (x,y) is added to the original data) meet in one point, then that point is independent of the value of y.

  2. It is interesting to examine the marginal effects of an additional point to the regression line. Let the initial independent variables be x_i, and response variables y_i, i ranging from 1 to n. The regression line goes through the center of mass, \bar{x} and \bar{y}. This together with its slope completely determines the regression line. Setting first derivatives to 0, we get the analytic formula for slope k=\frac{E(xy)-\bar{x}\bar{y}}{E(x^2)-\bar{x}^2}. The regression line is thus \frac{x-\bar{x}}{y-\bar{y}}=k. Now adding another point (x_{n+1}, y_{n+1}), I get the perturbed slope k'=\frac{E(xy)-\bar{x}\bar{y}+\frac{(\bar{x}-x_{n+1})(\bar{y}-y_{n+1})}{n+1}}{E(x^2)-\bar{x}^2+\frac{(\bar{x}-x_{n+1})^2}{n+1}}. This is intuitive in that if the new point (x_{n+1}, y_{n+1}) is reasonable close to the original center of mass, or if the data set n is large enough, then the effects of that additional point on slope of the regression line is negligible. Because the new line now cuts through the new center of mass (\bar{x}', \bar{y}')=(\frac{n\bar{x}+x_{n+1}}{n+1}, \frac{n\bar{y}+y_{n+1}}{n+1}), we can write the new line as \frac{x-\bar{x}'}{y-\bar{y}'}=k'. The lines intersect at \displaystyle x=\frac{k-k'}{k\bar{x}-k'\bar{x}'-\bar{y} +\bar{x}'}. However, I have trouble seeing why this is independent of x_{n+1} and y_{n+1}.

  3. Proof by simulation? 🙂

    I wrote the following R snippet to simulate the scenario using Normal random variates.

     
    #-------------------------------------------------------------------------
    # A function that will return the x-coordinate of the intersection
    # for the new (x,y) pair
    #-------------------------------------------------------------------------
    SequentialSimpleLinRegressionSimulate <- 
    function(NewX, NewY, N=100, muX=0, sigX=100, muY=0, sigY=100, SEED=123)
    	{
    	set.seed(SEED)
    	OriginalXvec <- rnorm(N, mean = muX, sd=sigX)
    	OriginalYvec <- rnorm(N, mean = muY, sd=sigY)
    	NewXvec <- c(OriginalXvec, NewX)
    	NewYvec <- c(OriginalYvec, NewY)
    	RegrModelOriginalData <- lm(OriginalYvec ~ OriginalXvec)
    	Intercept1 <- 
    		coef(summary(RegrModelOriginalData))["(Intercept)","Estimate"]
    	Slope1 <- 
    		coef(summary(RegrModelOriginalData))["OriginalXvec","Estimate"]
    	RegrModelNewData <- lm(NewYvec ~ NewXvec)
    	Intercept2 <- 
    		coef(summary(RegrModelNewData))["(Intercept)","Estimate"]
    	Slope2 <- 
    		coef(summary(RegrModelNewData))["NewXvec","Estimate"]
    	xCoordLinesIntersect <- (Intercept2-Intercept1)/(Slope1-Slope2)
    	return(xCoordLinesIntersect)
    	}
    #-------------------------------------------------------------------------
    # Fixing x=0 for example, vary y from -100K to +100K by 5K's
    #-------------------------------------------------------------------------
    for (TestY in seq(-100000, 100000, 5000))
    	{
    	print(
    	c(TestY,
    	SequentialSimpleLinRegressionSimulate(0,TestY)))
    	}
    #-------------------------------------------------------------------------
    

    Following is the output:
    -100000 921.4815
    -95000 921.4815
    -90000 921.4815
    -85000 921.4815
    -80000 921.4815
    -75000 921.4815
    -70000 921.4815
    -65000 921.4815
    -60000 921.4815
    -55000 921.4815
    -50000 921.4815
    -45000 921.4815
    -40000 921.4815
    -35000 921.4815
    -30000 921.4815
    -25000 921.4815
    -20000 921.4815
    -15000 921.4815
    -10000 921.4815
    -5000 921.4815
    0 921.4815
    5000 921.4815
    10000 921.4815
    15000 921.4815
    20000 921.4815
    25000 921.4815
    30000 921.4815
    35000 921.4815
    40000 921.4815
    45000 921.4815
    50000 921.4815
    55000 921.4815
    60000 921.4815
    65000 921.4815
    70000 921.4815
    75000 921.4815
    80000 921.4815
    85000 921.4815
    90000 921.4815
    95000 921.4815
    100000 921.4815

  4. If Maple is correct, the x coordinate of the intersection is
    – (ax – c) / (a – nx)
    where x is the x coordinate of the added (n+1st) point, a is Sum(x,1..n) and c is Sum(x^2,1..n).
    That says the x coordinate of the intersection is independent of y.
    ftp://lucky.syr.edu/math/regress.zip contains a DOCX file and an MW file (neither highly polished) showing what I did.

  5. As Prof. Hyune-Ju Kim points out, this “conjecture” follows from Cook (Detection of Influential Observation in Linear Regression, Technometrics, V19(1), 1977, pp. 15-18) . Combining equations (5) and (6), that paper shows at the left bottom of p.16 that with and without an observation i, the difference in parameter estimates ((\widehat{\alpha}-\widehat{\alpha}_{(-i)}, \widehat{\beta}-\widehat{\beta}_{(-i)})) is r_i times a function of the x‘s, where r_i being the residual. The residual term gets cancelled when the x-coordinate of the intersection (intercept difference divided by slope difference) is computed. This leaves a function of the x-values.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s