There is an upward trend in the digits of . I just found it using Maple.

```
X := [0, 1, 2, 3, 4, 5, 6, 7, 8]:
Y := [3, 1, 4, 1, 5, 9, 2, 6, 5]:
LinearFit([1, n], X, Y, n);
2.20000000000000+.450000000000000*n
```

Here the digits are enumerated beginning with the th, which is . The regression line predicts that the th digit of is approximately .

But maybe my data set is too small. Let’s throw in one more digit; that ought to be enough. Next digit turns out to be , and this hurts my trend. The new regression line has smaller slope, and it crosses the old one at .

But we all know that can be easily changed to . The old “professor, you totaled the scores on my exam incorrectly” trick. Finding a moment when none of the -obsessed people are looking, I change the decimal expansion of to . New trend looks even better than the old: the regression line became steeper, and it crosses the old one at the point .

What, again? Is this a coincidence? I try changing the th digit to other numbers, and plot the resulting regression lines.

All intersect at the same spot. The hidden magic of is uncovered.

(Thanks to Vincent Fatica for the idea of this post.)

To be more precise, Vincent Fatica observed (empirically, and so conjectured) that if the regression line for N data points and the regression line for N+1 data points (where the additional point (x,y) is added to the original data) meet in one point, then that point is independent of the value of y.

It is interesting to examine the marginal effects of an additional point to the regression line. Let the initial independent variables be , and response variables , i ranging from 1 to n. The regression line goes through the center of mass, and . This together with its slope completely determines the regression line. Setting first derivatives to 0, we get the analytic formula for slope . The regression line is thus . Now adding another point , I get the perturbed slope . This is intuitive in that if the new point is reasonable close to the original center of mass, or if the data set n is large enough, then the effects of that additional point on slope of the regression line is negligible. Because the new line now cuts through the new center of mass , we can write the new line as . The lines intersect at . However, I have trouble seeing why this is independent of and .

Proof by simulation? 🙂

I wrote the following R snippet to simulate the scenario using Normal random variates.

Following is the output:

-100000 921.4815

-95000 921.4815

-90000 921.4815

-85000 921.4815

-80000 921.4815

-75000 921.4815

-70000 921.4815

-65000 921.4815

-60000 921.4815

-55000 921.4815

-50000 921.4815

-45000 921.4815

-40000 921.4815

-35000 921.4815

-30000 921.4815

-25000 921.4815

-20000 921.4815

-15000 921.4815

-10000 921.4815

-5000 921.4815

0 921.4815

5000 921.4815

10000 921.4815

15000 921.4815

20000 921.4815

25000 921.4815

30000 921.4815

35000 921.4815

40000 921.4815

45000 921.4815

50000 921.4815

55000 921.4815

60000 921.4815

65000 921.4815

70000 921.4815

75000 921.4815

80000 921.4815

85000 921.4815

90000 921.4815

95000 921.4815

100000 921.4815

If Maple is correct, the x coordinate of the intersection is

– (ax – c) / (a – nx)

where x is the x coordinate of the added (n+1st) point, a is Sum(x,1..n) and c is Sum(x^2,1..n).

That says the x coordinate of the intersection is independent of y.

ftp://lucky.syr.edu/math/regress.zip contains a DOCX file and an MW file (neither highly polished) showing what I did.

As Prof. Hyune-Ju Kim points out, this “conjecture” follows from Cook (Detection of Influential Observation in Linear Regression, Technometrics, V19(1), 1977, pp. 15-18) . Combining equations (5) and (6), that paper shows at the left bottom of p.16 that with and without an observation , the difference in parameter estimates () is times a function of the ‘s, where being the residual. The residual term gets cancelled when the -coordinate of the intersection (intercept difference divided by slope difference) is computed. This leaves a function of the -values.