Moderatorship connections between Stack Exchange sites

Consider the graph in which the vertices are Stack Exchange sites, connected if two sites share a moderator. What does this graph look like? As being a moderator correlates with some level of expertise or at least interest in the topic, we can expect the structure of the graph to highlight topical connections between the sites.

Unsurprisingly, there is a dominant component (55 sites) centered at Stack Overflow. (Click the image for the larger version.)

SO_component
Stack Overflow component

Its radius is 6: every site can be joined to Stack Overflow by a path of length at most 6. The five sites at distance 6 from SO are Lifehacks, Martial Arts, Physical Fitness, Space Exploration, and Travel.

By the triangle inequality, the diameter of the SO component is at most 12. In fact it is exactly 12: it takes 12 steps to go from Space to either Fitness, Lifehacks, or Martial Arts.

The center of the SO component is not unique: Web Applications is another site from which every other one can be reached in 6 steps. And it has an advantage over SO in that only four sites in the component are at distance 6 from Web Apps. Naturally, these are the four sites that realize the diameter 12 of the component.

That said, Stack Overflow is the vertex of highest degree (8), meaning that Stack Overflow moderators also take part in moderating eight other sites.

What about other components? Excluding isolated vertices, we have the following picture of small components.

other_components
Other nontrivial components

The largest of these is the “language component”: English Language & Usage, English Language Learners, Japanese Language, Portuguese Language, and less logically, Sustainable Living. Other notable components are Unix (with bioinformatics thrown in) and Mathematics.


Source of the information: on the page listing moderators grouped by users I executed a JavaScript one-liner

JSON.stringify(Array.from(document.querySelectorAll('.mods-summary-list')).filter(e=>e.children.length>1).map(e=>Array.from(e.children).map(x=>x.hostname.split('.')[0])))

which gave a list of connections. The rest was done in Python:

import networkx as nx
from itertools import chain, combinations
connections = # copied JS output
edges = list(chain.from_iterable(combinations(c, 2) for c in connections))
G = nx.Graph()
G.add_edges_from(edges)
components = sorted(list(nx.connected_components(G)), key=len)
main_component = G.subgraph(components[-1])
pos = nx.kamada_kawai_layout(main_component)
nx.draw_networkx(main_component, pos=pos, node_size=100)

I used different layout algorithms for the dominant component and for the rest. The default “spring” layout makes the SO component too crowded, but is okay for small components:

other_components = G.subgraph(chain.from_iterable(components[:-1]))
pos = nx.spring_layout(other_components, k=0.25) 
nx.draw_networkx(other_components, pos=pos, node_size=100)

The rest was done by various functions of the NetworkX library such as

nx.degree(main_component)
nx.center(main_component)
nx.diameter(main_component)
nx.periphery(main_component)
nx.radius(main_component)
nx.shortest_path_length(main_component, source="stackoverflow")
nx.shortest_path_length(main_component, source="webapps")

The infinitely big picture: tanh-tanh scale

When plotting the familiar elementary functions like x2 or exp(x), we only see whatever part of the infinitely long curve fits in the plot window. What if we could see the entire curve at once?

The double-tanh scale can help with that. The function u = tanh(x) is a diffeomorphism of the real line onto the interval (-1, 1). Its inverse, arctanh or artanh or arth or {\tanh^{-1}x} or {\frac12 \log((1+x)/(1-x))}, whatever you prefer to call it, does the opposite. So, conjugating any function {f\colon \mathbb R\to \mathbb R} by the hyperbolic tangent produces a function {g\colon (-1, 1)\to (-1,1)} which we can plot in its entirety. Let’s try this.

Out of linear functions y = kx, only y=x and y=-x remain lines.

linear
Linear functions

The powers of x, from 1 to 4, look mostly familiar:

x1-4
x, x^2, x^3, x^4

Sine, cosine, and tangent functions are not periodic anymore:

sincostan
sin(x), cos(x), tan(x)

The exponential function looks concave instead of convex, although I don’t recommend trying to prove this by taking the second derivative of its tanh-conjugate.

exp
exp(x)

The Gaussian loses its bell-shaped appearance and becomes suspiciously similar to a semicircle.

gaussian
exp(-x^2/2)/sqrt(2pi)

This raises the question: which function does appear as a perfect semi-circle of radius 1 on the tanh-tanh scale? Turns out, it is {f(x) = \log|\coth(x/2)|}. Here it is shown in the normal coordinate system.

logcoth
log|coth(x/2)| without tanh conjugation

 

Graphs with an invertible incidence matrix

Graphs are often encoded by their adjacency matrix: a symmetric matrix where {1} in the entry {(i,j)} means there is an edge between vertices labeled {i} and {j}. Another useful encoding is the incidence matrix, in which rows correspond to edges and columns to vertices (or the other way around). Having {1} in the entry {(i,j)} means that {i}th edge is incident to {j}th vertex. Simply put, each row contains exactly two nonzero entries, which indicate the endpoints of an edge.

An incidence matrix is generally rectangular. It is the square matrix when the graph has as many edges as vertices. For example, the 4-cycle has incidence matrix

{M = \displaystyle \begin{pmatrix} 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 1 \end{pmatrix}}

(up to relabeling of vertices). This matrix is not invertible. Indeed, {M(1,-1,1,-1)^T=0}, which is a reflection of the fact that the graph is bipartite. Indeed, for any bipartite graph the {\pm 1} vector describing a 2-coloring of the vertices belongs to the kernel of the incidence matrix.

The incidence matrix of an odd cycle is invertible, however. What other such “invertible” graphs exist? Clearly, the disjoint union of “invertible” graphs is itself invertible, so we can focus on connected graphs.

There are no invertible graphs on 1 or 2 vertices. For 3 through 8 vertices, exhaustive search yields the following counts:

  • 1 graph on 3 vertices (3-cycle)
  • 1 graph on 4 vertices (3-cycle with an edge attached)
  • 4 graphs on 5 vertices
  • 8 graphs on 6 vertices
  • 23 graphs on 7 vertices
  • 55 graphs on 8 vertices

The numbers match the OEIS sequence A181688 but it’s not clear whether this is for real or a coincidence. The brute force search takes long for more than 8 vertices. It’s probably better to count such graphs by using their (folklore?) characterization as “trees planted on an odd cycle”, described by Chris Godsil on MathOverflow.

Some examples:

graph7_4

graph8_52

graph8_7

The examples were found and plotted by this Python script.

import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from itertools import combinations

def is_invertible(graph, n):
    matrix = []
    for i in range(n):
        row = [0]*n
        row[graph[i][0]] = 1
        row[graph[i][1]] = 1
        matrix.append(row)
    return abs(np.linalg.det(matrix)) > 0.5

n = 6   # number of vertices
graphs = combinations(combinations(range(n), 2), n)
inv = []
for g in graphs:
    if is_invertible(g, n):
        gr = nx.Graph(g)
        if nx.is_connected(gr) and not any([nx.is_isomorphic(gr, gr2) for gr2 in inv]):
            inv.append(gr)
            nx.draw_networkx(gr)
            plt.savefig("graph{}_{}.png".format(n, len(inv)))
            plt.clf()

print("There are {} connected invertible graphs with {} vertices".format(len(inv), n))

Random graphs with two-sided degree bounds

I wanted some random examples of graphs where every vertex has degree between two given parameters, minDegree and maxDegree. Like this one:

15 vertices, degree from 3 to 4
15 vertices, degree from 3 to 4

The approach I took was very simple (and not suitable for construction of very large or very regular graphs). Each edge appears with probability p. If the minimal degree is too small, this probability is multiplied by 1.1. If the maximal degree is too big, the probability is divided by 1.1. Either way, the process repeats until success.

g20-1-5
20 vertices, degree from 1 to 5

So far, this blog had too much Matlab/Scilab and not enough Python. I’ll try to restore the balance. Here, numpy generates random matrices and takes care of degree restrictions; networkx lays out the graph.

import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

vertices = 15
minDegree = 3
maxDegree = 4
p = 0.5
success = False

while not success:
    a = (np.random.random((vertices, vertices)) < p).astype(int)
    a = np.maximum(a, np.matrix.transpose(a))
    np.fill_diagonal(a, 0)
    s = a.sum(axis=1)
    success = True
    if min(s) < minDegree:
        success = False
        p = p * 1.1
    if max(s) > maxDegree:
        success = False
        p = p / 1.1
g = nx.Graph(a)
nx.draw_networkx(g)
plt.show()

One more time:

g25-2-4
25 vertices, degree from 2 to 4