Stack Exchange sites: questions and visits

After a year, it is time to update the Hertzsprung–Russell diagram of Stack Exchange sites. Here is the new and much improved version: a bubble chart made with Google Spreadsheets. The data is imported with ImportXML function, which apparently means it will be automatically refreshed every two hours. Just in case, here is a (heavily zoomed out) static image. It’s not as usable as the interactive chart you see above, because one can’t read the text on obscured bubbles by hovering over them.

Static snapshot of the diagram
Static snapshot of the diagram

As a rule of thumb, the sites with at least 10000 visits/day and 10 questions/day are those that “made it”: they either graduated from beta or will probably graduate in foreseeable future (assuming the quality of content is decent and the community is somewhat capable of self-organization).


A year ago, I joked about Beer, Italian Language, and Expatriates being either closed or merged into one site. Neither thing happened yet. Of the three, Expatriates site now has the largest numbers of visits and questions, though the quality is often an issue. Italian Language caught up with Russian Language (not to be confused with Русский язык, which is a recently imported site from Hashcode.ru and differentiates itself by targeting native speakers rather than language learners). Of the three, Beer appears to have the least amount of energy at present.

Some of the sites at the bottom, around 100 visits/day, are very new and will not stay there for long. The future of Community Building seems less certain.


Here’s the technical part. The data is/are obtained from this list of sites with the command

=IMPORTXML("http://stackexchange.com/sites?view=list", "//div[contains(@class,'category')]/@class|//input/@value")

The XPath parameter consists of two: //div[contains(@class,'category')]/@class picks up the class of every div element with “category” in its list of classes. This is later used to categorize as Technology, Science, Life/Arts, etc. The other part is //input/@value, which grabs all other data: number of questions, number of answers, number of users… most of which I have no use for. To sort this column by data type I put MOD(ROW()-2,9) next to it:

+------------------------------------------------+------+
|                      Raw                       | Item |
+------------------------------------------------+------+
| lv-item category-technology site-stackoverflow |    0 |
| 1217462400                                     |    1 |
| 9410733                                        |    2 |
| 15707906                                       |    3 |
| 73.95805406                                    |    4 |
| 4258590                                        |    5 |
| 7232805                                        |    6 |
| 7956.285714                                    |    7 |
| Stack Overflow                                 |    8 |
| lv-item category-technology site-serverfault   |    0 |
| 1241049600                                     |    1 |
+------------------------------------------------+------+

The columns C-F filter the content of A by item type. For example, D is the site name: =FILTER(A:A,B:B=8), etc. The Category column needs a bit of extra (routine) work to make it user-friendly. Otherwise, the table is finished:

+---------------------+-------------------+----------------+
|        Name         | Questions per day | Visits per day |
+---------------------+-------------------+----------------+
| Stack Overflow      | 7956.3            |        7232805 |
| Server Fault        | 103.1             |         353739 |
| Super User          | 162.9             |         734077 |
| Meta Stack Exchange | 21.8              |           8680 |
| Web Applications    | 12.7              |          44341 |
| Arqade              | 36.5              |         247448 |
| Webmasters          | 13.6              |          12433 |
| Seasoned Advice     | 6.9               |          80308 |
| Game Development    | 21.1              |          25836 |
| Photography         | 8.5               |          26560 |
| Cross Validated     | 84.4              |          63317 |
| Mathematics         | 616.4             |         150744 |
| Home Improvement    | 17.3              |          71148 |
+---------------------+-------------------+----------------+

The bubble chart uses this table for the log-log scale plot that you see above.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s