Given: websites on which users can post stuff. For the th site gets posts per day on average. Some of the posts are spam or otherwise harmful.
One of the ways to detect spam automatically is to send an API request to any of the sites, get post content and analyze it. The total number of such requests cannot exceed per day, while the total number of posts, namely , is substantially larger.
This necessitates rationing. The current strategy is to wait until a site has new posts, and then request all of those at once. The upside is that only requests are needed. The downside is that spam may sit undetected for a period of time proportional to .
One would like to minimize the mean lifetime of spam. Assuming for now that spam is uniformly distributed among the sites, this lifetime is proportional to
So, we need to minimize subject to . The method of Lagrange multipliers says that the gradient of objective function should be proportional to the gradient of constraint. Hence,
with independent of . This leads to .
That is, the size of API queue should be proportional to the square root of the level of activity.
An excellent proxy for overall activity on a site is QPD, questions per day. For example, the QPD ratio of Mathematics to Computer Science is currently 452:11, so the ratio of queue sizes should be about (in reality, the queue size is currently and , respectively).
The value of is easily found from the condition (pushing the API quota to the limit), with the final result
In practice, spam is unevenly distributed. Let’s say that is the proportion of spam on the th site. Then the mean lifetime is proportional to
This time, the method of Lagrange multipliers yields
hence . That is, the size of API queue should be proportional to .
Theoretically, the optimal size is
In practice, one does not know beyond some general idea of how spammy a site is. Drupal Answers gets a lot more than Computer Science; so even though its QPD exceeds CS by the factor of about four, its API queue size is set to the minimal possible value of .