35 categories of Stack Overflow comments

Google’s BigQuery dataset now includes Stack Overflow data dump, including the text of over 50 million comments posted on the site. What do these comments say? I picked the most frequent ones and grouped them by topic. The counts are an underestimate: there is only so much time I was willing to spend organizing synonymous comments.

  1. Thank you” comments (128960 in total) are the most common by far. Typical forms: Thank you very/so much!, Thanks a lot :), Perfect, thanks! The popularity of the emoticon in the second version is attributable to the minimal length requirement for comments: they must contain at least 15 characters. The laziest way to pad the text is probably Thank you……
  2. You are welcome” (50090), presumably posted in response to group 1 comments. You’re welcome. You’re welcome! You’re welcome 🙂 Users need that punctuation or emoticon to reach 15 characters. Although those not contracting “you are” don’t have this problem.
  3. Updated answer” (30979) invites whoever raised objections about the previous version of the answer to read it again.
  4.  “What is your question?” (20830) is the most common type of critical comments toward questions.
  5. This is not an answer” (17306) is the most common criticism for answers; usually posted automatically by reviewers. Typical form: This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. Another one is for questions posted in the answer box: If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context.
  6. What error are you getting?” (13439) is a request for debugging information.
  7. What have you tried?” (12640) often comes with the link whathaveyoutried.com and is a sufficiently notorious kind of comments that Stack Overflow software deletes them if anyone “flags” the comment. And it’s easy to cast flags automatically, so I substantially reduced the number of such comments since this data dump was uploaded. Further context: Should Stack Overflow (and Stack Exchange in general) be awarding “A”s for Effort?
  8. Post your code” (11486) can sometimes be a form of “what have you tried”; in other times it’s a logical response to someone posting an error message without the code producing it. Can you post your code? Post your code. Please post your code. Show your code. Where is your code? And so on.
  9. It does not work” (9674) — either the question author, or someone else with the same issue did not benefit from the solution. Maybe it’s wrong, maybe they used it wrong.
  10. This is a link-only answer” (9501) usually comes from reviewers in the standard form While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes.
  11. I updated the question” (8060), presumably in response to critical comments.
  12. Why the downvote(s)?” (6005) is asking whoever voted down the post to explain their position. Usually fruitless; if the voter wanted to say something, they would already.
  13. This is a duplicate question” (3859) is inserted automatically when someone moves for a question to be marked as a duplicate. Such comments are normally deleted automatically when the required number of close-votes is reached; but some remain. The most common by far is possible duplicate of What is a Null Pointer Exception, and how do I fix it? 
  14. I edited your title” (3795) is directed at users who title their questions like “Java: How to read a CSV file?”, using a part of the title as a tag. Standard form: I have edited your title. Please see, “Should questions include “tags” in their titles?“, where the consensus is “no, they should not”.
  15. Post a MCVE” (3775) – the line on which the error is thrown is probably not enough to diagnose the problem; on the other hand, a wall of code with an entire program is too much. One of standard forms: Questions seeking debugging help (“why isn’t this code working?”) must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example.
  16. That is correct” (3158)  usually refers to a statement made in another comment.
  17. It works” (3109) is the counterpart of group 9 above. Often used with “like a charm” but do charms actually work?
  18. What do you mean?” (2998) – for when an exchange in comments leads to more confusion.
  19. What tool are you using?” (2649) indicates that the question author forgot to specify either the language, OS, or the DBMS they are using.
  20. Good answer” (2607) – various forms of praise, This should be the accepted answer. This is the correct answer. Excellent answer! The first form additionally indicates that the question author did not pick the best answer as “accepted”.
  21. This question is off-topic” (2377) is a template for close votes with a custom explanation. For some years Stack Overflow used This question appears to be off-topic because… but then switched to the more assertive I’m voting to close this question as off-topic because…
  22. This is a low quality answer” (2003) is a response to answers that contain nothing but code, perhaps preceded by “try this”. Example: While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.
  23. Is this homework?” (1995) is not a particularly fruitful type of comments.
  24. Does this work?” (1916) is meant to obtain some response from question asker who has not yet acknowledged the answer.
  25. The link is dead” (1250) is a major reason why group 10 comments exist.
  26. http://stackoverflow.com/help/ . . .” (1117) and nothing but the link. Directs to one of Help Center articles such as “How to ask”. Maybe there should also be “How to Comment”
  27. Thanks are discouraged” (1046) … so all those group 1 comments aren’t meant to be. But this is mostly about posts rather than comments. Unlike forum sites, we don’t use “Thanks”, or “Any help appreciated”, or signatures on Stack Overflow. See “Should ‘Hi’, ‘thanks,’ taglines, and salutations be removed from posts?.
  28. Format your code” (967) – yes, please. Select the code block and press Ctrl-K. Thanks in advance. Oops, forgot about the previous group.
  29. What doesn’t work?” (926) is a response to vague comments of group 9.
  30. Don’t use mysql_* functions” (693) or Russian hackers will pwn your site. Comes with a link-rich explanation: Please, don’t use `mysql_*` functions in new code. They are no longer maintained and are officially deprecated. See the red box? Learn about prepared statements instead, and use PDO or MySQLithis article will help you decide which. If you choose PDO, here is a good tutorial.
  31. Add tags” (625) often comes up in the context of database questions. Which RDBMS is this for? Please add a tag to specify whether you’re using `mysql`, `postgresql`, `sql-server`, `oracle` or `db2` – or something else entirely.
  32. Improve title” (585) is like group 14, but invites the user to edit the title instead of doing it for them.
  33. Use modern JOIN syntax” (301) bemoans obsolete ways of dealing with databases. Bad habits to kick : using old-style JOINs – that old-style *comma-separated list of tables* style was replaced with the *proper* ANSI `JOIN` syntax in the ANSI-92 SQL Standard (more than 20 years ago) and its use is discouraged.
  34. More SQL woes” (272) is another template: Side note: you should not use the `sp_` prefix for your stored procedures. Microsoft has reserved that prefix for its own use (see *Naming Stored Procedures*, and you do run the risk of a name clash sometime in the future. It’s also bad for your stored procedure performance. It’s best to just simply avoid `sp_` and use something else as a prefix – or no prefix at all!
  35. I have the same problem” (241) is a kind of comments that really should not exist.