(Originally published on meta.stackoverflow Stack Exchange by Jon Ericson.)


I'm going to be using statistics from the week of October 8 to 14, 2017. There wasn't anything odd about that week as far as I can tell. Looking at a week avoids the weekend lull in asking that would come from looking at a day. That week there were about 145 million page views. A little less by Quantcast numbers:

Quantcast

And little more the Google Analytics numbers we include in the site analytics page available to 25k users. Both sources of traffic data estimate visits for the week at about 57 million. That could mean someone who came to get help with a problem they are trying to solve or someone who wants to participate on the site.

95% of those page visits are via search (also a 25k-only link) and the vast majority of those are Google searches. (There's a chance some of those are people who type 'stackoverflow' in their browser's search bar. But those are potentially offset by the miscategorization of duckduckgo.com traffic.) So we can guess that about 54 million of visits were from people who had some sort of problem they needed help with. (Could be more, of course. It's not unusual for me do one search after another while working on a coding project.)

There were 60,007 questions that week, which is about 1% of the search traffic. 2585 of those questions have been closed as duplicates so far, but it's likely that undercounts search failure by a large margin. Even so, it's obvious that most people who visit either find the answer they are looking for via search or give up before asking their question on Stack Overflow. We can certainly encourage people to do more searching, but it's not the most promising line of attack.

For A/B testing we define "bad questions" as those that have:

  • a negative score (13,394) or
  • been closed (6,127) or
  • been deleted (8,042).

Altogether, there were 18,946 bad questions by that definition. (All of these numbers are from this query. As always, please do your own analysis.) 32% badness a lot less than Sturgeon's law, but we can probably do better. I'd say we have done better as "badness" was much smaller (19%) seven years ago. But quite a bit of the increase in came from an increase in question downvoting as a result of removing the reputation penalty.

There are a number of blocks and filters in place to slow down askers and encourage them to ask better:

attempts    users  action                                                                  
--------    -----  ------
   60007    47914  asked
   13629     4743  Unformatted code
    9219     2987  Question with too much code and little context
    8770     3164  Question rejected due to low quality
    6285     3496  Question blocked
    5666     3925  CAPTCHA test failed
    5147     1396  New user question throttle
    1355      944  Question title word filter tripped
    1282      422  Post blocked for containing a link that requires code, without any code
     860      472  Post blocked for too many links
     449      187  Attempted to post duplicate question
     448      214  A post was blocked for having > 1 image
     216       53  Post blocked because IP blocked for spam
      58        6  Daily question limit reached

Clearly, these don't always prevent a bad question from landing on the site. When the system detects that the user has just dumped code in the body of the question, it gives this error:

It looks like your post is mostly code; please add some more details.

But there's no way to systematically tell if added text is useful details or just words to get around the filter. That's the bad news. The good news is we can use this data to estimate the causes of bad questions:

  1. Code dumps
  2. Grammar, spelling and other "quality" problems
  3. Problem titles
  4. Misusing links
  5. Asking too many questions or too many poor questions too quickly

Once the system has a pretty good idea that a user is not going to ask good questions, it starts blocking that user. In that week, 3496 users were prevented from asking at least once. We've made it harder to bypass the block by creating a new account, but it obviously happens sometimes.

Ultimately, we have millions of people visiting the site. While there are many prompts, blocks and limits slowing down questions, the small percentage of visitors who make it through turns out to be a large absolute number. For better or worse, the barriers to asking mean that the most determined (or, perhaps, desperate) users actually post. I'm sure you know decent programmers who have no interest in posting on Stack Overflow because of our reputation for harsh criticism. Those folks don't ask, but probably would produce better than average questions. Meanwhile, the people who just don't care about anything but the chance to get help from a programmer are willing to put up with just about any crap we throw at them.


One theory that people often raise when it comes to the cause of bad questions is that we could reduce the problem by just not answering bad questions. There's a good deal of sense in that argument. Research on artificial societies does suggest that a relatively small amount of reinforcement can encourage undesirable behavior in a population. Bad questions are answered about 28% of the time compared to 53% for all questions. Depending on what you value, asking on Stack Overflow with 28% might be worth putting up with even more filters, blocks, warnings, downvotes, critical comments and so on.

A good way to prevent people from answering bad questions would be to close them more quickly. But another way would be to increase the absolute number of good, answerable questions. Think about why people answer questions. If you have a bit of time and would like to help someone out or earn reputation, you might look around for a Stack Overflow question to answer. Sorting by upvotes will show you good questions that probably already have answers or, if not, are difficult to solve. These aren't likely to be productive in terms of helping others or reputation, so answerers need to look further down the list to find unanswered (but answerable) questions.

I think most users would rather answer good questions, but don't feel their contributions will be worthwhile. I certainly noticed that about myself when I first started contributing to Stack Overflow and, more recently, when I tried to get a sock puppet to 1k. Adding filters and blocks and warnings have signalled to sensible people who can read the signs that we don't want their questions. We are running out of obstacles we can throw in people's ways. So that's why I'm glad we are tackling the other end of the problem: encouraging askers to post more useful questions.


Some people have expressed surprise that so many asking attempts have been blocked. This isn't really anything new:

Question attempts blocked percentage

For those who are curious, the spike in June and July of 2016 came from the "Post blocked for too many links" filter. Basically, spam. Other than that, the most common hurdle for people is posting unformatted code. (Again, the data from October 8 to 14, 2017 is typical in this regard.) I didn't dig into the data to be sure, but I suspect most of those folks eventually get around the block by either fixing their code formatting or fiddling with the text of the question until the algorithm is satisfied. Either way, we can do better.


Please direct comments to the original post.