Why are there so many bad questions?
(Originally published on meta.stackoverflow Stack Exchange by Jon Ericson.)
I'm going to be using statistics from the week of October 8 to 14, 2017. There wasn't anything odd about that week as far as I can tell. Looking at a week avoids the weekend lull in asking that would come from looking at a day. That week there were about 145 million page views. A little less by Quantcast numbers:
And little more the Google Analytics numbers we include in the site analytics page available to 25k users. Both sources of traffic data estimate visits for the week at about 57 million. That could mean someone who came to get help with a problem they are trying to solve or someone who wants to participate on the site.
95% of those page visits are via search (also a 25k-only link) and the vast majority of those are Google searches. (There's a chance some of those are people who type 'stackoverflow' in their browser's search bar. But those are potentially offset by the miscategorization of duckduckgo.com traffic.) So we can guess that about 54 million of visits were from people who had some sort of problem they needed help with. (Could be more, of course. It's not unusual for me do one search after another while working on a coding project.)
There were 60,007 questions that week, which is about 1% of the search traffic. 2585 of those questions have been closed as duplicates so far, but it's likely that undercounts search failure by a large margin. Even so, it's obvious that most people who visit either find the answer they are looking for via search or give up before asking their question on Stack Overflow. We can certainly encourage people to do more searching, but it's not the most promising line of attack.
For A/B testing we define "bad questions" as those that have:
- a negative score (13,394) or
- been closed (6,127) or
- been deleted (8,042).
Altogether, there were 18,946 bad questions by that definition. (All of these numbers are from this query. As always, please do your own analysis.) 32% badness a lot less than Sturgeon's law, but we can probably do better. I'd say we have done better as "badness" was much smaller (19%) seven years ago. But quite a bit of the increase in came from an increase in question downvoting as a result of removing the reputation penalty.
There are a number of blocks and filters in place to slow down askers and encourage them to ask better:
attempts users action
-------- ----- ------
60007 47914 asked
13629 4743 Unformatted code
9219 2987 Question with too much code and little context
8770 3164 Question rejected due to low quality
6285 3496 Question blocked
5666 3925 CAPTCHA test failed
5147 1396 New user question throttle
1355 944 Question title word filter tripped
1282 422 Post blocked for containing a link that requires code, without any code
860 472 Post blocked for too many links
449 187 Attempted to post duplicate question
448 214 A post was blocked for having > 1 image
216 53 Post blocked because IP blocked for spam
58 6 Daily question limit reached
Clearly, these don't always prevent a bad question from landing on the site. When the system detects that the user has just dumped code in the body of the question, it gives this error:
It looks like your post is mostly code; please add some more details.
But there's no way to systematically tell if added text is useful details or just words to get around the filter. That's the bad news. The good news is we can use this data to estimate the causes of bad questions:
- Code dumps
- Grammar, spelling and other "quality" problems
- Problem titles
- Misusing links
- Asking too many questions or too many poor questions too quickly
Once the system has a pretty good idea that a user is not going to ask good questions, it starts blocking that user. In that week, 3496 users were prevented from asking at least once. We've made it harder to bypass the block by creating a new account, but it obviously happens sometimes.
Ultimately, we have millions of people visiting the site. While there are many prompts, blocks and limits slowing down questions, the small percentage of visitors who make it through turns out to be a large absolute number. For better or worse, the barriers to asking mean that the most determined (or, perhaps, desperate) users actually post. I'm sure you know decent programmers who have no interest in posting on Stack Overflow because of our reputation for harsh criticism. Those folks don't ask, but probably would produce better than average questions. Meanwhile, the people who just don't care about anything but the chance to get help from a programmer are willing to put up with just about any crap we throw at them.
One theory that people often raise when it comes to the cause of bad questions is that we could reduce the problem by just not answering bad questions. There's a good deal of sense in that argument. Research on artificial societies does suggest that a relatively small amount of reinforcement can encourage undesirable behavior in a population. Bad questions are answered about 28% of the time compared to 53% for all questions. Depending on what you value, asking on Stack Overflow with 28% might be worth putting up with even more filters, blocks, warnings, downvotes, critical comments and so on.
A good way to prevent people from answering bad questions would be to close them more quickly. But another way would be to increase the absolute number of good, answerable questions. Think about why people answer questions. If you have a bit of time and would like to help someone out or earn reputation, you might look around for a Stack Overflow question to answer. Sorting by upvotes will show you good questions that probably already have answers or, if not, are difficult to solve. These aren't likely to be productive in terms of helping others or reputation, so answerers need to look further down the list to find unanswered (but answerable) questions.
I think most users would rather answer good questions, but don't feel their contributions will be worthwhile. I certainly noticed that about myself when I first started contributing to Stack Overflow and, more recently, when I tried to get a sock puppet to 1k. Adding filters and blocks and warnings have signalled to sensible people who can read the signs that we don't want their questions. We are running out of obstacles we can throw in people's ways. So that's why I'm glad we are tackling the other end of the problem: encouraging askers to post more useful questions.
Some people have expressed surprise that so many asking attempts have been blocked. This isn't really anything new:
For those who are curious, the spike in June and July of 2016 came from the "Post blocked for too many links" filter. Basically, spam. Other than that, the most common hurdle for people is posting unformatted code. (Again, the data from October 8 to 14, 2017 is typical in this regard.) I didn't dig into the data to be sure, but I suspect most of those folks eventually get around the block by either fixing their code formatting or fiddling with the text of the question until the algorithm is satisfied. Either way, we can do better.
Please direct comments to the original post.