Why are answer rates dropping?
(Originally published on meta.stackoverflow Stack Exchange by Jon Ericson.)
Questions really are answered less often and getting fewer answers today than they were in the past:
If you want to look at the query, it's on SEDE. I've filtered out confounding factors:
-
I'm only considering answers within 30 days of the question being asked. This eliminates the effect of ancient unanswered questions getting their first answer years later. (2020 data is a bit skewed because we haven't experienced the entire 30-day window just yet.)
-
I'm not looking at questions with a negative score, or those that were closed or deleted within the 30-day window. We know there is an increase in bad questions, so I want to remove that factor from this analysis. (Not remove it entirely, as you will read. But just remove the obvious problem with bad questions not being answerable.)
-
I'm using our standard definition of "answered" which doesn't count answers that haven't been either upvoted or accepted. Somebody has to have vouched for at least one answer or we don't consider the question answered yet.
In case it isn't clear, unanswered_rate
is the ratio of questions that didn't get positively-scored or accepted answer within the 30-day window. answer_rate
counts the average number of answers with a positive score or that have been accepted. So not only are there more questions that don't get answered, questions are less likely to get multiple answers. One thing that has changed is questions have narrowed considerably in scope over the years. It wasn't unusual for questions to get dozens of answers in 2008, but that's pretty rare these days.
I don't think the reason behind the trend is very complicated:
The questions
line includes all questions, including downvoted, closed and/or deleted. answerers
counts how many people have contributed at least one upvoted answer in the year. Questions have far outpaced people willing to answer them since the beginning and we've seen fewer answerers since the peak (320,124) in 2016. Each question represents a measure of work to answer, close or edit into shape. As users (such as yours truly) tire of doing that work, we need to replace them with new volunteers. We've simply fallen behind in the last two years.
In my association with the site over the last ten years, I've proposed a number of reasons why people might leave:
-
Extrinsic motivation of the reputation system is unsustainable.
-
New questions are likely to be more narrowly focused on debugging than in the past.
-
The company and the community have an unhealthy relationship.
This isn't a comprehensive list, but the upshot is there are many reasons people stop answering and not all of them are fixable with our current toolset. Even as question rates have dropped in the past two years (-10%), the number of users who have answered has fallen faster (-23%). As long as new users - retiring users < 0
we're going have a problem. (Unless you like the idea of more questions of varying quality with fewer people to handle them, of course.)
Naturally Meta has focused on retiring users: "If we could just fix a set of problems on the site, we'd stop losing valuable contributors." But that's only half the problem. Inevitably people will retire even if it's just that they are retiring in the real world and spending their time playing with the grandchildren instead of answers Stack Overflow questions. From 2008 to 2012, Stack Overflow saw meteoric growth then leveling off until 2015 or so. In the last few years we saw a drop off in the number of new answerers:
Ignore the last point, obviously. We're a long way from knowing what will happen this year. That said, our plans for 2019 include initiatives such as the Ask a Question wizard to reduce bad questions (and hopefully increase good ones) and Custom Question Lists to help users find questions to answer. There are a few other projects we're considering that I hope will be announced soon. However, if these technical tools are to work, we'll also need to make cultural adjustments. As Clay Shirky has pointed out:
But you cannot completely program social issues either. So you can't separate the two things, and you also can't specify all social issues in technology. The group is going to assert its rights somehow, and you're going to get this mix of social and technological effects.
Please direct comments to the original post.