(Originally published on meta Stack Exchange by Jon Ericson.)


If you look around the internet, you'll find that:

99.9 percent of comments are either spam, off-message or simply wasted electrons.—ithacaindy (Obviously part of the 0.1%.)

Stack Exchange does not have that problem. Thanks to flagging, our spam and offensive comments have a half-life of minutes. So the goal of this proposal is not to mess with a good thing. Instead, I propose we increase our information density by hiding comments that are not bad, but just trivial and likely obsolete.

Currently on Stack Overflow and other non-meta sites the top 5 comments by vote are displayed in chronological order. The sixth and following comments are hidden behind a link. For posts with many comments that means that the earliest comments have a strong tendency to be shown even if they are not particularly useful or friendly. For posts with 5 or fewer, the comments are a nearly permanent feature even if an edit or the passage of time makes them obsolete. To rectify the situation, I propose using:

Comment Weight

Each comment is given a weight from 0-29 based on the following criteria:

  • One point for every vote up to the tenth.
  • One point for each 15 characters beyond the minimum (15) capped at 9.
  • 10 - age (in days) down to 0.

A comment with no votes, less than 30 characters long, and older than 10 days will have a weight of zero. If a comment of length 150 gets ten votes on the first day, it will temporarily weigh in at 29.

Comments with a weight less than 10 are hidden.1 All remaining comments are shown in chronological order.2

Let's look at each factor individually:

Age

All comments start at a weight of 10 and decay over the next 10 days. That means that every comment is displayed for at least one day and most3 will be displayed longer. Several proposals have suggested hiding old comments. The common thread is that most comments don't matter after they have been seen by whomever they are directed to.4 Under the top 5 system, comments are assumed to be valuable unless there are too many. Under the weighting system, comments must demonstrate their value in order to be displayed. The age factor gives comments the time they need to gather support.

Length

Lukas Mathis conducted a survey that showed statistically longer is better when it comes to comments. While I could find no other studies to support or reject this claim, it does match my experience.5 Not every long comment is worth keeping around, but as the comment length approaches 150 or so, the odds are a lot better. The length factor is capped at 9 so that a commenter can't pad their work to force display. Even the longest comment must be validated by a second person upvoting it.

Score

It's very difficult to properly evaluate comments in isolation from their post. This is where users come in. Only people can tell which comments deserve pride of place. Voting allows you to decide whether or not a comment gets shown to future readers.

People aren't perfect: I've observed that votes on short comments tend to mean "Funny". But votes on longer comments tend to mean (to take a page from Slashdot) "Insightful", "Interesting", or "Informative". Comments that we want to keep around have a combination of length and upvotes.

How you can help

As a baseline, the top five scheme hides 3,059,691 comments on Stack Overflow as of November 4, 2013. There were 24,136,126 undeleted comments in total. The weight algorithm would hide 22,517,301 comments. Changing the algorithm would hide 93% of comments compared to 13% as now. I feel confident that much of that is noise, but there's bound to be some signal lost as well.

I've written a query that displays the comments that are shown by the comment weight algorithm. Please take a few minutes to explore the comments on some posts with problematic comments.6 Fork my query and tweak the algorithm. Write up your findings in an answer below. Let us know if you find significantly useful comments that would be lost or long comment threads that would be even noisier by this weighting algorithm.


  1. By which I mean, they will be behind the add / show X more comments link.

  2. There is a significant edge case surrounding very long, very heated comment threads. If we don't cap the number of comment somehow there will be posts with up to 55(!) comments shown. (See the top 100 which includes election posts.) The simplest solution is to keep the current caps (5 for main sites and 15 for metas). I'm certainly open to other ways of preventing comments from overwhelming answers in these cases.

  3. The median comment length is 113, so half of all comments will actually be shown for at least 6 days.

  4. If you need convincing, take a look at 5 random comments. This only picks short (< 100 characters) for reasons explained shortly.

  5. Take a sample of 5 long comments and see for yourself.

  6. For instance, back in August Toby Allen pointed out a “a whole load of snarky irrelevant and unhelpful comments”. Many of the worst of them have been deleted already, but there are still plenty of not-so-useful comments left.


Please direct comments to the original post.