A Coherent Rating System for Scientific Learning Resources

Submitted by gsadmin on April 12th, 2011   |   Share this post:

The time has come: we need to figure out a great rating scheme. This post isn’t so much a call for help as it is a way for me to organize my thoughts before I seek assistance. I’m guessing you’ll find the Cliff Notes of this post on StackOverflow or Reddit in the very near future.

It’s likely, very likely, that this particular situation has a pre-existing solution that I am not familiar with so I’m going to assume that we’re not reinventing the wheel here. However, it’s important to get this right from the start as there’s no easy way to go back and fix it.

The Mission

Our site relies on registered and logged in users to vote on each item within our system. Some of these items are content pages on our site and some of them are links to other resources. The rating system should work the same for both types.

Content items under each individual subcategory (only two levels of categories: parent and child) need to be sorted and displayed in ascending or descending order based on their rating.

This rating, as I see it, needs to be an average rating based on all votes over time. The votes are whole numbers from 1 to 5, inclusive while the rating displays one decimal place. There can be, in theory, an unlimited number of votes on each item.

New Items

Items submitted to the site will collect in a moderation queue before being made public. We want to be as aggressive as possible when it comes to spam, particularly in our content, and the only way to ensure this is to manually review each item. An item becomes “published” when a moderator votes on the particular item (one vote causes the system to change the content item’s status). Here’s the first problem.

To publish a content item, a vote must be placed for it (no problem). But what should that vote be? Here are my two thoughts:

  • Publishing could just set the rating to “not rated.” The first person to rate that content would start the ball rolling.
  • Publishing an item could set the rating right in the middle, 3 out of 5. This sets a middle standard and allows content to float or sink from that middle.

After writing this out, I believe I’m partial to the second option. This would give new content a better chance and avoid gaming of the system (voting your own content as a 5 right when it’s published).

Rating on Existing Items

The voting system that is in place records each vote placed by each user. When the list of content links is displayed, each story calls forth all of its votes, calculates an average of some sort, then displays this number. I’m guessing that this system is what prevents duplicate votes.

Here’s the rub: we need to display content based on its overall rating and, right now, this rating is not being stored. I need to figure out the best way to store this rating and I’m tempted to just go with the most simple thing I can think of: create a new average based on the new vote. An example;

If a content link has been voted on 20 times (the system records the number of votes) then the rating is the average score of those 20 votes. If another vote happens, we calculate the rating as such:

(current rating * total number of votes + new rating) / new total number of votes

Each new vote should calculate and then store a new average in the system for that particular link. It seems simple but I wonder if I’m missing something…

Inherent in this line of thinking is the question: is a completely democratic method of voting the right way to sort content? The system has a way to rate members; is it a good idea to incorporate this user rating into the voting? At present, I’m not too familiar with the user rating system and it’s inherent flaws so I’m a bit wary about plugging this mystery system into something that already works.

Edit: I think I like the idea of a Bayesian approach. Source:  http://stackoverflow.com/questions/1411199/what-is-a-better-way-to-sort-by-a-5-star-rating

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

where:

  • R = average rating for the content item
  • v = number of votes for the content item
  • m = minimum votes required to be published (1)
  • C = the mean vote across the whole report (will likely just go with the middle, 3)

Displaying Items

The biggest issue here is how the items should be displayed. A common problem among rating sites is poor sorting of the rated content links. Something with 1 vote of 5 should not be higher than one of 500 votes with an average of 4.5. I think some of this will be handled by the published vote of three (see above) but possibly not.

By using the Bayesian equation above, I think some of this is taken care of, especially if we give it a few extra significant figures. The first term approaches one from zero, causing WR to grow as the total number of votes, v, grows. The second term does not affect the final WR as much as the first.

Still, this article talks about sorting based on rating and points out a few sites that do it wrong.

http://www.evanmiller.org/how-not-to-sort-by-average-rating.html