November 5, 2017 - Added a recency multiplier to make games played in the recent past weigh more than games played much further out. This was done because I felt like the ratings weren't catching up to a team's "true" ability quick enough with more games coming into play.
Elo-ish is a system I developed back in 2016 for measuring the strength of college football teams. For me, I always enjoyed the relative simplicity of Elo-ratings and how you could derive a win probability from the rating itself.
However, I feel like Elo itself isn’t particularly well suited for college football. The Elo system was originally developed for chess, a game in which a competitor may play hundreds, if not thousands, of games over the course of a career. In college football this is not the case; there are only 12 regular season games, plus a conference championship and bowl games if you’re good enough. At most, a team will play 15 games within the course of a season, not nearly enough for you to gain a real sense of how strong they are. There are theoretical ways to overcome this, but I didn’t feel satisfied with them when testing.
The other piece of Elo-ish is something called least squares. In essence, what you do is take the observed game result and compare it to the expected game result. If the expected game result is in line with the observed result, ratings don’t change very much. However, if there is an upset or a massive victory when only a close one was expected, the ratings will change quite a bit.
Least squares also has the benefit of “linking” teams in a way that Elo doesn’t. Linking refers to the idea that all teams are connected in some way to another team by the result of a game. This allows ratings to change even without playing; one of your opponents that was rated very highly could lose badly to a team they were supposed to beat. While it may seem unfair, this will cause your rating to decline as well, since the system will now see your schedule as weaker than it once was. In a pure Elo system, if you’re rated 1700, you’re rated 1700, no matter if your opponents turn out to be overrated or underrated.
This method of having teams be interconnected to each other allows for a better measure of a team’s true strength.
First off, we need to get a measure of a team’s performance in a game or a series of games. This is done by taking the margin of victory and comparing it to all the other games in a season through the cumulative normal distribution to get a “game score.” This is then multiplied by 400 to get a team’s performance rating from a single game. The loser of the game simply has the negative of the winner’s version. This is based on the “performance rating” that is used in Elo and used to provisionally rate chess players.
Using the cumulative normal distribution for the game score is beneficial in a few ways. For one, it allows for diminishing returns on margins of victory, meaning teams can’t simply run up the score to improve their rating. It doesn’t collapse scores, meaning that it is taking information in from these massive blowout games instead of just cutting things off after an arbitrary point, but not so much that it’s going to skew the ratings. It also allows for things to scale depending on how big or small average margin of victory was in a given year, meaning that if we were to run this back across the years, we could compare teams. For example, a 14 point win in 2016 is probably not equivalent to a 14 point win in 1939.
We then add all these performance ratings together, divide them by the number of games played, and add 1500 to each to make them compatible with an Elo-scale.
After we’ve done this, we use least squares to compare the game results. In essence, we take the actual game difference and compare it to what we’d expect to happen given the ratings of the team.
First, we need the difference from the game. This actual game difference is simply the winner’s game score minus the loser’s game score – effectively the winner’s game score multiplied by 2. Then, we need the expected game difference, which is simply the winner’s rating minus the losers rating, plus or minus a home field constant, which I’ll expand on here in a minute. We then subtract the actual game difference from the expected game difference and square it to get rid of negative numbers, which yields something called the squared error.
All the squared errors from a game are added together to get something called the sum of squared errors. To get the final rating, we simply minimize the sum of squared errors with the caveat that we want the average of all teams to equal 1500. This is why this system is called “least squares,” we’re trying to get the minimum or least amount of squared errors.
That home field advantage constant I talked about earlier? That’s worth about 50 Elo-ish points, or about 2.5-3 points, generally in line with what is generally considered to be the home field advantage in college football. For other sports, it may be different.
The ratings page has four inputs. The rank and the teams are pretty self-explanatory. The “rating” is, well, the rating. This is used to make all the projections you see on the site. Elo-ish ratings are always set with the average to 1500, with higher ratings being better than lower ones.
Elo-ish ratings are used to figure out the percentage chance that your team has to win. For example, if Alabama, a 2000 rated team was to play Arkansas, a 1500 rated team, the percentage chance that Alabama won would be this:
1 / (1 + 10^((1500-2000)/400))) = 94.5% chance for Alabama to win
This is the same equation as the Elo rating system that is used to figure out the percentage chance a team or competitor has to beat another. The 400 you see as the divisor can be varied depending on the sport. This is the case with my NFL projections, which is around 600. For the NBA and college football, the divisor is set to 400.
In general, a team rated 100 points better would have a 64% chance to win, a team rated 400 points better would have a 90% chance of winning, and a team rated 800 points better would have a 99% chance of winning. Note that this is slightly different for NFL projections due to the different divisor.
The last part of the ratings are something called “power”, or power ratings. Subtracting one power rating from another gives you the amount of points a team is expected to win by at a neutral site. The power ratings are dependent on the sport: a 1700 Elo-ish rating in one sport may equal a power rating of 105 for one sport, but 120 for another. This is because the power ratings are figured via a linear regression that best matches rating differences to margin of victory and outputs how many points better or worse than average a team is.
The power ratings are set with 100 as the average. This is a completely arbitrary number; I just don’t like negative numbers, and the power ratings could have been set with any number as the average. I chose 100 because it’s a nice, clean, even number.
The projections on my site consist of a few main things. It shows the date, the home team and the away team with the home team always listed first, then the home spread and the home team’s percentage chance to win.
The home spread keeps with Vegas custom. A team that has a negative number in the home spread column is actually a favorite, and a team with a positive number in the home spread column is actually an underdog. Why Vegas decided to do it like this, I have no idea, but I have it like this to avoid confusion for people who are familiar with reading Vegas spreads.
The home win probability is just using the equation talked about earler in the “Interpreting Ratings” section to give a percentage chance that the home team wins. To see the chance the away team wins, simply subtract 1 from the home win probability.
Projections are done with the site of the game in mind. A home team is given a 50 point elo-ish boost to their rating, while games at a neutral site give no boost to either team.
Projections on the site are sortable, so you can easily look up which teams have the best and worst chance of winning, as well as sorting home and away teams alphabetically too.
no