Continued from
Goko's Rating System, Part 1: ... in a formula!What we were told:Back in January 2013,
CEO Ted Griggs non-CEO Trisha Brooke
told us that Goko's rating system tracks μ and σ for each player and displays your rating as μ-2σ.
Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.
Mr. Griggs complicated the matter by
also telling us that they use an "Elo-type rating system." Ordinary Elo doesn't track σ for each player, so Mr. Griggs had to mean that they use some fancy-pants Elo derivative. That's bad news for figuring out the system, since it could easily be something obscure or even proprietary. Then again, maybe Mr. Griggs is a manager, not a mathematician...
What we know:Regardless of what Mr. Griggs or anyone else tells us, there are a few things we can be sure of:
- The rating system can handle games with more than two players.
- The ratings are based on who came in what place, rather than VPs or anything else Dominion-specific.
And unless their responders on getsatisfaction are
completely full of crap...
- The system really does track μ and σ for each player.
- It really does display some "conservative" measure like μ-kσ.
- It's actually supposed to be possible for your rating to go down after a win.
Maybe there are dozens of rating systems that fit this description, but TrueSkill is the only one I could find outside of academic publications, let alone download in a ready-made implementation.
Throw in the facts that Goko's devs never seemed expert/mathy enough to have implemented a totally novel system, and that they had the working example of Isotropic in front of them when the developed Goko... and well, it would have been pretty surprising if their system had turned out to be anything other than TS or some modified version.
How to be sure?If we're really, really lucky, the rating system itself might be implemented client-side. It's hard to imagine that even Goko developers are that crazy, but given their track record, I figured I'd better check... alas, no such luck. Fortunately, there's a way to reveal μ and σ at least. Just invoke the plausibly-named FS.Connection.getRating()... and you get a usage error. But the error is friendly enough that you can guess at what the right usage might be.
You can query your Pro rating like this:
conn.getRating({
version: 1,
playerId: mtgRoom.getLocalPlayer().getId(),
ratingSystemId: mtgRoom.options.ratingSystemPro
}).then(function (resp) {
console.log(resp.ratingData);
});
>> Object {SD: 263.14332861731685, mean: 6801.097762744435}
Great! So my real rating is {μ=6801, σ=263}. And my displayed rating is 6275, which is indeed μ-2σ.
Running the same function on an opponent yields his rating too. So we can get pre-game ratings, plug them into TS or Glicko or any other system and see how its rating changes compare to Goko's. The only problem is that we have no idea what parameters to use for those systems.
Bringing out the heavy artilleryBack in June 2013,
DStu proposed approximating Goko's rating system using numerical optimization. His idea was to get a good approximation of Goko's system by finding a set of TS parameters that yielded similar results. Of course, since they turned out to be running exactly TS, the technique actually yields perfect results. I tried
implementing his approach.
TrueSkill has five parameters: μ
0 and σ
0 for new player ratings and β, τ, and ε for updating ratings after a game. I gathered Pro rating changes from some real games, plugged them into a numerical optimizer, and asked it to find TS parameters to minimize the rating prediction error.
Estimating update parameters using Game 1:
Finished.
Error-minimizing TrueSkill Parameters:
beta: 1375.00
tau: 27.50
draw_prob: 0.05
Residual Error: 0.0000
Remember back when you did problems from a textbook, and the right answer was always some nice clean-looking number? I've missed that.
As for μ
0 and σ
0, we can run getRating() on a brand new player, but it turns out that it doesn't actually return anything until you've played at least one game. That's okay... the same optimization approach can be inverted to find μ
0 and σ
0. Just play a game against a brand new player, note your rating change, and plug in the β, τ, and ε that we already found.
Estimating initialization paramters: mu0, sigma0 using Game 2:
Finished.
Error-minimizing TrueSkill Parameters:
mu0: 5500.00
sigma0: 2250.00
Residual Error: 0.0000
Those are exactly the values that Mr. Griggs told us in Jan 2013, which makes it plausible that Goko has been using TS with these same parameters since day one.
Just to be sure, we should test the parameters on another game:
Testing Parameters using Game 3:
Expected post-game ratings:
A: 6822.35 +/- 262.66
B: 7074.08 +/- 266.68
Observed post-game ratings:
A: 6822.35 +/- 262.66
B: 7074.08 +/- 266.68
Residual Error: 0.0000
Incidentally, have you ever looked at
the original photo for that meme? I'm pretty sure the kid is eating sand.
Don't miss our final installment...
Goko's Rating System, Part 3: Goko vs. Isotropish!