I hereby declare a ceasefire. I can anticipate the next three or four posts and I doubt that any of them will be particularly helpful. Let's keep this thread focused.
There seem to be three reasonable options:
- Sort by mu with a cutoff based on variance or number of games
- Sort by mu-k*sigma for some k between 1 and 3
- Implement the isotropic leaderboard's algorithm and be done with it
I'm pretty sure that nobody wants the fourth option, sorting by mu with no cutoff.
If I understand correctly, the purpose of sorting by mu-k*sigma is the same as that of implementing a cutoff. In both cases, the goal is to keep the top of the leaderboard from being filled up by mediocre players who have been lucky in a small number of games. Either option deviates from a rating system's one truly objective goal: estimating the probability that any given player beats another.
Microsoft Research appears to advocate the mu-k*sigma approach, but they don't take a strong stance on what k should be used. Using any k>0 means sorting players by a deliberate underestimate of their actual skill, but the degree of that underestimate varies with k. With k=3, a player's rank derives from the skill level that we're 99% confident is below their actual skill. To me, that seems a little excessive and possibly unfair to new players. This is what the leaderboard on drunkensailor.org is doing now, and it seems to be what Goko does as well.
Isotropic used k=1. In other words, a player's rank derived from the skill level that was 84%<?> certain to be below their actual skill level. This is still conservative, but not nearly as brutal to new/lucky players as mu-3*sigma. Iso also used an unusually high starting uncertainty: sigma=mu instead of sigma=mu/3. I'm not sure what the motivation for this was, but it explains why Iso had "levels" as high as 53 and as low as -35, while mine runs from 29 to -3.
Finally, note that sigma appears to converge to 0.80 in with my standard Trueskill implementation. A great many of the experience players have ratings between 0.79 and 0.82, and the lowest sigma in all of Goko is 0.78. On Iso, uncertainties never seem to have gotten below 6.5, and they didn't converge nearly as uniformly.
None of this makes sense to me. Intuitively, I would have expected uncertainties to converge asymptotically to zero. I also wouldn't have expected my uncertainties to converge any more uniformly than Iso's did. Are these anomalies evidence of a failure in TrueSkill, in my parameters, in my code, or in my intuition?