Thanks all for your help. I think I've got the iso TS algorithm now, or at least a close and reasonable variant.
Yed covered most of it but just to clarify, at the end of every day, the sigma parameter for each player is adjusted by the gamma parameter. Gamma is the same for every player, every day and is equal to σ0 / 100 or 0.08333...
This makes sense mathematically, it's consistent with other TS descriptions, and it appears to be what dougz's code does.
I had thought that it was just added to the end of the day sigma, but it seems it's figured in using the equation sigma_new = sqrt(sigma_old^2 + gamma^2). Actually, that way makes more sense as it provides a diminishing returns on how fast the variance increases.
This is a plausible algorithm, but I don't see it in
dougz's code or in
his description, nor in others' TS descriptions or implementations. The term
(σ02 + γ2) does make an appearance in the ordinary TS algorithm (what Dangauthier et al call "Vanilla" TrueSkill), but it's only in the per-game prior for each player at the top of the factor graph. That affects the post-game values of
σ, but only after passing through the whole factor graph and the Bayesian update from the game result.
It's possible that dougz was doing this after each day's results as a modification of TS, though again I don't think that's what the description says. In any case, I just don't like it all that much. I suppose I'll include it if dougz says that it's what isotropic was using and/or if it gives a big boost to predictive performance.
-------
I ran sublee's TS implementation on the Goko data with with
μ0=σ0=25, β=σ0/2, τ=σ0/100, and ε such that the draw probability is 5%. The parameters that generate my current leaderboard are
μ0=25, σ0=μ0/3, β=σ0/2, τ=σ0/100, and ε such that the draw probability is 1.75%. Using
μ0=σ0 is a little unusual, and 1.75% is the draw probability in my dataset, but these differences don't seem to matter much for either the ordering or the predictive performance (see below).
Note that what dougz calls
τ isn't the "precision"
τ:=1/σ2 but the
σ-adjusting term we've been talking about. For reasons I cannot fathom, the TS papers and sublee's code use the symbol
τ for both of these values. In dougz's code, he sensible renames the
σ-adjustment term to
γ (as in the equation at the top of this post).
The mean binomial deviance I calculated from the Goko sample was basically the same regardless of which parameters I used, 0.379 and 0.373 respectively. So that's not strong evidence that either set of parameters is superior. I don't know what mbd Glicko or Elo would generate on this data set, but I expect it's worse. I also expect that there's plenty of room for improvement here.
If assume that isotropic was displaying
μ +/- 3σ as the skill range and also using that number for the level, then the leaderboard resulting from my might-be-iso parameters looks a fair amount like
the iso leaderboard, and it's ordered a lot like my current implementation:
Stef - Level 57: 67.64 +/- 10.10 ( 893 games)
Mic Qsenoch - Level 53: 63.87 +/- 10.19 ( 837 games)
LESPEUTERE - Level 51: 62.05 +/- 10.24 ( 960 games)
nomnomnom - Level 50: 64.44 +/- 14.42 ( 118 games)
Rene Kuroi - Level 49: 59.74 +/- 10.65 ( 274 games)
Geronimoo - Level 48: 58.51 +/- 10.18 ( 717 games)
Andrew Iannaccone - Level 48: 58.10 +/- 9.99 (1154 games)
Wandering Winder - Level 48: 58.07 +/- 10.06 ( 854 games)
jog - Level 47: 58.04 +/- 10.09 ( 862 games)
Obi Wan Bonogi - Level 46: 57.10 +/- 10.12 ( 587 games)
Fabian - Level 45: 56.12 +/- 10.27 ( 366 games)
Boodaloo - Level 45: 58.44 +/- 12.94 ( 134 games)
Tao Chen - Level 45: 62.69 +/- 17.44 ( 86 games)
kenyou2859 - Level 45: 55.59 +/- 10.45 ( 394 games)
Rabid - Level 44: 54.98 +/- 10.11 ( 508 games)
SheCantSayNo - Level 44: 54.81 +/- 9.95 (1781 games)
blueblimp - Level 44: 55.34 +/- 10.89 ( 261 games)
HiveMindEmulator - Level 44: 56.90 +/- 12.53 ( 149 games)
eliegel - Level 44: 54.33 +/- 10.30 ( 398 games)
Stealth Tomato - Level 43: 54.07 +/- 10.24 ( 573 games)
shark_bait - Level 43: 54.48 +/- 10.78 ( 243 games)
Jeebus - Level 42: 53.45 +/- 10.54 ( 300 games)
manzi - Level 42: 53.03 +/- 10.18 ( 356 games)
Mike Harris.0001 - Level 42: 53.08 +/- 10.24 ( 423 games)
The major remaining difference is that the 3σ values are higher and more tightly clustered than on the iso board, but that's consistent with the smaller number and variation in number of games played.
Summary:
I find it pretty plausible that isotropic was using "Vanilla" TrueSkill with the parameters above. The difference between those parameters and the ones I would choose doesn't seem large, and I see no real justification for any major deviation from the standard algorithm. Until I hear otherwise from dougz or unless someone else has a compelling argument to the contrary, I'm going to change my leaderboard to use those parameters.