Dominion Strategy Forum

Please login or register.

Login with username, password and session length
Pages: [1]

Author Topic: Goko's Rating System, Part 2: Reverse Engineering  (Read 8786 times)

0 Members and 1 Guest are viewing this topic.

ragingduckd

  • Board Moderator
  • *
  • Offline Offline
  • Posts: 1059
  • Respect: +3527
    • View Profile
Goko's Rating System, Part 2: Reverse Engineering
« on: April 13, 2014, 06:21:08 pm »
+47

Continued from Goko's Rating System, Part 1: ... in a formula!

What we were told:

Back in January 2013, CEO Ted Griggs non-CEO Trisha Brooke told us that Goko's rating system tracks μ and σ for each player and displays your rating as μ-2σ.  Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.

Mr. Griggs complicated the matter by also telling us that they use an "Elo-type rating system."  Ordinary Elo doesn't track σ for each player, so Mr. Griggs had to mean that they use some fancy-pants Elo derivative.  That's bad news for figuring out the system, since it could easily be something obscure or even proprietary.  Then again, maybe Mr. Griggs is a manager, not a mathematician...

What we know:

Regardless of what Mr. Griggs or anyone else tells us, there are a few things we can be sure of:
  • The rating system can handle games with more than two players.
  • The ratings are based on who came in what place, rather than VPs or anything else Dominion-specific.
And unless their responders on getsatisfaction are completely full of crap...
  • The system really does track μ and σ for each player.
  • It really does display some "conservative" measure like μ-kσ.
  • It's actually supposed to be possible for your rating to go down after a win.
Maybe there are dozens of rating systems that fit this description, but TrueSkill is the only one I could find outside of academic publications, let alone download in a ready-made implementation.

Throw in the facts that Goko's devs never seemed expert/mathy enough to have implemented a totally novel system, and that they had the working example of Isotropic in front of them when the developed Goko... and well, it would have been pretty surprising if their system had turned out to be anything other than TS or some modified version.

How to be sure?

If we're really, really lucky, the rating system itself might be implemented client-side.  It's hard to imagine that even Goko developers are that crazy, but given their track record, I figured I'd better check... alas, no such luck.  Fortunately, there's a way to reveal μ and σ at least.  Just invoke the plausibly-named FS.Connection.getRating()... and you get a usage error.  But the error is friendly enough that you can guess at what the right usage might be.

You can query your Pro rating like this:

Code: [Select]
conn.getRating({
      version: 1,
      playerId: mtgRoom.getLocalPlayer().getId(),
      ratingSystemId: mtgRoom.options.ratingSystemPro
  }).then(function (resp) {
      console.log(resp.ratingData);
  });
>> Object {SD: 263.14332861731685, mean: 6801.097762744435}

Great!  So my real rating is {μ=6801, σ=263}.  And my displayed rating is 6275, which is indeed μ-2σ.

Running the same function on an opponent yields his rating too.  So we can get pre-game ratings, plug them into TS or Glicko or any other system and see how its rating changes compare to Goko's.  The only problem is that we have no idea what parameters to use for those systems.

Bringing out the heavy artillery

Back in June 2013, DStu proposed approximating Goko's rating system using numerical optimization.  His idea was to get a good approximation of Goko's system by finding a set of TS parameters that yielded similar results.  Of course, since they turned out to be running exactly TS, the technique actually yields perfect results.  I tried implementing his approach.

TrueSkill has five parameters: μ0 and σ0 for new player ratings and β, τ, and ε for updating ratings after a game.  I gathered Pro rating changes from some real games, plugged them into a numerical optimizer, and asked it to find TS parameters to minimize the rating prediction error.

Code: [Select]
Estimating update parameters using Game 1:
Finished.

Error-minimizing TrueSkill Parameters:
beta:      1375.00
tau:         27.50
draw_prob:    0.05

Residual Error: 0.0000

Remember back when you did problems from a textbook, and the right answer was always some nice clean-looking number?  I've missed that.

As for μ0 and σ0, we can run getRating() on a brand new player, but it turns out that it doesn't actually return anything until you've played at least one game.  That's okay... the same optimization approach can be inverted to find μ0 and σ0.  Just play a game against a brand new player, note your rating change, and plug in the β, τ, and ε that we already found.

Code: [Select]
Estimating initialization paramters: mu0, sigma0 using Game 2:
Finished.

Error-minimizing TrueSkill Parameters:
mu0:       5500.00
sigma0:    2250.00

Residual Error: 0.0000

Those are exactly the values that Mr. Griggs told us in Jan 2013, which makes it plausible that Goko has been using TS with these same parameters since day one.

Just to be sure, we should test the parameters on another game:

Code: [Select]
Testing Parameters using Game 3:

Expected post-game ratings:
 A: 6822.35 +/- 262.66
 B: 7074.08 +/- 266.68
Observed post-game ratings:
 A: 6822.35 +/- 262.66
 B: 7074.08 +/- 266.68

Residual Error: 0.0000



Incidentally, have you ever looked at the original photo for that meme?  I'm pretty sure the kid is eating sand.

Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!
« Last Edit: April 14, 2014, 10:51:15 pm by ragingduckd »
Logged
Salvager Extension | Isotropish Leaderboard | Game Data | Log Search & other toys | Salvager Bug Reports

Salvager not working for me at all today. ... Please help! I can't go back to playing without it like an animal!

Qvist

  • Mountebank
  • *****
  • Offline Offline
  • Posts: 2400
  • Shuffle iT Username: Qvist
  • Respect: +4085
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #1 on: April 14, 2014, 03:06:21 am »
+24

Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.

I except apologies in form of upvotes.  :P

AI, thanks for doing this. This analysis is really amazing and worth a lot. I'm looking forward for part 3.

DStu

  • Margrave
  • *****
  • Offline Offline
  • Posts: 2627
  • Respect: +1490
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #2 on: April 14, 2014, 04:06:04 am »
0

Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.

I except apologies in form of upvotes.  :P

AI, thanks for doing this. This analysis is really amazing and worth a lot. I'm looking forward for part 3.
Can't remember if i was sceptical,  but on the other hand i also can't remember to have proposedreverse engeneering, so just have an upvote...
Logged

Holger

  • Minion
  • *****
  • Offline Offline
  • Posts: 741
  • Respect: +466
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #3 on: April 14, 2014, 12:22:34 pm »
+1

Great job on cracking Goko's system! Congratulations :D


Back in January 2013, CEO Ted Griggs told us that Goko's rating system tracks μ and σ for each player and displays your rating as μ-2σ.  Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.

[...]
Code: [Select]
Error-minimizing TrueSkill Parameters:
mu0:       5500.00
sigma0:    2250.00

Residual Error: 0.0000

Those are exactly the values that Mr. Griggs told us in Jan 2013, which makes it plausible that Goko has been using TS with these same parameters since day one.
Just two nit-pickings: It was trisha_brooke, not Mr.Griggs, who told us the μ and σ details in the linked post (unless Griggs is the "engineer" she referred to).
Logged

ragingduckd

  • Board Moderator
  • *
  • Offline Offline
  • Posts: 1059
  • Respect: +3527
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #4 on: April 14, 2014, 12:27:57 pm »
0

Great job on cracking Goko's system! Congratulations :D


Back in January 2013, CEO Ted Griggs told us that Goko's rating system tracks μ and σ for each player and displays your rating as μ-2σ.  Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.

[...]
Code: [Select]
Error-minimizing TrueSkill Parameters:
mu0:       5500.00
sigma0:    2250.00

Residual Error: 0.0000

Those are exactly the values that Mr. Griggs told us in Jan 2013, which makes it plausible that Goko has been using TS with these same parameters since day one.
Just two nit-pickings: It was trisha_brooke, not Mr.Griggs, who told us the μ and σ details in the linked post (unless Griggs is the "engineer" she referred to).

Ah.  Yes, I think you're right.  I has assumed it was all part of the Q&A.

What is your second nit-pick?  ;)
Logged
Salvager Extension | Isotropish Leaderboard | Game Data | Log Search & other toys | Salvager Bug Reports

Salvager not working for me at all today. ... Please help! I can't go back to playing without it like an animal!

Holger

  • Minion
  • *****
  • Offline Offline
  • Posts: 741
  • Respect: +466
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #5 on: April 14, 2014, 12:55:51 pm »
0

Just two nit-pickings: It was trisha_brooke, not Mr.Griggs, who told us the μ and σ details in the linked post (unless Griggs is the "engineer" she referred to).

Ah.  Yes, I think you're right.  I has assumed it was all part of the Q&A.

What is your second nit-pick?  ;)

There's no other, just that it should twice read Trisha instead of Griggs.  ;)
Logged

Donald X.

  • Dominion Designer
  • *****
  • Offline Offline
  • Posts: 6364
  • Respect: +25699
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #6 on: April 15, 2014, 05:50:37 pm »
+4

You know if in the end you have a convincing argument for it working some other way, and you pass this on to Making Fun, they may very well change it. They are not going to have a lot invested in however it works; it just needs to satisfy the people that care about it.
Logged

Polk5440

  • Torturer
  • *****
  • Offline Offline
  • Posts: 1708
  • Respect: +1788
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #7 on: April 15, 2014, 06:16:34 pm »
+5

You know if in the end you have a convincing argument for it working some other way, and you pass this on to Making Fun, they may very well change it. They are not going to have a lot invested in however it works; it just needs to satisfy the people that care about it.

A challenge! Pick the optimal TrueSkill parameters.
Logged

ragingduckd

  • Board Moderator
  • *
  • Offline Offline
  • Posts: 1059
  • Respect: +3527
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #8 on: April 17, 2014, 01:57:35 am »
+3

Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

I'm sorry for slow-rolling you guys.  Part 3 is coming soon.  I was delayed by taxes and other aspects of real life.
Logged
Salvager Extension | Isotropish Leaderboard | Game Data | Log Search & other toys | Salvager Bug Reports

Salvager not working for me at all today. ... Please help! I can't go back to playing without it like an animal!

WalrusMcFishSr

  • Minion
  • *****
  • Offline Offline
  • Posts: 642
  • An enormous walrus the size of Antarctica
  • Respect: +1793
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #9 on: April 17, 2014, 08:28:05 am »
+6

I could imagine that reverse-engineering the IRS takes rather more effort than this
Logged
My Dominion videos: http://www.youtube.com/user/WalrusMcFishSr   <---Bet you can't click on that!

Kirian

  • Adventurer
  • ******
  • Offline Offline
  • Posts: 7096
  • Shuffle iT Username: Kirian
  • An Unbalanced Equation
  • Respect: +9412
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #10 on: May 21, 2014, 10:23:57 pm »
+2

Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

I'm sorry for slow-rolling you guys.  Part 3 is coming soon.  I was delayed by taxes and other aspects of real life.

Tick tick tick...
Logged
Kirian's Law of f.DS jokes:  Any sufficiently unexplained joke is indistinguishable from serious conversation.

michaeljb

  • Board Moderator
  • *
  • Offline Offline
  • Posts: 1422
  • Shuffle iT Username: michaeljb
  • Respect: +2114
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #11 on: May 21, 2014, 10:59:25 pm »
+15

Logged
🚂 Give 18xx games a chance 🚂

michaeljb

  • Board Moderator
  • *
  • Offline Offline
  • Posts: 1422
  • Shuffle iT Username: michaeljb
  • Respect: +2114
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #12 on: June 24, 2014, 03:02:51 pm »
0

Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

http://daycalc.appspot.com/04/13/2014
Logged
🚂 Give 18xx games a chance 🚂

JW

  • Jester
  • *****
  • Offline Offline
  • Posts: 979
  • Shuffle iT Username: JW
  • Respect: +1792
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #13 on: July 11, 2014, 01:23:32 pm »
+2

Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

The recent issue with games quitting on the load screen (leading to Goko ranking loss, but not Isotropish rating loss) and the bots being at the top of the pro leaderboard should make it clear that Isotropish is better. Though it would still be interesting to see data prior to those issues just to see if Goko does halfway decently.
Logged

Holger

  • Minion
  • *****
  • Offline Offline
  • Posts: 741
  • Respect: +466
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #14 on: September 10, 2014, 09:06:13 am »
+2

Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

The recent issue with games quitting on the load screen (leading to Goko ranking loss, but not Isotropish rating loss) and the bots being at the top of the pro leaderboard should make it clear that Isotropish is better. Though it would still be interesting to see data prior to those issues just to see if Goko does halfway decently.

The bots' leaderboard mess-up was a recent temporary (?) bug, not an integral part of Goko's rating system. Of course Goko mustn't count stacked Adventure games for the Pro rating...  :o

I also hope that AI will still get around to posting "Part 3". I'm not quite certain if Goko's ranking (minus the bugs) is worse than Isotropish's; Isotropish uses an extremely high initial uncertainty, making the board very conservative wrt new players...
Logged

ragingduckd

  • Board Moderator
  • *
  • Offline Offline
  • Posts: 1059
  • Respect: +3527
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #15 on: September 10, 2014, 09:31:43 am »
+2

I also hope that AI will still get around to posting "Part 3". I'm not quite certain if Goko's ranking (minus the bugs) is worse than Isotropish's; Isotropish uses an extremely high initial uncertainty, making the board very conservative wrt new players...

A couple-hundred games tends to drop a new player into the normal uncertainty range.  For example, here are the top active players with fewer than 300 games.  Most veteran players end up with an uncertainty (3*sigma) around 10.

      pname      |   mu    | 3*sigma | numgames | level
-----------------+---------+----------+----------+-----------
 awall           | 61.3037 |  15.2337 |      129 |   46.0700
 Marin           | 54.5226 |  11.6589 |      173 |   42.8635
 Jean-Michel     | 50.8940 |  10.7637 |      249 |   40.1303
 DG              | 59.1405 |  19.8219 |       46 |   39.3186
 Holger          | 50.2388 |  11.5032 |      236 |   38.7356
 Drab Emordnilap | 49.7575 |  11.7360 |      172 |   38.0214
 Wisper          | 47.9380 |  11.4375 |      191 |   36.5004
 loppo           | 45.3395 |  10.4670 |      295 |   34.8726
 Young Nick      | 45.6504 |  10.8579 |      224 |   34.7927
 GeoLib          | 44.7361 |  10.9362 |      210 |   33.7999
 Käkkäräfasaani  | 46.4833 |  12.8319 |      133 |   33.6513
 Shinigami       | 43.3002 |  10.4169 |      298 |   32.8833
 Madman          | 44.2269 |  11.6328 |      201 |   32.5940
 Simon (DK)      | 45.5164 |  14.3709 |      104 |   31.1455
 SawneyBean      | 41.6755 |  10.7028 |      247 |   30.9726
 Yaju            | 50.4640 |  19.9638 |       49 |   30.5000
 mpsprs          | 43.9237 |  13.5030 |      112 |   30.4206
 BeeeeJ          | 41.3244 |  10.9389 |      227 |   30.3855
 MrFrog          | 41.3698 |  11.4582 |      216 |   29.9117
 Trevor Pasanen  | 41.6197 |  11.9886 |      166 |   29.6312


Also bear in mind that brand new players often really are changing their skill level, which affects how quickly their uncertainty drops.

IMO, Isotropish's bigger weakness is how long it takes to figure out that a veteran player is improving, rather than just having a lucky streak.  More on this shortly... I really am going to post part 3.  I've been sitting on a near-finished version for quite a while.
« Last Edit: September 10, 2014, 09:35:31 am by ragingduckd »
Logged
Salvager Extension | Isotropish Leaderboard | Game Data | Log Search & other toys | Salvager Bug Reports

Salvager not working for me at all today. ... Please help! I can't go back to playing without it like an animal!

Polk5440

  • Torturer
  • *****
  • Offline Offline
  • Posts: 1708
  • Respect: +1788
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #16 on: September 10, 2014, 04:12:49 pm »
+1

More on this shortly... I really am going to post part 3.  I've been sitting on a near-finished version for quite a while.

Are you writing a paper out of it or something? Might not be a bad idea....
Logged

Holger

  • Minion
  • *****
  • Offline Offline
  • Posts: 741
  • Respect: +466
    • View Profile
Re: Goko's Rating System, Part 2: Reverse Engineering
« Reply #17 on: September 11, 2014, 08:19:44 am »
0

I also hope that AI will still get around to posting "Part 3". I'm not quite certain if Goko's ranking (minus the bugs) is worse than Isotropish's; Isotropish uses an extremely high initial uncertainty, making the board very conservative wrt new players...

A couple-hundred games tends to drop a new player into the normal uncertainty range.  For example, here are the top active players with fewer than 300 games. 

Yeah, I'm one of them. :)  I would prefer to see "normal" uncertainties after only a few dozen games, not hundreds. This way a relatively new player could have a rating closely corresponding to his (often changing) skill, and not just have an "automatic" rating increase for playing lots of games until he reaches a hundred games.  (TrueSkill does account for changing skills by increasing the uncertainty slightly after each game; and a new online player may well be a veteran RL player.) As a starting value for new players, probably something in the middle between Isotropish's leaderboard level -75 and Goko's 1,000 rating would be best...

Quote
IMO, Isotropish's bigger weakness is how long it takes to figure out that a veteran player is improving, rather than just having a lucky streak.  More on this shortly... I really am going to post part 3.  I've been sitting on a near-finished version for quite a while.

There's no need to fine-tune it endlessly before posting; there's an edit button after all ;).  I'd really like to read even a half-finished analysis; if necessary, you could also split part 3 in two and post only the first half now...
Logged
Pages: [1]
 

Page created in 0.118 seconds with 20 queries.