Topic: Goko's Rating System, Part 2: Reverse Engineering (Read 8786 times)

ragingduckd · « **on:** April 13, 2014, 06:21:08 pm »

Continued from Goko's Rating System, Part 1: ... in a formula!

What we were told:

Back in January 2013, ~~CEO Ted Griggs~~ non-CEO Trisha Brooke told us that Goko's rating system tracks μ and σ for each player and displays your rating as μ-2σ. Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.

Mr. Griggs complicated the matter by also telling us that they use an "Elo-type rating system." Ordinary Elo doesn't track σ for each player, so Mr. Griggs had to mean that they use some fancy-pants Elo derivative. That's bad news for figuring out the system, since it could easily be something obscure or even proprietary. Then again, maybe Mr. Griggs is a manager, not a mathematician...

What we know:

Regardless of what Mr. Griggs or anyone else tells us, there are a few things we can be sure of:

The rating system can handle games with more than two players.
The ratings are based on who came in what place, rather than VPs or anything else Dominion-specific.

And unless their responders on getsatisfaction are completely full of crap...

The system really does track μ and σ for each player.
It really does display some "conservative" measure like μ-kσ.
It's actually supposed to be possible for your rating to go down after a win.

Maybe there are dozens of rating systems that fit this description, but TrueSkill is the only one I could find outside of academic publications, let alone download in a ready-made implementation.

Throw in the facts that Goko's devs never seemed expert/mathy enough to have implemented a totally novel system, and that they had the working example of Isotropic in front of them when the developed Goko... and well, it would have been pretty surprising if their system had turned out to be anything other than TS or some modified version.

How to be sure?

If we're really, really lucky, the rating system itself might be implemented client-side. It's hard to imagine that even Goko developers are that crazy, but given their track record, I figured I'd better check... alas, no such luck. Fortunately, there's a way to reveal μ and σ at least. Just invoke the plausibly-named FS.Connection.getRating()... and you get a usage error. But the error is friendly enough that you can guess at what the right usage might be.

You can query your Pro rating like this:

Code: [Select]

conn.getRating({
      version: 1,
      playerId: mtgRoom.getLocalPlayer().getId(),
      ratingSystemId: mtgRoom.options.ratingSystemPro
  }).then(function (resp) {
      console.log(resp.ratingData);
  });
>> Object {SD: 263.14332861731685, mean: 6801.097762744435}

Great! So my real rating is {μ=6801, σ=263}. And my displayed rating is 6275, which is indeed μ-2σ.

Running the same function on an opponent yields his rating too. So we can get pre-game ratings, plug them into TS or Glicko or any other system and see how its rating changes compare to Goko's. The only problem is that we have no idea what parameters to use for those systems.

Bringing out the heavy artillery

Back in June 2013, DStu proposed approximating Goko's rating system using numerical optimization. His idea was to get a good approximation of Goko's system by finding a set of TS parameters that yielded similar results. Of course, since they turned out to be running exactly TS, the technique actually yields perfect results. I tried implementing his approach.

TrueSkill has five parameters: μ₀ and σ₀ for new player ratings and β, τ, and ε for updating ratings after a game. I gathered Pro rating changes from some real games, plugged them into a numerical optimizer, and asked it to find TS parameters to minimize the rating prediction error.

Code: [Select]

Estimating update parameters using Game 1:
Finished.

Error-minimizing TrueSkill Parameters:
beta:      1375.00
tau:         27.50
draw_prob:    0.05

Residual Error: 0.0000

Remember back when you did problems from a textbook, and the right answer was always some nice clean-looking number? I've missed that.

As for μ₀ and σ₀, we can run getRating() on a brand new player, but it turns out that it doesn't actually return anything until you've played at least one game. That's okay... the same optimization approach can be inverted to find μ₀ and σ₀. Just play a game against a brand new player, note your rating change, and plug in the β, τ, and ε that we already found.

Code: [Select]

Estimating initialization paramters: mu0, sigma0 using Game 2:
Finished.

Error-minimizing TrueSkill Parameters:
mu0:       5500.00
sigma0:    2250.00

Residual Error: 0.0000

Those are exactly the values that Mr. Griggs told us in Jan 2013, which makes it plausible that Goko has been using TS with these same parameters since day one.

Just to be sure, we should test the parameters on another game:

Code: [Select]

Testing Parameters using Game 3:

Expected post-game ratings:
 A: 6822.35 +/- 262.66
 B: 7074.08 +/- 266.68
Observed post-game ratings:
 A: 6822.35 +/- 262.66
 B: 7074.08 +/- 266.68

Residual Error: 0.0000

Incidentally, have you ever looked at the original photo for that meme? I'm pretty sure the kid is eating sand.

Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

Qvist · « **Reply #1 on:** April 14, 2014, 03:06:21 am »

Quote from: ragingduckd on April 13, 2014, 06:21:08 pm

Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.

I except apologies in form of upvotes.

AI, thanks for doing this. This analysis is really amazing and worth a lot. I'm looking forward for part 3.

DStu · « **Reply #2 on:** April 14, 2014, 04:06:04 am »

Quote from: Qvist on April 14, 2014, 03:06:21 am

Quote from: ragingduckd on April 13, 2014, 06:21:08 pm
Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.

I except apologies in form of upvotes.

AI, thanks for doing this. This analysis is really amazing and worth a lot. I'm looking forward for part 3.

Can't remember if i was sceptical, but on the other hand i also can't remember to have proposedreverse engeneering, so just have an upvote...

Holger · « **Reply #3 on:** April 14, 2014, 12:22:34 pm »

Great job on cracking Goko's system! Congratulations

Quote from: ragingduckd on April 13, 2014, 06:21:08 pm

Back in January 2013, CEO Ted Griggs told us that Goko's rating system tracks μ and σ for each player and displays your rating as μ-2σ. Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.

[...]
Code: [Select]
Error-minimizing TrueSkill Parameters: mu0: 5500.00 sigma0: 2250.00 Residual Error: 0.0000
Those are exactly the values that Mr. Griggs told us in Jan 2013, which makes it plausible that Goko has been using TS with these same parameters since day one.

Just two nit-pickings: It was trisha_brooke, not Mr.Griggs, who told us the μ and σ details in the linked post (unless Griggs is the "engineer" she referred to).

ragingduckd · « **Reply #4 on:** April 14, 2014, 12:27:57 pm »

Quote from: Holger on April 14, 2014, 12:22:34 pm

Great job on cracking Goko's system! Congratulations

Quote from: ragingduckd on April 13, 2014, 06:21:08 pm
Back in January 2013, CEO Ted Griggs told us that Goko's rating system tracks μ and σ for each player and displays your rating as μ-2σ. Qvist immediately guessed that they were running TrueSkill, but other forums members were more skeptical.

[...]
Code: [Select]
Error-minimizing TrueSkill Parameters: mu0: 5500.00 sigma0: 2250.00 Residual Error: 0.0000
Those are exactly the values that Mr. Griggs told us in Jan 2013, which makes it plausible that Goko has been using TS with these same parameters since day one.
Just two nit-pickings: It was trisha_brooke, not Mr.Griggs, who told us the μ and σ details in the linked post (unless Griggs is the "engineer" she referred to).

Ah. Yes, I think you're right. I has assumed it was all part of the Q&A.

What is your second nit-pick?

Holger · « **Reply #5 on:** April 14, 2014, 12:55:51 pm »

Quote from: ragingduckd on April 14, 2014, 12:27:57 pm

Quote from: Holger on April 14, 2014, 12:22:34 pm
Just two nit-pickings: It was trisha_brooke, not Mr.Griggs, who told us the μ and σ details in the linked post (unless Griggs is the "engineer" she referred to).

Ah. Yes, I think you're right. I has assumed it was all part of the Q&A.

What is your second nit-pick?

There's no other, just that it should twice read Trisha instead of Griggs.

Donald X. · « **Reply #6 on:** April 15, 2014, 05:50:37 pm »

You know if in the end you have a convincing argument for it working some other way, and you pass this on to Making Fun, they may very well change it. They are not going to have a lot invested in however it works; it just needs to satisfy the people that care about it.

Polk5440 · « **Reply #7 on:** April 15, 2014, 06:16:34 pm »

Quote from: Donald X. on April 15, 2014, 05:50:37 pm

You know if in the end you have a convincing argument for it working some other way, and you pass this on to Making Fun, they may very well change it. They are not going to have a lot invested in however it works; it just needs to satisfy the people that care about it.

A challenge! Pick the optimal TrueSkill parameters.

ragingduckd · « **Reply #8 on:** April 17, 2014, 01:57:35 am »

Quote from: ragingduckd on April 13, 2014, 06:21:08 pm

Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

I'm sorry for slow-rolling you guys. Part 3 is coming soon. I was delayed by taxes and other aspects of real life.

WalrusMcFishSr · « **Reply #9 on:** April 17, 2014, 08:28:05 am »

I could imagine that reverse-engineering the IRS takes rather more effort than this

Kirian · « **Reply #10 on:** May 21, 2014, 10:23:57 pm »

Quote from: ragingduckd on April 17, 2014, 01:57:35 am

Quote from: ragingduckd on April 13, 2014, 06:21:08 pm
Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

I'm sorry for slow-rolling you guys. Part 3 is coming soon. I was delayed by taxes and other aspects of real life.

Tick tick tick...

michaeljb · « **Reply #11 on:** May 21, 2014, 10:59:25 pm »

michaeljb · « **Reply #12 on:** June 24, 2014, 03:02:51 pm »

Quote from: ragingduckd on April 13, 2014, 06:21:08 pm

Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

http://daycalc.appspot.com/04/13/2014

JW · « **Reply #13 on:** July 11, 2014, 01:23:32 pm »

Quote from: ragingduckd on April 13, 2014, 06:21:08 pm

Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

The recent issue with games quitting on the load screen (leading to Goko ranking loss, but not Isotropish rating loss) and the bots being at the top of the pro leaderboard should make it clear that Isotropish is better. Though it would still be interesting to see data prior to those issues just to see if Goko does halfway decently.

Holger · « **Reply #14 on:** September 10, 2014, 09:06:13 am »

Quote from: JW on July 11, 2014, 01:23:32 pm

Quote from: ragingduckd on April 13, 2014, 06:21:08 pm
Don't miss our final installment... Goko's Rating System, Part 3: Goko vs. Isotropish!

The recent issue with games quitting on the load screen (leading to Goko ranking loss, but not Isotropish rating loss) and the bots being at the top of the pro leaderboard should make it clear that Isotropish is better. Though it would still be interesting to see data prior to those issues just to see if Goko does halfway decently.

The bots' leaderboard mess-up was a recent temporary (?) bug, not an integral part of Goko's rating system. Of course Goko mustn't count stacked Adventure games for the Pro rating...

I also hope that AI will still get around to posting "Part 3". I'm not quite certain if Goko's ranking (minus the bugs) is worse than Isotropish's; Isotropish uses an extremely high initial uncertainty, making the board very conservative wrt new players...

ragingduckd · « **Reply #15 on:** September 10, 2014, 09:31:43 am »

Quote from: Holger on September 10, 2014, 09:06:13 am

I also hope that AI will still get around to posting "Part 3". I'm not quite certain if Goko's ranking (minus the bugs) is worse than Isotropish's; Isotropish uses an extremely high initial uncertainty, making the board very conservative wrt new players...

A couple-hundred games tends to drop a new player into the normal uncertainty range. For example, here are the top active players with fewer than 300 games. Most veteran players end up with an uncertainty (3*sigma) around 10.

pname | mu | 3*sigma | numgames | level -----------------+---------+----------+----------+----------- awall | 61.3037 | 15.2337 | 129 | 46.0700 Marin | 54.5226 | 11.6589 | 173 | 42.8635 Jean-Michel | 50.8940 | 10.7637 | 249 | 40.1303 DG | 59.1405 | 19.8219 | 46 | 39.3186 Holger | 50.2388 | 11.5032 | 236 | 38.7356 Drab Emordnilap | 49.7575 | 11.7360 | 172 | 38.0214 Wisper | 47.9380 | 11.4375 | 191 | 36.5004 loppo | 45.3395 | 10.4670 | 295 | 34.8726 Young Nick | 45.6504 | 10.8579 | 224 | 34.7927 GeoLib | 44.7361 | 10.9362 | 210 | 33.7999 Käkkäräfasaani | 46.4833 | 12.8319 | 133 | 33.6513 Shinigami | 43.3002 | 10.4169 | 298 | 32.8833 Madman | 44.2269 | 11.6328 | 201 | 32.5940 Simon (DK) | 45.5164 | 14.3709 | 104 | 31.1455 SawneyBean | 41.6755 | 10.7028 | 247 | 30.9726 Yaju | 50.4640 | 19.9638 | 49 | 30.5000 mpsprs | 43.9237 | 13.5030 | 112 | 30.4206 BeeeeJ | 41.3244 | 10.9389 | 227 | 30.3855 MrFrog | 41.3698 | 11.4582 | 216 | 29.9117 Trevor Pasanen | 41.6197 | 11.9886 | 166 | 29.6312

Also bear in mind that brand new players often really are changing their skill level, which affects how quickly their uncertainty drops.

IMO, Isotropish's bigger weakness is how long it takes to figure out that a veteran player is improving, rather than just having a lucky streak. More on this shortly... I really am going to post part 3. I've been sitting on a near-finished version for quite a while.

Polk5440 · « **Reply #16 on:** September 10, 2014, 04:12:49 pm »

Quote from: ragingduckd on September 10, 2014, 09:31:43 am

More on this shortly... I really am going to post part 3. I've been sitting on a near-finished version for quite a while.

Are you writing a paper out of it or something? Might not be a bad idea....

Holger · « **Reply #17 on:** September 11, 2014, 08:19:44 am »

Quote from: ragingduckd on September 10, 2014, 09:31:43 am

Quote from: Holger on September 10, 2014, 09:06:13 am
I also hope that AI will still get around to posting "Part 3". I'm not quite certain if Goko's ranking (minus the bugs) is worse than Isotropish's; Isotropish uses an extremely high initial uncertainty, making the board very conservative wrt new players...

A couple-hundred games tends to drop a new player into the normal uncertainty range. For example, here are the top active players with fewer than 300 games.

Yeah, I'm one of them.

I would prefer to see "normal" uncertainties after only a few dozen games, not hundreds. This way a relatively new player could have a rating closely corresponding to his (often changing) skill, and not just have an "automatic" rating increase for playing lots of games until he reaches a hundred games. (TrueSkill does account for changing skills by increasing the uncertainty slightly after each game; and a new online player may well be a veteran RL player.) As a starting value for new players, probably something in the middle between Isotropish's leaderboard level -75 and Goko's 1,000 rating would be best...

Quote

IMO, Isotropish's bigger weakness is how long it takes to figure out that a veteran player is improving, rather than just having a lucky streak. More on this shortly... I really am going to post part 3. I've been sitting on a near-finished version for quite a while.

There's no need to fine-tune it endlessly before posting; there's an edit button after all

. I'd really like to read even a half-finished analysis; if necessary, you could also split part 3 in two and post only the first half now...

Dominion Strategy Forum

News:

Author Topic: Goko's Rating System, Part 2: Reverse Engineering (Read 8786 times)

ragingduckd

Goko's Rating System, Part 2: Reverse Engineering

Qvist

Re: Goko's Rating System, Part 2: Reverse Engineering

DStu

Re: Goko's Rating System, Part 2: Reverse Engineering

Holger

Re: Goko's Rating System, Part 2: Reverse Engineering

ragingduckd

Re: Goko's Rating System, Part 2: Reverse Engineering

Holger

Re: Goko's Rating System, Part 2: Reverse Engineering

Donald X.

Re: Goko's Rating System, Part 2: Reverse Engineering

Polk5440

Re: Goko's Rating System, Part 2: Reverse Engineering

ragingduckd

Re: Goko's Rating System, Part 2: Reverse Engineering

WalrusMcFishSr

Re: Goko's Rating System, Part 2: Reverse Engineering

Kirian

Re: Goko's Rating System, Part 2: Reverse Engineering

michaeljb

Re: Goko's Rating System, Part 2: Reverse Engineering

michaeljb

Re: Goko's Rating System, Part 2: Reverse Engineering

JW

Re: Goko's Rating System, Part 2: Reverse Engineering

Holger

Re: Goko's Rating System, Part 2: Reverse Engineering

ragingduckd

Re: Goko's Rating System, Part 2: Reverse Engineering

Polk5440

Re: Goko's Rating System, Part 2: Reverse Engineering

Holger

Re: Goko's Rating System, Part 2: Reverse Engineering