Topic: Isotropish Leaderboard (alternative to Goko Pro) (Read 145481 times)

ragingduckd · « **on:** July 18, 2013, 08:00:06 am »

http://isotropish.com is an implementation of Microsoft's TrueSkill rating system. It is meant to be similar to the rating system that was used by Isotropic, but it isn't quite faithful (see below).

TrueSkill parameters:
- Initial rating: μ=25, σ=25
- Adjustment: β=25, τ=25/100, draw rate = 5%
Uses the Python package from trueskill.org
Sorted by "Level" defined as μ - 3σ. Click column names for alternative sorting.
Games with bots count, games with guests don't
Only 2-player "Pro" games count
No daily increase in σ

I originally meant for these parameters to replicate those of the isotropic leaderboard, but there was some confusion at the time about just what those parameters were:

From Isotropic's FAQ (no longer available):

Quote from: http://dominion.isotropic.org/faq/

New players are assigned a skill of "25 ± 25", which is to say, we don't really have any idea what that person's skill is.
...
I've set β = 25, γ = σ₀ / 100 (applied daily), and the draw probability at 5%

From a PM with dougz:

Quote from: dougz on July 31, 2013, 12:22:38 am

No, I start with μ=25, σ=25/3 like everyone else. But since I display "μ ± 3σ" on the leaderboard it shows up as 25 ± 25.

I may not have ever actually had the clamping on the upper end.

Choice of 5%: pulled out of thin air.

Yes, there was a small daily increase in σ. (Moved 1% of the way back to 25/3, I think.) I didn't want people to be able to camp out on the leaderboard by getting to a good position and then not playing.

Based on all this, it looks like dougz was probably using μ₀=25, σ₀=25/3, β = 25, draw_prob=5%, τ=?, and a daily increase in uncertainty of either (0.01)σ or (0.01)(σ₀-σ).

β = 25 is much larger than the TrueSkill default of σ₀/2 = 25/6. The default for τ is σ₀/100 = 25/300, but I don't have anything definite about what value was used for Isotropic. The differences between Isotropic and Isotropish are likely due to the values used for β and τ, and to the fact that Isotropish does not have the daily increase in σ. The different σ₀ values have some effect too, but can't explain the differences in σ for players with thousands of games.

No need to post expressing your personal preferences for these parameters. I'm doing some testing now and will switch to the best-performing values shortly.

TrojH · « **Reply #1 on:** July 18, 2013, 08:55:16 am »

I'd prefer that the Trueskill rating for each person be shown as mu +- (3*sigma), not mu +- sigma. But that's just me.

jsh357 · « **Reply #2 on:** July 18, 2013, 09:04:37 am »

When is it updated? daily?

ragingduckd · « **Reply #3 on:** July 18, 2013, 09:29:43 am »

Quote from: jsh357 on July 18, 2013, 09:04:37 am

When is it updated? daily?

Every 10 min

yed · « **Reply #4 on:** July 18, 2013, 12:27:20 pm »

Good work! Thanks!

WanderingWinder · « **Reply #5 on:** July 18, 2013, 01:32:41 pm »

I assume that you are using the default values for Beta and Tau and all that jazz?
Okay, so if you have Beta=25/6, there are a couple things: first, the values of sigma for the different players are quickly falling to the point that they matter almost nothing. If we take even the highest guy from the top 200 (daniel greif, currently #69, sigma = 2.34), against someone with the seemingly typical sigma of .8, the overall difference in the Sigma of the distribution is from 4.32 to 4.85, which at most is going to have a 2.79% change in the expected winrate, which only happens if the opponent is 4.57 points of mu) away from this fellow - corresponding to a jump from 82.7% to 85.5%.

Furthermore, it should be very easy to whip up the expected winrate in a match between any two players. Maybe another page where you can enter the mu and sigma for both of you, and it spits out the result? Of course, it would be most nice to do it on name vs name, but I don't know how much work that is - the thing I have suggested should be very very simple.

Okay, lastly, my pet peeve, the mu-3*sigma thing. Basically, it is systematically underrating players with high uncertainties. Especially against strong players. If you look at it, the system expects #204 to do better against Stef than it does #6.

Oh, also, you are rating guests, just not games against them. I don't know if you want to clean that up or not.

ragingduckd · « **Reply #6 on:** July 18, 2013, 04:22:28 pm »

Quote from: WanderingWinder on July 18, 2013, 01:32:41 pm

If we take even the highest guy from the top 200 (daniel greif, currently #69, sigma = 2.34), against someone with the seemingly typical sigma of .8, the overall difference in the Sigma of the distribution is from 4.32 to 4.85, which at most is going to have a 2.79% change in the expected winrate, which only happens if the opponent is 4.57 points of mu) away from this fellow - corresponding to a jump from 82.7% to 85.5%.

My intuition is failing me here. Is this obviously undesirable? Is fixing it a matter of calibrating sigma or of using a different family of curves?

Quote

Furthermore, it should be very easy to whip up the expected winrate in a match between any two players. Maybe another page where you can enter the mu and sigma for both of you, and it spits out the result? Of course, it would be most nice to do it on name vs name, but I don't know how much work that is - the thing I have suggested should be very very simple.

Good idea. I have a mapping from player names to current ratings, so neither approach is difficult.

Quote

Okay, lastly, my pet peeve, the mu-3*sigma thing. Basically, it is systematically underrating players with high uncertainties. Especially against strong players. If you look at it, the system expects #204 to do better against Stef than it does #6.

That's true. But do you really want the ordering to reflect the probability of beating Stef? For top-whatever players that's probably a good metric, but for the median player it might not be so great.

In any case, here's the mu-sorted top-25:

Code: [Select]

 games_played |       pname       |   mu    | sigma  
--------------+-------------------+---------+--------
            7 | lettukastike      | 36.7890 | 3.7491
            7 | Tydude            | 36.6604 | 3.8999
            7 | Guest_213286      | 36.5075 | 4.8567
            5 | ShivaBowl         | 36.3048 | 4.8451
            7 | jonts             | 36.0121 | 3.9443
            6 | Tommy             | 35.9250 | 4.0632
            8 | Meteora           | 35.9202 | 4.0425
            2 | Tomas Ramanauskas | 35.6200 | 5.6838
            4 | Lilly Roettgen    | 35.2591 | 5.2667
            4 | w455up            | 35.0957 | 4.7752
            5 | Guest_860460      | 35.0173 | 4.5824
            1 | gianthsiao        | 34.7435 | 6.0963
            4 | Jason Orne        | 34.7036 | 4.4389
           11 | Robert Birks      | 34.5059 | 4.9000
            5 | Saphy89           | 34.4803 | 5.4340
            7 | Schub             | 34.4702 | 3.9738
            3 | Bahguerra         | 34.4432 | 5.3661
            8 | rinshan           | 34.3814 | 3.6950
            3 | JEL64             | 34.3160 | 5.5604
            2 | Nightwing         | 34.2906 | 5.3101
            2 | RobinRaven        | 34.2896 | 5.7889
            3 | GamingNoise       | 34.2733 | 4.8472
           13 | daniel greif      | 34.2320 | 2.3373
            7 | QuidProBro        | 34.1935 | 4.4836
            3 | Focastars         | 34.1898 | 5.0783

Limiting it to players with 30+ eligible games makes it more palatable:

Code: [Select]

 games_played |       pname       |   mu    | sigma  
--------------+-------------------+---------+--------
           86 | Tao Chen          | 32.3903 | 1.1416
          111 | nomnomnom         | 32.3275 | 0.9555
          748 | Stef              | 32.1135 | 0.7955
          134 | Boodaloo          | 32.0649 | 0.8679
          657 | Mic Qsenoch       | 31.6395 | 0.7953
          490 | Stealth Tomato    | 31.5151 | 0.8173
          212 | Rene Kuroi        | 31.4289 | 0.8053
          675 | Wandering Winder  | 31.3437 | 0.7959
          531 | Obi Wan Bonogi    | 31.2981 | 0.8044
          681 | hiroki            | 31.2741 | 0.7907
         1580 | SheCantSayNo      | 31.2577 | 0.7875
          138 | HiveMindEmulator  | 31.2219 | 0.8641
          439 | Rabid             | 31.2037 | 0.8033
           53 | Holger            | 31.1264 | 1.2410
          581 | Geronimoo         | 31.1062 | 0.8065
          690 | yudai214          | 31.0915 | 0.7872
           51 | First             | 31.0688 | 1.1839
           41 | yuuna_tu          | 30.9809 | 1.2839
           48 | awall             | 30.8608 | 1.4834
          718 | LESPEUTERE        | 30.8312 | 0.8034
          485 | iriho             | 30.8213 | 0.7930
          419 | jaybeez           | 30.7857 | 0.7980
         1113 | Andrew Iannaccone | 30.7585 | 0.7888
           46 | faw               | 30.7349 | 1.2166
           33 | ooo               | 30.7265 | 1.4631

Quote

Oh, also, you are rating guests, just not games against them. I don't know if you want to clean that up or not.

Thanks. It looks like the problem is that some of the guest players are slipping past my filtering. Goko seems to have used several different guest-naming conventions. That'll be easy to fix for future games, but it might be a while before I clean up the existing data.

WanderingWinder · « **Reply #7 on:** July 18, 2013, 04:31:58 pm »

Quote from: ragingduckd on July 18, 2013, 04:22:28 pm

Quote from: WanderingWinder on July 18, 2013, 01:32:41 pm
If we take even the highest guy from the top 200 (daniel greif, currently #69, sigma = 2.34), against someone with the seemingly typical sigma of .8, the overall difference in the Sigma of the distribution is from 4.32 to 4.85, which at most is going to have a 2.79% change in the expected winrate, which only happens if the opponent is 4.57 points of mu) away from this fellow - corresponding to a jump from 82.7% to 85.5%.

My intuition is failing me here. Is this obviously undesirable? Is fixing it a matter of calibrating sigma or of using a different family of curves?

Oh, this isn't meant to be a criticism at all, just something to note.

It does play a little into my mu-3*sigma point, insofar as the difference between 1.0 and 0.75 is basically nothing to the system, whereas the .75 difference in mu that it is equating to is pretty significant.
Your first listing of straight mu sort is obviously not the most desirable thing ever, but I actually don't see how the second is only 'a little more palatable': to me, this IS the leaderboard. I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something. But basically, yeah, that's what I would go with.

ragingduckd · « **Reply #8 on:** July 18, 2013, 04:46:15 pm »

Quote from: WanderingWinder on July 18, 2013, 04:31:58 pm

Your first listing of straight mu sort is obviously not the most desirable thing ever, but I actually don't see how the second is only 'a little more palatable': to me, this IS the leaderboard

Ok, enourmously more palatable.

Quote

I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something. But basically, yeah, that's what I would go with.

I like N games better than a sigma cutoff. The only thing worse than slipping down the leaderboard after winning a game would be falling off of it entirely. I do wish there were a cleaner solution.

SCSN · « **Reply #9 on:** July 18, 2013, 04:50:03 pm »

Quote from: WanderingWinder on July 18, 2013, 04:31:58 pm

Your first listing of straight mu sort is obviously not the most desirable thing ever

lol, I love these kind of understatements.

Quote

I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something.

If you're going with this I have a strong preference for cutting of at 1, as I've never seen any of those guys with sigma > 1, while I've seen all the players with sigma < 1 quite alot, and played all of them except HME.

WanderingWinder · « **Reply #10 on:** July 18, 2013, 04:54:04 pm »

Quote from: ragingduckd on July 18, 2013, 04:46:15 pm

Quote from: WanderingWinder on July 18, 2013, 04:31:58 pm
Your first listing of straight mu sort is obviously not the most desirable thing ever, but I actually don't see how the second is only 'a little more palatable': to me, this IS the leaderboard

Ok, enourmously more palatable.

Quote
I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something. But basically, yeah, that's what I would go with.

I like N games better than a sigma cutoff. The only thing worse than slipping down the leaderboard after winning a game would be falling off of it entirely. I do wish there were a cleaner solution.

For some reason I was thinking you were still using a time cutoff. While surely it's not really going to make much difference, as whatever sigma cutoff you would pick would be something that an established player isn't going to fall off of without a serious amount of game fixing, I agree with you here - my thought was that you could fall off based on inactivity, which seems fine, but that is not the case, so yeah, games makes more sense.

Fabian · « **Reply #11 on:** July 18, 2013, 05:22:20 pm »

Any chance you could add the "games played" to the leaderboard?

Very nice overall.

ednever · « **Reply #12 on:** July 18, 2013, 08:51:44 pm »

FWIW it looks a lot better for me.

I just clawed my way up to #10 on the Goko board, but I did it without really challenging myself. I post a new game and basically accept anyone with a rating above 4500 or so. Which means a lot of games with people below 5000 rating.

I'm learning that that level on goko corresponds to about 15 on iso?
It's low enough that I rarely lose, even when making pretty significant mistakes. (One of the few I lost, I messed up the interface and kept trashing cards with my Jack that I was trying to play. Including a King's Court. And it was still a close game!)

At the same time the few games I've played against the top players have not gone great. I've only won a handful.

Whatever system goko uses you can move up the ladder to the very top by playing people far below your level and not losing very often (I think I gain about 20 points on a win and lose about 70 on a loss).

On the Trueskill board I'm in #20 which seems a lot more reasonable given how I've been playing.

Nice work.

Ed

blueblimp · « **Reply #13 on:** July 18, 2013, 11:54:33 pm »

For what it's worth, if you want to emulate the isotropic leaderboard exactly, then the web archive still has the settings for it:
http://web.archive.org/web/20130116154350/http://dominion.isotropic.org/faq/

Quote

n a nutshell: skill is measured on a scale that goes from roughly 0 to 50 points. (Actually skill can be any number, but 99.8% of players should fall in the 0–50 range.) The skill range column is a 99.8% confidence interval — the system is 99.8% sure your true skill lies somewhere in that range. New players are assigned a skill of "25 ± 25", which is to say, we don't really have any idea what that person's skill is. As you play more, your mean skill moves up or down and the range gets smaller as the system believes it has a better estimate of your skill.

The level column is the low end of your range, rounded down to an integer and clamped to the range [0, 50]. If we ignore the clamping, it is a conservative skill estimate in the sense that we are 99.9% confident that you are at least that skillful.

Because Dominion has a lot of randomness (it's not uncommon for a low-skill player to beat a high-skill player, through fortunate shuffling), it takes a relatively long time to change your skill — the system needs to see a lot of examples of you winning before it accepts that it's not just due to luck. (For those interested in the details, I've set β = 25, γ = σ₀ / 100 (applied daily), and the draw probability at 5%.)

HiveMindEmulator · « **Reply #14 on:** July 19, 2013, 12:41:20 am »

Quote from: WanderingWinder on July 18, 2013, 01:32:41 pm

It does play a little into my mu-3*sigma point, insofar as the difference between 1.0 and 0.75 is basically nothing to the system, whereas the .75 difference in mu that it is equating to is pretty significant.
Your first listing of straight mu sort is obviously not the most desirable thing ever, but I actually don't see how the second is only 'a little more palatable': to me, this IS the leaderboard. I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something. But basically, yeah, that's what I would go with.

The problem is that the leaderboard means different things to different people. Using the ATP as an example, Andy Murray has been playing very well over the past year, but he didn't play in the French Open this year, so he can't be #1. This doesn't mean you shouldn't expect him to be able to beat Djokovic in the US Open, it just means he hasn't done enough to deserve to be #1, based on the meaning of the ATP rankings. The rankings are not predictive, they are accomplishment-based.

Some people like it this way, and some people don't. Psychologically, sorting by mu makes it kind of crazy. People are already upset at how much one game affects your Goko rating, so I shudder to think how they'll feel about a leaderboard sorted by mu. However, the people who play the most often don't really want to be rewarded for simply playing more games, but only for playing better, so these people may tend to prefer to sort by mu.

Maybe a good compromise would be to provide both? Have the leaderboard be the standard Trueskill thing, but make it sortable by mu, so people can see that if they want?

yed · « **Reply #15 on:** July 19, 2013, 08:48:13 am »

Feature request:
Add "grey level numbers" when a lot of people are in one level. Like isotropic:
http://dominion.isotropic.org/leaderboard/

Schneau · « **Reply #16 on:** July 19, 2013, 07:28:11 pm »

Why are there a bunch of players with ".0000" after there username?

Lightning edit: After some more perusal, it looks like it's for duplicated usernames. It seems strange that Goko would allow those!

ragingduckd · « **Reply #17 on:** July 19, 2013, 08:01:27 pm »

Quote from: blueblimp on July 18, 2013, 11:54:33 pm

For what it's worth, if you want to emulate the isotropic leaderboard exactly, then the web archive still has the settings for it:
http://web.archive.org/web/20130116154350/http://dominion.isotropic.org/faq/

Thanks. I'm considering it. It would be nice for consistency's sake, if nothing else.

Quote from: HiveMindEmulator on July 19, 2013, 12:41:20 am

Maybe a good compromise would be to provide both? Have the leaderboard be the standard Trueskill thing, but make it sortable by mu, so people can see that if they want?

I like the idea of providing variations, but I also fear that this way lies madness. One unofficial leaderboard is bad enough...

Quote from: Schneau on July 19, 2013, 07:28:11 pm

Why are there a bunch of players with ".0000" after there username?

Lightning edit: After some more perusal, it looks like it's for duplicated usernames. It seems strange that Goko would allow those!

I was wondering about that. Are you sure? I haven't looked at their hashes.

Schneau · « **Reply #18 on:** July 19, 2013, 09:37:57 pm »

Quote from: ragingduckd on July 19, 2013, 08:01:27 pm

Quote from: Schneau on July 19, 2013, 07:28:11 pm
Why are there a bunch of players with ".0000" after there username?

Lightning edit: After some more perusal, it looks like it's for duplicated usernames. It seems strange that Goko would allow those!

I was wondering about that. Are you sure? I haven't looked at their hashes.

I'm not sure. It just so happened that the first one I searched for (John Wyatt) had a duplicate. But, a bunch of others don't. So, now I'm much less sure, but it's a decent hypothesis if others with the same name just use a custom username.

rspeer · « **Reply #19 on:** July 19, 2013, 10:35:18 pm »

Quote from: SheCantSayNo on July 18, 2013, 04:50:03 pm

If you're going with this I have a strong preference for cutting of at 1, as I've never seen any of those guys with sigma > 1, while I've seen all the players with sigma < 1 quite alot, and played all of them except HME.

If you're getting unknown players at the top of the leaderboard, setting a hard cutoff isn't the answer. It means that you think these people actually have much more rating uncertainty than what you calculated.

If you haven't switched from (mu - sigma) to (mu - 3*sigma) for the ranking, certainly do that. If you have, there might be other parameters you need to change.

WanderingWinder · « **Reply #20 on:** July 19, 2013, 10:46:02 pm »

Quote from: rspeer on July 19, 2013, 10:35:18 pm

Quote from: SheCantSayNo on July 18, 2013, 04:50:03 pm
If you're going with this I have a strong preference for cutting of at 1, as I've never seen any of those guys with sigma > 1, while I've seen all the players with sigma < 1 quite alot, and played all of them except HME.

If you're getting unknown players at the top of the leaderboard, setting a hard cutoff isn't the answer. It means that you think these people actually have much more rating uncertainty than what you calculated.

The issue with this is, he isn't saying they should be ranked lower because they aren't as good, but because he specifically hasn't played against or seen them. This could even just be time zone issues. It's really not a basis for anything.

Quote

If you haven't switched from (mu - sigma) to (mu - 3*sigma) for the ranking, certainly do that.

Why? Why why why why why?

blueblimp · « **Reply #21 on:** July 19, 2013, 11:52:28 pm »

Quote from: WanderingWinder on July 19, 2013, 10:46:02 pm

Quote
If you haven't switched from (mu - sigma) to (mu - 3*sigma) for the ranking, certainly do that.

Why? Why why why why why?

My justification for leaderboards being conservative: people in general prefer to be conservative when proclaiming who's the best at anything. Otherwise, any newcomer on a hot streak would be proclaimed the best, only to be immediately dethroned as it turns out it was mostly luck. Requiring good performance over a period of many games is more stable.

Subtracting N*sigma from mu is just a way of formalizing that conservatism.

Kirian · « **Reply #22 on:** July 20, 2013, 12:08:36 am »

Quote from: blueblimp on July 19, 2013, 11:52:28 pm

Quote from: WanderingWinder on July 19, 2013, 10:46:02 pm
Quote
If you haven't switched from (mu - sigma) to (mu - 3*sigma) for the ranking, certainly do that.

Why? Why why why why why?
My justification for leaderboards being conservative: people in general prefer to be conservative when proclaiming who's the best at anything. Otherwise, any newcomer on a hot streak would be proclaimed the best, only to be immediately dethroned as it turns out it was mostly luck. Requiring good performance over a period of many games is more stable.

Subtracting N*sigma from mu is just a way of formalizing that conservatism.

I think we should go for mu - 50*sigma personally.

rspeer · « **Reply #23 on:** July 20, 2013, 03:06:39 am »

WW, your plaintive cries of "why" have straightforward answers. (edit: brain fart on who I was addressing)

You subtract some multiple of sigma because you need a confidence interval. You don't want false positives at the top of the leaderboard. If you have lots of false positives, it's not a leaderboard, it's a luckyboard, and everyone can recognize that and suggests blunt fixes by filtering out people who don't meet other criteria.

That multiple is 3*sigma because you want a 99% confidence interval, which handwavily means that 1 player in the top 100 will be there by a lucky fluke. You could choose a different number, sure. 2*sigma would probably be acceptable. 1*sigma gets into the silly range. The 50*sigma that you trollishly suggest is deep, deep into the silly range and you know it -- such a leaderboard would not function at all.

ragingduckd · « **Reply #24 on:** July 20, 2013, 04:58:02 am »

I hereby declare a ceasefire. I can anticipate the next three or four posts and I doubt that any of them will be particularly helpful. Let's keep this thread focused.

There seem to be three reasonable options:

Sort by mu with a cutoff based on variance or number of games
Sort by mu-k*sigma for some k between 1 and 3
Implement the isotropic leaderboard's algorithm and be done with it

I'm pretty sure that nobody wants the fourth option, sorting by mu with no cutoff.

If I understand correctly, the purpose of sorting by mu-k*sigma is the same as that of implementing a cutoff. In both cases, the goal is to keep the top of the leaderboard from being filled up by mediocre players who have been lucky in a small number of games. Either option deviates from a rating system's one truly objective goal: estimating the probability that any given player beats another.

Microsoft Research appears to advocate the mu-k*sigma approach, but they don't take a strong stance on what k should be used. Using any k>0 means sorting players by a deliberate underestimate of their actual skill, but the degree of that underestimate varies with k. With k=3, a player's rank derives from the skill level that we're 99% confident is below their actual skill. To me, that seems a little excessive and possibly unfair to new players. This is what the leaderboard on drunkensailor.org is doing now, and it seems to be what Goko does as well.

Isotropic used k=1. In other words, a player's rank derived from the skill level that was 84%<?> certain to be below their actual skill level. This is still conservative, but not nearly as brutal to new/lucky players as mu-3*sigma. Iso also used an unusually high starting uncertainty: sigma=mu instead of sigma=mu/3. I'm not sure what the motivation for this was, but it explains why Iso had "levels" as high as 53 and as low as -35, while mine runs from 29 to -3.

Finally, note that sigma appears to converge to 0.80 in with my standard Trueskill implementation. A great many of the experience players have ratings between 0.79 and 0.82, and the lowest sigma in all of Goko is 0.78. On Iso, uncertainties never seem to have gotten below 6.5, and they didn't converge nearly as uniformly.

None of this makes sense to me. Intuitively, I would have expected uncertainties to converge asymptotically to zero. I also wouldn't have expected my uncertainties to converge any more uniformly than Iso's did. Are these anomalies evidence of a failure in TrueSkill, in my parameters, in my code, or in my intuition?

Dominion Strategy Forum

News:

Author Topic: Isotropish Leaderboard (alternative to Goko Pro) (Read 145481 times)