Topic: Isotropish Leaderboard (alternative to Goko Pro) (Read 147032 times)

ragingduckd · « **on:** July 18, 2013, 08:00:06 am »

http://isotropish.com is an implementation of Microsoft's TrueSkill rating system. It is meant to be similar to the rating system that was used by Isotropic, but it isn't quite faithful (see below).

TrueSkill parameters:
- Initial rating: μ=25, σ=25
- Adjustment: β=25, τ=25/100, draw rate = 5%
Uses the Python package from trueskill.org
Sorted by "Level" defined as μ - 3σ. Click column names for alternative sorting.
Games with bots count, games with guests don't
Only 2-player "Pro" games count
No daily increase in σ

I originally meant for these parameters to replicate those of the isotropic leaderboard, but there was some confusion at the time about just what those parameters were:

From Isotropic's FAQ (no longer available):

Quote from: http://dominion.isotropic.org/faq/

New players are assigned a skill of "25 ± 25", which is to say, we don't really have any idea what that person's skill is.
...
I've set β = 25, γ = σ₀ / 100 (applied daily), and the draw probability at 5%

From a PM with dougz:

Quote from: dougz on July 31, 2013, 12:22:38 am

No, I start with μ=25, σ=25/3 like everyone else. But since I display "μ ± 3σ" on the leaderboard it shows up as 25 ± 25.

I may not have ever actually had the clamping on the upper end.

Choice of 5%: pulled out of thin air.

Yes, there was a small daily increase in σ. (Moved 1% of the way back to 25/3, I think.) I didn't want people to be able to camp out on the leaderboard by getting to a good position and then not playing.

Based on all this, it looks like dougz was probably using μ₀=25, σ₀=25/3, β = 25, draw_prob=5%, τ=?, and a daily increase in uncertainty of either (0.01)σ or (0.01)(σ₀-σ).

β = 25 is much larger than the TrueSkill default of σ₀/2 = 25/6. The default for τ is σ₀/100 = 25/300, but I don't have anything definite about what value was used for Isotropic. The differences between Isotropic and Isotropish are likely due to the values used for β and τ, and to the fact that Isotropish does not have the daily increase in σ. The different σ₀ values have some effect too, but can't explain the differences in σ for players with thousands of games.

No need to post expressing your personal preferences for these parameters. I'm doing some testing now and will switch to the best-performing values shortly.

TrojH · « **Reply #1 on:** July 18, 2013, 08:55:16 am »

I'd prefer that the Trueskill rating for each person be shown as mu +- (3*sigma), not mu +- sigma. But that's just me.

jsh357 · « **Reply #2 on:** July 18, 2013, 09:04:37 am »

When is it updated? daily?

ragingduckd · « **Reply #3 on:** July 18, 2013, 09:29:43 am »

Quote from: jsh357 on July 18, 2013, 09:04:37 am

When is it updated? daily?

Every 10 min

yed · « **Reply #4 on:** July 18, 2013, 12:27:20 pm »

Good work! Thanks!

WanderingWinder · « **Reply #5 on:** July 18, 2013, 01:32:41 pm »

I assume that you are using the default values for Beta and Tau and all that jazz?
Okay, so if you have Beta=25/6, there are a couple things: first, the values of sigma for the different players are quickly falling to the point that they matter almost nothing. If we take even the highest guy from the top 200 (daniel greif, currently #69, sigma = 2.34), against someone with the seemingly typical sigma of .8, the overall difference in the Sigma of the distribution is from 4.32 to 4.85, which at most is going to have a 2.79% change in the expected winrate, which only happens if the opponent is 4.57 points of mu) away from this fellow - corresponding to a jump from 82.7% to 85.5%.

Furthermore, it should be very easy to whip up the expected winrate in a match between any two players. Maybe another page where you can enter the mu and sigma for both of you, and it spits out the result? Of course, it would be most nice to do it on name vs name, but I don't know how much work that is - the thing I have suggested should be very very simple.

Okay, lastly, my pet peeve, the mu-3*sigma thing. Basically, it is systematically underrating players with high uncertainties. Especially against strong players. If you look at it, the system expects #204 to do better against Stef than it does #6.

Oh, also, you are rating guests, just not games against them. I don't know if you want to clean that up or not.

ragingduckd · « **Reply #6 on:** July 18, 2013, 04:22:28 pm »

Quote from: WanderingWinder on July 18, 2013, 01:32:41 pm

If we take even the highest guy from the top 200 (daniel greif, currently #69, sigma = 2.34), against someone with the seemingly typical sigma of .8, the overall difference in the Sigma of the distribution is from 4.32 to 4.85, which at most is going to have a 2.79% change in the expected winrate, which only happens if the opponent is 4.57 points of mu) away from this fellow - corresponding to a jump from 82.7% to 85.5%.

My intuition is failing me here. Is this obviously undesirable? Is fixing it a matter of calibrating sigma or of using a different family of curves?

Quote

Furthermore, it should be very easy to whip up the expected winrate in a match between any two players. Maybe another page where you can enter the mu and sigma for both of you, and it spits out the result? Of course, it would be most nice to do it on name vs name, but I don't know how much work that is - the thing I have suggested should be very very simple.

Good idea. I have a mapping from player names to current ratings, so neither approach is difficult.

Quote

Okay, lastly, my pet peeve, the mu-3*sigma thing. Basically, it is systematically underrating players with high uncertainties. Especially against strong players. If you look at it, the system expects #204 to do better against Stef than it does #6.

That's true. But do you really want the ordering to reflect the probability of beating Stef? For top-whatever players that's probably a good metric, but for the median player it might not be so great.

In any case, here's the mu-sorted top-25:

Code: [Select]

 games_played |       pname       |   mu    | sigma  
--------------+-------------------+---------+--------
            7 | lettukastike      | 36.7890 | 3.7491
            7 | Tydude            | 36.6604 | 3.8999
            7 | Guest_213286      | 36.5075 | 4.8567
            5 | ShivaBowl         | 36.3048 | 4.8451
            7 | jonts             | 36.0121 | 3.9443
            6 | Tommy             | 35.9250 | 4.0632
            8 | Meteora           | 35.9202 | 4.0425
            2 | Tomas Ramanauskas | 35.6200 | 5.6838
            4 | Lilly Roettgen    | 35.2591 | 5.2667
            4 | w455up            | 35.0957 | 4.7752
            5 | Guest_860460      | 35.0173 | 4.5824
            1 | gianthsiao        | 34.7435 | 6.0963
            4 | Jason Orne        | 34.7036 | 4.4389
           11 | Robert Birks      | 34.5059 | 4.9000
            5 | Saphy89           | 34.4803 | 5.4340
            7 | Schub             | 34.4702 | 3.9738
            3 | Bahguerra         | 34.4432 | 5.3661
            8 | rinshan           | 34.3814 | 3.6950
            3 | JEL64             | 34.3160 | 5.5604
            2 | Nightwing         | 34.2906 | 5.3101
            2 | RobinRaven        | 34.2896 | 5.7889
            3 | GamingNoise       | 34.2733 | 4.8472
           13 | daniel greif      | 34.2320 | 2.3373
            7 | QuidProBro        | 34.1935 | 4.4836
            3 | Focastars         | 34.1898 | 5.0783

Limiting it to players with 30+ eligible games makes it more palatable:

Code: [Select]

 games_played |       pname       |   mu    | sigma  
--------------+-------------------+---------+--------
           86 | Tao Chen          | 32.3903 | 1.1416
          111 | nomnomnom         | 32.3275 | 0.9555
          748 | Stef              | 32.1135 | 0.7955
          134 | Boodaloo          | 32.0649 | 0.8679
          657 | Mic Qsenoch       | 31.6395 | 0.7953
          490 | Stealth Tomato    | 31.5151 | 0.8173
          212 | Rene Kuroi        | 31.4289 | 0.8053
          675 | Wandering Winder  | 31.3437 | 0.7959
          531 | Obi Wan Bonogi    | 31.2981 | 0.8044
          681 | hiroki            | 31.2741 | 0.7907
         1580 | SheCantSayNo      | 31.2577 | 0.7875
          138 | HiveMindEmulator  | 31.2219 | 0.8641
          439 | Rabid             | 31.2037 | 0.8033
           53 | Holger            | 31.1264 | 1.2410
          581 | Geronimoo         | 31.1062 | 0.8065
          690 | yudai214          | 31.0915 | 0.7872
           51 | First             | 31.0688 | 1.1839
           41 | yuuna_tu          | 30.9809 | 1.2839
           48 | awall             | 30.8608 | 1.4834
          718 | LESPEUTERE        | 30.8312 | 0.8034
          485 | iriho             | 30.8213 | 0.7930
          419 | jaybeez           | 30.7857 | 0.7980
         1113 | Andrew Iannaccone | 30.7585 | 0.7888
           46 | faw               | 30.7349 | 1.2166
           33 | ooo               | 30.7265 | 1.4631

Quote

Oh, also, you are rating guests, just not games against them. I don't know if you want to clean that up or not.

Thanks. It looks like the problem is that some of the guest players are slipping past my filtering. Goko seems to have used several different guest-naming conventions. That'll be easy to fix for future games, but it might be a while before I clean up the existing data.

WanderingWinder · « **Reply #7 on:** July 18, 2013, 04:31:58 pm »

Quote from: ragingduckd on July 18, 2013, 04:22:28 pm

Quote from: WanderingWinder on July 18, 2013, 01:32:41 pm
If we take even the highest guy from the top 200 (daniel greif, currently #69, sigma = 2.34), against someone with the seemingly typical sigma of .8, the overall difference in the Sigma of the distribution is from 4.32 to 4.85, which at most is going to have a 2.79% change in the expected winrate, which only happens if the opponent is 4.57 points of mu) away from this fellow - corresponding to a jump from 82.7% to 85.5%.

My intuition is failing me here. Is this obviously undesirable? Is fixing it a matter of calibrating sigma or of using a different family of curves?

Oh, this isn't meant to be a criticism at all, just something to note.

It does play a little into my mu-3*sigma point, insofar as the difference between 1.0 and 0.75 is basically nothing to the system, whereas the .75 difference in mu that it is equating to is pretty significant.
Your first listing of straight mu sort is obviously not the most desirable thing ever, but I actually don't see how the second is only 'a little more palatable': to me, this IS the leaderboard. I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something. But basically, yeah, that's what I would go with.

ragingduckd · « **Reply #8 on:** July 18, 2013, 04:46:15 pm »

Quote from: WanderingWinder on July 18, 2013, 04:31:58 pm

Your first listing of straight mu sort is obviously not the most desirable thing ever, but I actually don't see how the second is only 'a little more palatable': to me, this IS the leaderboard

Ok, enourmously more palatable.

Quote

I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something. But basically, yeah, that's what I would go with.

I like N games better than a sigma cutoff. The only thing worse than slipping down the leaderboard after winning a game would be falling off of it entirely. I do wish there were a cleaner solution.

SCSN · « **Reply #9 on:** July 18, 2013, 04:50:03 pm »

Quote from: WanderingWinder on July 18, 2013, 04:31:58 pm

Your first listing of straight mu sort is obviously not the most desirable thing ever

lol, I love these kind of understatements.

Quote

I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something.

If you're going with this I have a strong preference for cutting of at 1, as I've never seen any of those guys with sigma > 1, while I've seen all the players with sigma < 1 quite alot, and played all of them except HME.

WanderingWinder · « **Reply #10 on:** July 18, 2013, 04:54:04 pm »

Quote from: ragingduckd on July 18, 2013, 04:46:15 pm

Quote from: WanderingWinder on July 18, 2013, 04:31:58 pm
Your first listing of straight mu sort is obviously not the most desirable thing ever, but I actually don't see how the second is only 'a little more palatable': to me, this IS the leaderboard

Ok, enourmously more palatable.

Quote
I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something. But basically, yeah, that's what I would go with.

I like N games better than a sigma cutoff. The only thing worse than slipping down the leaderboard after winning a game would be falling off of it entirely. I do wish there were a cleaner solution.

For some reason I was thinking you were still using a time cutoff. While surely it's not really going to make much difference, as whatever sigma cutoff you would pick would be something that an established player isn't going to fall off of without a serious amount of game fixing, I agree with you here - my thought was that you could fall off based on inactivity, which seems fine, but that is not the case, so yeah, games makes more sense.

Fabian · « **Reply #11 on:** July 18, 2013, 05:22:20 pm »

Any chance you could add the "games played" to the leaderboard?

Very nice overall.

ednever · « **Reply #12 on:** July 18, 2013, 08:51:44 pm »

FWIW it looks a lot better for me.

I just clawed my way up to #10 on the Goko board, but I did it without really challenging myself. I post a new game and basically accept anyone with a rating above 4500 or so. Which means a lot of games with people below 5000 rating.

I'm learning that that level on goko corresponds to about 15 on iso?
It's low enough that I rarely lose, even when making pretty significant mistakes. (One of the few I lost, I messed up the interface and kept trashing cards with my Jack that I was trying to play. Including a King's Court. And it was still a close game!)

At the same time the few games I've played against the top players have not gone great. I've only won a handful.

Whatever system goko uses you can move up the ladder to the very top by playing people far below your level and not losing very often (I think I gain about 20 points on a win and lose about 70 on a loss).

On the Trueskill board I'm in #20 which seems a lot more reasonable given how I've been playing.

Nice work.

Ed

blueblimp · « **Reply #13 on:** July 18, 2013, 11:54:33 pm »

For what it's worth, if you want to emulate the isotropic leaderboard exactly, then the web archive still has the settings for it:
http://web.archive.org/web/20130116154350/http://dominion.isotropic.org/faq/

Quote

n a nutshell: skill is measured on a scale that goes from roughly 0 to 50 points. (Actually skill can be any number, but 99.8% of players should fall in the 0–50 range.) The skill range column is a 99.8% confidence interval — the system is 99.8% sure your true skill lies somewhere in that range. New players are assigned a skill of "25 ± 25", which is to say, we don't really have any idea what that person's skill is. As you play more, your mean skill moves up or down and the range gets smaller as the system believes it has a better estimate of your skill.

The level column is the low end of your range, rounded down to an integer and clamped to the range [0, 50]. If we ignore the clamping, it is a conservative skill estimate in the sense that we are 99.9% confident that you are at least that skillful.

Because Dominion has a lot of randomness (it's not uncommon for a low-skill player to beat a high-skill player, through fortunate shuffling), it takes a relatively long time to change your skill — the system needs to see a lot of examples of you winning before it accepts that it's not just due to luck. (For those interested in the details, I've set β = 25, γ = σ₀ / 100 (applied daily), and the draw probability at 5%.)

HiveMindEmulator · « **Reply #14 on:** July 19, 2013, 12:41:20 am »

Quote from: WanderingWinder on July 18, 2013, 01:32:41 pm

It does play a little into my mu-3*sigma point, insofar as the difference between 1.0 and 0.75 is basically nothing to the system, whereas the .75 difference in mu that it is equating to is pretty significant.
Your first listing of straight mu sort is obviously not the most desirable thing ever, but I actually don't see how the second is only 'a little more palatable': to me, this IS the leaderboard. I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something. But basically, yeah, that's what I would go with.

The problem is that the leaderboard means different things to different people. Using the ATP as an example, Andy Murray has been playing very well over the past year, but he didn't play in the French Open this year, so he can't be #1. This doesn't mean you shouldn't expect him to be able to beat Djokovic in the US Open, it just means he hasn't done enough to deserve to be #1, based on the meaning of the ATP rankings. The rankings are not predictive, they are accomplishment-based.

Some people like it this way, and some people don't. Psychologically, sorting by mu makes it kind of crazy. People are already upset at how much one game affects your Goko rating, so I shudder to think how they'll feel about a leaderboard sorted by mu. However, the people who play the most often don't really want to be rewarded for simply playing more games, but only for playing better, so these people may tend to prefer to sort by mu.

Maybe a good compromise would be to provide both? Have the leaderboard be the standard Trueskill thing, but make it sortable by mu, so people can see that if they want?

yed · « **Reply #15 on:** July 19, 2013, 08:48:13 am »

Feature request:
Add "grey level numbers" when a lot of people are in one level. Like isotropic:
http://dominion.isotropic.org/leaderboard/

Schneau · « **Reply #16 on:** July 19, 2013, 07:28:11 pm »

Why are there a bunch of players with ".0000" after there username?

Lightning edit: After some more perusal, it looks like it's for duplicated usernames. It seems strange that Goko would allow those!

ragingduckd · « **Reply #17 on:** July 19, 2013, 08:01:27 pm »

Quote from: blueblimp on July 18, 2013, 11:54:33 pm

For what it's worth, if you want to emulate the isotropic leaderboard exactly, then the web archive still has the settings for it:
http://web.archive.org/web/20130116154350/http://dominion.isotropic.org/faq/

Thanks. I'm considering it. It would be nice for consistency's sake, if nothing else.

Quote from: HiveMindEmulator on July 19, 2013, 12:41:20 am

Maybe a good compromise would be to provide both? Have the leaderboard be the standard Trueskill thing, but make it sortable by mu, so people can see that if they want?

I like the idea of providing variations, but I also fear that this way lies madness. One unofficial leaderboard is bad enough...

Quote from: Schneau on July 19, 2013, 07:28:11 pm

Why are there a bunch of players with ".0000" after there username?

Lightning edit: After some more perusal, it looks like it's for duplicated usernames. It seems strange that Goko would allow those!

I was wondering about that. Are you sure? I haven't looked at their hashes.

Schneau · « **Reply #18 on:** July 19, 2013, 09:37:57 pm »

Quote from: ragingduckd on July 19, 2013, 08:01:27 pm

Quote from: Schneau on July 19, 2013, 07:28:11 pm
Why are there a bunch of players with ".0000" after there username?

Lightning edit: After some more perusal, it looks like it's for duplicated usernames. It seems strange that Goko would allow those!

I was wondering about that. Are you sure? I haven't looked at their hashes.

I'm not sure. It just so happened that the first one I searched for (John Wyatt) had a duplicate. But, a bunch of others don't. So, now I'm much less sure, but it's a decent hypothesis if others with the same name just use a custom username.

rspeer · « **Reply #19 on:** July 19, 2013, 10:35:18 pm »

Quote from: SheCantSayNo on July 18, 2013, 04:50:03 pm

If you're going with this I have a strong preference for cutting of at 1, as I've never seen any of those guys with sigma > 1, while I've seen all the players with sigma < 1 quite alot, and played all of them except HME.

If you're getting unknown players at the top of the leaderboard, setting a hard cutoff isn't the answer. It means that you think these people actually have much more rating uncertainty than what you calculated.

If you haven't switched from (mu - sigma) to (mu - 3*sigma) for the ranking, certainly do that. If you have, there might be other parameters you need to change.

WanderingWinder · « **Reply #20 on:** July 19, 2013, 10:46:02 pm »

Quote from: rspeer on July 19, 2013, 10:35:18 pm

Quote from: SheCantSayNo on July 18, 2013, 04:50:03 pm
If you're going with this I have a strong preference for cutting of at 1, as I've never seen any of those guys with sigma > 1, while I've seen all the players with sigma < 1 quite alot, and played all of them except HME.

If you're getting unknown players at the top of the leaderboard, setting a hard cutoff isn't the answer. It means that you think these people actually have much more rating uncertainty than what you calculated.

The issue with this is, he isn't saying they should be ranked lower because they aren't as good, but because he specifically hasn't played against or seen them. This could even just be time zone issues. It's really not a basis for anything.

Quote

If you haven't switched from (mu - sigma) to (mu - 3*sigma) for the ranking, certainly do that.

Why? Why why why why why?

blueblimp · « **Reply #21 on:** July 19, 2013, 11:52:28 pm »

Quote from: WanderingWinder on July 19, 2013, 10:46:02 pm

Quote
If you haven't switched from (mu - sigma) to (mu - 3*sigma) for the ranking, certainly do that.

Why? Why why why why why?

My justification for leaderboards being conservative: people in general prefer to be conservative when proclaiming who's the best at anything. Otherwise, any newcomer on a hot streak would be proclaimed the best, only to be immediately dethroned as it turns out it was mostly luck. Requiring good performance over a period of many games is more stable.

Subtracting N*sigma from mu is just a way of formalizing that conservatism.

Kirian · « **Reply #22 on:** July 20, 2013, 12:08:36 am »

Quote from: blueblimp on July 19, 2013, 11:52:28 pm

Quote from: WanderingWinder on July 19, 2013, 10:46:02 pm
Quote
If you haven't switched from (mu - sigma) to (mu - 3*sigma) for the ranking, certainly do that.

Why? Why why why why why?
My justification for leaderboards being conservative: people in general prefer to be conservative when proclaiming who's the best at anything. Otherwise, any newcomer on a hot streak would be proclaimed the best, only to be immediately dethroned as it turns out it was mostly luck. Requiring good performance over a period of many games is more stable.

Subtracting N*sigma from mu is just a way of formalizing that conservatism.

I think we should go for mu - 50*sigma personally.

rspeer · « **Reply #23 on:** July 20, 2013, 03:06:39 am »

WW, your plaintive cries of "why" have straightforward answers. (edit: brain fart on who I was addressing)

You subtract some multiple of sigma because you need a confidence interval. You don't want false positives at the top of the leaderboard. If you have lots of false positives, it's not a leaderboard, it's a luckyboard, and everyone can recognize that and suggests blunt fixes by filtering out people who don't meet other criteria.

That multiple is 3*sigma because you want a 99% confidence interval, which handwavily means that 1 player in the top 100 will be there by a lucky fluke. You could choose a different number, sure. 2*sigma would probably be acceptable. 1*sigma gets into the silly range. The 50*sigma that you trollishly suggest is deep, deep into the silly range and you know it -- such a leaderboard would not function at all.

ragingduckd · « **Reply #24 on:** July 20, 2013, 04:58:02 am »

I hereby declare a ceasefire. I can anticipate the next three or four posts and I doubt that any of them will be particularly helpful. Let's keep this thread focused.

There seem to be three reasonable options:

Sort by mu with a cutoff based on variance or number of games
Sort by mu-k*sigma for some k between 1 and 3
Implement the isotropic leaderboard's algorithm and be done with it

I'm pretty sure that nobody wants the fourth option, sorting by mu with no cutoff.

If I understand correctly, the purpose of sorting by mu-k*sigma is the same as that of implementing a cutoff. In both cases, the goal is to keep the top of the leaderboard from being filled up by mediocre players who have been lucky in a small number of games. Either option deviates from a rating system's one truly objective goal: estimating the probability that any given player beats another.

Microsoft Research appears to advocate the mu-k*sigma approach, but they don't take a strong stance on what k should be used. Using any k>0 means sorting players by a deliberate underestimate of their actual skill, but the degree of that underestimate varies with k. With k=3, a player's rank derives from the skill level that we're 99% confident is below their actual skill. To me, that seems a little excessive and possibly unfair to new players. This is what the leaderboard on drunkensailor.org is doing now, and it seems to be what Goko does as well.

Isotropic used k=1. In other words, a player's rank derived from the skill level that was 84%<?> certain to be below their actual skill level. This is still conservative, but not nearly as brutal to new/lucky players as mu-3*sigma. Iso also used an unusually high starting uncertainty: sigma=mu instead of sigma=mu/3. I'm not sure what the motivation for this was, but it explains why Iso had "levels" as high as 53 and as low as -35, while mine runs from 29 to -3.

Finally, note that sigma appears to converge to 0.80 in with my standard Trueskill implementation. A great many of the experience players have ratings between 0.79 and 0.82, and the lowest sigma in all of Goko is 0.78. On Iso, uncertainties never seem to have gotten below 6.5, and they didn't converge nearly as uniformly.

None of this makes sense to me. Intuitively, I would have expected uncertainties to converge asymptotically to zero. I also wouldn't have expected my uncertainties to converge any more uniformly than Iso's did. Are these anomalies evidence of a failure in TrueSkill, in my parameters, in my code, or in my intuition?

blueblimp · « **Reply #25 on:** July 20, 2013, 05:24:58 am »

Isotropic uses mu - 3*sigma. The "uncertainty" number the leaderboard shows is 3*sigma.

yed · « **Reply #26 on:** July 20, 2013, 05:53:11 am »

Quote from: Schneau on July 19, 2013, 07:28:11 pm

Why are there a bunch of players with ".0000" after there username?

Lightning edit: After some more perusal, it looks like it's for duplicated usernames. It seems strange that Goko would allow those!

I suspect .0000 username is created after Facebook/Google+ login with duplicate name.

Awaclus · « **Reply #27 on:** July 20, 2013, 05:53:39 am »

I vote for

3. Implement the isotropic leaderboard's algorithm and be done with it

Fabian · « **Reply #28 on:** July 20, 2013, 05:59:29 am »

While I certainly haven't put as much thought into this as some of you other guys have, you know, Awaclus suggestion doesn't seem half bad to me.

yed · « **Reply #29 on:** July 20, 2013, 06:15:43 am »

Quote from: ragingduckd on July 20, 2013, 04:58:02 am

On Iso, uncertainties never seem to have gotten below 6.5, and they didn't converge nearly as uniformly.

None of this makes sense to me. Intuitively, I would have expected uncertainties to converge asymptotically to zero. I also wouldn't have expected my uncertainties to converge any more uniformly than Iso's did. Are these anomalies evidence of a failure in TrueSkill, in my parameters, in my code, or in my intuition?

Maybe it has something to do with GAMMA parameter in iso rating:
https://github.com/dougz/trueskill/blob/master/trueskill.py#L285
EDIT: Added quote from iso trueskill source linked above.

Quote

gamma is a small amount by which a player's uncertainty (sigma) is
increased prior to the start of each game. This allows us to
account for skills that vary over time; the effect of old games
on the estimate will slowly disappear unless reinforced by evidence
from new games.

WanderingWinder · « **Reply #30 on:** July 20, 2013, 08:25:50 am »

Quote from: rspeer on July 20, 2013, 03:06:39 am

Kirian, your plaintive cries of "why" have straightforward answers.

I believe I am the one crying 'why'. But whilst these answers may be straightforward, that doesn't mean they are good.

Quote

You subtract some multiple of sigma because you need a confidence interval.

Uh, why? You really don't. Beyond this, you could use 50% confidence, which would get you to 0*sigma. But subtracting some multiple of sigma does *not* give you a confidence level anyway. It would be the bottom value of a CI - why don't you use the top? Except that this doesn't even work anyway, because there really isn't any evidence that the thing you are measuring is normally distributed - even if you are assuming this to be the case with no justification (as TrueSkill implicitly does.... this doesn't really make sense though).

Quote

You don't want false positives at the top of the leaderboard.

You've now switched from using the language of CIs to the language of hypothesis tests. But it doesn't really make sense to have a hypothesis test in this case. What is the hypothesis you are testing? The big issue with using a hypothesis test here is that you have thousands of players which each need measurement, which leads to thousands of different hypotheses, which are all related, and all the hypotheses you could purport to be testing off the top of my head are going to have *every* player fail the test... But what does 'false positive' even mean here? A player who shows up as good but isn't? In setting up the system this way, though, aren't you creating lots of false negatives? I mean, by cutting down alpha so low, you are way increasing beta? Or to look at it another way, you are saying people who haven't played are really bad players. As I look at this, there are 6905 or 6906 (can't tell on one because of rounding) players who are rated as being better than a new player, and only 7301 players total. This simply doesn't make sense.

Quote

If you have lots of false positives, it's not a leaderboard, it's a luckyboard, and everyone can recognize that and suggests blunt fixes by filtering out people who don't meet other criteria.

If this is a problem, then it is because your underlying system is a bad one. After all, it is your system which thinks that these players actually have whatever strength you are assigning them. Indeed, the system you have now isn't a leaderboard either so much as a 'be-sorta-good-but-most-of-all-play-a-lot board'.

Quote

That multiple is 3*sigma because you want a 99% confidence interval, which handwavily means that 1 player in the top 100 will be there by a lucky fluke.

This isn't correct at all. First, again, it is assuming that each player's rating is normally distributed, which is wrong, but more than this, it's actually a 99.73% confidence interval even in that case, and okay, it looks like that's not a big difference, but it swings things from being 1 in 100 to 1 in 370. But okay, the whole normal distribution on a rating thing is just so wrong... if that were the case, we would have based on the players all starting with mu = 25 sigma = 25/3, one in 370 players should have mu over 50 and one in 370 should have mu less than zero. In actuality, you don't have anyone over 37 or below 11, with several thousand players.

Quote

You could choose a different number, sure. 2*sigma would probably be acceptable. 1*sigma gets into the silly range. The 50*sigma that you trollishly suggest is deep, deep into the silly range and you know it -- such a leaderboard would not function at all.

But you're being excessively handwaving here. 2*sigma is fine why? 1 is silly how? In fact, with just 1 sigma, you are already getting to where these people you haven't heard of aren't at the very tip top, it doesn't seem to make all that much difference from 3 to me... And how is 50 deeply deeply silly? I don't understand why it's silly to have it at a low number, silly to have a really high number, but good to have it be 2-3 only. What is so special about those values? I truly don't understand. My best guess is that this is just the way it looks to you because you are used to it - you do 95% confidence intervals or 98% or 99% all the time, so you don't like using less than 2 sigma, and 50 sigmas looks weird to you because it's magically too high a standard for you. Well, I agree it's too high a standard, but you have to look at what you are measuring - it's not something where we want to give anything but our best guess.

The real issue is that subtracting or adding any level of sigma from the mean distorts what it is you are measuring. You are no longer measuring the strength of the results alone and reporting it, but rather a combination of the strength and the number of games played*. This means that to maximize your level, it is no longer so much about playing well when you play, but about playing a lot and playing pretty well. The higher number of sigma you subtract essentially shifts this function toward the playing more side (while adding would shift it to the playing less side). I am strongly of the opinion that what number you are reporting as your measurement should match inasmuch as possible the actual objective of what it is you are trying to measure, and as players play to win, we should be trying to measure their ability to win and not their ability to play a lot.

*technically sigma isn't a straight-up measurement of how many games get played, but there is a very high correlation there, such that by far the easiest way to affect your sigma is to play a lot.

WanderingWinder · « **Reply #31 on:** July 20, 2013, 08:41:40 am »

Quote from: HiveMindEmulator on July 19, 2013, 12:41:20 am

Quote from: WanderingWinder on July 18, 2013, 01:32:41 pm
It does play a little into my mu-3*sigma point, insofar as the difference between 1.0 and 0.75 is basically nothing to the system, whereas the .75 difference in mu that it is equating to is pretty significant.
Your first listing of straight mu sort is obviously not the most desirable thing ever, but I actually don't see how the second is only 'a little more palatable': to me, this IS the leaderboard. I don't know about cutting off at 30 games - I would probably cut off at sigma of 1.5 or 1 or something. But basically, yeah, that's what I would go with.

The problem is that the leaderboard means different things to different people. Using the ATP as an example, Andy Murray has been playing very well over the past year, but he didn't play in the French Open this year, so he can't be #1. This doesn't mean you shouldn't expect him to be able to beat Djokovic in the US Open, it just means he hasn't done enough to deserve to be #1, based on the meaning of the ATP rankings. The rankings are not predictive, they are accomplishment-based.

Some people like it this way, and some people don't. Psychologically, sorting by mu makes it kind of crazy. People are already upset at how much one game affects your Goko rating, so I shudder to think how they'll feel about a leaderboard sorted by mu. However, the people who play the most often don't really want to be rewarded for simply playing more games, but only for playing better, so these people may tend to prefer to sort by mu.

Maybe a good compromise would be to provide both? Have the leaderboard be the standard Trueskill thing, but make it sortable by mu, so people can see that if they want?

The ATP is not a good comparison. Here's why:
For one thing, their ranking system is really terrible - it treats all events within the last year absolutely equally (well, okay, no, it gives more weight to more important tournaments; my point is that *when* the events happened doesn't matter - yesterday is given the same weight as 50 weeks ago), and then throws out everything before that. So, if Andy Murray loses every match he plays from now until next year's Wimbledon, and then he gets to the finals of Wimbledon next year but loses to Djokovic or Nadal (or whatever the best player is at that point), his ranking will DECREASE on the basis of that result, because it was worse than he did this year. It also takes only round into account, and not opponents. So the guy who beat Nadal in the early rounds this year got no more credit for that than someone beating random world number 58. (There are other issues with the system that aren't really relevant here, such as that it makes no differentiation based on surface).

But the big point of your post here is this: "The rankings are not predictive, they are accomplishment-based." Well, the problem with this is, what is an accomplishment? For Tennis, it sort of makes sense to do things as you suggest, because while they are 'open' tournaments, they don't really just let anyone play. But the biggest thing is, you measure the goal. Well, in tennis, it's not just about winning the highest percentage of matches against the best players you can. Oh sure, that's somewhat in there, but really you are trying to win Grand Slams, which is much more important, and then you are trying to win whatever other tournament, that's less important, etc. The point is that not all events are equally important, and so you want to take that into account when doing the rankings. That's not really true of Dominion, and moreover, not in a way which matters: In tennis, playing 20 events a year may well make you more tired, which can inhibit your ability to do as well in them. In Dominion, the fatigue factor... well, maybe it's not non-existent, but it's much less.

But most importantly along these lines, what is accomplishment? I would say that it's having good winning rates much moreso than winning a lot. The thing about accomplishment systems is that everything is positive, you don't lose rating. But I think losing 50 games and winning 2 should be rated worse than just winning any one of those same games, no?

Oh, it's also a misunderstanding if you think that sorting by mu will make the ratings swingier - it really won't.

WanderingWinder · « **Reply #32 on:** July 20, 2013, 09:00:50 am »

Quote from: ragingduckd on July 20, 2013, 04:58:02 am

There seem to be three reasonable options:
Sort by mu with a cutoff based on variance or number of games
Sort by mu-k*sigma for some k between 1 and 3
Implement the isotropic leaderboard's algorithm and be done with it
I'm pretty sure that nobody wants the fourth option, sorting by mu with no cutoff.

I actually do want option number four. The problem with it is only that the system is probably quite wrong. Playing one game, no matter how good you do, shouldn't get you to high enough of a rating that you would be rated like number one on the leaderboard. I mean, that's actually an empirical question - does this make for a better rating system than the alternative or not? You all react against the mu sort because you think it should probably be worse. Well, I tend to agree with that line of thought, but it's an empirical question, and if it is one which we are right on, it actually just means the entire rating system is bad. Well, I suppose I would probably prefer an 'active' leaderboard, such that you fall off after a certain period of inactivity, but still I definitely want a mu sort.

Quote

If I understand correctly, the purpose of sorting by mu-k*sigma is the same as that of implementing a cutoff. In both cases, the goal is to keep the top of the leaderboard from being filled up by mediocre players who have been lucky in a small number of games. Either option deviates from a rating system's one truly objective goal: estimating the probability that any given player beats another.

Well, I don't actually think this is the goal of either, particularly of the mu-k*sigma sort, where I think the goal is to spur more playing. But you're quite right on the goal of a rating system, and this is really why I would want a mu sort - you are expected to be better than every player below you and worse than every player above you. This isn't the case with the current system.

Quote

Microsoft Research appears to advocate the mu-k*sigma approach,

If they have any research saying this, it's market research. Seriously, they put this in their general information about the system, but you don't see it in the scholarly papers, and there's really no statistical backing for it.

Quote

but they don't take a strong stance on what k should be used. Using any k>0 means sorting players by a deliberate underestimate of their actual skill, but the degree of that underestimate varies with k. With k=3, a player's rank derives from the skill level that we're 99% confident is below their actual skill. To me, that seems a little excessive and possibly unfair to new players. This is what the leaderboard on drunkensailor.org is doing now, and it seems to be what Goko does as well.

We actually still don't really know what Goko does. For sure they have some uncertainty thing such that playing more helps your rating, but for all I know it actually gets folded into a single rating number and not separated out as a mu and sigma kind of thing.

Quote

Isotropic used k=1.

Actually, iso used k = 3. You might be confused because the numbers they showed were mu+/-3*sigma, so it looks like they just subtracted the two numbers. But the second number displayed was 3*sigma, not just sigma.

Quote

In other words, a player's rank derived from the skill level that was 84%<?> certain to be below their actual skill level.

Except that this assumes that players' skills are normally distributed, which isn't true. But I've covered this.

Quote

This is still conservative, but not nearly as brutal to new/lucky players as mu-3*sigma. Iso also used an unusually high starting uncertainty: sigma=mu instead of sigma=mu/3. I'm not sure what the motivation for this was, but it explains why Iso had "levels" as high as 53 and as low as -35, while mine runs from 29 to -3.

As I've explained above, you have this wrong, because they displayed 3sigma and not sigma. But actually yours running from -3 to 29 is not something in your favor - if the ratings were actually normally distributed, you would have, based on your number of players, a much bigger range (of mu!) than you do.

Quote

Finally, note that sigma appears to converge to 0.80 in with my standard Trueskill implementation. A great many of the experience players have ratings between 0.79 and 0.82, and the lowest sigma in all of Goko is 0.78. On Iso, uncertainties never seem to have gotten below 6.5, and they didn't converge nearly as uniformly.

They don't actually converge to .80. It's just that it's very hard to get lower than that by playing the way people actually do. For that, I would have to look, but you would either need higher draw rates, or you'd need to do something like play very weak players a lot and win a lot. But it has to do with how their updating equations work, and basically there's enough uncertainty in the game that you can't get lower than this. I don't' think they should ever go down to 0 though, because you really can't ever get totally sure of what someone's skill is with no uncertainty. Anyway, iso's were higher because they incremented upward a little bit with every day that passed (and with every game? I can't recall exactly), which meant that to get them very low, you not only needed to do what you need to do for your system, but you needed to play a heckuva lot, all the time.

Anyway, the real thing to me is, the proof is in the pudding. You go with the system that best measures things, and the only way we have of telling this is based on the predictions, so you go with the thing which best predicts things. Since you are actually only making predictions centred on the value of mu, that is what you should be sorting by.

Edit: Incidentally, it has been suggested at points in the past that I have made such comments in a way which is self-serving. This is pretty clearly not the case here. Relative to other players (if I have counted correctly), this change would help me relative to 8 players, no change relative to 139 besides myself and hurt me relative to 7153.

Awaclus · « **Reply #33 on:** July 20, 2013, 11:03:55 am »

Quote from: Fabian on July 20, 2013, 05:59:29 am

While I certainly haven't put as much thought into this as some of you other guys have, you know, Awaclus suggestion doesn't seem half bad to me.

It was AI's suggestion, not mine. I just voted for it.

markusin · « **Reply #34 on:** July 20, 2013, 11:45:00 am »

If what WW is true, that skill is not normally distributed, then there is little that can be done to accurately model win rates without doing a complete overhaul of the rating system. I don't know much about statistical analysis, but this question has me thinking.

The thing about Dominion is that there is so much inherent randomness, and that randomness is assumed to fit into a normal distribution, but then the skill is also assumed to be normally distributed. Is that correct? Think of games that are almost all pure skill or strength, like Starcraft or arm-wrestling. In those games, a player flat out wins against a much weaker player, barring exceptional circumstances (sickness maybe?). The players of games like that have to be very close in skill for there to even be a contest. So for those games, mu is much more informative about who will win that sigma. Things get complicated because skill/strength clearly decreases without continuous practice/training, so some sort of rating drift seems appropriate.

The normal distribution just seems more convenient than anything else. Both randomness and rating drift can be lumped into sigma, and that seems less controversial that the alternatives like decreasing mu or whatever. Applying another model would just be guesswork without empirical data, and so far the normal distribution doesn't seem to be THAT off. I think the normal distribution mainly has problems with predicting the win rate when the rating gap gets too large.

rrenaud · « **Reply #35 on:** July 20, 2013, 01:39:34 pm »

All models are wrong, some models are useful.

markusin · « **Reply #36 on:** July 20, 2013, 01:51:19 pm »

So long as a model helps predict outcomes given inputs and makes sense, it's doing what it's supposed to.

dondon151 · « **Reply #37 on:** July 20, 2013, 03:16:54 pm »

ITT: whining

Polk5440 · « **Reply #38 on:** July 20, 2013, 03:39:21 pm »

Quote from: markusin on July 20, 2013, 11:45:00 am

The normal distribution just seems more convenient than anything else.

Well, and you can try to hide behind central limit theorems, even if they don't necessarily apply....

markusin · « **Reply #39 on:** July 20, 2013, 09:57:53 pm »

I forgot to say this: It's great that you went out of your way to make this. Awesome work, Andrew Iannaccone.

Titandrake · « **Reply #40 on:** July 22, 2013, 01:18:31 am »

So, you're saying I'm only 4 levels behind Stef?

Guys I finally broke the equivalent of Iso level 40!

SCSN · « **Reply #41 on:** July 22, 2013, 02:36:52 am »

Quote from: Titandrake on July 22, 2013, 01:18:31 am

So, you're saying I'm only 4 levels behind Stef?

Guys I finally broke the equivalent of Iso level 40!

I'm still 4 steps behind my iso glory days as level 33

hsiale · « **Reply #42 on:** July 22, 2013, 05:34:14 am »

Quote from: Awaclus on July 20, 2013, 05:53:39 am

I vote for

3. Implement the isotropic leaderboard's algorithm and be done with it

+1. That algorithm was good enough and people are used to it - reimplementing means I can compare how I play to how I played on Iso. Currently I have no idea how the level on Goko official leaderboard or the unofficial one compares to low 20ish level that was my Iso peak.

ragingduckd · « **Reply #43 on:** July 22, 2013, 07:47:38 am »

Ok, I've reached a conclusion. I'm wimping out.

I'm still interested in what sort of rating system is best for Dominion, but I don't feel qualified to answer that question. For now, what I can definitely do is give the peoples something they want and something that's better than Goko.

So please vote on which system you prefer. If there are two or three options you like equally, you can vote for all of them. I'm not going to implement multiple views, as I think that's just inviting chaos.

Incidentally, I found a way to see the full Goko rating data, or at least your mu and sigma together. It turns out that the rating they show is actually mu - 2 * sigma.

SCSN · « **Reply #44 on:** July 22, 2013, 08:03:55 am »

Quote from: ragingduckd on July 22, 2013, 07:47:38 am

Incidentally, I found a way to see the full Goko rating data, or at least your mu and sigma together. It turns out that the rating they show is actually mu - 2 * sigma.

I'm very curious as to which part of that is too complicated to plop into a formula. Is it the subtraction, or perhaps the multiplication by two?

Polk5440 · « **Reply #45 on:** July 22, 2013, 09:33:24 am »

Quote from: SheCantSayNo on July 22, 2013, 08:03:55 am

Quote from: ragingduckd on July 22, 2013, 07:47:38 am
Incidentally, I found a way to see the full Goko rating data, or at least your mu and sigma together. It turns out that the rating they show is actually mu - 2 * sigma.

I'm very curious as to which part of that is too complicated to plop into a formula. Is it the subtraction, or perhaps the multiplication by two?

Goko already told us this directly.

It's the updating of mu and sigma that they are not explicitly revealing.

Edit: Since this post they added the drift downward for inactivity.

sudgy · « **Reply #46 on:** July 22, 2013, 02:12:46 pm »

I would vote for Iso's rating system.

WanderingWinder · « **Reply #47 on:** July 22, 2013, 04:01:39 pm »

Quote from: markusin on July 20, 2013, 11:45:00 am

If what WW is true, that skill is not normally distributed, then there is little that can be done to accurately model win rates without doing a complete overhaul of the rating system. I don't know much about statistical analysis, but this question has me thinking.

The thing about Dominion is that there is so much inherent randomness, and that randomness is assumed to fit into a normal distribution, but then the skill is also assumed to be normally distributed. Is that correct? Think of games that are almost all pure skill or strength, like Starcraft or arm-wrestling. In those games, a player flat out wins against a much weaker player, barring exceptional circumstances (sickness maybe?). The players of games like that have to be very close in skill for there to even be a contest. So for those games, mu is much more informative about who will win that sigma. Things get complicated because skill/strength clearly decreases without continuous practice/training, so some sort of rating drift seems appropriate.

The normal distribution just seems more convenient than anything else. Both randomness and rating drift can be lumped into sigma, and that seems less controversial that the alternatives like decreasing mu or whatever. Applying another model would just be guesswork without empirical data, and so far the normal distribution doesn't seem to be THAT off. I think the normal distribution mainly has problems with predicting the win rate when the rating gap gets too large.

Not exactly. I mean, whatever distribution you're going to use is going to calibrate, so that essentially (this is a bit of a simplification) it will be accurate for your average matchup. Then, the curves aren't going to be *that* far off in the region of your matchup. I could spout to you a whole bunch of plausible curves that for their mid-sections are within half a percent of each other. And you're not going to really notice that - you probably won't even notice the difference between 55% and 58% very much, or at least certainly not very quickly. By far the biggest differences you are going to get though are in the further out matchups, where one player is heavily favored.

Anyway, long story short, the curve can be reasonably bad overall and still perform pretty well for the majority of matchups you're in, but the CIs break down very quickly if you aren't normal.

Also, @AI/ragingduckd or everyone really: I strongly think that the way to sort is just by mu, but for all that... it's not really that important, and I am anyway pretty happy with it so long as you continue to show it broken up, so that everyone can get out of it what they want, either way you end up sorting.

Quote from: ragingduckd on July 22, 2013, 07:47:38 am

Incidentally, I found a way to see the full Goko rating data, or at least your mu and sigma together. It turns out that the rating they show is actually mu - 2 * sigma.

Interesting. How does it compare to what you have up?

mail-mi · « **Reply #48 on:** July 22, 2013, 06:37:31 pm »

How do i find me? is there a search engine for names? that would be awesome.

rrenaud · « **Reply #49 on:** July 22, 2013, 07:07:03 pm »

The other thing to consider is that if you are dumping data an HTML table, it's really easy to provide options to sort by various columns. I used this sorttable Javascript library a lot in councilroom.

Maybe the best answer to "which way to sort" is "who cares? Just click whichever column you like".

SCSN · « **Reply #50 on:** July 22, 2013, 07:13:42 pm »

Quote from: rrenaud on July 22, 2013, 07:07:03 pm

Maybe the best answer to "which way to sort" is "who cares? Just click whichever column you like".

Good thing -Stef- dropped his hyphens, now I can at least outrank him via alphabetical sort.

Kirian · « **Reply #51 on:** July 22, 2013, 08:42:26 pm »

Quote from: rrenaud on July 22, 2013, 07:07:03 pm

The other thing to consider is that if you are dumping data an HTML table, it's really easy to provide options to sort by various columns. I used this sorttable Javascript library a lot in councilroom.

Maybe the best answer to "which way to sort" is "who cares? Just click whichever column you like".

This.

Kirian · « **Reply #52 on:** July 22, 2013, 08:43:24 pm »

Quote from: mail-mi on July 22, 2013, 06:37:31 pm

How do i find me? is there a search engine for names? that would be awesome.

Ctrl-F doesn't work for you?

WanderingWinder · « **Reply #53 on:** July 22, 2013, 08:54:32 pm »

Quote from: mail-mi on July 22, 2013, 06:37:31 pm

How do i find me? is there a search engine for names? that would be awesome.

If you're in windows, ctrl+f is your friend.

sudgy · « **Reply #54 on:** July 22, 2013, 09:21:06 pm »

Woah, how am I nine levels behind Stef?

mail-mi · « **Reply #55 on:** July 22, 2013, 11:13:30 pm »

Quote from: Kirian on July 22, 2013, 08:43:24 pm

Quote from: mail-mi on July 22, 2013, 06:37:31 pm
How do i find me? is there a search engine for names? that would be awesome.

Ctrl-F doesn't work for you?

{facepalm}

Kirian · « **Reply #56 on:** July 22, 2013, 11:54:08 pm »

Quote from: mail-mi on July 22, 2013, 11:13:30 pm

Quote from: Kirian on July 22, 2013, 08:43:24 pm
Quote from: mail-mi on July 22, 2013, 06:37:31 pm
How do i find me? is there a search engine for names? that would be awesome.

Ctrl-F doesn't work for you?
{facepalm}

It's all right, man, we all have those days.

lespeutere · « **Reply #57 on:** July 23, 2013, 07:47:55 am »

Quote from: sudgy on July 22, 2013, 09:21:06 pm

Woah, how am I nine levels behind Stef?

You're saying "only 9" or "9, so many"?

Awaclus · « **Reply #58 on:** July 23, 2013, 08:24:22 am »

Feature request: "Last updated on Tue, Jul 23 at 05:19 AM PDT (5 minutes ago)"

sudgy · « **Reply #59 on:** July 23, 2013, 02:13:33 pm »

Quote from: lespeutere on July 23, 2013, 07:47:55 am

Quote from: sudgy on July 22, 2013, 09:21:06 pm
Woah, how am I nine levels behind Stef?
You're saying "only 9" or "9, so many"?

"only 9". On iso I was around level 13, and you know where he was...

mail-mi · « **Reply #60 on:** July 27, 2013, 06:06:31 pm »

I was about level 1 on iso.

Now I'm 22. I feel so proud.

StrongRhino · « **Reply #61 on:** July 27, 2013, 07:51:05 pm »

I was 3. Now I'm 24.

Awaclus · « **Reply #62 on:** July 27, 2013, 08:14:20 pm »

Quote from: StrongRhino on July 27, 2013, 07:51:05 pm

I was 3. Now I'm 24.

I'm 18... wait, this wasn't the age thread?

HiveMindEmulator · « **Reply #63 on:** July 27, 2013, 08:20:07 pm »

ragingduckd · « **Reply #64 on:** July 27, 2013, 08:53:15 pm »

I wrote to dougz asking for clarification on his TS implementation, but I'll ask the same questions here in case any of you know the answers. The quotes are from http://web.archive.org/web/20130116154350/http://dominion.isotropic.org/faq/

Quote

In a nutshell: skill is measured on a scale that goes from roughly 0 to 50 points. (Actually skill can be any number, but 99.8% of players should fall in the 0–50 range.) The skill range column is a 99.8% confidence interval — the system is 99.8% sure your true skill lies somewhere in that range. New players are assigned a skill of "25 ± 25", which is to say, we don't really have any idea what that person's skill is.

Does 25 +/- 25 mean that you're starting with mu=25, σ=25? If so, then 0-50 would be more like a 68% CI, right? The default for the other implementations I've seen is 25 +/- 8.66. That is, σ=25/3.

Quote

The level column is the low end of your range, rounded down to an integer and clamped to the range [0, 50]. If we ignore the clamping, it is a conservative skill estimate in the sense that we are 99.9% confident that you are at least that skillful.

Did you eventually remove this on the upper end? I see a lot of people on your leaderboard who would be at large negative levels but are listed at 0. But I also see lespeutere and -Stef- over 50.

Quote

I've set β = 25, γ = σ0 / 100 (applied daily), and the draw probability at 5%.)

Empirically, I find more like a 1.75% draw rate. Was there any special reason for chooing 5%?

@Obi Wan Bonogi: You voted for isotropic's TS algorithm "without the decay," but dougz doesn't mention a decay in mu or a daily increase in σ. Can you clarify?

jonts26 · « **Reply #65 on:** July 27, 2013, 08:59:39 pm »

I'm pretty sure for iso initial sigma = 25/3 so the leaderboard value for a new player mu - 3*sigma is 25 ± 25. And the decay affects just the uncertainty parameter which comes into play with

Quote

γ = σ0 / 100 (applied daily)

ragingduckd · « **Reply #66 on:** July 27, 2013, 09:09:31 pm »

Quote from: jonts26 on July 27, 2013, 08:59:39 pm

I'm pretty sure for iso initial sigma = 25/3 so the leaderboard value for a new player mu - 3*sigma is 25 ± 25. And the decay affects just the uncertainty parameter which comes into play with
Quote
γ = σ0 / 100 (applied daily)

The thing is, I'm using initial sigma=25/3 too but my final mu's and sigma's are both way smaller than isotropic's. I'm not sure what else I'm doing differently that could have such a large effect.

yed · « **Reply #67 on:** July 28, 2013, 03:19:59 am »

INITIAL_MU = 25.0
INITIAL_SIGMA = INITIAL_MU / 3.0
https://github.com/dougz/trueskill/blob/master/trueskill.py#L261

What do you use as BETA, EPSILON, GAMMA?
https://github.com/dougz/trueskill/blob/master/trueskill.py#L267

ragingduckd · « **Reply #68 on:** July 28, 2013, 04:46:23 am »

Quote from: yed on July 28, 2013, 03:19:59 am

INITIAL_MU = 25.0
INITIAL_SIGMA = INITIAL_MU / 3.0
https://github.com/dougz/trueskill/blob/master/trueskill.py#L261

What do you use as BETA, EPSILON, GAMMA?
https://github.com/dougz/trueskill/blob/master/trueskill.py#L267

Aha! Thanks. I think I'm screwing up my betas. I'm using sublee's default of beta = INITIAL_SIGMA/2, while dougz used beta = 25.

Edit: I'm doing the epsilons and gammas differently too, but I think it's the betas that are causing real trouble.

ragingduckd · « **Reply #69 on:** July 28, 2013, 05:21:08 am »

Quote from: jonts26 on July 27, 2013, 08:59:39 pm

I'm pretty sure for iso initial sigma = 25/3 so the leaderboard value for a new player mu - 3*sigma is 25 ± 25. And the decay affects just the uncertainty parameter which comes into play with
Quote
γ = σ0 / 100 (applied daily)

Well, actually I don't understand this either. If σ0 is the starting sigma, a constant, then how can γ = σ0 / 100 be applied daily? Further, gamma is global, not player-specific.

yed · « **Reply #70 on:** July 28, 2013, 05:37:58 am »

Quote from: ragingduckd on July 28, 2013, 05:21:08 am

Quote from: jonts26 on July 27, 2013, 08:59:39 pm
I'm pretty sure for iso initial sigma = 25/3 so the leaderboard value for a new player mu - 3*sigma is 25 ± 25. And the decay affects just the uncertainty parameter which comes into play with
Quote
γ = σ0 / 100 (applied daily)

Well, actually I don't understand this either. If σ0 is the starting sigma, a constant, then how can γ = σ0 / 100 be applied daily? Further, gamma is global, not player-specific.

If I understand it correctly for each player new sigma is daily calculated like this:

Quote

sigma=sqrt(pl.skill[1] ** 2 + GAMMA ** 2)

(I think pl.skill[1] is old sigma.)
https://github.com/dougz/trueskill/blob/master/trueskill.py#L345

ragingduckd · « **Reply #71 on:** July 28, 2013, 06:00:51 am »

Quote from: yed on July 28, 2013, 05:37:58 am

Quote from: ragingduckd on July 28, 2013, 05:21:08 am
Quote from: jonts26 on July 27, 2013, 08:59:39 pm
I'm pretty sure for iso initial sigma = 25/3 so the leaderboard value for a new player mu - 3*sigma is 25 ± 25. And the decay affects just the uncertainty parameter which comes into play with
Quote
γ = σ0 / 100 (applied daily)

Well, actually I don't understand this either. If σ0 is the starting sigma, a constant, then how can γ = σ0 / 100 be applied daily? Further, gamma is global, not player-specific.
If I understand it correctly for each player new sigma is daily calculated like this:
Quote
sigma=sqrt(pl.skill[1] ** 2 + GAMMA ** 2)
(I think pl.skill[1] is old sigma.)
https://github.com/dougz/trueskill/blob/master/trueskill.py#L345

Yes, that I follow. But the GAMMA in that equation is the same for all players in the game.

Actually, I think it's just constant over all games. It's possible that dougz was updating it for each matchup using isotropic code that isn't included in his trueskill package, but even then each player in that matchup would be using the same GAMMA regardless of their individual sigmas, so I still don't understand what γ = σ0 / 100 (applied daily) means.

yed · « **Reply #72 on:** July 28, 2013, 06:15:48 am »

I undersand it like

Quote

γ = σ0 / 100

corresponds to this code:

Quote

GAMMA = INITIAL_SIGMA / 100.0

https://github.com/dougz/trueskill/blob/master/trueskill.py#L307
And

Quote

(applied daily)

means apply it as here:

Quote

sigma=sqrt(pl.skill[1] ** 2 + GAMMA ** 2)

https://github.com/dougz/trueskill/blob/master/trueskill.py#L345

I don't understand it completely. And probably because of that I don't get why using same GAMMA regardless of their individual sigmas is a problem.

qmech · « **Reply #73 on:** July 28, 2013, 06:28:19 am »

Quote from: ragingduckd on July 28, 2013, 06:00:51 am

so I still don't understand what γ = σ0 / 100 (applied daily) means.

In the early days Iso would increase the variance after each game. Since that put a hard floor on how low the variance could go, it was later changed to happen once a day. So "applied daily" means that you only fudge the variance once a day, rather than recalculating the (constant) gamma daily.

jonts26 · « **Reply #74 on:** July 28, 2013, 10:08:42 am »

Quote from: ragingduckd on July 28, 2013, 06:00:51 am

Yes, that I follow. But the GAMMA in that equation is the same for all players in the game.

Actually, I think it's just constant over all games. It's possible that dougz was updating it for each matchup using isotropic code that isn't included in his trueskill package, but even then each player in that matchup would be using the same GAMMA regardless of their individual sigmas, so I still don't understand what γ = σ0 / 100 (applied daily) means.

Yed covered most of it but just to clarify, at the end of every day, the sigma parameter for each player is adjusted by the gamma parameter. Gamma is the same for every player, every day and is equal to σ0 / 100 or 0.08333...

I had thought that it was just added to the end of the day sigma, but it seems it's figured in using the equation sigma_new = sqrt(sigma_old^2 + gamma^2). Actually, that way makes more sense as it provides a diminishing returns on how fast the variance increases.

qmech · « **Reply #75 on:** July 28, 2013, 10:53:42 am »

Quote from: jonts26 on July 28, 2013, 10:08:42 am

I had thought that it was just added to the end of the day sigma, but it seems it's figured in using the equation sigma_new = sqrt(sigma_old^2 + gamma^2). Actually, that way makes more sense as it provides a diminishing returns on how fast the variance increases.

sigma^2 is the variance.

(I know you know this!)

jonts26 · « **Reply #76 on:** July 28, 2013, 11:13:35 am »

Quote from: qmech on July 28, 2013, 10:53:42 am

Quote from: jonts26 on July 28, 2013, 10:08:42 am
I had thought that it was just added to the end of the day sigma, but it seems it's figured in using the equation sigma_new = sqrt(sigma_old^2 + gamma^2). Actually, that way makes more sense as it provides a diminishing returns on how fast the variance increases.

sigma^2 is the variance.

(I know you know this!)

Accurate terminology is for chumps.

ragingduckd · « **Reply #77 on:** July 28, 2013, 02:25:35 pm »

Thanks all for your help. I think I've got the iso TS algorithm now, or at least a close and reasonable variant.

Quote from: jonts26 on July 28, 2013, 10:08:42 am

Yed covered most of it but just to clarify, at the end of every day, the sigma parameter for each player is adjusted by the gamma parameter. Gamma is the same for every player, every day and is equal to σ0 / 100 or 0.08333...

This makes sense mathematically, it's consistent with other TS descriptions, and it appears to be what dougz's code does.

Quote from: jonts26 on July 28, 2013, 10:08:42 am

I had thought that it was just added to the end of the day sigma, but it seems it's figured in using the equation sigma_new = sqrt(sigma_old^2 + gamma^2). Actually, that way makes more sense as it provides a diminishing returns on how fast the variance increases.

This is a plausible algorithm, but I don't see it in dougz's code or in his description, nor in others' TS descriptions or implementations. The term (σ₀² + γ²) does make an appearance in the ordinary TS algorithm (what Dangauthier et al call "Vanilla" TrueSkill), but it's only in the per-game prior for each player at the top of the factor graph. That affects the post-game values of σ, but only after passing through the whole factor graph and the Bayesian update from the game result.

It's possible that dougz was doing this after each day's results as a modification of TS, though again I don't think that's what the description says. In any case, I just don't like it all that much. I suppose I'll include it if dougz says that it's what isotropic was using and/or if it gives a big boost to predictive performance.

-------

I ran sublee's TS implementation on the Goko data with with μ₀=σ₀=25, β=σ₀/2, τ=σ₀/100, and ε such that the draw probability is 5%. The parameters that generate my current leaderboard are μ₀=25, σ₀=μ₀/3, β=σ₀/2, τ=σ₀/100, and ε such that the draw probability is 1.75%. Using μ₀=σ₀ is a little unusual, and 1.75% is the draw probability in my dataset, but these differences don't seem to matter much for either the ordering or the predictive performance (see below).

Note that what dougz calls τ isn't the "precision" τ:=1/σ² but the σ-adjusting term we've been talking about. For reasons I cannot fathom, the TS papers and sublee's code use the symbol τ for both of these values. In dougz's code, he sensible renames the σ-adjustment term to γ (as in the equation at the top of this post).

The mean binomial deviance I calculated from the Goko sample was basically the same regardless of which parameters I used, 0.379 and 0.373 respectively. So that's not strong evidence that either set of parameters is superior. I don't know what mbd Glicko or Elo would generate on this data set, but I expect it's worse. I also expect that there's plenty of room for improvement here.

If assume that isotropic was displaying μ +/- 3σ as the skill range and also using that number for the level, then the leaderboard resulting from my might-be-iso parameters looks a fair amount like the iso leaderboard, and it's ordered a lot like my current implementation:

Code: [Select]

                          Stef - Level 57: 67.64 +/- 10.10 ( 893 games)
                   Mic Qsenoch - Level 53: 63.87 +/- 10.19 ( 837 games)
                    LESPEUTERE - Level 51: 62.05 +/- 10.24 ( 960 games)
                     nomnomnom - Level 50: 64.44 +/- 14.42 ( 118 games)
                    Rene Kuroi - Level 49: 59.74 +/- 10.65 ( 274 games)
                     Geronimoo - Level 48: 58.51 +/- 10.18 ( 717 games)
             Andrew Iannaccone - Level 48: 58.10 +/-  9.99 (1154 games)
              Wandering Winder - Level 48: 58.07 +/- 10.06 ( 854 games)
                           jog - Level 47: 58.04 +/- 10.09 ( 862 games)
                Obi Wan Bonogi - Level 46: 57.10 +/- 10.12 ( 587 games)
                        Fabian - Level 45: 56.12 +/- 10.27 ( 366 games)
                      Boodaloo - Level 45: 58.44 +/- 12.94 ( 134 games)
                      Tao Chen - Level 45: 62.69 +/- 17.44 (  86 games)
                    kenyou2859 - Level 45: 55.59 +/- 10.45 ( 394 games)
                         Rabid - Level 44: 54.98 +/- 10.11 ( 508 games)
                  SheCantSayNo - Level 44: 54.81 +/-  9.95 (1781 games)
                     blueblimp - Level 44: 55.34 +/- 10.89 ( 261 games)
              HiveMindEmulator - Level 44: 56.90 +/- 12.53 ( 149 games)
                       eliegel - Level 44: 54.33 +/- 10.30 ( 398 games)
                Stealth Tomato - Level 43: 54.07 +/- 10.24 ( 573 games)
                    shark_bait - Level 43: 54.48 +/- 10.78 ( 243 games)
                        Jeebus - Level 42: 53.45 +/- 10.54 ( 300 games)
                         manzi - Level 42: 53.03 +/- 10.18 ( 356 games)
              Mike Harris.0001 - Level 42: 53.08 +/- 10.24 ( 423 games)

The major remaining difference is that the 3σ values are higher and more tightly clustered than on the iso board, but that's consistent with the smaller number and variation in number of games played.

Summary:

I find it pretty plausible that isotropic was using "Vanilla" TrueSkill with the parameters above. The difference between those parameters and the ones I would choose doesn't seem large, and I see no real justification for any major deviation from the standard algorithm. Until I hear otherwise from dougz or unless someone else has a compelling argument to the contrary, I'm going to change my leaderboard to use those parameters.

shark_bait · « **Reply #78 on:** July 28, 2013, 02:51:28 pm »

Ha ha, this leaderboard is so schizophrenic right now, I'm guessing it's catching up and parsing data?

ragingduckd · « **Reply #79 on:** July 28, 2013, 03:04:59 pm »

Quote from: shark_bait on July 28, 2013, 02:51:28 pm

Ha ha, this leaderboard is so schizophrenic right now, I'm guessing it's catching up and parsing data?

Yes, it can't make up it's mind.

Edit: All caught up. Now using TS with the parameters described two posts up, also implemented ~~sortable rows,~~ # of games, and those gray "Level" entries.

sudgy · « **Reply #80 on:** July 28, 2013, 03:19:58 pm »

Aw, now I got demoted even more than I expected.

Oh well, now my main question is: "DID STEF GO SUPERHUMAN?!?"

Jdaki · « **Reply #81 on:** July 28, 2013, 03:24:37 pm »

Although I've wasted the last who knows how long minutes reading all the gammas and alphas and mus without a clue, this leader-board is brilliant work. Kudos to Andrew.

Awaclus · « **Reply #82 on:** July 28, 2013, 03:58:37 pm »

Quote from: sudgy on July 28, 2013, 03:19:58 pm

Aw, now I got demoted even more than I expected.

Oh well, now my main question is: "DID STEF GO SUPERHUMAN?!?"

Stef has always been superhuman. My main question is: "DID STEF GO SUPERSTEF?!?"

WanderingWinder · « **Reply #83 on:** July 28, 2013, 04:06:05 pm »

Is anyone else having the problem that this site is almost without fail causing crashes/extraordinarily high lag?

Awaclus · « **Reply #84 on:** July 28, 2013, 04:08:15 pm »

Quote from: WanderingWinder on July 28, 2013, 04:06:05 pm

Is anyone else having the problem that this site is almost without fail causing crashes/extraordinarily high lag?

I don't know if anyone else is having the problem, but it has been working quite well for me.

ragingduckd · « **Reply #85 on:** July 28, 2013, 04:21:51 pm »

Quote from: Awaclus on July 28, 2013, 04:08:15 pm

Quote from: WanderingWinder on July 28, 2013, 04:06:05 pm
Is anyone else having the problem that this site is almost without fail causing crashes/extraordinarily high lag?
I don't know if anyone else is having the problem, but it has been working quite well for me.

The table-sorting plugin is a bit of a memory hog... probably isn't really designed to deal with 8k rows. I guess it might be a problem with mobile devices, or maybe if you're running a JS debugger or certain browser add-ons.

In any case, I just disabled the sortable columns. See if it's still a problem.

WanderingWinder · « **Reply #86 on:** July 28, 2013, 04:30:40 pm »

Quote from: ragingduckd on July 28, 2013, 04:21:51 pm

Quote from: Awaclus on July 28, 2013, 04:08:15 pm
Quote from: WanderingWinder on July 28, 2013, 04:06:05 pm
Is anyone else having the problem that this site is almost without fail causing crashes/extraordinarily high lag?
I don't know if anyone else is having the problem, but it has been working quite well for me.

The table-sorting plugin is a bit of a memory hog... probably isn't really designed to deal with 8k rows. I guess it might be a problem with mobile devices, or maybe if you're running a JS debugger or certain browser add-ons.

In any case, I just disabled the sortable columns. See if it's still a problem.

It's been a problem almost every time I refreshed it since before the sorter, though it was usually just that it would be 15ish seconds of I can't do anything. The sorting made it WAY worse and totally froze me out to where I had to stop the browser's process. Just tried it again and first time it took 10 seconds or so, second time it was loading for a minute and I gave up on it. FWIW, it doesn't seem to be a memory thing - it's making my processor usage go up to the max I am letting my browser use (25% of my total processing power). And I am on a laptop.

Edit: Okay, seems to not really be an issue if I bring the page up in chrome (well, the sorting thing would still squawk and take forever to actually sort, but it wasn't locking me up). But the problems I am experiencing are in firefox (22.0), which is weird because iirc that's what you use to test....

ragingduckd · « **Reply #87 on:** July 28, 2013, 05:08:33 pm »

Quote from: WanderingWinder on July 28, 2013, 04:30:40 pm

Quote from: ragingduckd on July 28, 2013, 04:21:51 pm
Quote from: Awaclus on July 28, 2013, 04:08:15 pm
Quote from: WanderingWinder on July 28, 2013, 04:06:05 pm
Is anyone else having the problem that this site is almost without fail causing crashes/extraordinarily high lag?
I don't know if anyone else is having the problem, but it has been working quite well for me.

The table-sorting plugin is a bit of a memory hog... probably isn't really designed to deal with 8k rows. I guess it might be a problem with mobile devices, or maybe if you're running a JS debugger or certain browser add-ons.

In any case, I just disabled the sortable columns. See if it's still a problem.
It's been a problem almost every time I refreshed it since before the sorter, though it was usually just that it would be 15ish seconds of I can't do anything. The sorting made it WAY worse and totally froze me out to where I had to stop the browser's process. Just tried it again and first time it took 10 seconds or so, second time it was loading for a minute and I gave up on it. FWIW, it doesn't seem to be a memory thing - it's making my processor usage go up to the max I am letting my browser use (25% of my total processing power). And I am on a laptop.

Edit: Okay, seems to not really be an issue if I bring the page up in chrome (well, the sorting thing would still squawk and take forever to actually sort, but it wasn't locking me up). But the problems I am experiencing are in firefox (22.0), which is weird because iirc that's what you use to test....

Yes, I have no problems in FF. There's no client-side code anymore, so it shouldn't be taking any CPU. It's an unusually large page though (~2MB), which isn't a problem in itself, but I suppose it's possible that some FF Add-on is causing trouble, if it processes pages in a particularly CPU-intensive way. I have an appreciable delay when opening up Firebug, for example. Even if that's the problem though, the real solution would be for me to paginate the leaderboard so it's not such a massive page.

soulnet · « **Reply #88 on:** July 28, 2013, 05:33:56 pm »

Quote from: ragingduckd on July 28, 2013, 05:08:33 pm

Yes, I have no problems in FF. There's no client-side code anymore, so it shouldn't be taking any CPU. It's an unusually large page though (~2MB), which isn't a problem in itself, but I suppose it's possible that some FF Add-on is causing trouble, if it processes pages in a particularly CPU-intensive way. I have an appreciable delay when opening up Firebug, for example. Even if that's the problem though, the real solution would be for me to paginate the leaderboard so it's not such a massive page.

Unless is causing trouble to the server, please don't do that, or at least keep the option to get the entire table. Being able to ctrl+f through it and browse up and down to look who you know or just played is close is great.

HiveMindEmulator · « **Reply #89 on:** July 28, 2013, 05:47:38 pm »

Maybe a dumb question, but why is the number of games so different from the number of games listed in the goko stats (when you click on avatars in the lobby)? Does the latter include casual or adventures or are some pro games excluded from these rankings?

Kirian · « **Reply #90 on:** July 28, 2013, 05:56:11 pm »

Quote from: HiveMindEmulator on July 28, 2013, 05:47:38 pm

Maybe a dumb question, but why is the number of games so different from the number of games listed in the goko stats (when you click on avatars in the lobby)? Does the latter include casual or adventures or are some pro games excluded from these rankings?

The latter includes casual and adventures, plus games against bots.

markusin · « **Reply #91 on:** July 28, 2013, 06:08:00 pm »

Quote from: Kirian on July 28, 2013, 05:56:11 pm

Quote from: HiveMindEmulator on July 28, 2013, 05:47:38 pm
Maybe a dumb question, but why is the number of games so different from the number of games listed in the goko stats (when you click on avatars in the lobby)? Does the latter include casual or adventures or are some pro games excluded from these rankings?

The latter includes casual and adventures, plus games against bots.

Yes it even includes adventure games and bot games. At first I though that was kind of silly, but then I figured that every game adds to a player's experience, whether the opponent was a bot or not. It would be more deceptive to see someone who played 100+ bot games appeared as someone with less than 10 games played on their profile.

ragingduckd · « **Reply #92 on:** July 28, 2013, 06:12:40 pm »

Quote from: soulnet on July 28, 2013, 05:33:56 pm

Quote from: ragingduckd on July 28, 2013, 05:08:33 pm
Yes, I have no problems in FF. There's no client-side code anymore, so it shouldn't be taking any CPU. It's an unusually large page though (~2MB), which isn't a problem in itself, but I suppose it's possible that some FF Add-on is causing trouble, if it processes pages in a particularly CPU-intensive way. I have an appreciable delay when opening up Firebug, for example. Even if that's the problem though, the real solution would be for me to paginate the leaderboard so it's not such a massive page.

Unless is causing trouble to the server, please don't do that, or at least keep the option to get the entire table. Being able to ctrl+f through it and browse up and down to look who you know or just played is close is great

I won't paginate unless I can find a graceful solution. Sorting a 8k row table in JS isn't very graceful though. I'll keep the option to display the whole table regardless. No server load problems so far.

HiveMindEmulator · « **Reply #93 on:** July 28, 2013, 06:32:21 pm »

Quote from: markusin on July 28, 2013, 06:08:00 pm

Quote from: Kirian on July 28, 2013, 05:56:11 pm
Quote from: HiveMindEmulator on July 28, 2013, 05:47:38 pm
Maybe a dumb question, but why is the number of games so different from the number of games listed in the goko stats (when you click on avatars in the lobby)? Does the latter include casual or adventures or are some pro games excluded from these rankings?

The latter includes casual and adventures, plus games against bots.
Yes it even includes adventure games and bot games. At first I though that was kind of silly, but then I figured that every game adds to a player's experience, whether the opponent was a bot or not. It would be more deceptive to see someone who played 100+ bot games appeared as someone with less than 10 games played on their profile.

Adventure games is a little weird though, since they're not really like real games. But that's not a subject for this thread I guess...

florrat · « **Reply #94 on:** July 28, 2013, 06:33:17 pm »

Quote from: ragingduckd on July 28, 2013, 06:12:40 pm

I won't paginate unless I can find a graceful solution. Sorting a 8k row table in JS isn't very graceful though. I'll keep the option to display the whole table regardless. No server load problems so far.

Maybe exclude all players with less than 10 (or so) played games? And/or maybe exclude all players who haven't played for over a month?

That should make the table significantly smaller, while keeping almost all interesting data there.

GeoLib · « **Reply #95 on:** July 28, 2013, 10:20:25 pm »

Quote from: Awaclus on July 28, 2013, 03:58:37 pm

Quote from: sudgy on July 28, 2013, 03:19:58 pm
Aw, now I got demoted even more than I expected.

Oh well, now my main question is: "DID STEF GO SUPERHUMAN?!?"
Stef has always been superhuman. My main question is: "DID STEF GO SUPERSTEF?!?"

jonts26 · « **Reply #96 on:** July 28, 2013, 10:57:42 pm »

The variance values in the new leaderboard are interesting. There seems to be a pretty hard floor right around 10/3 which is reached at about 600ish games played. Is there some sort of theoretical limit to how low sigma can get based on the model and is 10/3 a good value for that? My intuition doesn't tell me much here.

mail-mi · « **Reply #97 on:** July 28, 2013, 11:32:20 pm »

Quote from: mail-mi on July 27, 2013, 06:06:31 pm

I was about level 1 on iso.

Now I'm 22. I feel so proud.

never mind 10. $:-\$

sudgy · « **Reply #98 on:** July 28, 2013, 11:48:39 pm »

Quote from: mail-mi on July 28, 2013, 11:32:20 pm

Quote from: mail-mi on July 27, 2013, 06:06:31 pm
I was about level 1 on iso.

Now I'm 22. I feel so proud.
never mind 10. $:-\$

Wait, now I'm seven. I was 23 earlier, and on iso I was 13...

eliegel34 · « **Reply #99 on:** July 29, 2013, 02:28:36 am »

Quote from: WanderingWinder on July 28, 2013, 04:06:05 pm

Is anyone else having the problem that this site is almost without fail causing crashes/extraordinarily high lag?

Yes, I am having 1/5 games crash right now.

Edit: My 99th post is reply 99 to this thread; neat

qmech · « **Reply #100 on:** July 29, 2013, 03:29:54 am »

Quote from: eliegel34 on July 29, 2013, 02:28:36 am

Quote from: WanderingWinder on July 28, 2013, 04:06:05 pm
Is anyone else having the problem that this site is almost without fail causing crashes/extraordinarily high lag?

Yes, I am having 1/5 games crash right now.

Edit: My 99th post is reply 99 to this thread; neat

Unfortunately:

This is Andrew's leaderboard thread. However! Whining about Goko is encouraged—any thread over here should do.

yed · « **Reply #101 on:** July 29, 2013, 03:45:28 am »

Quote from: ragingduckd on July 28, 2013, 02:25:35 pm

This is a plausible algorithm, but I don't see it in dougz's code

So this line

Quote

sigma=sqrt(pl.skill[1] ** 2 + GAMMA ** 2)

https://github.com/dougz/trueskill/blob/master/trueskill.py#L345
does something else?

ragingduckd · « **Reply #102 on:** July 29, 2013, 04:43:32 am »

Quote from: yed on July 29, 2013, 03:45:28 am

Quote from: ragingduckd on July 28, 2013, 02:25:35 pm
This is a plausible algorithm, but I don't see it in dougz's code
So this line
Quote
sigma=sqrt(pl.skill[1] ** 2 + GAMMA ** 2)
https://github.com/dougz/trueskill/blob/master/trueskill.py#L345
does something else?

Yeah, that updating is a normal part of the Vanilla TS algorithm. It happens once per game, not once per day. But going back to the original quote:

Quote

(For those interested in the details, I've set β = 25, γ = σ0 / 100 (applied daily), and the draw probability at 5%.)

... I understand your original statement now. I agree that dougz may have meant that he's doing the once-a-day updating in addition to or instead of the once-per-game updating. I think I'll wait until he chimes in to make any such changes though.

yed · « **Reply #103 on:** July 29, 2013, 04:57:01 am »

Quote from: ragingduckd on July 29, 2013, 04:43:32 am

Yeah, that updating is a normal part of the Vanilla TS algorithm. It happens once per game, not once per day. But going back to the original quote:

Ok, my bad. Thx for explaining.

Quote from: ragingduckd on July 29, 2013, 04:43:32 am

... I understand your original statement now. I agree that dougz may have meant that he's doing the once-a-day updating in addition to or instead of the once-per-game updating. I think I'll wait until he chimes in to make any such changes though.

Sure, sounds reasonable.

Fabian · « **Reply #104 on:** July 29, 2013, 08:29:36 am »

How impossible would it be to add a feature to the chrome extension where, instead of displaying the Goko pro rating, you just display the ~~isotropic~~ drunkensailor leaderboard level, or something similar? Now that we have a real rating system, the Goko ratings somehow seem even more like a joke than usual.

Watno · « **Reply #105 on:** July 29, 2013, 09:25:25 am »

How long till Stef gets to level 60?

ragingduckd · « **Reply #106 on:** July 29, 2013, 09:33:47 am »

Quote from: Watno on July 29, 2013, 09:25:25 am

How long till Stef gets to level 60?

Whenever I feel like letting him.

Kirian · « **Reply #107 on:** July 29, 2013, 09:36:23 am »

Quote from: Watno on July 29, 2013, 09:25:25 am

How long till Stef gets to level 60?

Stef doesn't get to level 60. Stef makes level 60 come to him.

SCSN · « **Reply #108 on:** July 29, 2013, 11:11:29 am »

Quote from: Fabian on July 29, 2013, 08:29:36 am

How impossible would it be to add a feature to the chrome extension where, instead of displaying the Goko pro rating, you just display the ~~isotropic~~ drunkensailor leaderboard level, or something similar? Now that we have a real rating system, the Goko ratings somehow seem even more like a joke than usual.

Well, they look pretty similar to me. This TS thing has increased my faith in Goko's leaderboard rather than decreased it.

Fabian · « **Reply #109 on:** July 29, 2013, 11:43:22 am »

Quote from: SheCantSayNo on July 29, 2013, 11:11:29 am

Quote from: Fabian on July 29, 2013, 08:29:36 am
How impossible would it be to add a feature to the chrome extension where, instead of displaying the Goko pro rating, you just display the ~~isotropic~~ drunkensailor leaderboard level, or something similar? Now that we have a real rating system, the Goko ratings somehow seem even more like a joke than usual.

Well, they look pretty similar to me. This TS thing has increased my faith in Goko's leaderboard rather than decreased it.

Well, losing ~5 games in a row won't shoot me from 5th to 30th place on the isotropic leaderboard, that's for sure.

sudgy · « **Reply #110 on:** July 29, 2013, 01:13:10 pm »

Quote from: Kirian on July 29, 2013, 09:36:23 am

Quote from: Watno on July 29, 2013, 09:25:25 am
How long till Stef gets to level 60?

Stef doesn't get to level 60. Stef makes level 60 come to him.

Wait, Stef is Chuck Norris and lives in Soviet Russia?

Powerman · « **Reply #111 on:** July 29, 2013, 04:53:10 pm »

I don't like this. I'm rank 22 in both Casual and Pro on GOKO, but only rank 39 here

Blueswan · « **Reply #112 on:** July 31, 2013, 01:58:10 pm »

I'm level 32 and ranked #159. Amazing. That is WAY higher than expected.

Watno · « **Reply #113 on:** July 31, 2013, 02:03:45 pm »

Quote from: Fabian on July 29, 2013, 11:43:22 am

Quote from: SheCantSayNo on July 29, 2013, 11:11:29 am
Quote from: Fabian on July 29, 2013, 08:29:36 am
How impossible would it be to add a feature to the chrome extension where, instead of displaying the Goko pro rating, you just display the ~~isotropic~~ drunkensailor leaderboard level, or something similar? Now that we have a real rating system, the Goko ratings somehow seem even more like a joke than usual.

Well, they look pretty similar to me. This TS thing has increased my faith in Goko's leaderboard rather than decreased it.

Well, losing ~5 games in a row won't shoot me from 5th to 30th place on the isotropic leaderboard, that's for sure.

Have you actually tested this?

Fabian · « **Reply #114 on:** July 31, 2013, 02:07:08 pm »

Quote from: Watno on July 31, 2013, 02:03:45 pm

Quote from: Fabian on July 29, 2013, 11:43:22 am
Quote from: SheCantSayNo on July 29, 2013, 11:11:29 am
Quote from: Fabian on July 29, 2013, 08:29:36 am
How impossible would it be to add a feature to the chrome extension where, instead of displaying the Goko pro rating, you just display the ~~isotropic~~ drunkensailor leaderboard level, or something similar? Now that we have a real rating system, the Goko ratings somehow seem even more like a joke than usual.

Well, they look pretty similar to me. This TS thing has increased my faith in Goko's leaderboard rather than decreased it.

Well, losing ~5 games in a row won't shoot me from 5th to 30th place on the isotropic leaderboard, that's for sure.
Have you actually tested this?

Let's just say the other night when I posted it, I was #30 on the goko leaderboard and 12th or 13th on the isotropish.

The goko ratings are way way too swingy and hugely depend on your latest results. Of course, the best players tend to have had good results in the recent past too, but, you know.

jsh357 · « **Reply #115 on:** July 31, 2013, 02:08:19 pm »

Quote from: Watno on July 31, 2013, 02:03:45 pm

Quote from: Fabian on July 29, 2013, 11:43:22 am
Quote from: SheCantSayNo on July 29, 2013, 11:11:29 am
Quote from: Fabian on July 29, 2013, 08:29:36 am
How impossible would it be to add a feature to the chrome extension where, instead of displaying the Goko pro rating, you just display the ~~isotropic~~ drunkensailor leaderboard level, or something similar? Now that we have a real rating system, the Goko ratings somehow seem even more like a joke than usual.

Well, they look pretty similar to me. This TS thing has increased my faith in Goko's leaderboard rather than decreased it.

Well, losing ~5 games in a row won't shoot me from 5th to 30th place on the isotropic leaderboard, that's for sure.
Have you actually tested this?

Yesterday I was 50 places higher on AI's leaderboard than on Goko's. Losing games drops me 1 or 2 places at most. Seems a lot less swingy to me.

loppo · « **Reply #116 on:** July 31, 2013, 02:19:38 pm »

Quote from: Fabian on July 31, 2013, 02:07:08 pm

The goko ratings are way way too swingy and hugely depend on your latest results. Of course, the best players tend to have had good results in the recent past too, but, you know.

In all fairness, the goko ratings have been made so swingy to get rid of the cheaters that "quitted" their way up the leaderboard. They were less swingy before, and eventually they will revert this to normal soon.

If i look at the goko leaderboard, if find names in the top spot that eventually belong there, and all of the cheaters are gone. For me thats a good sign: swingy or not, the right peolpe are at the top.

quote from getsatisaction on this very topic:

You are right: the subtraction is too tough now. This is on purpose because we had a situation where some people were cheating their way up the leaderboard by quitting (before quits were penalized) and setting it tough means that you have to actively play -- and play at your rating level -- in order to maintain your rating. It's a little more like "king of the hill". Remember, it's happening to everybody, not just you, so it's fair. But now that the overrated quit-cheaters have had time to drop, it's getting time to revert these setting to be more moderate.

sudgy · « **Reply #117 on:** July 31, 2013, 02:54:44 pm »

Alright, now my level is more how it used to be. And up a couple levels too, yay!

Polk5440 · « **Reply #118 on:** July 31, 2013, 03:40:23 pm »

Quote from: Powerman on July 29, 2013, 04:53:10 pm

I don't like this. I'm rank 22 in both Casual and Pro on GOKO, but only rank 39 here

My difference is even greater. I am #22 on Goko Pro and #63 on Isotropish.

I tend to move around a lot on both leaderboards because I play inconsistently (both in frequency of playing pro games and quality of play -- TV can be so distracting sometimes). For instance, I've moved about 150 places on Goko's leaderboard after coming back from vacation and a week of playing casual games with friends. But in June, I was in the top 50 before dropping out of the top 100 because of poor play.

I am motivated to keep my pro rating above now 6000 because then I can legitimately title those games "Polk, 5440+".

Stealth Tomato · « **Reply #119 on:** August 01, 2013, 03:03:42 pm »

Judging between this, the Goko leaderboard, and the leaderboard from back in the Isotropic days, I've been about the 20th-best player on the internet for more than a year.

This despite about half of the 19 above me having changed in that time.

ragingduckd · « **Reply #120 on:** August 02, 2013, 03:39:35 pm »

From dougz in response to my questions (above):

Quote

No, I start with μ=25, σ=25/3 like everyone else. But since I display "μ ± 3σ" on the leaderboard it shows up as 25 ± 25.

I may not have ever actually had the clamping on the upper end.

Choice of 5%: pulled out of thin air.

Yes, there was a small daily increase in σ. (Moved 1% of the way back to 25/3, I think.) I didn't want people to be able to camp out on the leaderboard by getting to a good position and then not playing.

So the current parameters aren't quite Isotropic's. The "Isotropish" board is tougher on new players and less tough on inactive players. I'm not sure what influence the different draw percentage has.

Edit: The current parameters probably also lead to a greater overall spread in ratings, so top ratings are higher and bottom ratings are lower. So until/unless I change this, don't try to make much of direct comparisons between Isotropic and Isotropish.

ragingduckd · « **Reply #121 on:** August 02, 2013, 06:32:24 pm »

Now loads partial leaderboard by default. Enabled column sorting on the server side. Both of these should help with performance/crash issues.

blueblimp · « **Reply #122 on:** August 03, 2013, 08:36:04 pm »

I suspect Goko's algorithm is more favourable to playing lower-level players than this leaderboard. I label my games 4000++ so I play a lot of players in the 4000-5000 range and tend to win a lot as a result. On the Goko leaderboard, this tends to keep me around the top 10 (right now at #4 due to a lucky streak) whereas on the isotropic-style leaderboard I'm sitting around #16.

Also there must be a lot less competition on Goko compared to isotropic, because there I would hang around the top 50-ish.

Awaclus · « **Reply #123 on:** August 03, 2013, 10:07:44 pm »

Quote from: blueblimp on August 03, 2013, 08:36:04 pm

Also there must be a lot less competition on Goko compared to isotropic, because there I would hang around the top 50-ish.

I think there's probably more competition. It's just that the people competing on Goko aren't as competitive as the ones on Isotropic were.

Titandrake · « **Reply #124 on:** August 03, 2013, 10:26:55 pm »

Quote from: Awaclus on August 03, 2013, 10:07:44 pm

Quote from: blueblimp on August 03, 2013, 08:36:04 pm
Also there must be a lot less competition on Goko compared to isotropic, because there I would hang around the top 50-ish.
I think there's probably more competition. It's just that the people competing on Goko aren't as competitive as the ones on Isotropic were.

If you define competition as the number of people around your skill level, then Goko is less competitive at mid-high levels. The very top is probably around the same as Iso.

nomnomnom · « **Reply #125 on:** August 04, 2013, 03:10:54 pm »

Quote from: ragingduckd on August 02, 2013, 06:32:24 pm

Now loads partial leaderboard by default. Enabled column sorting on the server side. Both of these should help with performance/crash issues.

Apparently, the leaderboard hasn't been updated since that change.

ragingduckd · « **Reply #126 on:** August 05, 2013, 03:40:21 am »

Quote from: nomnomnom on August 04, 2013, 03:10:54 pm

Quote from: ragingduckd on August 02, 2013, 06:32:24 pm
Now loads partial leaderboard by default. Enabled column sorting on the server side. Both of these should help with performance/crash issues.
Apparently, the leaderboard hasn't been updated since that change.

Whoops. I thought I'd squashed that bug. Fixed now. Thanks.

vintermann · « **Reply #127 on:** August 05, 2013, 07:40:05 am »

My write-in suggestion (I'm not very invested in this so don't take it too seriously): Remi Coulom's Whole History Rating.

It's similar to TrueSkill, but instead of incremental updates to a skill and variance parameter, it calculates "the exact maximum a posteriori over the whole rating history of all players". It's more computing intensive than TrueSkill, but still fairly fast.

GeoLib · « **Reply #128 on:** August 08, 2013, 01:54:45 am »

My number of games doesn't seem to be going up. I only started keeping track quantitatively today, but I don't think it's moved in a while. It's certainly been stuck on 59 all day today (despite me having played 5ish games). My rating on the other hand, has changed.

ragingduckd · « **Reply #129 on:** August 08, 2013, 02:18:46 am »

Quote from: GeoLib on August 08, 2013, 01:54:45 am

My number of games doesn't seem to be going up. I only started keeping track quantitatively today, but I don't think it's moved in a while. It's certainly been stuck on 59 all day today (despite me having played 5ish games). My rating on the other hand, has changed.

Looks like the game count isn't updating properly... you have ~~127~~ 80* rated games, not 59.

* 80 games of the sort that the leaderboard counts (two players, pro, not guest)

GeoLib · « **Reply #130 on:** August 08, 2013, 03:20:01 pm »

OK. It says 80 now. Not sure when I played 47 non-pro games though... Would that number include adventures (the games that aren't counted)?

ragingduckd · « **Reply #131 on:** August 08, 2013, 03:36:38 pm »

Quote from: GeoLib on August 08, 2013, 03:20:01 pm

OK. It says 80 now. Not sure when I played 47 non-pro games though... Would that number include adventures (the games that aren't counted)?

Here's what I have for you in my database:

rating | count -----------+------- unknown | 14 adventure | 27 casual | 4 pro | 83

"unknown" is games before 5/12/2013, which aren't included in the ratings or counted on the leaderboard. That leaves 3 pro games unaccounted for. One was excluded b/c it had 3+ players, but I'm not sure about the other two. Probably they were games where you beat 6500+ players.

GeoLib · « **Reply #132 on:** August 08, 2013, 04:15:23 pm »

Haha. Probably not. Until recently I couldn't play any because of the rating caps. My leaderboard ranking is changing, but the number of games isn't again (it's stuck at 80). I just played at least 3 more games that should be counted (2 against bama and one against faust).

GeoLib · « **Reply #133 on:** August 08, 2013, 05:59:51 pm »

And now it's up to 84, which seems right. So I guess there's just some delay for updating number of games played but there isn't for rating shifts. That seems weird to me, but whatever.

yed · « **Reply #134 on:** August 10, 2013, 07:53:15 am »

Quote

Last recorded game finished 293 min, 41 seconds ago

Something is wrong...

ragingduckd · « **Reply #135 on:** August 10, 2013, 08:20:42 am »

Quote from: yed on August 10, 2013, 07:53:15 am

Quote
Last recorded game finished 293 min, 41 seconds ago
Something is wrong...

Thanks. Fixed the bug. It wasn't creating the new day's log directory at midnight. Updating now.

sudgy · « **Reply #136 on:** August 10, 2013, 01:59:20 pm »

So, has my skill increased since the good ol' days on iso, or is that the "ish" of isotropish?

ragingduckd · « **Reply #137 on:** August 10, 2013, 04:42:21 pm »

Quote from: sudgy on August 10, 2013, 01:59:20 pm

So, has my skill increased since the good ol' days on iso, or is that the "ish" of isotropish?

It's hard to say. The starting variance is 3x higher and it doesn't include the daily increase in variance. So it's not as sensitive to how often you play and it's quicker to adjust at first and slower to adjust afterwards. There's probably more value in comparing how you stack up relative to others than comparing the levels directly.

None of this was really intentional, btw. I set it up some days before dougz told me his actual implementation details and then I was hesitatant to keep switching the leaderboard around. Maybe I should just switch to Isotropic's exact algorithm and be done with it.

yed · « **Reply #138 on:** August 10, 2013, 05:27:28 pm »

Quote from: ragingduckd on August 10, 2013, 04:42:21 pm

Maybe I should just switch to Isotropic's exact algorithm and be done with it.

I vote for this.

Watno · « **Reply #139 on:** August 12, 2013, 12:44:27 pm »

I did what I never managed on Iso: Level 40! Yay.

Slyfox · « **Reply #140 on:** August 12, 2013, 11:12:52 pm »

Quote from: ragingduckd on August 10, 2013, 04:42:21 pm

Maybe I should just switch to Isotropic's exact algorithm and be done with it.

Yes, please.

ragingduckd · « **Reply #141 on:** August 14, 2013, 05:18:41 am »

Quote from: ragingduckd on August 10, 2013, 04:42:21 pm

Quote from: sudgy on August 10, 2013, 01:59:20 pm
So, has my skill increased since the good ol' days on iso, or is that the "ish" of isotropish?

It's hard to say. The starting variance is 3x higher and it doesn't include the daily increase in variance. So it's not as sensitive to how often you play and it's quicker to adjust at first and slower to adjust afterwards. There's probably more value in comparing how you stack up relative to others than comparing the levels directly.

None of this was really intentional, btw. I set it up some days before dougz told me his actual implementation details and then I was hesitatant to keep switching the leaderboard around. Maybe I should just switch to Isotropic's exact algorithm and be done with it.

Ok, it seems best to me too. So if anyone has a strong objection, then now's the time to say so. Otherwise I'm definitely going to implement the Isotropic algorithm just as soon as I get around to it.

Ratsia · « **Reply #142 on:** August 21, 2013, 04:32:21 am »

The earlier observations seem to suggest that the Goko ranking is fairly reasonable (that is, close to the trueskill leaderboard) near the top of the ranking. Has anyone looked at how they compare for the lower ranks? Or is it too hard to retrieve a full Goko ranking?

I'm asking because I noticed by chance that the bots seem to rank higher in the Trueskill list.

Bottington has now level 14 in Trueskill and rating 3576 in Goko (and three other bots are, btw, between levels 12-13 so their levels are quite stable). I checked three human players who also have level 14 in Trueskill (and enough games, at least 500 for each) and each of those had rating around 4500 in Goko, full 1000 points higher. The same holds for ranks; Bottington is ranked 886 in Trueskill and around 1500 in Goko, whereas the humans were around 900 in Trueskill and around 700 in Goko. The opposite check shows similar signs. Three arbitrary managers ranked near Bottington in Goko had levels 4, 8 and 11 in Isotropic. These are again lower than Bottington's rank in Trueskill.

Overall, it looks like the bots have at least some hundreds of points too low rating in Goko or, equivalently, perhaps 5-10 levels too high rating in Trueskill. If we believe Trueskill to be a better estimate, then this implies that pushing one's rating up in Goko by beating the bots is actually a bit harder than it should be. In fact, this also matches my own experience. I've played primarily against bots (because I very often need to pause during playing) and my rank in Trueskill is 156 compared to 673 in Goko.

Disclaimer: The above observation might simply be because of very small and potentially biased sample.

SCSN · « **Reply #143 on:** August 21, 2013, 05:44:53 am »

I think the bots just haven't played enough games for their rating to converge.

ragingduckd · « **Reply #144 on:** August 21, 2013, 08:28:09 am »

Quote from: Ratsia l. k=topic=8900.msg283354#msg283354 date=1377073941

Has anyone looked at how they compare for the lower ranks? Or is it too hard to retrieve a full Goko ranking?

It's a nuisance. I don't have the data anyway.

Quote

I'm asking because I noticed by chance that the bots seem to rank higher in the Trueskill list.

TS doesn't count games with guests or 3+ players. It's hard to know what effect those have on the Bot ratings.

Stealth Tomato · « **Reply #145 on:** August 21, 2013, 01:04:21 pm »

How difficult is it to create a system that includes 3P and 4P the way Isotropic did? (1st records a win over all other players; everyone else records a loss against 1st; I forget whether 3rd vs. 2nd was a draw or a non-game.)

Edit: I accidentally hit the Edit button instead of reply. Sorry. --ragingduckd

ragingduckd · « **Reply #146 on:** August 21, 2013, 01:30:19 pm »

Quote from: Stealth Tomato on August 21, 2013, 01:04:21 pm

How difficult is it to create a system that includes 3P and 4P the way Isotropic did? (1st records a win over all other players; everyone else records a loss against 1st; I forget whether 3rd vs. 2nd was a draw or a non-game.)

Trueskill is actually well designed for this. I've excluded 3+ player games because I feel that they represent a different skill set.

That said, anyone who's interested in this is certainly welcome to use my code and database to implement it. It's all open source, after all.

sudgy · « **Reply #147 on:** August 21, 2013, 10:32:32 pm »

Quote from: Stealth Tomato on August 21, 2013, 01:04:21 pm

Edit: I accidentally hit the Edit button instead of reply. Sorry. --ragingduckd

Oh, I've done this...

Ratsia · « **Reply #148 on:** August 22, 2013, 02:47:32 am »

Quote from: SheCantSayNo on August 21, 2013, 05:44:53 am

I think the bots just haven't played enough games for their rating to converge.

I guess this was a joke, but you are kind of right anyway. Today Lord Bottington has only level 9 and is down 300 positions, while Conqueror has reached level 16. Despite 5000+ games for some of the bots their levels are fluctuating heavily, and I probably simply happened to look at the ratings when Bottington was unusually high.

Anyway, high fluctuation in the mean skill is interesting in itself. Trueskill often has the tendency of the uncertainty to drop artificially low, which also prevents large changes in skill, but perhaps the bots truly vary so much in actual skill that their rating never stabilizes? I guess that would be caused by their AI being so much worse on some kingdoms than it is on others.

yed · « **Reply #149 on:** August 22, 2013, 03:35:24 am »

I think bot's rating is affected by series of games with same result. If newbies lose to them a lot they go up, if someone good is climbing to 5000, they go down...

RTT · « **Reply #150 on:** August 22, 2013, 04:38:12 am »

Who really cares about the trueskill from bots.??

Ratsia · « **Reply #151 on:** August 22, 2013, 05:12:16 am »

Quote from: RTT on August 22, 2013, 04:38:12 am

Who really cares about the trueskill from bots.??

I only looked at them for the reason that they might reveal something about the Goko rating system (and possibly about Trueskill in the context of Dominion as well, even though the underlying mechanisms are known), out of academic interest as someone who has (very briefly) worked with designing such rating systems.

Potential under/overestimation of bot skill in Goko is also relevant from a meta game design perspective. While hardcore Dominion players might not care about the internal ranking in Goko, the leaderboard is an integral part of the product Goko is offering and non-negligible part of their customers play against bots frequently enough to be affected by such potential bias in their quest of playing the meta game of leaderboard climbing.

Edit: Also, if the bots would, for some reason, have biased estimates in Trueskill, playing against them could be used to game the system. You probably couldn't reach the very top that way, since they have so low skill, but maybe the average players could get a few extra levels that could matter for a contest amongst a group of friends?

Beyond Awesome · « **Reply #152 on:** August 23, 2013, 06:49:14 pm »

Quote from: Ratsia on August 22, 2013, 05:12:16 am

Quote from: RTT on August 22, 2013, 04:38:12 am
Who really cares about the trueskill from bots.??
I only looked at them for the reason that they might reveal something about the Goko rating system (and possibly about Trueskill in the context of Dominion as well, even though the underlying mechanisms are known), out of academic interest as someone who has (very briefly) worked with designing such rating systems.

Potential under/overestimation of bot skill in Goko is also relevant from a meta game design perspective. While hardcore Dominion players might not care about the internal ranking in Goko, the leaderboard is an integral part of the product Goko is offering and non-negligible part of their customers play against bots frequently enough to be affected by such potential bias in their quest of playing the meta game of leaderboard climbing.

Edit: Also, if the bots would, for some reason, have biased estimates in Trueskill, playing against them could be used to game the system. You probably couldn't reach the very top that way, since they have so low skill, but maybe the average players could get a few extra levels that could matter for a contest amongst a group of friends?

I am sure in casual you could reach the top by playing the bots by pre-selecting an engine kingdom they have no chance of winning

Awaclus · « **Reply #153 on:** August 23, 2013, 06:50:34 pm »

Quote from: Beyond Awesome on August 23, 2013, 06:49:14 pm

Quote from: Ratsia on August 22, 2013, 05:12:16 am
Quote from: RTT on August 22, 2013, 04:38:12 am
Who really cares about the trueskill from bots.??
I only looked at them for the reason that they might reveal something about the Goko rating system (and possibly about Trueskill in the context of Dominion as well, even though the underlying mechanisms are known), out of academic interest as someone who has (very briefly) worked with designing such rating systems.

Potential under/overestimation of bot skill in Goko is also relevant from a meta game design perspective. While hardcore Dominion players might not care about the internal ranking in Goko, the leaderboard is an integral part of the product Goko is offering and non-negligible part of their customers play against bots frequently enough to be affected by such potential bias in their quest of playing the meta game of leaderboard climbing.

Edit: Also, if the bots would, for some reason, have biased estimates in Trueskill, playing against them could be used to game the system. You probably couldn't reach the very top that way, since they have so low skill, but maybe the average players could get a few extra levels that could matter for a contest amongst a group of friends?

I am sure in casual you could reach the top by playing the bots by pre-selecting an engine kingdom they have no chance of winning

Rats and Ambassador.

GeoLib · « **Reply #154 on:** August 30, 2013, 11:37:41 pm »

Just got this error message when trying to go to the site:

Unhandled exception (probably a programming error). Please report in the forums if you can reproduce it.

Watno · « **Reply #155 on:** August 31, 2013, 04:03:45 pm »

I just got the same message.

SCSN · « **Reply #156 on:** September 04, 2013, 08:43:12 am »

Looks like I've joined the ranks of the bots:

ragingduckd · « **Reply #157 on:** September 04, 2013, 08:47:01 am »

Quote from: SheCantSayNo on September 04, 2013, 08:43:12 am

Looks like I've joined the ranks of the bots:

Well, maybe Villager Bot will be as strong as you are once he's played a couple hundred more games...

Mic Qsenoch · « **Reply #158 on:** September 08, 2013, 02:16:59 pm »

I'm not sure if this is a bug or not, but the uncertainty value should decrease as your number of recent games increases right? Because just played ~10 games and at the start of the day my 3sigma was 10.18 (I think) and now it's 10.23. I could be wrong on the exact numbers, but I know it increased as I played games.

SCSN · « **Reply #159 on:** September 08, 2013, 02:24:43 pm »

If your recent results have been anomalous relative to your previous ones (either winning or losing more often than expected), your sigma should increase.

Mic Qsenoch · « **Reply #160 on:** September 08, 2013, 02:33:34 pm »

Quote from: SheCantSayNo on September 08, 2013, 02:24:43 pm

If your recent results have been anomalous relative to your previous ones (either winning or losing more often than expected), your sigma should increase.

Yes, I thought about that, I just didn't know if the Trueskill algorithm actually worked that way. Because on iso your uncertainty would always decrease with the number of recent games, I never noticed any dependency on the outcomes.

WanderingWinder · « **Reply #161 on:** September 08, 2013, 03:56:40 pm »

Quote from: Mic Qsenoch on September 08, 2013, 02:33:34 pm

Quote from: SheCantSayNo on September 08, 2013, 02:24:43 pm
If your recent results have been anomalous relative to your previous ones (either winning or losing more often than expected), your sigma should increase.

Yes, I thought about that, I just didn't know if the Trueskill algorithm actually worked that way. Because on iso your uncertainty would always decrease with the number of recent games, I never noticed any dependency on the outcomes.

I am pretty sure TrueSkill does work this way. The reason you didn't see it on iso was because it had the upward time increment, which prevented you from ever (or at least, almost ever) getting to the point where the general downward trend from more games -> more certainty to be thus counterbalanced.

I am sure this isn't a bug, though whether it's more accurate or not is less clear.

soulnet · « **Reply #162 on:** September 22, 2013, 03:39:23 pm »

Feature request: Would it be easy to include goko's official rating scores in the table? I would like to see a comparison of both.

Mic Qsenoch · « **Reply #163 on:** September 22, 2013, 04:00:14 pm »

Quote from: soulnet on September 22, 2013, 03:39:23 pm

Feature request: Would it be easy to include goko's official rating scores in the table? I would like to see a comparison of both.

AI has said that it's a real pain to scrape the entire Goko leaderboard.

soulnet · « **Reply #164 on:** September 22, 2013, 06:05:22 pm »

Quote from: Mic Qsenoch on September 22, 2013, 04:00:14 pm

AI has said that it's a real pain to scrape the entire Goko leaderboard.

Ok, nevermind then. Thanks for the quick response.

GeoLib · « **Reply #165 on:** September 22, 2013, 08:01:40 pm »

I realize this might not be relevant for this particular leaderboard, as AI has indicated that he just wants to implement the isotropic formula and be down with it, but perhaps this will be useful for those who expressed an interest in creating a better rating system for Dominion (like WW).

I was thinking that perhaps it would make sense to implement something that takes into account first player advantage. Like the expectation for a win by first player is increased. Since turn order is random the effect of on rating would probably be minimal, but it still seems like something that makes sense to include and could be one solution to the oft-repeated question of how to mitigate the P1 edge. Additionally, if this algorithm were interested in multi-player games, it could increase the uncertainty on the outcome with more players (and decrease the expectation that p4 wins). Anyway, just a thought I had. I wouldn't be surprised if someone had already considered it, but I thought I'd offer it here.

ragingduckd · « **Reply #166 on:** September 22, 2013, 08:05:27 pm »

Quote from: GeoLib on September 22, 2013, 08:01:40 pm

... as AI has indicated that he just wants to implement the isotropic formula and be down with it, ...

Oh yeah...

ragingduckd · « **Reply #167 on:** October 07, 2013, 07:17:30 am »

Quote from: ragingduckd on September 22, 2013, 08:05:27 pm

Quote from: GeoLib on September 22, 2013, 08:01:40 pm
... as AI has indicated that he just wants to implement the isotropic formula and be down with it, ...

Oh yeah...

Sorry I haven't gotten around to this. My development schedule is as erratic as my personal schedule.

I just banned ottocar from the Isotropish leaderboard for cheating way too blatantly. No public witch hunts please, but if anyone else out there is cheating in some equally obvious way, feel free to PM me about it.

Kirian · « **Reply #168 on:** October 09, 2013, 12:24:43 pm »

Quote from: ragingduckd on October 07, 2013, 07:17:30 am

Quote from: ragingduckd on September 22, 2013, 08:05:27 pm
Quote from: GeoLib on September 22, 2013, 08:01:40 pm
... as AI has indicated that he just wants to implement the isotropic formula and be down with it, ...

Oh yeah...

Sorry I haven't gotten around to this. My development schedule is as erratic as my personal schedule.

I just banned ottocar from the Isotropish leaderboard for cheating way too blatantly. No public witch hunts please, but if anyone else out there is cheating in some equally obvious way, feel free to PM me about it.

I need to play against this 11111 guy, he seems to suck.

soulnet · « **Reply #169 on:** October 09, 2013, 01:29:30 pm »

Quote from: Kirian on October 09, 2013, 12:24:43 pm

I need to play against this 11111 guy, he seems to suck.

There is a great thing to do with someone that sucks that is not playing Dominion.

TheMirrorMan · « **Reply #170 on:** October 21, 2013, 07:49:01 am »

Right behind you as second human player in the list SCSN ! I'm not an addict, it's cool ...

Qvist · « **Reply #171 on:** October 26, 2013, 03:45:43 am »

Big thanks AI for doing this. This is great.

Also:

Quote from: Watno on August 12, 2013, 12:44:27 pm

I did what I never managed on Iso: Level 40! Yay.

Ditto. Yeah. Now I only need to play again to appear on the full leaderboard.

yed · « **Reply #172 on:** November 15, 2013, 04:52:25 am »

Quote from: kylar on November 15, 2013, 12:33:25 am

Hi,

I have just seen a new way to lose points while supposedly winning

http://dom.retrobox.eu/?/20131114/log.50d804c7e4b0b34255505855.1384490982851.txt

This game ended on my opponents turn 12. I assumed that while he was ahead, he saw that I was most likely going to win and ended it. If you look at the bottom of the log there are two game over boxes, one with my opponent winning and one with me winning. I noticed something was wrong when I went to the leaderboard post game and saw that while there was a +21 next to my name, I had lost about 20 points. I would have lost about 40 if he had won because of rating differences. Its like we played two games in that one and split them.

Has anyone seen this before or is it new?

AI how do you parse this log?

ragingduckd · « **Reply #173 on:** November 15, 2013, 05:14:03 am »

Quote from: yed on November 15, 2013, 04:52:25 am

Quote from: kylar on November 15, 2013, 12:33:25 am
Hi,

I have just seen a new way to lose points while supposedly winning

http://dom.retrobox.eu/?/20131114/log.50d804c7e4b0b34255505855.1384490982851.txt

This game ended on my opponents turn 12. I assumed that while he was ahead, he saw that I was most likely going to win and ended it. If you look at the bottom of the log there are two game over boxes, one with my opponent winning and one with me winning. I noticed something was wrong when I went to the leaderboard post game and saw that while there was a +21 next to my name, I had lost about 20 points. I would have lost about 40 if he had won because of rating differences. Its like we played two games in that one and split them.

Has anyone seen this before or is it new?

AI how do you parse this log?

My parser reads the final number of VPs from the log and counts how many turns each player took. It then draws its own conclusions about who won.

It checks those conclusions against what Goko prints at the bottom of the log, but it ignores any discrepancy in the known cases where Goko screws up the rankings and/or number of turns. See the conditions at the end of the parser's code.

This is one of those known situations, so the parser ignores the ranks listed in the log (both versions) and records its own conclusions. From the log search engine:

Code: [Select]

2013/11/14 20:49
Player:	Rank:	VPs:	Turns:	Quit?
kylar	        2nd	5	12	
untiedshoe	1st	12	12	
Provinces / Estates - Caravan, Copper, Coppersmith, Curse, Duchy, Estate, Forager, Fortress, Gold, JackOfAllTrades, Mine, Nobles, Potion, Province, Silver, Steward, Talisman, Vineyard
Log Viewer Kingdom

yed · « **Reply #174 on:** November 15, 2013, 05:50:15 am »

Quote from: ragingduckd on November 15, 2013, 05:14:03 am

See the conditions at the end of the parser's code.

Wow, it is possible to screw it this many ways...

Qvist · « **Reply #175 on:** November 16, 2013, 07:17:37 pm »

I think the leaderboard stopped updating. If you have time, it would be great if you have a look at it.

"Last recorded game finished 132 min, 43 seconds ago"

ragingduckd · « **Reply #176 on:** November 16, 2013, 09:36:45 pm »

Quote from: Qvist on November 16, 2013, 07:17:37 pm

I think the leaderboard stopped updating. If you have time, it would be great if you have a look at it.

"Last recorded game finished 132 min, 43 seconds ago"

I restarted it. Dunno what went wrong.

Kirian · « **Reply #177 on:** November 16, 2013, 09:39:14 pm »

A crazy question about the calculations: is there something intrinsic to the system that creates a floor for sigma? There's literally no one--no matter how many games played--with a sigma under 9.92. What's up with that?

ragingduckd · « **Reply #178 on:** November 16, 2013, 10:01:46 pm »

Quote from: Kirian on November 16, 2013, 09:39:14 pm

A crazy question about the calculations: is there something intrinsic to the system that creates a floor for sigma? There's literally no one--no matter how many games played--with a sigma under 9.92. What's up with that?

Honestly, I have no idea. It seems suspicious to me too.

The code is here if anyone wants to look at it. Basically I'm just outsourcing the job to this TrueSkill package.

flies · « **Reply #179 on:** November 17, 2013, 09:07:20 am »

Quote from: Kirian on November 16, 2013, 09:39:14 pm

A crazy question about the calculations: is there something intrinsic to the system that creates a floor for sigma? There's literally no one--no matter how many games played--with a sigma under 9.92. What's up with that?

This could be a natural property of the game. I mean sometimes you do everything possible and still lose.

ragingduckd · « **Reply #180 on:** November 18, 2013, 10:52:04 pm »

Quote from: flies on November 17, 2013, 09:07:20 am

Quote from: Kirian on November 16, 2013, 09:39:14 pm
A crazy question about the calculations: is there something intrinsic to the system that creates a floor for sigma? There's literally no one--no matter how many games played--with a sigma under 9.92. What's up with that?
This could be a natural property of the game. I mean sometimes you do everything possible and still lose.

Possibly, but the isotropic leaderboard didn't look quite like that.

Warfreak2 · « **Reply #181 on:** November 19, 2013, 07:11:14 am »

Maybe there's a lower signal-to-noise ratio on Goko, with more games being lost because of bad connections, abandoned due to slowplaying, &c.?

2.71828..... · « **Reply #182 on:** November 19, 2013, 07:55:09 am »

Quote from: ragingduckd on November 18, 2013, 10:52:04 pm

Quote from: flies on November 17, 2013, 09:07:20 am
Quote from: Kirian on November 16, 2013, 09:39:14 pm
A crazy question about the calculations: is there something intrinsic to the system that creates a floor for sigma? There's literally no one--no matter how many games played--with a sigma under 9.92. What's up with that?
This could be a natural property of the game. I mean sometimes you do everything possible and still lose.

Possibly, but the isotropic leaderboard didn't look quite like that.

Could it possibly have something to do with the games played that count toward your rank? The isotropic leaderboard includes all eligible games, so lespeutere, for example, has over 10,000 games counting toward his ranking. Meanwhile since the isotropish includes only games played in the last month, lespeutere has just under 3000 games counting toward the isotropish leaderboard. I haven't followed all of the implementation talk and such, so this is just a bystander's casual look at the rankings, but that seems to be the biggest difference in the two leaderboards.

Mic Qsenoch · « **Reply #183 on:** November 19, 2013, 10:27:59 am »

Quote from: 2.71828..... on November 19, 2013, 07:55:09 am

Could it possibly have something to do with the games played that count toward your rank? The isotropic leaderboard includes all eligible games, so lespeutere, for example, has over 10,000 games counting toward his ranking. Meanwhile since the isotropish includes only games played in the last month, lespeutere has just under 3000 games counting toward the isotropish leaderboard. I haven't followed all of the implementation talk and such, so this is just a bystander's casual look at the rankings, but that seems to be the biggest difference in the two leaderboards.

I think the isotropish leaderboard includes all eligible games in its ranking, it just doesn't display your name on the leaderboard if you haven't played a game in the last month.

yed · « **Reply #184 on:** November 20, 2013, 07:18:31 pm »

AI: please restart it again: Last recorded game finished 40 min, 21 seconds ago

ragingduckd · « **Reply #185 on:** November 20, 2013, 09:16:22 pm »

Quote from: yed on November 20, 2013, 07:18:31 pm

AI: please restart it again: Last recorded game finished 40 min, 21 seconds ago

Ok, and I've turned on massive debugging too. Until now I haven't even been able to find an error in the log. The loop seems to just stop silently.

soulnet · « **Reply #186 on:** December 03, 2013, 08:33:16 pm »

Is it possible that log gathering is not working? I have two games played about 10 minutes ago that do not appear in the search and the leaderboard was updated more than 30 minutes ago according to its own displayed info (every time I remember before was less than a minute ago).

ragingduckd · « **Reply #187 on:** December 05, 2013, 10:35:14 pm »

Quote from: soulnet on December 03, 2013, 08:33:16 pm

Is it possible that log gathering is not working? I have two games played about 10 minutes ago that do not appear in the search and the leaderboard was updated more than 30 minutes ago according to its own displayed info (every time I remember before was less than a minute ago).

Yes... it shut off again. I restarted it.

Even with my in-depth logging, I can't figure out why this is happening. I don't mind restarting it whenever people notice it's busted tho.

Qvist · « **Reply #188 on:** December 06, 2013, 06:19:43 pm »

Sorry to bother you AI. But there is something wrong for sure. The # of games isn't updating anymore.
I don't know if this is only a display bug or if the rating is also calculated wrong, but the # games doesn't rise anymore.

ragingduckd · « **Reply #189 on:** December 06, 2013, 10:07:43 pm »

Quote from: Qvist on December 06, 2013, 06:19:43 pm

Sorry to bother you AI. But there is something wrong for sure. The # of games isn't updating anymore.
I don't know if this is only a display bug or if the rating is also calculated wrong, but the # games doesn't rise anymore.

Restarted it again.

This is probably the same bug that's stopped the system the last several times. I've looked for it to no avail and I don't expect to have time for a serious bug hunt any time soon.

Just PM me if/when this happens again, rather than posting here. No need to clutter up the thread.

rspeer · « **Reply #190 on:** March 31, 2014, 10:52:57 pm »

Quote from: 2.71828..... on November 19, 2013, 07:55:09 am

Quote from: ragingduckd on November 18, 2013, 10:52:04 pm
Quote from: flies on November 17, 2013, 09:07:20 am
Quote from: Kirian on November 16, 2013, 09:39:14 pm
A crazy question about the calculations: is there something intrinsic to the system that creates a floor for sigma? There's literally no one--no matter how many games played--with a sigma under 9.92. What's up with that?
This could be a natural property of the game. I mean sometimes you do everything possible and still lose.

Possibly, but the isotropic leaderboard didn't look quite like that.
Could it possibly have something to do with the games played that count toward your rank? The isotropic leaderboard includes all eligible games, so lespeutere, for example, has over 10,000 games counting toward his ranking. Meanwhile since the isotropish includes only games played in the last month, lespeutere has just under 3000 games counting toward the isotropish leaderboard. I haven't followed all of the implementation talk and such, so this is just a bystander's casual look at the rankings, but that seems to be the biggest difference in the two leaderboards.

It's pretty much a natural property of TrueSkill. There is a point at which you can't decrease your uncertainty any more, because the uncertainty increase per game balances the uncertainty decrease you get by playing.

This happened on Isotropic too. It had two different sources of uncertainty, though, one that was per-game and one that was per-day. The players with games >> days could get their uncertainty down to 6-ish.

Holger · « **Reply #191 on:** April 01, 2014, 04:32:58 pm »

Quote from: rspeer on March 31, 2014, 10:52:57 pm

Quote from: 2.71828..... on November 19, 2013, 07:55:09 am
Quote from: ragingduckd on November 18, 2013, 10:52:04 pm
Quote from: flies on November 17, 2013, 09:07:20 am
Quote from: Kirian on November 16, 2013, 09:39:14 pm
A crazy question about the calculations: is there something intrinsic to the system that creates a floor for sigma? There's literally no one--no matter how many games played--with a sigma under 9.92. What's up with that?
This could be a natural property of the game. I mean sometimes you do everything possible and still lose.

Possibly, but the isotropic leaderboard didn't look quite like that.
Could it possibly have something to do with the games played that count toward your rank? The isotropic leaderboard includes all eligible games, so lespeutere, for example, has over 10,000 games counting toward his ranking. Meanwhile since the isotropish includes only games played in the last month, lespeutere has just under 3000 games counting toward the isotropish leaderboard. I haven't followed all of the implementation talk and such, so this is just a bystander's casual look at the rankings, but that seems to be the biggest difference in the two leaderboards.

It's pretty much a natural property of TrueSkill. There is a point at which you can't decrease your uncertainty any more, because the uncertainty increase per game balances the uncertainty decrease you get by playing.

This happened on Isotropic too. It had two different sources of uncertainty, though, one that was per-game and one that was per-day. The players with games >> days could get their uncertainty down to 6-ish.

But the per-day source only ever increased the uncertainty on Isotropic, according to Doug; so how could people get uncertainties substantially below 9.9 on Isotropic, but not on Isotropish?
(I don't think this can only be due to the number of games played either - e.g. jog now has more games on oko, but a much lower uncertainty (8.6) on Isotropic.)

7string · « **Reply #192 on:** April 03, 2014, 03:35:28 pm »

I noticed that as of today, the Isotropish Leaderboard only displays the top 100 ranked players. It only displays the top 100 now regardless of whether you select partial or full displays. As a newer player, I have enjoyed being able to reference the Isotropish Leaderboard to review my stats as I try to work my way up the leaderboard, but I'm still down around the 400 rank or so, and now I can no longer see my stats or those of my other family and friends who are also not in the top 100. Anybody know if this is a known issue, or if there is someplace else I should report it. I don't really do any coding myself, so do not have a sign-on on Github. Thanks.

ragingduckd · « **Reply #193 on:** April 03, 2014, 03:41:36 pm »

Quote from: 7string on April 03, 2014, 03:35:28 pm

I noticed that as of today, the Isotropish Leaderboard only displays the top 100 ranked players. It only displays the top 100 now regardless of whether you select partial or full displays. As a newer player, I have enjoyed being able to reference the Isotropish Leaderboard to review my stats as I try to work my way up the leaderboard, but I'm still down around the 400 rank or so, and now I can no longer see my stats or those of my other family and friends who are also not in the top 100. Anybody know if this is a known issue, or if there is someplace else I should report it. I don't really do any coding myself, so do not have a sign-on on Github. Thanks.

Whoops. Sorry about that. It's back to normal now.

Thanks for letting me know.

7string · « **Reply #194 on:** April 03, 2014, 04:06:56 pm »

Quote from: ragingduckd on April 03, 2014, 03:41:36 pm

Quote from: 7string on April 03, 2014, 03:35:28 pm
I noticed that as of today, the Isotropish Leaderboard only displays the top 100 ranked players. It only displays the top 100 now regardless of whether you select partial or full displays. As a newer player, I have enjoyed being able to reference the Isotropish Leaderboard to review my stats as I try to work my way up the leaderboard, but I'm still down around the 400 rank or so, and now I can no longer see my stats or those of my other family and friends who are also not in the top 100. Anybody know if this is a known issue, or if there is someplace else I should report it. I don't really do any coding myself, so do not have a sign-on on Github. Thanks.

Whoops. Sorry about that. It's back to normal now.

Thanks for letting me know.

Wow...fast response! Thanks much!!...I rarely got that fast of service even when I was an IT manager. ; )

WanderingWinder · « **Reply #195 on:** April 04, 2014, 07:25:14 am »

Here are a couple of videos from one of the three guys who developed TrueSkill. He talks about some other stuff, too.
(The first link has three videos - second video there is where most of the TS stuff is for that link, though he sets up mathematical principlas in the first part).

http://videolectures.net/acml2013_herbrich_real_time_bayesian_learning/

http://videolectures.net/acml2013_herbrich_technology_transfer/

Anyway, might post some more thoughts on this later, but thought people here might be interested.

Polk5440 · « **Reply #196 on:** April 04, 2014, 07:38:35 am »

Quote from: WanderingWinder on April 04, 2014, 07:25:14 am

Here are a couple of videos from one of the three guys who developed TrueSkill. He talks about some other stuff, too.
(The first link has three videos - second video there is where most of the TS stuff is for that link, though he sets up mathematical principlas in the first part).

http://videolectures.net/acml2013_herbrich_real_time_bayesian_learning/

http://videolectures.net/acml2013_herbrich_technology_transfer/

Anyway, might post some more thoughts on this later, but thought people here might be interested.

Related:

Microsoft Research has a lot of good material on TrueSkill. For instance, here is an overview, a detailed description, a lengthy article, and an academic-style article on the system.

Holger · « **Reply #197 on:** April 04, 2014, 05:16:32 pm »

Quote from: Holger on April 01, 2014, 04:32:58 pm

Quote from: rspeer on March 31, 2014, 10:52:57 pm
Quote from: 2.71828..... on November 19, 2013, 07:55:09 am
Quote from: ragingduckd on November 18, 2013, 10:52:04 pm
Quote from: flies on November 17, 2013, 09:07:20 am
Quote from: Kirian on November 16, 2013, 09:39:14 pm
A crazy question about the calculations: is there something intrinsic to the system that creates a floor for sigma? There's literally no one--no matter how many games played--with a sigma under 9.92. What's up with that?
This could be a natural property of the game. I mean sometimes you do everything possible and still lose.

Possibly, but the isotropic leaderboard didn't look quite like that.
Could it possibly have something to do with the games played that count toward your rank? The isotropic leaderboard includes all eligible games, so lespeutere, for example, has over 10,000 games counting toward his ranking. Meanwhile since the isotropish includes only games played in the last month, lespeutere has just under 3000 games counting toward the isotropish leaderboard. I haven't followed all of the implementation talk and such, so this is just a bystander's casual look at the rankings, but that seems to be the biggest difference in the two leaderboards.

It's pretty much a natural property of TrueSkill. There is a point at which you can't decrease your uncertainty any more, because the uncertainty increase per game balances the uncertainty decrease you get by playing.

This happened on Isotropic too. It had two different sources of uncertainty, though, one that was per-game and one that was per-day. The players with games >> days could get their uncertainty down to 6-ish.

But the per-day source only ever increased the uncertainty on Isotropic, according to Doug; so how could people get uncertainties substantially below 9.9 on Isotropic, but not on Isotropish?
(I don't think this can only be due to the number of games played either - e.g. jog now has more games on Goko, but a much lower uncertainty (8.6) on Isotropic.)

I think I've found the explanation (right in this thread

):

Quote from: qmech on July 28, 2013, 06:28:19 am

Quote from: ragingduckd on July 28, 2013, 06:00:51 am
so I still don't understand what γ = σ0 / 100 (applied daily) means.

In the early days Iso would increase the variance after each game. Since that put a hard floor on how low the variance could go, it was later changed to happen once a day. So "applied daily" means that you only fudge the variance once a day, rather than recalculating the (constant) gamma daily.

So apparently Isotropic only increased the variance per day, not also per game. Therefore its variances varied much more, depending on the number of games played per day. On Isotropish, the increase per game leads to an absolute floor, as we currently observe. Players with >1 game/day usually had lower variances on Isotropic, players with <1 game/day usually have lower variances now.

ragingduckd · « **Reply #198 on:** April 04, 2014, 07:15:33 pm »

Quote from: Holger on April 04, 2014, 05:16:32 pm

I think I've found the explanation (right in this thread ):
Quote from: qmech on July 28, 2013, 06:28:19 am
Quote from: ragingduckd on July 28, 2013, 06:00:51 am
so I still don't understand what γ = σ0 / 100 (applied daily) means.

In the early days Iso would increase the variance after each game. Since that put a hard floor on how low the variance could go, it was later changed to happen once a day. So "applied daily" means that you only fudge the variance once a day, rather than recalculating the (constant) gamma daily.

So apparently Isotropic only increased the variance per day, not also per game. Therefore its variances varied much more, depending on the number of games played per day. On Isotropish, the increase per game leads to an absolute floor, as we currently observe. Players with >1 game/day usually had lower variances on Isotropic, players with <1 game/day usually have lower variances now.

I don't think I understand. Isotropish doesn't increase variance after each game either. I would expect that the more games you play per day under dougz's system, the closer you get to the floor your variance hits on Isotropish. I don't get how your variance could actually get lower.

Holger · « **Reply #199 on:** April 05, 2014, 12:09:43 pm »

Quote from: ragingduckd on April 04, 2014, 07:15:33 pm

I don't think I understand. Isotropish doesn't increase variance after each game either. I would expect that the more games you play per day under dougz's system, the closer you get to the floor your variance hits on Isotropish. I don't get how your variance could actually get lower.

I haven't looked at your code, but you said previously that you just used the vanilla TrueSkill algorithm, which does increase the variance (by GAMMA^2) once for each game:

Quote from: http://research.microsoft.com/en-us/projects/trueskill/faq.aspx

So, what is going on here? Between any two games of a gamer, the TrueSkill ranking system assumes that the true skill of a gamer, that is, μ, can have changed slightly either up or down; this property is what allows the ranking system to adapt to a change in the skill of a gamer. Technically, this is achieved by a small increase in the σ of each participating gamer before the game outcome is incorporated.

Quote from: yed on July 20, 2013, 06:15:43 am

Quote
gamma is a small amount by which a player's uncertainty (sigma) is
increased prior to the start of each game. This allows us to
account for skills that vary over time; the effect of old games
on the estimate will slowly disappear unless reinforced by evidence
from new games.

Quote from: ragingduckd on July 29, 2013, 04:43:32 am

Quote from: yed on July 29, 2013, 03:45:28 am
Quote from: ragingduckd on July 28, 2013, 02:25:35 pm
This is a plausible algorithm, but I don't see it in dougz's code
So this line
Quote
sigma=sqrt(pl.skill[1] ** 2 + GAMMA ** 2)
https://github.com/dougz/trueskill/blob/master/trueskill.py#L345
does something else?

Yeah, that updating is a normal part of the Vanilla TS algorithm. It happens once per game, not once per day. But going back to the original quote:

Quote
(For those interested in the details, I've set β = 25, γ = σ0 / 100 (applied daily), and the draw probability at 5%.)

... I understand your original statement now. I agree that dougz may have meant that he's doing the once-a-day updating in addition to or instead of the once-per-game updating. I think I'll wait until he chimes in to make any such changes though.

(Bold by me.) So you apply GAMMA once per game, while Doug (according to qmech's quote above) ended up applying it only once per day, not per game.

Dominion Strategy Forum

News:

Author Topic: Isotropish Leaderboard (alternative to Goko Pro) (Read 147032 times)