Topic: Beyond cheaters, this is why we have the isotropish leaderboard (Read 11705 times)

jl8e · « **Reply #25 on:** March 29, 2014, 10:33:07 pm »

Quote from: flies on March 29, 2014, 06:56:11 pm

My goko rating will go down aobut 60 points if I lose to a player ~800 pts lower than me. If that happens five times in a row, I've lost 300 points. That volatility is not what I'd want for maximum accuracy.

If you’re losing five times running to players rated significantly below you, your rating is too high, and should be dropping significantly.

Dominion ratings, whatever system is in use, are going to show volatility, because Dominion is a high-variance game. It’s not like chess, where at some point, the higher-ranked player simply is not going to lose. No matter how good someone is, against an average player they’re still going to lose occasionally because of luck. If the skill difference means they win 95% of the time, then in a perfectly-balanced rating system their rating is going to drop significantly when they do lose. Specifically, it’s going to drop by 19 * x, where x is however much they would gain by winning.

popsofctown · « **Reply #26 on:** March 30, 2014, 12:23:59 am »

I dunno man, you give Celestial Chameleon the right kingdom against a weaker player..

flies · « **Reply #27 on:** March 30, 2014, 12:03:35 pm »

Quote from: WanderingWinder on March 29, 2014, 07:21:52 pm

Quote from: flies on March 29, 2014, 06:56:11 pm
Quote
My goko rating will go down aobut 60 points if I lose to a player ~800 pts lower than me. If that happens five times in a row, I've lost 300 points. That volatility is not what I'd want for maximum accuracy.
But 60 points and 800 points and 300 points don't necessarily mean anything. If it was losing .6 points against someone rated 8 below you, would it be a problem? Well, this is the same thing, they are just showing you extra digits.

Right now 300 points would drop me from the 11'th ranked player to #33. This feels wrong. Rankings are better for scale insofar as the scale is not arbitrary per se, but the meaning of a rank difference depends on how many players are ranked. (there are about 300 players above lvl 29 on isotropish, ~500 above 5000 on goko.)

We don't really know how to rank skill of players. We have no truth to compare to. We could, in principle, devise bots with a known "skill" (whereby they'd decide at the outset who'd win/lose based on some function of a scalar "skill" difference) and see what isotropish would do vs goko, and that would help. But we're not going to do that.

I appreciate your skepticism, WW, but I can't shake the feeling that goko's ranking is too volatile. Whatever my skill is, five games out of 1079 shouldn't change my estimation of it that much (how many of those are over the last month I'm not sure, 50?).

Quote

TrueSkill models things as a difference of normal distributions, with the combined normal having an extra paramater for intrinsic variability of the game.

If you think the analysis here, where the odds are given as an exponential function of TS difference, is mistaken I'd be interested to hear why.

WanderingWinder · « **Reply #28 on:** March 30, 2014, 12:31:53 pm »

Quote from: flies on March 30, 2014, 12:03:35 pm

Quote from: WanderingWinder on March 29, 2014, 07:21:52 pm
Quote from: flies on March 29, 2014, 06:56:11 pm
Quote
My goko rating will go down aobut 60 points if I lose to a player ~800 pts lower than me. If that happens five times in a row, I've lost 300 points. That volatility is not what I'd want for maximum accuracy.
But 60 points and 800 points and 300 points don't necessarily mean anything. If it was losing .6 points against someone rated 8 below you, would it be a problem? Well, this is the same thing, they are just showing you extra digits.

Right now 300 points would drop me from the 11'th ranked player to #33. This feels wrong. Rankings are better for scale insofar as the scale is not arbitrary per se, but the meaning of a rank difference depends on how many players are ranked. (there are about 300 players above lvl 29 on isotropish, ~500 above 5000 on goko.)

We don't really know how to rank skill of players. We have no truth to compare to. We could, in principle, devise bots with a known "skill" (whereby they'd decide at the outset who'd win/lose based on some function of a scalar "skill" difference) and see what isotropish would do vs goko, and that would help. But we're not going to do that.

But this isn't true. We DO know a way to rank the skill of players - look at game results. Rating systems can (and generally do) give specific numeric predictions of game outcomes, e.g. Jimmy has a 72% chance of winning against Bob. Then you compare these results against what happens and see what has the best accuracy/least error. There's actually more than one way to measure accuracy, and you can argue their merits, but you can certainly do it by one of them.

As for dropping you 22 spots, well, maybe that's right - seems to me that it thinks that you and these other guys are pretty closely bunched as is, and it wouldn't take much to flip it from "You're very slightly better than them" to "You're very slightly worse". It's certainly plausible.

Quote

I appreciate your skepticism, WW, but I can't shake the feeling that goko's ranking is too volatile. Whatever my skill is, five games out of 1079 shouldn't change my estimation of it that much (how many of those are over the last month I'm not sure, 50?).

I mean, sure, you can feel that way. My main point is that right now, all anyone has to go on either way is just feeling.

Quote

Quote
TrueSkill models things as a difference of normal distributions, with the combined normal having an extra paramater for intrinsic variability of the game.
If you think the analysis here, where the odds are given as an exponential function of TS difference, is mistaken I'd be interested to hear why.

I'm apparently missing something in that link, as what I see there is that skills are measured by Normal (AKA Gaussian) distributions. I don't see the exponential function showing up in what is actually used; I see it given as a comparison to other models (Elo as currently implemented most places, though not actually what Elo himself originally proposed), which have logistic bases. But for sure it says they use Normal distributions there. (See the 9th, 13th, and 14th posts there, as well as the linked paper).

flies · « **Reply #29 on:** March 30, 2014, 04:05:39 pm »

Quote

We DO know a way to rank the skill of players...

now i'm confused. if there is a way to rank the skill of players more accurately than TS or whatever goko does, why don't we use it?

Quote

As for dropping you 22 spots, well, maybe that's right - seems to me that it thinks that you and these other guys are pretty closely bunched as is, and it wouldn't take much to flip it from "You're very slightly better than them" to "You're very slightly worse". It's certainly plausible.

this is reasonable.

Quote

I'm apparently missing something in that link, as what I see there is that skills are measured by Normal (AKA Gaussian) distributions. I don't see the exponential function showing up in what is actually used; I see it given as a comparison to other models (Elo as currently implemented most places, though not actually what Elo himself originally proposed), which have logistic bases. But for sure it says they use Normal distributions there. (See the 9th, 13th, and 14th posts there, as well as the linked paper).

Ok, it's taken me a long time (months) to grok all this, and I haven't got my head entirely wrapped around it, but the exponential odds referred to in the link is certainly not what TS does. I'd be very interested to work out exactly what the win prediction under TS is, but I haven't got the time to work that out at the moment.

WanderingWinder · « **Reply #30 on:** March 30, 2014, 04:30:54 pm »

Quote from: flies on March 30, 2014, 04:05:39 pm

Quote
We DO know a way to rank the skill of players...
now i'm confused. if there is a way to rank the skill of players more accurately than TS or whatever goko does, why don't we use it?

Well, I'm not actually saying we know a way that is better than TS/what goko does/etc. I'm saying that those are methods, and we can measure how good they are. You're never going to find a perfect method, but you can measure different methods to see how good they are, and look for whichever thing is the best. At least, you can do that in principle, though nobody is actually measuring their accuracy right now.

Quote

Quote
As for dropping you 22 spots, well, maybe that's right - seems to me that it thinks that you and these other guys are pretty closely bunched as is, and it wouldn't take much to flip it from "You're very slightly better than them" to "You're very slightly worse". It's certainly plausible.
this is reasonable.

Quote
I'm apparently missing something in that link, as what I see there is that skills are measured by Normal (AKA Gaussian) distributions. I don't see the exponential function showing up in what is actually used; I see it given as a comparison to other models (Elo as currently implemented most places, though not actually what Elo himself originally proposed), which have logistic bases. But for sure it says they use Normal distributions there. (See the 9th, 13th, and 14th posts there, as well as the linked paper).
Ok, it's taken me a long time (months) to grok all this, and I haven't got my head entirely wrapped around it, but the exponential odds referred to in the link is certainly not what TS does. I'd be very interested to work out exactly what the win prediction under TS is, but I haven't got the time to work that out at the moment.

[/quote]

The 'exponential odds thing' is what is used by most every Elo system nowadays, as well as several variants. It's probably the most common nowadays, and it's relatively easy to compute.

TS uses this (simplified to a 2-player game here):
P_A_Wins = Normal_CDF(Mu_A - Mu_B, Sqrt(Sigma_A^2+Sigma_B^2+Sigma_Game^2))

where P_A_Wins is the probability player A wins and Normal_CDF is the cumulative distirbution function of the Normal (or Gaussian) distribution, which you can't compute in closed form. You have to do it numerically, which used to effectively mean we looked it up in a table, though nowadays people do it on computers. Googling, the first thing that plops up is this: http://www.danielsoper.com/statcalc3/calc.aspx?id=53 though you can do it in e.g. Excel if you have that. The parametrization I give above is Normal(X, Standard Deviation), sometimes you see it as Mean, Variance. Uh, what else? Oh, I guess if the thing wants a mean, put it in as 0. And if you're trying to do it off of isotropish ratings, then you'll want to know that the displayed uncertainty is 3*sigma, not just straight sigma. I don't remember exactly what it uses for Sigma_game; 25/6 seems to be what I remember, but uh, that certainly could be wrong.

ragingduckd · « **Reply #31 on:** April 05, 2014, 06:17:25 pm »

Quote from: WanderingWinder on March 30, 2014, 04:30:54 pm

TS uses this (simplified to a 2-player game here):
P_A_Wins = Normal_CDF(Mu_A - Mu_B, Sqrt(Sigma_A^2+Sigma_B^2+Sigma_Game^2))

I think there's a small typo here. The σ²_Game needs a factor of 2, since removing the player uncertainties (σ_A= σ_B=0) should yield the standard Gaussian Elo result: Φ(μ_A - μ_B, 2^1/2β)

Also, if you want a non-zero draw probability, throw in a draw margin ε. For Isotropish, ε≈2.2, which corresponds to a (somewhat inaccurate) draw rate of 5%. More empirically accurate would be ε≈0.78 (1.75%).

WanderingWinder · « **Reply #32 on:** April 05, 2014, 06:49:41 pm »

Quote from: ragingduckd on April 05, 2014, 06:17:25 pm

Quote from: WanderingWinder on March 30, 2014, 04:30:54 pm
TS uses this (simplified to a 2-player game here):
P_A_Wins = Normal_CDF(Mu_A - Mu_B, Sqrt(Sigma_A^2+Sigma_B^2+Sigma_Game^2))

I think there's a small typo here. The σ²_Game needs a factor of 2, since removing the player uncertainties (σ_A= σ_B=0) should yield the standard Gaussian Elo result: Φ(μ_A - μ_B, 2^1/2β)

Also, if you want a non-zero draw probability, throw in a draw margin ε. For Isotropish, ε≈2.2, which corresponds to a (somewhat inaccurate) draw rate of 5%. More empirically accurate would be ε≈0.78 (1.75%).

1st, this was all off the top of my head, so it wasn't entirely precise. Forgot the 2, though it's not the most accurate thing to say that this is the "standard Gaussian Elo result" - all the actually-running Elo systems use a Logistic function, not a Gaussian; to be fair, Elo originally proposed Gaussian distributions, but... he doesn't actually say that this is the right figure; he explains in section 8.23 of his book (which is sitting on the arm of my chair as I type this) how you might get that figure, but eventually doesn't model players' variances separately, arguing that even if they're far different, it ends up not making so much difference. And indeed, in the Elo system, as he proposed it, there is just one variance that gets used, and it's an entirely irrelevant scale factor (well, ok, it makes a difference, but it's a scale factor - basically all it does is make the numbers different, without changing the predictions of the system, similar to a "double it, double the gaps between the ratings" (though not actually quite this simple)).

And actually, your explanation of where the 2 comes from doesn't actually make sense - the entire reason a two is there is because there are two players, and when you take a difference between two Gaussians, you get a Gaussian with mean equal to the differences of the mean and variance equal to the sum of the variance, which if you have equal variances (sigmaA = sigmaB) is simply sigmaA^2 + sigmaB^2 = 2sigma^2. That's actually why the 2 is there.

As for the draw probability, you need to have something if you want to model explicitly the chance at a draw, you need a paramter, yes, but I was actually doing the common convention of treating a draw as a simultaneous half-win and half-loss, or to put it more simply, I'm giving not the probability of a win, but the expected score - which is equivalent to (Wins + Draws/2)/(Games)

But yeah, there should be a 2 there, and at least some forms of TS do explicitly model draws differently, so there is that.

Dominion Strategy Forum

News:

Author Topic: Beyond cheaters, this is why we have the isotropish leaderboard (Read 11705 times)

jl8e

Re: Beyond cheaters, this is why we have the isotropish leaderboard

popsofctown

Re: Beyond cheaters, this is why we have the isotropish leaderboard

flies

Re: Beyond cheaters, this is why we have the isotropish leaderboard

WanderingWinder

Re: Beyond cheaters, this is why we have the isotropish leaderboard

flies

Re: Beyond cheaters, this is why we have the isotropish leaderboard

WanderingWinder

Re: Beyond cheaters, this is why we have the isotropish leaderboard

ragingduckd

Re: Beyond cheaters, this is why we have the isotropish leaderboard

WanderingWinder

Re: Beyond cheaters, this is why we have the isotropish leaderboard