Topic: Beyond cheaters, this is why we have the isotropish leaderboard (Read 11683 times)

() | (_) ^/ · « **on:** March 28, 2014, 10:34:42 am »

1) Hi all been a while yes there is no punctuation in this line unless of course you consider the closed parenthesis at the conclusion of the point indicator to be punctuation i dont but you may

2) I honestly have never ever cheated the Dominion leaderboard, either on Dominion Online or Isotropic Dominion.

3) I've played a handful of games today after playing one yesterday, and that after not playing at all for probably six months or so.

My "Pro" ranking per goko: #2 with 6598 points
My Isotropish ranking: #60 at level 40

The difference is staggering! I'm sure many of you already know this, but this is the first time I've actually been on Dominion since the Isotropish leaderboard was introduced, and it is quite nice. Really brings my head down out of the clouds.

() | (_) ^/ · « **Reply #1 on:** March 28, 2014, 11:07:17 am »

.... and then I lost one game and I'm now down to 5167 goko rating. <3

flies · « **Reply #2 on:** March 28, 2014, 01:56:48 pm »

was it against a low level player?

StrongRhino · « **Reply #3 on:** March 28, 2014, 09:53:51 pm »

Quote from: () | (_) ^/ on March 28, 2014, 11:07:17 am

.... and then I lost one game and I'm now down to 5167 goko rating. <3

A low player? I've lost that many points before, it's somewhat frustrating moving so much in one game.

Polk5440 · « **Reply #4 on:** March 28, 2014, 10:04:04 pm »

The key is that he hasn't played in 6 months. At some point it becomes like you are a new player again. Variance is very high and history matters little. The decay is faster for goko's system (recent play matters more), so it makes sense that's TrueSkill looks more stable at this point.

Tables · « **Reply #5 on:** March 28, 2014, 10:05:28 pm »

Wow, I wonder how it feels to get on a winning streak and just so happen to shoot up to the top of the Goko leaderboard? I mean nothing like that's ever happened to me.

But yeah, the Pro leaderboard is uh okay at gauging roughly people's skill but Isotropish is just so much more accurate, and giving uncertainty is nice as well.

flies · « **Reply #6 on:** March 28, 2014, 10:19:15 pm »

i don't understand why the sigmas on the isotropish board are so much more tightly clumped than the old isotropic ratings. (for reference: http://dominion.isotropic.org/leaderboard/ - more games? really?)

GeoLib · « **Reply #7 on:** March 28, 2014, 11:42:05 pm »

Quote from: flies on March 28, 2014, 10:19:15 pm

i don't understand why the sigmas on the isotropish board are so much more tightly clumped than the old isotropic ratings. (for reference: http://dominion.isotropic.org/leaderboard/ - more games? really?)

Isotropic had a small increase in uncertainty over time which is not in isotropish. I think that's it.

florrat · « **Reply #8 on:** March 28, 2014, 11:53:15 pm »

Quote from: flies on March 28, 2014, 10:19:15 pm

i don't understand why the sigmas on the isotropish board are so much more tightly clumped than the old isotropic ratings. (for reference: http://dominion.isotropic.org/leaderboard/ - more games? really?)

Yeah, I've wondered that as well. The absolute minimum uncertainty on the whole leaderboard is 9.92. Is there some theoretical lower bound for the uncertainty, or can it get arbitrarily close to 0?

SirPeebles · « **Reply #9 on:** March 29, 2014, 07:21:32 am »

Quote from: () | (_) ^/ on March 28, 2014, 10:34:42 am

1) Hi all been a while yes there is no punctuation in this line unless of course you consider the closed parenthesis at the conclusion of the point indicator to be punctuation i dont but you may

That's 'cause you hid 'em all in your username.

WanderingWinder · « **Reply #10 on:** March 29, 2014, 12:36:25 pm »

I really don't understand anyone's basis for the claim that isotropish is "more accurate" than Goko/MF pro.

Joseph2302 · « **Reply #11 on:** March 29, 2014, 12:50:25 pm »

There's a massive discrepancy at times between Goko and Isotropish ratings. For instance, I'm currently at Goko rating 4853, isotropish level 21. I'm sat in great hall, and there are 6 people within +/- 300 rating of me. On isotropish ratings, they range between levels 12 and 30. Clearly someone's rating is a lot worse? (I'm guessing Goko's)

WanderingWinder · « **Reply #12 on:** March 29, 2014, 01:09:33 pm »

Quote from: Joseph2302 on March 29, 2014, 12:50:25 pm

There's a massive discrepancy at times between Goko and Isotropish ratings. For instance, I'm currently at Goko rating 4853, isotropish level 21. I'm sat in great hall, and there are 6 people within +/- 300 rating of me. On isotropish ratings, they range between levels 12 and 30. Clearly someone's rating is a lot worse?

Actually, all that follows is that they're much different. System A might be horrible at rating 3 of them and System B might be horrible at rating the other 3. Or one or both might just be horribly misrating you. Or...

Quote

(I'm guessing Goko's)

But my real question is, WHY is this the flat assumption of everyone? Is it just because there's a perception that anything Goko does must be bad...?

SCSN · « **Reply #13 on:** March 29, 2014, 01:56:36 pm »

Goko's rating system is swingy as hell. Since human skill changes only slowly over time, any estimation of it that's all over the place is horrible.

WanderingWinder · « **Reply #14 on:** March 29, 2014, 02:15:05 pm »

Quote from: SheCantSayNo on March 29, 2014, 01:56:36 pm

Goko's rating system is swingy as hell. Since human skill changes only slowly over time, any estimation of it that's all over the place is horrible.

There are a few problems with this; 1) we don't actually know if it's swingy, really, because we don't know how much 5 points or 50 points or 500 points means (admittedly, this isn't exactly a point in the system's favor); 2)while (presumably, though I don't have better than anecdotal evidence) skill doesn't change rapidly over time for most players, this doesn't mean that the best estimation of said skill isn't going to move around a bit; 3)I don't actually agree with your assessment that their system is 'swingy as hell' - it actually doesn't seem to move that much at all to me. On the other hand, the isotropish ratings seem INCREDIBLY sluggish.

Polk5440 · « **Reply #15 on:** March 29, 2014, 02:17:53 pm »

Quote from: WanderingWinder on March 29, 2014, 02:15:05 pm

On the other hand, the isotropish ratings seem INCREDIBLY sluggish.

Yes, definitely.

SCSN · « **Reply #16 on:** March 29, 2014, 03:02:09 pm »

Quote from: WanderingWinder on March 29, 2014, 02:15:05 pm

Quote from: SheCantSayNo on March 29, 2014, 01:56:36 pm
Goko's rating system is swingy as hell. Since human skill changes only slowly over time, any estimation of it that's all over the place is horrible.

There are a few problems with this; 1) we don't actually know if it's swingy, really, because we don't know how much 5 points or 50 points or 500 points means (admittedly, this isn't exactly a point in the system's favor)

We do know this from looking at the relative rankings. To give an example: earlier today I was briefly above Stef on the Goko rankings after winning a few games in a row, i.e. according to their rating system I would be a favorite in our next match-up, even though we have both played thousand of games on their site of which over a hundred against each other. Goko's conclusion is clearly retarded, because there's no doubt in my (or isotropish's) mind that Stef is the better player. For some more examples, see this post by Andrew.

Quote

2)while (presumably, though I don't have better than anecdotal evidence) skill doesn't change rapidly over time for most players, this doesn't mean that the best estimation of said skill isn't going to move around a bit

It should move around a bit when it has little evidence, but when it has hundreds or even thousands of games on you including a ton of recent data points, it should be changing very conservatively. And Goko's system isn't just moving around a bit, it's bouncing wildly to the tune of white noise.

Quote

3)I don't actually agree with your assessment that their system is 'swingy as hell' - it actually doesn't seem to move that much at all to me. On the other hand, the isotropish ratings seem INCREDIBLY sluggish.

We clearly have very different expectations here, because I think the isotropish ratings are much too volatile still. After having played thousands of games and a sufficient volume recently, it shouldn't be possible to change by a few levels within a single day: the prior of your skill having changed significantly over a very short time-span should be close to zero (by the nature of skill acquisition and decay), so that any significant deviation from expectation over a small sample should be judged as a fluke and thus only very slightly affect ratings.

To make of this a testable prediction: I predict that a running 30-day average of the isotropish ratings will be a significantly better predictor of the outcome of match-ups between players than the ratings as they currently are.

WanderingWinder · « **Reply #17 on:** March 29, 2014, 03:37:46 pm »

Quote from: SheCantSayNo on March 29, 2014, 03:02:09 pm

Quote from: WanderingWinder on March 29, 2014, 02:15:05 pm
Quote from: SheCantSayNo on March 29, 2014, 01:56:36 pm
Goko's rating system is swingy as hell. Since human skill changes only slowly over time, any estimation of it that's all over the place is horrible.

There are a few problems with this; 1) we don't actually know if it's swingy, really, because we don't know how much 5 points or 50 points or 500 points means (admittedly, this isn't exactly a point in the system's favor)

We do know this from looking at the relative rankings. To give an example: earlier today I was briefly above Stef on the Goko rankings after winning a few games in a row, i.e. according to their rating system I would be a favorite in our next match-up, even though we have both played thousand of games on their site of which over a hundred against each other. Goko's conclusion is clearly retarded, because there's no doubt in my (or isotropish's) mind that Stef is the better player. For some more examples, see this post by Andrew.

Well, I can go rebut that post if you want. Also isotropish had Stef below you for a while several days ago, so there's some doubt in its mind (even if we ignore that, it has you guys within like 1 st dev right now, so it thinks there's a reasonable chance). Further, I guess I think players' skills move faster than you do, because I certainly wouldn't quote thousands of games like they're all relevant. E.g., since the beginning of the month, you've played 99 games, 13 against Stef. He's played 168. You're 7-6 against him this month. Last month, you were 12-8 against him. No games in January. 1-3 in December. November, 3-2. Based on these results, anyway, you're pretty clearly better than him. I don't know what hundreds of games from ancient times you're banking on to confirm your "he's better than me", at least heads up.

Quote

Quote
2)while (presumably, though I don't have better than anecdotal evidence) skill doesn't change rapidly over time for most players, this doesn't mean that the best estimation of said skill isn't going to move around a bit

It should move around a bit when it has little evidence, but when it has hundreds or even thousands of games on you including a ton of recent data points, it should be changing very conservatively. And Goko's system isn't just moving around a bit, it's bouncing wildly to the tune of white noise.

When there are a ton of recent data points, it ought to move towards what those say, ignoring to some extent the older data. And again, you don't know how much it's moving around, because you only see points, and you don't know how much points are worth. It could be that all this random fluctuation is just bumping between 50.001% and 49.999%. Yes, you have rank data, but you don't have how MUCH it favors anyone against anyone else.

Quote

Quote
3)I don't actually agree with your assessment that their system is 'swingy as hell' - it actually doesn't seem to move that much at all to me. On the other hand, the isotropish ratings seem INCREDIBLY sluggish.

We clearly have very different expectations here, because I think the isotropish ratings are much too volatile still. After having played thousands of games and a sufficient volume recently, it shouldn't be possible to change by a few levels within a single day: the prior of your skill having changed significantly over a very short time-span should be close to zero (by the nature of skill acquisition and decay), so that any significant deviation from expectation over a small sample should be judged as a fluke and thus only very slightly affect ratings.

Well, when I lose something like 15 out of 20 against someone (who was ranked reasonably high to start with and it still has me at like 80% against them, I have to assume isotropish is too stodgy.
To be more clear here, the math doesn't really back you up here. If my model has it as a 2% chance Bob beats Tim in any game, and Bob beats Tim 10 games in a row, there's one chance in something on the order of a Billion Billion of that happening. Your model is wrong, and needs to move. I don't care if you have 10k games, your ratings need to move significantly. Obviously, this is a pretty dramatic example, but even in more realistic scenarios, you can pretty quickly to get to things that are 1 in a thousand or worse, very very easily. Now it's possible that you just had that random luck pop up, but I think it's more likely that the players' skills weren't accurately recorded, possibly because of at least somewhat of a skill change.

Quote

To make of this a testable prediction: I predict that a running 30-day average of the isotropish ratings will be a significantly better predictor of the outcome of match-ups between players than the ratings as they currently are.

You think a rating taken as an average over the last 30 days of isotropish ratings will be a significantly better predictor of WHICH matchups between players than... the current isotropish ratings? There are lots of holes in this that would need to be filled before it can be considered a testable prediction. First, how are you averaging? Arithmetic mean of mu and arithmetic mean of sigma? Do you have the historical data to calculate this? How do you time-average the data, since it updates real-time? Over what time period are you taking this measurement? Perhaps most important, how do you want to define "better predictor"? You need some kind of error function.

sudgy · « **Reply #18 on:** March 29, 2014, 03:41:29 pm »

I think the more you play, the better the Pro rating is at knowing your actual score. WW, you play 29384 games a day so it shows you pretty well. But, people like me or () | (_) ^/ aren't shown as well on the Pro rating because we rarely play.

WanderingWinder · « **Reply #19 on:** March 29, 2014, 03:58:32 pm »

Quote from: sudgy on March 29, 2014, 03:41:29 pm

I think the more you play, the better the Pro rating is at knowing your actual score. WW, you play 29384 games a day so it shows you pretty well. But, people like me or () | (_) ^/ aren't shown as well on the Pro rating because we rarely play.

I don't play THAT much, though you have a point. On the other hand, the "level" system on the isotropish has a very similar problem - it all comes down to displaying your rating with a penalty for uncertainty. In any case, my point isn't that Pro rating is a great system (I don't think it is), it's that isotropish is pretty similarly lousy.

dominion123 · « **Reply #20 on:** March 29, 2014, 05:31:50 pm »

There's nothing wrong with goko's rating system, it (presumably) uses some sort of ELO algorithm much like chess ratings. It's mathematically sound; given the ratings of two players you are able to predict the likelihood of one winning over another (+-200 probably means you are very close in skill), and the ratings change by the exact amount it needs to account for the new information (a loss or a win). The criticism here is that they don't measure skill very well. Well, it does.

Personally I prefer these kinds of ratings the most because they are have an applicable interpretation, namely the likelihood of one player winning over another. I don't know how the isotropic level system works (I'm not familiar with it), but if you conclude that one player is much superior to another despite even developed goko-rakings, then I must say I much prefer goko's. If you gain "points" or whatever based on how many games you play, and not solely who you play against, then I have a problem with it.

flies · « **Reply #21 on:** March 29, 2014, 06:56:11 pm »

Quote from: dominion123 on March 29, 2014, 05:31:50 pm

I don't know how the isotropic level system works (I'm not familiar with it), but if you conclude that one player is much superior to another despite even developed goko-rakings, then I must say I much prefer goko's.

Isotropish uses TrueSkill which does seem to represent the point difference as an odds difference through some kind of exponential scaling iinm.

My goko rating will go down aobut 60 points if I lose to a player ~800 pts lower than me. If that happens five times in a row, I've lost 300 points. That volatility is not what I'd want for maximum accuracy.

WanderingWinder · « **Reply #22 on:** March 29, 2014, 07:21:52 pm »

Quote from: flies on March 29, 2014, 06:56:11 pm

Quote from: dominion123 on March 29, 2014, 05:31:50 pm
I don't know how the isotropic level system works (I'm not familiar with it), but if you conclude that one player is much superior to another despite even developed goko-rakings, then I must say I much prefer goko's.

Isotropish uses TrueSkill which does seem to represent the point difference as an odds difference through some kind of exponential scaling iinm.

TrueSkill models things as a difference of normal distributions, with the combined normal having an extra paramater for intrinsic variability of the game.

Quote

My goko rating will go down aobut 60 points if I lose to a player ~800 pts lower than me. If that happens five times in a row, I've lost 300 points. That volatility is not what I'd want for maximum accuracy.

But 60 points and 800 points and 300 points don't necessarily mean anything. If it was losing .6 points against someone rated 8 below you, would it be a problem? Well, this is the same thing, they are just showing you extra digits. The other thing is, since you don't know what this means, it could be that someone 800 points lower than you has a 40% chance at winning, and then dropping 300 points only gets you to 45% against someone you originally were rated the same as, which isn't awful for losing 5 games in a row against someone rated a ways below you.

Really, you measure whether the volatility is appropriate or not based on whether it is accurate to predict future games, and the only thing we KNOW is bad about the current system, in terms of worrying about accuracy, is that you can't tell what these predictions are to measure whether it's good or not.

popsofctown · « **Reply #23 on:** March 29, 2014, 10:05:14 pm »

How do I increase the certainty that I suck at dominion

Polk5440 · « **Reply #24 on:** March 29, 2014, 10:29:49 pm »

Quote from: WanderingWinder on March 29, 2014, 03:37:46 pm

Quote from: SheCantSayNo on March 29, 2014, 03:02:09 pm
Quote from: WanderingWinder on March 29, 2014, 02:15:05 pm
Quote from: SheCantSayNo on March 29, 2014, 01:56:36 pm
Goko's rating system is swingy as hell. Since human skill changes only slowly over time, any estimation of it that's all over the place is horrible.

There are a few problems with this; 1) we don't actually know if it's swingy, really, because we don't know how much 5 points or 50 points or 500 points means (admittedly, this isn't exactly a point in the system's favor)

We do know this from looking at the relative rankings. To give an example: earlier today I was briefly above Stef on the Goko rankings after winning a few games in a row, i.e. according to their rating system I would be a favorite in our next match-up, even though we have both played thousand of games on their site of which over a hundred against each other. Goko's conclusion is clearly retarded, because there's no doubt in my (or isotropish's) mind that Stef is the better player. For some more examples, see this post by Andrew.
Well, I can go rebut that post if you want. Also isotropish had Stef below you for a while several days ago, so there's some doubt in its mind (even if we ignore that, it has you guys within like 1 st dev right now, so it thinks there's a reasonable chance). Further, I guess I think players' skills move faster than you do, because I certainly wouldn't quote thousands of games like they're all relevant. E.g., since the beginning of the month, you've played 99 games, 13 against Stef. He's played 168. You're 7-6 against him this month. Last month, you were 12-8 against him. No games in January. 1-3 in December. November, 3-2. Based on these results, anyway, you're pretty clearly better than him. I don't know what hundreds of games from ancient times you're banking on to confirm your "he's better than me", at least heads up.

I like this game!

I am 5-2 against Stef this year to date and 9-7 against SheCantSayNo this year to date.

I'd like my Silver medal, please!

(Can't claim gold... Mic Qsenoch wipes the floor with me.)

jl8e · « **Reply #25 on:** March 29, 2014, 10:33:07 pm »

Quote from: flies on March 29, 2014, 06:56:11 pm

My goko rating will go down aobut 60 points if I lose to a player ~800 pts lower than me. If that happens five times in a row, I've lost 300 points. That volatility is not what I'd want for maximum accuracy.

If you’re losing five times running to players rated significantly below you, your rating is too high, and should be dropping significantly.

Dominion ratings, whatever system is in use, are going to show volatility, because Dominion is a high-variance game. It’s not like chess, where at some point, the higher-ranked player simply is not going to lose. No matter how good someone is, against an average player they’re still going to lose occasionally because of luck. If the skill difference means they win 95% of the time, then in a perfectly-balanced rating system their rating is going to drop significantly when they do lose. Specifically, it’s going to drop by 19 * x, where x is however much they would gain by winning.

popsofctown · « **Reply #26 on:** March 30, 2014, 12:23:59 am »

I dunno man, you give Celestial Chameleon the right kingdom against a weaker player..

flies · « **Reply #27 on:** March 30, 2014, 12:03:35 pm »

Quote from: WanderingWinder on March 29, 2014, 07:21:52 pm

Quote from: flies on March 29, 2014, 06:56:11 pm
Quote
My goko rating will go down aobut 60 points if I lose to a player ~800 pts lower than me. If that happens five times in a row, I've lost 300 points. That volatility is not what I'd want for maximum accuracy.
But 60 points and 800 points and 300 points don't necessarily mean anything. If it was losing .6 points against someone rated 8 below you, would it be a problem? Well, this is the same thing, they are just showing you extra digits.

Right now 300 points would drop me from the 11'th ranked player to #33. This feels wrong. Rankings are better for scale insofar as the scale is not arbitrary per se, but the meaning of a rank difference depends on how many players are ranked. (there are about 300 players above lvl 29 on isotropish, ~500 above 5000 on goko.)

We don't really know how to rank skill of players. We have no truth to compare to. We could, in principle, devise bots with a known "skill" (whereby they'd decide at the outset who'd win/lose based on some function of a scalar "skill" difference) and see what isotropish would do vs goko, and that would help. But we're not going to do that.

I appreciate your skepticism, WW, but I can't shake the feeling that goko's ranking is too volatile. Whatever my skill is, five games out of 1079 shouldn't change my estimation of it that much (how many of those are over the last month I'm not sure, 50?).

Quote

TrueSkill models things as a difference of normal distributions, with the combined normal having an extra paramater for intrinsic variability of the game.

If you think the analysis here, where the odds are given as an exponential function of TS difference, is mistaken I'd be interested to hear why.

WanderingWinder · « **Reply #28 on:** March 30, 2014, 12:31:53 pm »

Quote from: flies on March 30, 2014, 12:03:35 pm

Quote from: WanderingWinder on March 29, 2014, 07:21:52 pm
Quote from: flies on March 29, 2014, 06:56:11 pm
Quote
My goko rating will go down aobut 60 points if I lose to a player ~800 pts lower than me. If that happens five times in a row, I've lost 300 points. That volatility is not what I'd want for maximum accuracy.
But 60 points and 800 points and 300 points don't necessarily mean anything. If it was losing .6 points against someone rated 8 below you, would it be a problem? Well, this is the same thing, they are just showing you extra digits.

Right now 300 points would drop me from the 11'th ranked player to #33. This feels wrong. Rankings are better for scale insofar as the scale is not arbitrary per se, but the meaning of a rank difference depends on how many players are ranked. (there are about 300 players above lvl 29 on isotropish, ~500 above 5000 on goko.)

We don't really know how to rank skill of players. We have no truth to compare to. We could, in principle, devise bots with a known "skill" (whereby they'd decide at the outset who'd win/lose based on some function of a scalar "skill" difference) and see what isotropish would do vs goko, and that would help. But we're not going to do that.

But this isn't true. We DO know a way to rank the skill of players - look at game results. Rating systems can (and generally do) give specific numeric predictions of game outcomes, e.g. Jimmy has a 72% chance of winning against Bob. Then you compare these results against what happens and see what has the best accuracy/least error. There's actually more than one way to measure accuracy, and you can argue their merits, but you can certainly do it by one of them.

As for dropping you 22 spots, well, maybe that's right - seems to me that it thinks that you and these other guys are pretty closely bunched as is, and it wouldn't take much to flip it from "You're very slightly better than them" to "You're very slightly worse". It's certainly plausible.

Quote

I appreciate your skepticism, WW, but I can't shake the feeling that goko's ranking is too volatile. Whatever my skill is, five games out of 1079 shouldn't change my estimation of it that much (how many of those are over the last month I'm not sure, 50?).

I mean, sure, you can feel that way. My main point is that right now, all anyone has to go on either way is just feeling.

Quote

Quote
TrueSkill models things as a difference of normal distributions, with the combined normal having an extra paramater for intrinsic variability of the game.
If you think the analysis here, where the odds are given as an exponential function of TS difference, is mistaken I'd be interested to hear why.

I'm apparently missing something in that link, as what I see there is that skills are measured by Normal (AKA Gaussian) distributions. I don't see the exponential function showing up in what is actually used; I see it given as a comparison to other models (Elo as currently implemented most places, though not actually what Elo himself originally proposed), which have logistic bases. But for sure it says they use Normal distributions there. (See the 9th, 13th, and 14th posts there, as well as the linked paper).

flies · « **Reply #29 on:** March 30, 2014, 04:05:39 pm »

Quote

We DO know a way to rank the skill of players...

now i'm confused. if there is a way to rank the skill of players more accurately than TS or whatever goko does, why don't we use it?

Quote

As for dropping you 22 spots, well, maybe that's right - seems to me that it thinks that you and these other guys are pretty closely bunched as is, and it wouldn't take much to flip it from "You're very slightly better than them" to "You're very slightly worse". It's certainly plausible.

this is reasonable.

Quote

I'm apparently missing something in that link, as what I see there is that skills are measured by Normal (AKA Gaussian) distributions. I don't see the exponential function showing up in what is actually used; I see it given as a comparison to other models (Elo as currently implemented most places, though not actually what Elo himself originally proposed), which have logistic bases. But for sure it says they use Normal distributions there. (See the 9th, 13th, and 14th posts there, as well as the linked paper).

Ok, it's taken me a long time (months) to grok all this, and I haven't got my head entirely wrapped around it, but the exponential odds referred to in the link is certainly not what TS does. I'd be very interested to work out exactly what the win prediction under TS is, but I haven't got the time to work that out at the moment.

WanderingWinder · « **Reply #30 on:** March 30, 2014, 04:30:54 pm »

Quote from: flies on March 30, 2014, 04:05:39 pm

Quote
We DO know a way to rank the skill of players...
now i'm confused. if there is a way to rank the skill of players more accurately than TS or whatever goko does, why don't we use it?

Well, I'm not actually saying we know a way that is better than TS/what goko does/etc. I'm saying that those are methods, and we can measure how good they are. You're never going to find a perfect method, but you can measure different methods to see how good they are, and look for whichever thing is the best. At least, you can do that in principle, though nobody is actually measuring their accuracy right now.

Quote

Quote
As for dropping you 22 spots, well, maybe that's right - seems to me that it thinks that you and these other guys are pretty closely bunched as is, and it wouldn't take much to flip it from "You're very slightly better than them" to "You're very slightly worse". It's certainly plausible.
this is reasonable.

Quote
I'm apparently missing something in that link, as what I see there is that skills are measured by Normal (AKA Gaussian) distributions. I don't see the exponential function showing up in what is actually used; I see it given as a comparison to other models (Elo as currently implemented most places, though not actually what Elo himself originally proposed), which have logistic bases. But for sure it says they use Normal distributions there. (See the 9th, 13th, and 14th posts there, as well as the linked paper).
Ok, it's taken me a long time (months) to grok all this, and I haven't got my head entirely wrapped around it, but the exponential odds referred to in the link is certainly not what TS does. I'd be very interested to work out exactly what the win prediction under TS is, but I haven't got the time to work that out at the moment.

[/quote]

The 'exponential odds thing' is what is used by most every Elo system nowadays, as well as several variants. It's probably the most common nowadays, and it's relatively easy to compute.

TS uses this (simplified to a 2-player game here):
P_A_Wins = Normal_CDF(Mu_A - Mu_B, Sqrt(Sigma_A^2+Sigma_B^2+Sigma_Game^2))

where P_A_Wins is the probability player A wins and Normal_CDF is the cumulative distirbution function of the Normal (or Gaussian) distribution, which you can't compute in closed form. You have to do it numerically, which used to effectively mean we looked it up in a table, though nowadays people do it on computers. Googling, the first thing that plops up is this: http://www.danielsoper.com/statcalc3/calc.aspx?id=53 though you can do it in e.g. Excel if you have that. The parametrization I give above is Normal(X, Standard Deviation), sometimes you see it as Mean, Variance. Uh, what else? Oh, I guess if the thing wants a mean, put it in as 0. And if you're trying to do it off of isotropish ratings, then you'll want to know that the displayed uncertainty is 3*sigma, not just straight sigma. I don't remember exactly what it uses for Sigma_game; 25/6 seems to be what I remember, but uh, that certainly could be wrong.

ragingduckd · « **Reply #31 on:** April 05, 2014, 06:17:25 pm »

Quote from: WanderingWinder on March 30, 2014, 04:30:54 pm

TS uses this (simplified to a 2-player game here):
P_A_Wins = Normal_CDF(Mu_A - Mu_B, Sqrt(Sigma_A^2+Sigma_B^2+Sigma_Game^2))

I think there's a small typo here. The σ²_Game needs a factor of 2, since removing the player uncertainties (σ_A= σ_B=0) should yield the standard Gaussian Elo result: Φ(μ_A - μ_B, 2^1/2β)

Also, if you want a non-zero draw probability, throw in a draw margin ε. For Isotropish, ε≈2.2, which corresponds to a (somewhat inaccurate) draw rate of 5%. More empirically accurate would be ε≈0.78 (1.75%).

WanderingWinder · « **Reply #32 on:** April 05, 2014, 06:49:41 pm »

Quote from: ragingduckd on April 05, 2014, 06:17:25 pm

Quote from: WanderingWinder on March 30, 2014, 04:30:54 pm
TS uses this (simplified to a 2-player game here):
P_A_Wins = Normal_CDF(Mu_A - Mu_B, Sqrt(Sigma_A^2+Sigma_B^2+Sigma_Game^2))

I think there's a small typo here. The σ²_Game needs a factor of 2, since removing the player uncertainties (σ_A= σ_B=0) should yield the standard Gaussian Elo result: Φ(μ_A - μ_B, 2^1/2β)

Also, if you want a non-zero draw probability, throw in a draw margin ε. For Isotropish, ε≈2.2, which corresponds to a (somewhat inaccurate) draw rate of 5%. More empirically accurate would be ε≈0.78 (1.75%).

1st, this was all off the top of my head, so it wasn't entirely precise. Forgot the 2, though it's not the most accurate thing to say that this is the "standard Gaussian Elo result" - all the actually-running Elo systems use a Logistic function, not a Gaussian; to be fair, Elo originally proposed Gaussian distributions, but... he doesn't actually say that this is the right figure; he explains in section 8.23 of his book (which is sitting on the arm of my chair as I type this) how you might get that figure, but eventually doesn't model players' variances separately, arguing that even if they're far different, it ends up not making so much difference. And indeed, in the Elo system, as he proposed it, there is just one variance that gets used, and it's an entirely irrelevant scale factor (well, ok, it makes a difference, but it's a scale factor - basically all it does is make the numbers different, without changing the predictions of the system, similar to a "double it, double the gaps between the ratings" (though not actually quite this simple)).

And actually, your explanation of where the 2 comes from doesn't actually make sense - the entire reason a two is there is because there are two players, and when you take a difference between two Gaussians, you get a Gaussian with mean equal to the differences of the mean and variance equal to the sum of the variance, which if you have equal variances (sigmaA = sigmaB) is simply sigmaA^2 + sigmaB^2 = 2sigma^2. That's actually why the 2 is there.

As for the draw probability, you need to have something if you want to model explicitly the chance at a draw, you need a paramter, yes, but I was actually doing the common convention of treating a draw as a simultaneous half-win and half-loss, or to put it more simply, I'm giving not the probability of a win, but the expected score - which is equivalent to (Wins + Draws/2)/(Games)

But yeah, there should be a 2 there, and at least some forms of TS do explicitly model draws differently, so there is that.

Dominion Strategy Forum

News:

Author Topic: Beyond cheaters, this is why we have the isotropish leaderboard (Read 11683 times)