Topic: Goko rating system (Read 20857 times)

ftl · « **on:** April 29, 2013, 02:43:51 pm »

They posted some stuff about the rating system, reposting it here. It was originally at https://getsatisfaction.com/goko/topics/summary_of_rating_calculations .

Quote from: jqs

Each player gets a rating between 0-10,000. All registered players are rated the same way, including bots. The system for calculating ratings is complicated and depends on many factors:
* Ratings are adjusted based on what place you came in, rather than how many victory points you won by. If you resign a game, it is treated as a loss.

* Ratings are also adjusted based on the ratings of your opponents. Beating a higher-rated opponent will help your rating much more than beating a lower-rated opponent.

* The system is conservative with its calculations, so that is reasonably sure you deserve the rating before it awards it to you. In other words, your rating may be low when you haven't played many games and rise as the system gains more confidence that you deserve the higher rating.

* Your rating will fluctuate more when the system has little data about how skillful you are. As you play more, the system's confidence in its estimate grows and your rating will be affected less by results. Unexpected results, such as a win over a much higher-rated opponent, affect ratings the most.
* Your rating will fluctuate more with results against players with very established ratings

Also responded to some questions, including why you could sometimes lose points after a win:

Quote from: Lord Humanton

>So why exactly can you lose points by winning?
The purpose of a rating system is to estimate your skill so it can be used predictively, not to make you feel good about winning. Suppose you have played one game, losing to a 6000 player. The rating system has to assign you some rating arbitrarily lower than 6000, say it picks 5000. Now suppose in your second game you beat a player who has a rating of 2000. Your overall data is:
* loss vs. 6000
* win vs. 2000
From that limited data, a logical guess for your rating would be right in the middle: 4000. That's all mathematically fine, but to the player, all he sees is that he just lost 1000 points (from 5000 to 4000) by winning. This is an extreme example and very simplified compared to what actually happens, but it illustrates the principle, which is a feature of most rating systems.

That said, we no longer allow you to lose points by winning. Instead, we're perverting the rating system a little to prevent this, only because it upsets players who have a hard time believing it's the best thing to happen in order to model their ability.

>Can't you just give us the exact formula you use?
Sorry, it's too complicated a system to plop into a formula, otherwise we would. But you shouldn't stress out over ratings or their calculation. It's better to understand the above principles if you're curious, use ratings as an aid to matching up opponents, and then just go out and enjoy playing the game!

Watno · « **Reply #1 on:** April 29, 2013, 02:56:54 pm »

I really wonder hpw they calculate their ratings if you cant't put it into a formula^^

ftl · « **Reply #2 on:** April 29, 2013, 03:25:28 pm »

The main things I am curious about in the rating system formula are:

1) How much does "playing a lot" versus "actually being good" factor into a high rating?
2) ...does it matter who you play? On average, will you gain more rating for playing lower or higher-ranked opponents? Obviously, for playing a lower-ranked opponent, you gain fewer points if you win and lose more if you lose, but you also have a higher probability of winning; does that balance out right, so that the expected value is the same whether you play up or play down?

...I'm mistrustful is because right now, I'm #3 in the pro ratings, and I completely don't believe that I'm *actually* the third-best player on goko. I suspect the rating system is doing something wrong to put me there, I'm just not sure what. I was never particularly high on iso...

WanderingWinder · « **Reply #3 on:** April 29, 2013, 03:34:34 pm »

That's a load of crap. Let's run down the problems:
Having a lower and upper cap is not good - artificial cut-offs and truncations disturb things, no question.
The rest of the first blob is fine, except: "The system is conservative with its calculations, so that is reasonably sure you deserve the rating before it awards it to you. In other words, your rating may be low when you haven't played many games and rise as the system gains more confidence that you deserve the higher rating."
Now, being conservative is fine (well, conservative is a relative term, but whatever, being more conservative than, say expectations, is fine and probably good). BUT being conservative does NOT mean giving you a lower rating when it's less sure, it means giving you a bigger range to both top and bottom. Every argument you can make about the top you can also make about the bottom. Of course, it's possible to have an underlying assymetric curve which will lead to differences - however, there is absolutely NO substantive evidence to support this, and the amount of groundbreaking research needed to put such a curve on good fitting would be incredibly irresponsible for them to have done.
But ok, this is mostly a gimmick to get you to play more, and to continue to encourage you as you go on, as the overall trend of the ratings is upwards. And well, it's not an uncommon gimmick, so I will try not to complain too much about it, even though it drives me nuts.

However, it's the second bit that is particularly... let's just say wrong.

"The purpose of a rating system is to estimate your skill so it can be used predictively, not to make you feel good about winning." This is actually fine, though I actually doubt that it's true. But if you want to be able to use the thing predictively, then why do you not allow people to, you know, predict anything with it? Like, have some way to convert rating differences into expected win percentage? Or at the very least, show going into a matchup what the system thinks the win% is. Because if you don't make predictions, you aren't being predictive.
So saying that the ratings aren't supposed to make you feel good is a true if not-very-good-for-PR statement. But support it!

"Suppose you have played one game, losing to a 6000 player."
Ok.
"The rating system has to assign you some rating arbitrarily lower than 6000,"
This is absolutely false. The system doesn't have to give you a rating at all, particularly in this situation. Indeed, an entirely reasonable - and in some cases used - approach is to not assign a rating at all until (at least) the winning percentage of a player is neither 0 nor 1.

"say it picks 5000. Now suppose in your second game you beat a player who has a rating of 2000. Your overall data is:
* loss vs. 6000
* win vs. 2000"
Actually, this isn't true. Or rather, it isn't thorough. At the very least, our overall data contains not only these things, but that the win vs. a 2000 came AFTER the loss vs. the 6000.

"From that limited data, a logical guess for your rating would be right in the middle: 4000." I guess you could say that that is "a logical guess", but in reality, it's a bad guess. This treats all data as if it happened simultaneously, which throws out the highly important recency information. Actually it's hard to know what we know without knowing what 1000 points mean. But if we look at it in a very very clear way, we can see that this is wonky. So after one game, we have some (vague) idea of what X's rating is, and we've (perhaps questionably) said that our best guess is 5000. Now this guy wins against a 2000. And our estimation of his skill goes down? This is telling me that his previous sum total of results (a loss against a 6000) is better than his current result (a win against a 2000).

What he is saying here is that it's better to lose against a 6000 than to win against a 2000.

Actually, if we look at it more closely, it seems that he's suggesting that winning against someone is a result that will always pull you towards their rating +1000 points, and a loss will always pull you toward their rating -1000 points. However, it doesn't take much thought to realize that this is one of the stupidest ideas for a rating system you've ever heard of, particularly when they allow ranges of values which are 10,000 points wide - and that a 2000 point spread here effectively represents at least 100% win rate. Actually, it's worse than that, really. Because if this is the case, then imagine you have a player who is rated 3000 and another rated 5000. If the second guy plays the first, his rating *necessarily* decreases, and the other guys *necessarily* increases. That is, your expectation is that the 5000 needs to win something like 200% of the time to *maintain* his current rating, and the other guy needs to win something like -100% of the time to maintain his. This is a linear curve, and it appalls me how bad of an idea it is.

"That's all mathematically fine," Actually, it rather clearly is NOT.

"but to the player, all he sees is that he just lost 1000 points (from 5000 to 4000) by winning."
And to the system, this guy just achieved the best result possible given his situation, and it now estimates him to be a worse player for it. Indeed, if this is the system, no one ever has incentive to play anyone worse than 1000 points below them, ratings-wise, as they can't even tread water. And the system thinks that by agreeing to the game, you're a worse player.

"This is an extreme example and very simplified compared to what actually happens, "
Ok...
"but it illustrates the principle, which is a feature of most rating systems. "
This is total B******t. Most rating systems aren't nearly this bad. Please, provide any evidence for this. (Now of course, I am sure there are systems out there which have this supposed feature, but the VAST MAJORITY of them do not).

"That said, we no longer allow you to lose points by winning. Instead, we're perverting the rating system a little to prevent this, only because it upsets players who have a hard time believing it's the best thing to happen in order to model their ability. "
Okay, now you're just contradicting yourself. You said that the point of the rating system is to have predictive power and not to make you feel good about yourself. Now you're saying that you're changing your system because people are upset? Which is it? Because you're actually doubly wrong here. This is not a perversion of the rating system (well, it probably is in some sense, i.e. it's less true to the system's terrible foundations, but not in the sense of making the system better/worse in terms of predictions), and it is moreover not following in what you supposedly are upholding as your principles.

" >Can't you just give us the exact formula you use?"
Now actually, I don't have a problem with not giving the exact formula. In the ideal world, you would have it. But there are a number of legitimate reasons for not giving it, so that's fine.

"Sorry, it's too complicated a system to plop into a formula, otherwise we would."
But this is again total baloney. I mean, go look at some of the rating systems that are out there. Go look at Glicko-boost or something. You're telling me it's *more complicated* than that? What 'geniuses' do you have who came up with such an elaborate formula, and how can you afford to pay them? I can't even fathom something that can qualify as a rating system which spits out NUMBERS and is supposed to be numerically predictive and CAN'T be plopped into a formula. Would it be a complicated formula? Yes, of course. Could someone else figure it out? Are you seriously trying to tell me that you're arrogant enough to think that nobody ever could work it out? No, you have other reasons for this, and it gets my craw that you give this "it's too complicated to put in a formula" nonsense rather than actually just giving your real reasons, which may well be legit, but which because you won't give them to us, start to look less so.

"But you shouldn't stress out over ratings or their calculation. It's better to understand the above principles if you're curious, use ratings as an aid to matching up opponents, and then just go out and enjoy playing the game!"
Now I'm fine with a "ratings don't matter that much" standpoint. But now you are telling me what I should stress out over and what I should enjoy?

Watno · « **Reply #4 on:** April 29, 2013, 04:00:35 pm »

Don't tell us, tell them.

WanderingWinder · « **Reply #5 on:** April 29, 2013, 04:02:17 pm »

Quote from: Watno on April 29, 2013, 04:00:35 pm

Don't tell us, tell them.

Working definition of insanity.

Lightning edit: And in all honesty, they have bigger fish to fry, even though this fries me.

ooksoo · « **Reply #6 on:** April 29, 2013, 04:06:00 pm »

Quote from: Watno on April 29, 2013, 04:00:35 pm

Don't tell us, tell them.

agreed

Kirian · « **Reply #7 on:** April 29, 2013, 05:57:59 pm »

Quote from: ftl on April 29, 2013, 02:43:51 pm

>Can't you just give us the exact formula you use?
Sorry, it's too complicated a system to plop into a formula, otherwise we would.

I... but... how....

COMPUTERS DO NOT WORK THAT WAY

DStu · « **Reply #8 on:** April 30, 2013, 01:41:52 am »

Quote from: WanderingWinder on April 29, 2013, 03:34:34 pm

"That's all mathematically fine," Actually, it rather clearly is NOT.

Mathematically, it's maybe alright, just the model they apply the math on seem to be crap.

WanderingWinder · « **Reply #9 on:** April 30, 2013, 08:04:30 am »

Quote from: DStu on April 30, 2013, 01:41:52 am

Quote from: WanderingWinder on April 29, 2013, 03:34:34 pm
"That's all mathematically fine," Actually, it rather clearly is NOT.
Mathematically, it's maybe alright, just the model they apply the math on seem to be crap.

Well, but I would say that is the math. If you have:

You have 60 feet of fencing and you want to build a rectangular fence with fencing on three sides and the natural barrier of a river on the fourth, what dimensions should you build the fence with to cover the maximum area?

And you answer with "I have a system. Every side is equal length. And 60/3 = 20 feet on each side." Is that 'mathematically fine'? Because 60/3 does come out as 20. But the math really is in the set-up. And that's what they have wrong.

DStu · « **Reply #10 on:** April 30, 2013, 08:18:40 am »

First, I was joking and not really confident of it myself, so not much point in arguing.

Anyway, I think what is math and what is model depends on what axioms you see as given. You can have a "rating system" with absurd requirements, fulfil them and make perfectly fine math on them, in this case your model i.e. the requirements is crap. Or you can have good requirements and apply bad math on it. From our view on goko these are probably indistinguishable.
You can probably also say that the task "build a good rating system" includes identifying the right requirements. Probably depends on what you see as "the task".

Edit: @example: Here of course you have a mathematical error. There is a well-defined task and you make an error in one step assuming the heuristics solves the problem. But I don't think there is such a well-defined task in "build a rating system". Defining what that actually means could be understood as modelling.

LastFootnote · « **Reply #11 on:** April 30, 2013, 11:28:07 am »

Quote from: WanderingWinder on April 30, 2013, 08:04:30 am

You have 60 feet of fencing and you want to build a rectangular fence with fencing on three sides and the natural barrier of a river on the fourth, what dimensions should you build the fence with to cover the maximum area?

And you answer with "I have a system. Every side is equal length. And 60/3 = 20 feet on each side." Is that 'mathematically fine'? Because 60/3 does come out as 20. But the math really is in the set-up. And that's what they have wrong.

Oof, I'm ashamed to admit I had to relearn some algebra for that one. To think I got a B.A. in Mathematics not 10 years ago.

pinkymadigan · « **Reply #12 on:** April 30, 2013, 11:46:42 am »

Regarding figuring out the proper answer to the river/fence example - what is the proper setup? Assuming I could vary the lengths by 1 foot increments I wrote a quick script to find the maximum configuration which gave me 15, 30, 15 sides for an area of 450 square feet, but from a mathematical standpoint, what's the proper way to setup that problem? This seems like something I should know, but anymore when I have a math problem like this I find it easier to program something to find the answer for me.

Kirian · « **Reply #13 on:** April 30, 2013, 11:56:03 am »

pinky:

Call the sides perpendicular to the river length x. Then the side parallel to the river has length 60 - 2x. The area of the enclosure is x(60 - 2x) = 60x - 2x². To find the maximum of that quadratic, we determine the derivative and find the zero of the derivative, which is the highest point on the quadratic because the squared term is negative. So:

(d/dx)(60x - 2x²) = 60 - 4x

Find the zero: 60 - 4x = 0... which means x = 60/4 = 15 ft; that's the two perpendicular sides. Then the parallel side is 60 - 2x = 30 ft.

pinkymadigan · « **Reply #14 on:** April 30, 2013, 12:26:30 pm »

Quote from: Kirian on April 30, 2013, 11:56:03 am

pinky:

Call the sides perpendicular to the river length x. Then the side parallel to the river has length 60 - 2x. The area of the enclosure is x(60 - 2x) = 60x - 2x². To find the maximum of that quadratic, we determine the derivative and find the zero of the derivative, which is the highest point on the quadratic because the squared term is negative. So:

(d/dx)(60x - 2x²) = 60 - 4x

Find the zero: 60 - 4x = 0... which means x = 60/4 = 15 ft; that's the two perpendicular sides. Then the parallel side is 60 - 2x = 30 ft.

Yep, that all looks familiar.

Thanks. I gotta get out of programming financial stuff. It's boring, and seeing stuff like this really makes me wish I had ended up in simulation programming, where you really get to flex your math muscles all the time.

WanderingWinder · « **Reply #15 on:** April 30, 2013, 01:19:53 pm »

Quote from: DStu on April 30, 2013, 08:18:40 am

First, I was joking and not really confident of it myself, so not much point in arguing.

Anyway, I think what is math and what is model depends on what axioms you see as given. You can have a "rating system" with absurd requirements, fulfil them and make perfectly fine math on them, in this case your model i.e. the requirements is crap. Or you can have good requirements and apply bad math on it. From our view on goko these are probably indistinguishable.
You can probably also say that the task "build a good rating system" includes identifying the right requirements. Probably depends on what you see as "the task".

Edit: @example: Here of course you have a mathematical error. There is a well-defined task and you make an error in one step assuming the heuristics solves the problem. But I don't think there is such a well-defined task in "build a rating system". Defining what that actually means could be understood as modelling.

The problem I have here is that they define what it means - they say it is supposed to be predictive of future matches. And actually, by any criteria I've ever heard for a rating system, the math doesn't add up.

By what you're saying, ANY math could be correct.

qmech · « **Reply #16 on:** April 30, 2013, 02:33:45 pm »

There's a cute trick for the river and fence. If you reflect your fence in the river then you're just building rectangular fences, and it's easy to show that the square has the largest area among rectangles with the same perimeter.

theory · « **Reply #17 on:** April 30, 2013, 02:44:28 pm »

Quote from: qmech on April 30, 2013, 02:33:45 pm

There's a cute trick for the river and fence. If you reflect your fence in the river then you're just building rectangular fences, and it's easy to show that the square has the largest area among rectangles with the same perimeter.

I think this one belongs in The Book.

DStu · « **Reply #18 on:** April 30, 2013, 02:44:37 pm »

I think we both agree on what they have done wrong, and if that could possibly called math or model or whatever is quite unimportant.

dsc · « **Reply #19 on:** January 28, 2014, 11:04:44 pm »

I don't understand why they spent time on this at all. It's basically a solved problem. ELO has been around since the 30's; it's used for a bunch of serious competitive games (like Chess, Go, some MLB player ratings, etc). Microsoft's TrueSkill[1], the multiplayer variant, has plenty of solid open-source implementations[2]. I think we all know this because Iso used it.

But like, even if TrueSkill doesn't suit your fancy, skill-estimation rating systems are well-studied and there are tons of them. You really need a good reason to come up with a new one. If their argument was "we decided TrueSkill didn't suit our needs and here's a careful discussion of the differences" then I'd be totally open to blazing some trails. For example, Blizzard didn't use it for the StarCraft 2 ladders[3]. They wanted matching to maintain a sort of camaraderie if you played frequently. So they traded off on precision of skill-estimation in favor of making the humans have more fun. That seems reasonable.

But that's not what we get. We're told that math is hard and everybody should just chill out or go shopping. In fact, we're explicitly told that the system wasn't there to make us happy, which seems suspicious. In fact, it mostly sounds to me like "we didn't do our homework (also we can't keep our servers up haha suckers have some lag!!1)". I, for one, do not find this argument compelling.

[1] http://research.microsoft.com/en-us/projects/trueskill/
[2] https://github.com/search?q=trueskill -- The implementation isotropic used: https://github.com/dougz/trueskill
[3] http://www.sirlin.net/blog/2010/7/24/analyzing-starcraft-2s-ranking-system.html

yed · « **Reply #20 on:** January 29, 2014, 04:08:49 am »

Comparing with Isotropish when it was up, they probably use Trueskill with different parameters (and some bugs). I think they can't say that to avoid licensing problems.

Schneau · « **Reply #21 on:** January 29, 2014, 07:10:54 am »

Since this thread was zombied, might as well make use of it. Here's a quote from the new forum FAQ:

Quote from: http://forum.makingfun.com/showthread.php?4179-Leaderboards-amp-Ratings

What should I do if I'm losing so badly, I just want to end the game?
If it is a two-player game, you should resign the game. You can resign by clicking on the options button and then choosing Resign. Resigning a two-player game is fine etiquette: you admit you have lost the game and your opponent's time is not wasted playing out to a foregone conclusion. On the other hand, simply closing the browser tab or leaving the window so it times out is known as "quitting" and is considered bad sportsmanship. In this case, your opponent must wait for you to move until the system finally determines you are not there and adjudicates the game. In games with three or more players, leaving the game in the middle ruins the dynamics of the game for the remaining players, so please stick it out to the end in those games.

Is "ruins the dynamics of the game" a euphemism for "quits the game entirely because its coded poorly", or do they not know that this is what happens?

WanderingWinder · « **Reply #22 on:** January 29, 2014, 07:22:42 am »

Quote from: dsc on January 28, 2014, 11:04:44 pm

I don't understand why they spent time on this at all. It's basically a solved problem. ELO has been around since the 30's; it's used for a bunch of serious competitive games (like Chess, Go, some MLB player ratings, etc). Microsoft's TrueSkill[1], the multiplayer variant, has plenty of solid open-source implementations[2]. I think we all know this because Iso used it.

Minor point: Elo (which I prefer to capitalizing it ELO, considering it's named after a man) has been around since the 50s. It was first implemented in 1960.

And actually, there have been problems raised with all of these systems, and they look for alternatives for these reasons. Of course, I wouldn't think Goko are really the people to solve them, especially considering where they have been with staffing and how many bigger problems they have. But there are reasons why you might want a different system, to be fair.

factotumjack · « **Reply #23 on:** January 29, 2014, 11:34:02 pm »

Just wanted to mention how much, as a grad student in Statistics, this thread makes me feel at home.

I knew this game attracted a lot of math fans, but this really drove it home for me.

dsc · « **Reply #24 on:** January 30, 2014, 08:50:30 am »

1. raar brains.
2. It's not so much the perfection of the existing solutions as the fact that Goko has a bevy of bigger and more obvious fish to fry. Like, say, keeping the servers up. Or reducing lag somewhere below geologic timescales. As a software engineer, `pip install py-trueskill` sounds pretty great when I've been awake for 3 days patching things with duct-tape and tears.

Dominion Strategy Forum

News:

Author Topic: Goko rating system (Read 20857 times)