Topic: The WW Rating System (Read 31229 times)

WanderingWinder · « **on:** April 21, 2013, 02:58:53 pm »

Current Rating List

Rank ID Rating Uncertainty Name

  1  13    6345   791 lespeutere
  2   9    4724  1684 qmech
  3   6    3400   796 Watno
  4   1    2944  1616 WanderingWinder
  5  19    1096   779 SheCantSayNo
  6  15     908  2140 A Drowned Kernel
  7  22       0  2500 Slyfox
  8  18       0  2500 snorkelbike
  9  14       0  2500 jonts26
 10  10    -147  2357 ftl
 11   4    -298  2418 Tables
 12  12   -1287  2254 Qvist
 13   2   -2484  2382 michaeljb
 14   8   -2664  2363 Schneau
 15   7   -2998  2454 Kirian
 16  21   -3808  2266 gman314
 17   3   -4529  2065 Twistedarcher
 18   5   -4647  2284 jsh357
 19  17   -4794  2288 Beyond Awesome
 20  20   -4849  1967 Rabid
 21  16   -5907  2392 serakfalcon
 22  11   -8546  1655 markusin

For general info (used to be here), please see the next post.

WanderingWinder · « **Reply #1 on:** April 21, 2013, 02:59:13 pm »

Dissatisfaction with the Goko ratings, as well as (and quite frankly, more importantly than) my own personal interest in rating systems has led to me developing my own system, which I am here calling the WW system. It only works for 2-player games. Hypothetically, there are extensions for more players, BUT I would need to do some math to extend things (well, mostly to check that they are extended correctly), and I would want to keep separate ratings anyway, as the game is very different with different numbers of people! So okay, anyway, I might potentially possibly do that down the road at some indeterminate point in the future. But not now.

I'm not going to divulge all the particular formulae, at least yet. I may want to patent the system, as it's of professional interest to me. Having said that, it's not like I'm doing a ton of stuff, particularly with the version I'm starting out with here, that's super groundbreaking from the conceptual level that players are generally interested in. So I will explain the basic tenets of the system here. I will also show you how to figure out your expected win% in any matchup, since right now that is just going to be based on the logistic curve.

Also, feel free to ask me whatever questions about thing you don't understand (i.e. Why did my this number do this after that happened? Or why is this a bigger difference than that?)

There are three numbers you're going to see that are associated with each player.
ID - this is an integer that lets me identify you in the system. It has no statistical significance.

Rating- this is the big important number to look at. After every game, the system will re-evaluate your rating. If you won, your rating will go up; if you lost it will go down. How MUCH it moves is dependent on a few different things. One is how much the system expected you to win (which is based on your rating). The more it expected you to win, the fewer points you can gain, and the more you can lose, and vice versa. Exactly how much your rating actually moves is dependent on the third number, the uncertainty. If you and your opponent had the same uncertainty, then your rating change is a straight multiplication of the uncertainty with the percentage. If they were different, then your rating will change somewhat less than your uncertainty if there's was higher than yours, or somewhat more than your uncertainty if there's was lower than yours.

Uncertainty - this is an ancillary number. It tells us by how much, on average, you will expect your rating to change in the next game. *It is not a standard deviation* It changes a lot. First, every day it retreats back towards its original value of 2500. The closer you are to that value, the slower this retreat happens. Second, like your rating, it changes after every game. Since each game gives us more information, playing a game will *generally* make the uncertainty decrease. However, the amount of change will depend on how unlikely the system thought the result of the game was - the more likely, the faster your uncertainty decreases, etc. As it's set up right now, it will generally be on the order of a result that the system thinks is 40% likely (or less) in order to actually make your uncertainty go up. Because your uncertainty can go up, it i possible to exceed the original value. Also, it's THEORETICALLY possible for you to get arbitrarily large uncertainties this way, BUT this requires your uncertainty to essentially be continually increasaing, which is something that ought not happen for nearly long enough to see this unless you're trying to engineer it to be so - in which case, you're going to stick out very blatantly as having colluded your results.

Now some notes on the actual numbers you're going to see. First of all, the initial rating, and ergo the 'average' rating (at least, the average at first - this won't necessarily remain the case), is going to be 0. So lots of people are going to have negative ratings - get over it. The rating is just a number, and moreover, the multiplicative scale is entirely irrelevant. Someone who has a rating of 10,000 is not 'twice as good' as someone who has a rating of 5,000. They're just 5,000 points higher. If you actually want to know your expected winning% against any particular person, here's how to calculate it:

Win% = (1.0001^dR)/(1+(1.0001^dR)), where ^ means taking the thing on the left to the power of the thing on the right, and dR is the difference of your ratings - specifically, your rating minus the other guy's.

As you can see, only the difference between two ratings has any significance. Okay, furthermore, this is a pretty wide rating scale. If you take first person advantage to be around 55-45, then that's about 2000 points (roughly). So a few hundred points here and there don't mean much at all. It takes a little over 4000 points to get a 60-40 advantage. Almost 11,000 for a 75-25 split, and 22,000 for 90-10. Nearly 30,000 points to get a 95-5 difference, and over 45,000 to get to 99%. I fully expect that there will be tens of thousands of point gaps very quickly, and they can get reversed pretty quickly as well. One of the nice features of the system is that if you get significantly better, the system will take notice relatively more quickly than many other systems - sustained outperformance of expectations can lead to an acceleration of how fast you gain points. So it's not all that hard to turn things around.

Please note that right now, there is no systemic first player advantage. I could potentially incorporate this on a later pass, though I do have some misgivings, but I would want some data on the size of the thing first.

I should note that the system has never had a practical test - this will be the first. There are also a couple of things that need tweaking (how fast the uncertainty moves, but the biggest actually being the underlying win etimation function - I don't think logistic is terribly, er, right, for Dominion especially, but without more empirical data, it's the best null starting point).

Anyway, here's what I would like for you guys to do:
Sign up for the system. I will give you an ID. Every day (I will try to do this in the mornings), I will update the rating list with the games from the previous day. What that means I need from you guys is the results. In order for this to work in a way that doesn't make me pull my remaining hair out of my head, I am going to need it formatted particularly:
ID1 ID2 Result ReasonForGameEnd Notes

Where ID1 is the ID of the player who went first in the game, ID2 is that of the player who went second, result is either 1, 0.5, or 0, depending on whether the first player won, drew (i.e. it was a tie, you both rejoiced in your shared victory), or lost, respectively (PLEASE NOTE, this is the FIRST PLAYER, i.e. the ID of the guy with ID1, NOT NECESSARILY YOU). ReasonForGameEnd will be a number from 0-7. 0 is for resignation, 1 is for a 3 pile ending, 2 is for provinces running out, 3 is for both a 3-pile ending AND province, 4 is for colonies running out, 5 is for colonies and 3 piles, 6 is for colonies and provinces, and 7 is for colonies, provinces, and 3 piles (A free +1 to whoever figures out the logic here). Notes can include whatever you want to put - the score, the number of turns, who took the last turn, which piles ran out, the kingdom, strategic notes, a link to the log, what lobby the game was played in, whatever you want, SO LONG AS THERE ISN'T A LINE BREAK.

Please note the spaces between the 5 quantities here. They are absolutely essential. You cannot lack these spaces, nor can you have other spaces, EXCEPT in the notes, where anything is fine.

The easiest way I think to do this will be to put the results in a Google Doc Spreadsheet. I've set one of these up, and when you sign up, I can PM you the link to it. Someone with more experience with Google docs can hopefully help me with any issues I have there, as I haven't really used them before. However, unless there are some particular issues with simultaneous editing, it seems to be pretty intuitive, so I don't anticipate any particular problems. Anyway, every day will be its own sheet, which I will lock after that day. I will probably create a few days' worth of sheets in advance, in case I am late getting around to it, and also for everything before I wake up.

PLEASE ALSO NOTE THAT THE ORDERING OF GAMES MATTERS somewhat, so please make sure all of your games are reported in order. It doesn't matter if you post your game after someone else who finished after you, so long as each player's games are in order - throw that order off, and you will get off somehow. Also, please make sure EACH GAME IS IN THERE JUST ONCE. If it's in there twice, I will assume you played twice with the same outcome, and rate both. Now, I would encourage both players to check and make sure the thing is put in correctly, but don't double-rate the thing in the mean-time.

My suggestion would be to put in the IDs as soon as the game is made, to hold your spot for the game, and then fill in the results afterwards.

Feel free to put names in rather than IDs TEMPORARILY (i.e. you want to finish out a string of games with the same guy before putting all the results in all nice-like), but don't leave them in there, as I haven't written a program to parse this information.

I would suggest a particular lobby for these games - say Outpost II, which should otherwise be largely deserted. You can put your ID stamp on the game. But however you want to find matches, I don't care, so long as both players are in the system. Also, these don't have to be Goko games - they can be any games you want, so long as both players are in the system and agree to it.

I may have some bugs some days which stop me from running the ratings immediately. Please try not to fret too much over this - I do have other things to do, and a bug is going to probably take longer than the time I have to devote to this, at least until I can get a break, or more likely, home from work (unless I am lucky and it's incredibly simple). The effect on your rating will be nothing, except your inability to see it for a while.

Also, you can probably expect about a day's worth of lag between signing up and being 'active', i.e. having your name appear in the list and me getting the google doc stuff to you.

WanderingWinder · « **Reply #2 on:** April 21, 2013, 02:59:25 pm »

Reserved for Player List

ID   Name         Goko Username
1   WanderingWinder         WanderingWinder
2   michaeljb         michaeljb
3   Twistedarcher         Twistedarcher
4   Tables         Tables
5   jsh357         JSH357
6   Watno         Watno
7   Kirian         Kirian
8   Schneau         Schneau
9   qmech         qmech
10   ftl         ftl
11   markusin         markusin
12   Qvist         Qvist
13   lespeutere         LESPEUTERE
14   jonts26         jonts
15   A Drowned Kernel         A Drowned Kernel
16   serakfalcon         a mad mongoose - marvelously
17   Beyond Awesome         Beyond Awesome
18   snorkelbike         Snorkelbike
19   SheCantSayNo         SheCantSayNo
20   Rabid         Rabid
21   gman314         Graham Ward
22   Slyfox         Slyfox
23

WanderingWinder · « **Reply #3 on:** April 21, 2013, 02:59:38 pm »

Yesterday's Rating List

Rank ID Rating Uncertainty Name

Code: [Select]

  1  13    6345   774 lespeutere
  2   9    4724  1676 qmech
  3   6    3400   778 Watno
  4   1    1902  1655 WanderingWinder
  5  19     937   772 SheCantSayNo
  6  15     908  2136 A Drowned Kernel
  7  22       0  2500 Slyfox
  8  18       0  2500 snorkelbike
  9  14       0  2500 jonts26
 10  10    -147  2355 ftl
 11   4    -298  2417 Tables
 12  17   -1170  2386 Beyond Awesome
 13  12   -1287  2252 Qvist
 14   2   -2484  2380 michaeljb
 15   8   -2664  2361 Schneau
 16   7   -2998  2454 Kirian
 17  21   -3808  2264 gman314
 18   3   -4529  2061 Twistedarcher
 19   5   -4647  2281 jsh357
 20  20   -4849  1961 Rabid
 21  16   -5907  2391 serakfalcon
 22  11   -8546  1647 markusin

WanderingWinder · « **Reply #4 on:** April 21, 2013, 02:59:52 pm »

Reserved for miscellany

WanderingWinder · « **Reply #5 on:** April 21, 2013, 03:01:50 pm »

Oh, please let us know your goko username, particularly if it's different from your forum name here, when you sign up.

michaeljb · « **Reply #6 on:** April 21, 2013, 03:08:25 pm »

Sign me up; still michaeljb on Goko.

Twistedarcher · « **Reply #7 on:** April 21, 2013, 03:09:31 pm »

I'll sign up - Twistedarcher on Goko. If anyone wants to play games and get in on the rating system, I'll be in Outpost II.

Tables · « **Reply #8 on:** April 21, 2013, 03:15:19 pm »

Logic behind the game end is binary (do I need to explain more than that?)

It sounds interesting. Sign me up.

Also: While negative and positive rating probably mean nothing to us, they can be very unpleasant to a casual gamer. For that reason you might want to consider making the scaling such almost everyone will be positive.

jsh357 · « **Reply #9 on:** April 21, 2013, 03:21:26 pm »

JSH357

Watno · « **Reply #10 on:** April 21, 2013, 03:25:11 pm »

I kinda hoped you would generate the ratings from the logs, which would mean much more data.
This way just games against other people in the ratings will count?

Anyway, sign me up (Watno)

Kirian · « **Reply #11 on:** April 21, 2013, 03:26:56 pm »

Quote from: Tables on April 21, 2013, 03:15:19 pm

Logic behind the game end is binary (do I need to explain more than that?)

I should hope you don't need to. I immediately thought of chmod RWX permissions...

OK, sign me up. I don't know just how many games I'll get into the system as I tend to play at weird times, and I'm quite busy lately and until the end of May at least.

Schneau · « **Reply #12 on:** April 21, 2013, 03:28:29 pm »

Agreed with Watno. But, sign me up anyway (Schneau).

I have to say, one thing that often annoys me in things like rating systems (and other areas like board game points) is when numbers are really big without needing to be. You could divide all ratings by 1000 here and get numbers more in the double digits range, which is easier for people to understand than numbers in the tens of thousands range. You could even keep the actual ratings behind the scenes the same, but just divide them by 1000 when reporting them.

Watno · « **Reply #13 on:** April 21, 2013, 03:30:47 pm »

What happens on a tie? No rating change?
Also, anyone up for games?

WanderingWinder · « **Reply #14 on:** April 21, 2013, 03:32:21 pm »

Quote from: Watno on April 21, 2013, 03:30:47 pm

What happens on a tie? No rating change?
Also, anyone up for games?

On a tie, put it in as 0.5 in the result, and the person with the higher rating will see a rating decrease, whilst the person with the lower rating will see an increase.

Tables · « **Reply #15 on:** April 21, 2013, 03:38:14 pm »

I presume given you've protected only the first sheet, you're okay with us entering our data in the second? I've entered the data for the first game, and once again, my in-sig claim of best dominion player in the world is confirmed.

WanderingWinder · « **Reply #16 on:** April 21, 2013, 03:46:18 pm »

So, about the grabbing from the logs. Well, yeah, but there are a couple issues. One is actually scraping all their data to do that, without asking them or all the individuals, I don't want to run afoul of anyone. Bigger (since I generally presume nobody would really care) is that this is really practically difficult. I mean, I need to write something to go through all their logs (I have absolutely no idea how to access them except in the exact game I'm in, at the moment), scrape out only the ones which are pro 2p games, with no guests, have a database(!) of all those users, and then run through, what, thousands, maybe tens of thousands, of games every day? It's not something that I really have that much free time to sink into, and it's not something that I am actually set up for logistically, either.

Negative Ratings: Yes, beginners may not like them, but I maintain my 'deal with it' attitude. There are a number of reasons. 0 is quite a nice number for the centre, and really any other number is wonky. Also, there are very small practical dealing with how numbers are treated computationally. Perhaps most important, I really have little idea as to the number to add - what should it be, I cannot say. It's always going to be possible to get a negative, unless you truncate, which is something I am VERY averse to. Of course, if I add, oh, a billion points to everyone's score, it's hard to imagine anyone hitting negative. But at this point, things look really weird - most everyone is sitting on 999,9XX,XXX or something.

Dividing by 1000: I prefer precision in my ratings. If someone moves up 40 points here, I imagine they'll want to see some swing larger than 0. Surely people understand the large numbers as well as they do the small ones? I mean, part of the reason I *like* large numbers is that people won't think of it as like a box of crayons or something - these are abstract numbers without tangible relation and should be thought of as such.

WanderingWinder · « **Reply #17 on:** April 21, 2013, 03:46:28 pm »

Quote from: Tables on April 21, 2013, 03:38:14 pm

I presume given you've protected only the first sheet, you're okay with us entering our data in the second? I've entered the data for the first game, and once again, my in-sig claim of best dominion player in the world is confirmed.

Yes, that's the idea.

qmech · « **Reply #18 on:** April 21, 2013, 03:55:56 pm »

Should be interesting. I'm qmech on Goko.

SCSN · « **Reply #19 on:** April 21, 2013, 03:59:28 pm »

Count me in.

Quote

ReasonForGameEnd will be a number from 0-7. 0 is for resignation, 1 is for a 3 pile ending, 2 is for provinces running out, 3 is for both a 3-pile ending AND province, 4 is for colonies running out, 5 is for colonies and 3 piles, 6 is for colonies and provinces, and 7 is for colonies, provinces, and 3 piles (A free +1 to whoever figures out the logic here).

3 = 1+2, 5 = 4+1, 6 = 4+2, 7 = 4+3 = 5+2

Schneau · « **Reply #20 on:** April 21, 2013, 04:04:19 pm »

Quote from: WanderingWinder on April 21, 2013, 03:46:18 pm

Dividing by 1000: I prefer precision in my ratings. If someone moves up 40 points here, I imagine they'll want to see some swing larger than 0. Surely people understand the large numbers as well as they do the small ones? I mean, part of the reason I *like* large numbers is that people won't think of it as like a box of crayons or something - these are abstract numbers without tangible relation and should be thought of as such.

Ah, but I wasn't suggesting that you have to round ratings to integers - you could include as many decimal places as you like if you want people to see rating changes. You are thinking like a mathematician or engineer (not that I blame you - I'm one too!). But, thinking as a psychologist or designer, people are better at understanding smaller numbers than larger numbers. Similar arguments could be made for avoiding negative ratings, but I'll let others make those arguments.

Are you storing ratings as integers or real numbers? It seems like integers would be a bit wonky, since you appear to be using real numbers in the ratings formulas.

ftl · « **Reply #21 on:** April 21, 2013, 04:24:45 pm »

I mean, at the moment the details of reporting doesn't matter, those things will matter when/if the rating system is open to the general public instead of just those who manually report results.

If it ever gets included in a product of some sort, you can manipulate the true rating into a "Display rating" which has all the UI properties necessary, but the meat of the system would still be the behind-the-scenes accurate estimator of skill, so for the moment especially it's probably better to just use that.

I'd be fine with being included in the rating (Goko username ftl), but I don't play that often and so I don't know whether I'll ever end up in a rated game with someone here... I'll check in on Outpost II when I'm looking for a game though.

WanderingWinder · « **Reply #22 on:** April 21, 2013, 04:29:36 pm »

I am keeping decimals (I wouldn't say real numbers exactly - computers can store them, and I'm not using some fancy mathematical software that keeps anything like 'exact' values for things like root 2 or pi), but what I post will be rounded to the nearest integer.

Also, I wasn't suggesting you were suggesting rounding. Indeed, I don't really think people relate to 24.839 any better than 24,839 (or vice versa for those of you in parts of the world that would switch commas and periods in relation to what my U.S. education taught). But to the extent that they do, it is because they relate to physical objects, yes? This is the reason people are better at understanding smaller numbers - it's hard for you or me to have a conception of what a trillion is, because we just don't come across a trillion of anything we can observe in our lives. So we don't have much of an intuitive understanding of it, but we sort of abstractly know, huh, that's a large number. But if it were 10, we have some kind of better understanding, because there are lots of situations where we interact with 10 of something - like fingers.

But the abstract I-don't-really-have-a-conception-of-this is GOOD to my mind, because I don't want people to have an intuitive feel for the numbers, exactly because they AREN'T intuitive things. You can still understand that one number is larger than another even with the vague abstract understanding, and that's all I want them to feel like they have without consulting the formula (which gives you a fairly intuitive percentage), because the ratings are abstractions with no actual direct representation.

There's a similar argument about negative numbers - if everything is positive... like someone with a 1400 chess rating can easily think 'aha! I am half as good as the best players in the world, since my rating is half theirs' - but of course this is wrong, because it's not on a ratio scale. But one of my hopes is that a negative number, or the commonality of such, will stop people from thinking this, because such thinking doesn't really make sense with negatives.

tl;dr I don't want people making those (false) connections!

Schneau · « **Reply #23 on:** April 21, 2013, 04:41:44 pm »

I'm in Outpost II if anyone wants a WW rated game. (I don't have any cards

)

rrenaud · « **Reply #24 on:** April 21, 2013, 05:26:04 pm »

I think DStu has downloaded the all the goko logs and probably has a system set up to do so.

Dominion Strategy Forum

News:

Author Topic: The WW Rating System (Read 31229 times)