Dissatisfaction with the Goko ratings, as well as (and quite frankly, more importantly than) my own personal interest in rating systems has led to me developing my own system, which I am here calling the WW system. It only works for 2-player games. Hypothetically, there are extensions for more players, BUT I would need to do some math to extend things (well, mostly to check that they are extended correctly), and I would want to keep separate ratings anyway, as the game is very different with different numbers of people! So okay, anyway, I might potentially possibly do that down the road at some indeterminate point in the future. But not now.
I'm not going to divulge all the particular formulae, at least yet. I may want to patent the system, as it's of professional interest to me. Having said that, it's not like I'm doing a ton of stuff, particularly with the version I'm starting out with here, that's super groundbreaking from the conceptual level that players are generally interested in. So I will explain the basic tenets of the system here. I will also show you how to figure out your expected win% in any matchup, since right now that is just going to be based on the logistic curve.
Also, feel free to ask me whatever questions about thing you don't understand (i.e. Why did my this number do this after that happened? Or why is this a bigger difference than that?)
There are three numbers you're going to see that are associated with each player.
ID - this is an integer that lets me identify you in the system. It has no statistical significance.
Rating- this is the big important number to look at. After every game, the system will re-evaluate your rating. If you won, your rating will go up; if you lost it will go down. How MUCH it moves is dependent on a few different things. One is how much the system expected you to win (which is based on your rating). The more it expected you to win, the fewer points you can gain, and the more you can lose, and vice versa. Exactly how much your rating actually moves is dependent on the third number, the uncertainty. If you and your opponent had the same uncertainty, then your rating change is a straight multiplication of the uncertainty with the percentage. If they were different, then your rating will change somewhat less than your uncertainty if there's was higher than yours, or somewhat more than your uncertainty if there's was lower than yours.
Uncertainty - this is an ancillary number. It tells us by how much, on average, you will expect your rating to change in the next game. *It is not a standard deviation* It changes a lot. First, every day it retreats back towards its original value of 2500. The closer you are to that value, the slower this retreat happens. Second, like your rating, it changes after every game. Since each game gives us more information, playing a game will *generally* make the uncertainty decrease. However, the amount of change will depend on how unlikely the system thought the result of the game was - the more likely, the faster your uncertainty decreases, etc. As it's set up right now, it will generally be on the order of a result that the system thinks is 40% likely (or less) in order to actually make your uncertainty go up. Because your uncertainty can go up, it i possible to exceed the original value. Also, it's THEORETICALLY possible for you to get arbitrarily large uncertainties this way, BUT this requires your uncertainty to essentially be continually increasaing, which is something that ought not happen for nearly long enough to see this unless you're trying to engineer it to be so - in which case, you're going to stick out very blatantly as having colluded your results.
Now some notes on the actual numbers you're going to see. First of all, the initial rating, and ergo the 'average' rating (at least, the average at first - this won't necessarily remain the case), is going to be 0. So lots of people are going to have negative ratings - get over it. The rating is just a number, and moreover, the multiplicative scale is entirely irrelevant. Someone who has a rating of 10,000 is not 'twice as good' as someone who has a rating of 5,000. They're just 5,000 points higher. If you actually want to know your expected winning% against any particular person, here's how to calculate it:
Win% = (1.0001^dR)/(1+(1.0001^dR)), where ^ means taking the thing on the left to the power of the thing on the right, and dR is the difference of your ratings - specifically, your rating minus the other guy's.
As you can see, only the difference between two ratings has any significance. Okay, furthermore, this is a pretty wide rating scale. If you take first person advantage to be around 55-45, then that's about 2000 points (roughly). So a few hundred points here and there don't mean much at all. It takes a little over 4000 points to get a 60-40 advantage. Almost 11,000 for a 75-25 split, and 22,000 for 90-10. Nearly 30,000 points to get a 95-5 difference, and over 45,000 to get to 99%. I fully expect that there will be tens of thousands of point gaps very quickly, and they can get reversed pretty quickly as well. One of the nice features of the system is that if you get significantly better, the system will take notice relatively more quickly than many other systems - sustained outperformance of expectations can lead to an acceleration of how fast you gain points. So it's not all that hard to turn things around.
Please note that right now, there is no systemic first player advantage. I could potentially incorporate this on a later pass, though I do have some misgivings, but I would want some data on the size of the thing first.
I should note that the system has never had a practical test - this will be the first. There are also a couple of things that need tweaking (how fast the uncertainty moves, but the biggest actually being the underlying win etimation function - I don't think logistic is terribly, er, right, for Dominion especially, but without more empirical data, it's the best null starting point).
Anyway, here's what I would like for you guys to do:
Sign up for the system. I will give you an ID. Every day (I will try to do this in the mornings), I will update the rating list with the games from the previous day. What that means I need from you guys is the results. In order for this to work in a way that doesn't make me pull my remaining hair out of my head, I am going to need it formatted particularly:
ID1 ID2 Result ReasonForGameEnd Notes
Where ID1 is the ID of the player who went first in the game, ID2 is that of the player who went second, result is either 1, 0.5, or 0, depending on whether the first player won, drew (i.e. it was a tie, you both rejoiced in your shared victory), or lost, respectively (PLEASE NOTE, this is the FIRST PLAYER, i.e. the ID of the guy with ID1, NOT NECESSARILY YOU). ReasonForGameEnd will be a number from 0-7. 0 is for resignation, 1 is for a 3 pile ending, 2 is for provinces running out, 3 is for both a 3-pile ending AND province, 4 is for colonies running out, 5 is for colonies and 3 piles, 6 is for colonies and provinces, and 7 is for colonies, provinces, and 3 piles (A free +1 to whoever figures out the logic here). Notes can include whatever you want to put - the score, the number of turns, who took the last turn, which piles ran out, the kingdom, strategic notes, a link to the log, what lobby the game was played in, whatever you want, SO LONG AS THERE ISN'T A LINE BREAK.
Please note the spaces between the 5 quantities here. They are absolutely essential. You cannot lack these spaces, nor can you have other spaces, EXCEPT in the notes, where anything is fine.
The easiest way I think to do this will be to put the results in a Google Doc Spreadsheet. I've set one of these up, and when you sign up, I can PM you the link to it. Someone with more experience with Google docs can hopefully help me with any issues I have there, as I haven't really used them before. However, unless there are some particular issues with simultaneous editing, it seems to be pretty intuitive, so I don't anticipate any particular problems. Anyway, every day will be its own sheet, which I will lock after that day. I will probably create a few days' worth of sheets in advance, in case I am late getting around to it, and also for everything before I wake up.
PLEASE ALSO NOTE THAT THE ORDERING OF GAMES MATTERS somewhat, so please make sure all of your games are reported in order. It doesn't matter if you post your game after someone else who finished after you, so long as each player's games are in order - throw that order off, and you will get off somehow. Also, please make sure EACH GAME IS IN THERE JUST ONCE. If it's in there twice, I will assume you played twice with the same outcome, and rate both. Now, I would encourage both players to check and make sure the thing is put in correctly, but don't double-rate the thing in the mean-time.
My suggestion would be to put in the IDs as soon as the game is made, to hold your spot for the game, and then fill in the results afterwards.
Feel free to put names in rather than IDs TEMPORARILY (i.e. you want to finish out a string of games with the same guy before putting all the results in all nice-like), but don't leave them in there, as I haven't written a program to parse this information.
I would suggest a particular lobby for these games - say Outpost II, which should otherwise be largely deserted. You can put your ID stamp on the game. But however you want to find matches, I don't care, so long as both players are in the system. Also, these don't have to be Goko games - they can be any games you want, so long as both players are in the system and agree to it.
I may have some bugs some days which stop me from running the ratings immediately. Please try not to fret too much over this - I do have other things to do, and a bug is going to probably take longer than the time I have to devote to this, at least until I can get a break, or more likely, home from work (unless I am lucky and it's incredibly simple). The effect on your rating will be nothing, except your inability to see it for a while.
Also, you can probably expect about a day's worth of lag between signing up and being 'active', i.e. having your name appear in the list and me getting the google doc stuff to you.