Dominion Strategy Forum
Dominion => Dominion Online at Shuffle iT => Dominion General Discussion => Goko Dominion Online => Topic started by: ragingduckd on July 17, 2013, 08:54:38 am
-
Aside from a few outliers like Boodaloo, tThe boards look pretty similar to me. Are there other aberrations I'm missing?
Goko Top 100:
rank | pname | rating
------+--------------------+--------
1 | Stef | 6931
2 | nomnomnom | 6919
3 | hiroki | 6801
4 | Rene Kuroi | 6782
5 | Mic Qsenoch | 6760
6 | SheCantSayNo | 6757
7 | Stealth Tomato | 6668
8 | Wandering Winder | 6640
9 | Tao Chen | 6636
10 | Geronimoo | 6609
11 | Obi Wan Bonogi | 6607
12 | jaybeez | 6555
13 | LESPEUTERE | 6534
14 | ednever | 6524
15 | jog | 6512
16 | Rabid | 6511
17 | HiveMindEmulator | 6502
18 | Fabian | 6488
19 | blueblimp | 6465
20 | awaclus | 6461
21 | Andrew Iannaccone | 6433
22 | flyingkuyt | 6398
23 | iriho | 6379
24 | jhovall_goko | 6370
25 | PitrPicko | 6365
26 | yuuna_tu | 6327
27 | AQUAREAF | 6305
28 | Mike Harris.0001 | 6292
29 | yudai214 | 6278
30 | yed | 6267
31 | Slyfox | 6262
32 | eliegel | 6257
33 | 2.71828..... | 6223
34 | SM.SM | 6222
35 | wicket | 6213
36 | TrickStaR | 6197
37 | Monsieur X | 6189
38 | shark_bait | 6173
39 | manzi | 6172
40 | DominionKing | 6154
41 | Eevee | 6153
42 | nnn | 6152
43 | Robz888 | 6151
44 | fiu | 6148
45 | Tom Collett | 6140
46 | theParty | 6123
47 | minased | 6117
48 | GwinnR | 6081
49 | sami1 | 6080
50 | Perry Green | 6078
51 | Zan | 6077
52 | kenyou2859 | 6072
53 | heron | 6068
54 | dudeabides | 6067
55 | markusin | 6067
56 | faust | 6049
57 | Psyduck | 6044
58 | RTT | 6043
59 | David Hunter | 6041
60 | Lotoreo | 6035
61 | dawn_harbor | 6034
62 | A Drowned Kernel | 6033
63 | Jeebus | 6030
64 | Powerman | 6014
65 | Qvist | 6013
66 | sangatsu | 5999
67 | Watno | 5997
68 | Trojan Horse | 5994
69 | HampusEriksson | 5986
70 | Booyakasha | 5985
71 | WhiteRabbit1981 | 5982
72 | Warrior | 5980
73 | Emeric | 5972
74 | D_dreamer | 5969
75 | 7MiKL7 | 5954
76 | M1 | 5947
77 | andwilk | 5946
78 | microman | 5943
79 | Indur | 5926
80 | astrosity | 5925
81 | Titandrake | 5921
82 | todo_boss | 5916
83 | Kazuhiro Kobayashi | 5898
84 | Александр Логинов | 5883
85 | Egor Kulikov | 5866
86 | Silverfinger | 5862
87 | Lekkit | 5848
88 | mullinKAI | 5839
89 | dscarpac | 5836
90 | moharimo | 5820
91 | kilgoretrout103 | 5802
92 | zporiri | 5801
93 | Magicarp | 5801
94 | heatthespurs | 5800
95 | Dominionologist | 5787
96 | Vampyroteuthis | 5780
97 | Masschy | 5773
98 | hirotashi | 5771
99 | Johannes Dorn | 5754
100 | Polk5440 | 5748
TrueSkill* Top 100:
rank | pname | mu | sigma
--------|------------------------------------+----------+----------
1 | Stef | 60.5675 | 3.9775
2 | Boodaloo | 60.3245 | 4.3395
3 | nomnomnom | 61.6375 | 4.7775
4 | Rene Kuroi | 58.6050 | 4.0585
5 | Mic Qsenoch | 58.1975 | 3.9765
6 | SheCantSayNo | 57.2280 | 3.9360
7 | Stealth Tomato | 57.5755 | 4.0865
8 | Wandering Winder | 57.1345 | 3.9650
9 | Tao Chen | 61.9515 | 5.7080
10 | hiroki | 55.9995 | 3.9410
11 | Rabid | 56.0185 | 4.0165
12 | Geronimoo | 55.5715 | 4.0710
13 | HiveMindEmulator | 55.5580 | 4.3235
14 | Andrew Iannaccone | 54.2025 | 3.9435
15 | Obi Wan Bonogi | 54.0950 | 3.9965
16 | TrickStaR | 54.1255 | 4.0345
17 | jaybeez | 53.9285 | 3.9900
18 | LESPEUTERE | 53.9790 | 4.0120
19 | Fabian | 53.2665 | 4.0085
20 | jog | 53.0150 | 3.9470
21 | PitrPicko | 53.0575 | 3.9795
22 | awaclus | 53.0230 | 3.9855
23 | blueblimp | 53.1740 | 4.0875
24 | ednever | 53.0205 | 4.1355
25 | Qvist | 52.0220 | 3.9885
26 | Perry Green | 51.6780 | 4.0900
27 | manzi | 51.2640 | 3.9720
28 | heron | 51.4135 | 4.0245
29 | flyingkuyt | 51.0310 | 3.9370
30 | Jeebus | 51.1155 | 3.9995
31 | eliegel | 51.0905 | 4.0280
32 | jhovall_goko | 50.8355 | 3.9635
33 | iriho | 50.5480 | 3.9405
34 | Psyduck | 52.0175 | 4.4565
35 | sami1 | 50.6900 | 4.0330
36 | Mike Harris.0001 | 50.5940 | 4.0170
37 | Lekkit | 50.4200 | 3.9610
38 | shark_bait | 50.5130 | 4.0145
39 | Slyfox | 50.2800 | 3.9390
40 | wicket | 50.2745 | 3.9620
41 | Masschy | 50.3730 | 4.0520
42 | zporiri | 50.8940 | 4.2580
43 | yudai214 | 49.7325 | 3.9385
44 | SM.SM | 49.7215 | 3.9380
45 | Warrior | 49.7705 | 4.0085
46 | Monsieur X | 49.7705 | 4.0085
47 | yed | 49.4895 | 3.9150
48 | AQUAREAF | 49.5920 | 3.9750
49 | faust | 49.6090 | 3.9980
50 | kenyou2859 | 49.8205 | 4.0760
51 | First | 55.3440 | 5.9195
52 | theParty | 49.4880 | 3.9730
53 | Robz888 | 49.2785 | 3.9455
54 | GwinnR | 49.3625 | 4.0500
55 | Kevin O'Brien | 51.3655 | 4.7700
56 | Eevee | 48.9310 | 3.9615
57 | Holger | 55.6320 | 6.2050
58 | wsc | 51.4760 | 4.8355
59 | gagnerouperdretelleestlaquestion? | 49.9420 | 4.3300
60 | Powerman | 49.0620 | 4.0590
61 | dudeabides | 48.8435 | 3.9910
62 | florrat | 52.1135 | 5.0945
63 | David Hunter | 48.5990 | 3.9965
64 | Schlippy | 49.4220 | 4.3225
65 | nnn | 48.5010 | 4.0255
66 | markusin | 49.0665 | 4.2170
67 | andwilk | 51.4850 | 5.0315
68 | sangatsu | 48.3820 | 4.0030
69 | Lotoreo | 48.0110 | 3.9415
70 | pâté de campagne | 48.2605 | 4.0425
71 | 2.71828..... | 48.0130 | 3.9660
72 | daniel greif | 71.1600 | 11.6865
73 | Bulec | 48.9830 | 4.2995
74 | RTT | 47.9095 | 3.9550
75 | scott pilgrim | 48.3400 | 4.1020
76 | yuuna_tu | 54.9045 | 6.4195
77 | qmech | 51.0900 | 5.1830
78 | kn1tt3r | 47.4915 | 3.9880
79 | faw | 53.6745 | 6.0830
80 | Pneumatiker | 49.8545 | 4.8275
81 | minased | 47.2105 | 3.9560
82 | Titandrake | 47.2465 | 3.9700
83 | A Drowned Kernel | 47.1445 | 3.9425
84 | p4ddy0d00rs | 47.1735 | 3.9640
85 | Teenage Raistlin | 67.2485 | 10.7155
86 | Tom Collett | 46.8370 | 3.9375
87 | dawn_harbor | 47.1200 | 4.0330
88 | Cruxis | 47.2785 | 4.0865
89 | fiu | 48.6615 | 4.5505
90 | Troninho | 46.8575 | 3.9520
91 | dnkywin | 47.3565 | 4.1335
92 | jsh357 | 47.2225 | 4.1070
93 | WhiteRabbit1981 | 46.6635 | 3.9475
94 | houroku | 46.7355 | 3.9800
95 | Hao Chu | 47.0940 | 4.1345
96 | Watno | 46.3765 | 3.9390
97 | Pex Golder | 47.8285 | 4.4575
98 | CopperCopper | 47.1805 | 4.2640
99 | Zan | 46.1630 | 3.9355
100 | DominionKing | 46.1270 | 3.9265
*TrueSkill implementation details: (anything insane here?)
- Using the package at https://pypi.python.org/pypi/trueskill
- Including only 2-player "Pro" games, excluding games with guests
initial mu=75, sigma=25 (calibrated to mimic the isotropic leaderboard)- draw rate=0.0175 (the empirical average)
ranked by sigma - 3*mu- no rating degradation over time
Edit: Sorry guys, I posted this after a night of insomnia. Some corrections and clarifications:
Ranked by mu - 3*sigma. Thanks for noting this, HME. WW, is mu - sigma a more accepted metric? The Microsoft research page here (http://research.microsoft.com/en-us/projects/trueskill/) said that k=3 was "common."
The initial rating was mu=25, sigma=8.33. I then scaled both up around the mean to give numbers that I could more easily compare to the Iso leaderboard. For each player, the numbers I listed are actually mu' = 25 + 5*(mu-25) and sigma' = 5*sigma. I don't know what made me think this was a good idea or why I thought it was equivalent to mu=75, sigma=25. I'm undoing this.
By "no degradation over time" i meant that I wasn't doing Goko's accelerating increase uncertainty every day thing. I'm sure that some degradation is appropriate, but I suspect that Goko's method is too rapid. I also really doubt that the change should accelerate over time, and it should certainly stop asymptotically.
-
I knew my presence in the top 100 was too good to be true. Thanks for the reality check. :)
-
The TrueSkill leaderboard is clearly better. CLEARLY. They need to change it to this immediately.
-
The TrueSkill leaderboard is clearly better. CLEARLY. They need to change it to this immediately.
but is it "strictly" better ? ;)
-
Aside from a few outliers like Boodaloo, the boards look pretty similar to me. Are there other aberrations I'm missing?
Boodaloo is an alt of mine. I stopped using it so it fell from the Goko leaderboard.
-
im 92 on the goko leaderboard, but 42 on the true skill. that's a pretty big gap, no? (i did however lose a lot of games the past couple of days, so that could explain the big drop ive had on the goko leaderboard recently)
-
Yeah, looks pretty similar. For all the complaints about and issues with the Goko ranking system, it seems to pass the smell test.
I assume you mean mu-3*sigma.
-
im 92 on the goko leaderboard, but 42 on the true skill. that's a pretty big gap, no? (i did however lose a lot of games the past couple of days, so that could explain the big drop ive had on the goko leaderboard recently)
I don't think anyone has been complaining about the accuracy of the Goko leaderboard (and if someone has, I've missed it), it's the swinginess which makes people complain. And it is very swingy indeed, it's not uncommon to go 30-40 ranks up or down on the leaderboard in a couple of hours.
EDIT: Though I assume it's more uncommon if you've played a lot of games. I have less than 500 games I think.
-
Thing with Iso is that it updated once per day, so swingness was harder to catch.
-
Great work!
initial mu=75, sigma=25 (calibrated to mimic the isotropic leaderboard)
This should be 25.
I'm curious as to why the sigma's are way lower than iso's uncertainties.
-
I'm curious as to why the sigma's are way lower than iso's uncertainties.
"no rating degradation over time" should explain a lot of it. Iso increased the uncertainty every day, to prevent the inherent trend of TrueSkill sigmas to converge towards zero (under certain conditions, at least).
-
I'm curious as to why the sigma's are way lower than iso's uncertainties.
"no rating degradation over time" should explain a lot of it. Iso increased the uncertainty every day, to prevent the inherent trend of TrueSkill sigmas to converge towards zero (under certain conditions, at least).
Because iso's uncertainties were 3*sigma rather than just sigma.
-
By 'no rating degradation over time', do you mean no time-based adjustments to mu, or to anything?
-
Looking back over the TrueSkill documentation, I'm still vexed to try to figure out what their Beta factor actually does. As far as I can tell, they're more or less modelling the ratings as normal distributions. And the Tau factor has to do with their updating scheme. But Beta... well, apparently Beta is supposed to be related to how much skill it takes for player A to have an expected win rate of X over player B (with different sources giving X as 80% and 75.6%; I assume 80% is just some chosen round number, but I have no idea how they get 75.6). Like, I think it's just a scale factor? If that's the case, then (updating fanciness aside), we really do just have normal distributions. And if we have normal distributions, we can do some math on that, that's not too bad (the other issue I have is that it's vexingly hard to find their actual math - there are lots of handwaving explanations but relatively few equations; when I go and try to look at their papers on the subject, as best I can tell, beta is the actual standard deviation variable, but this doesn't seem to make sense - mostly, I don't understand why there is both Beta and Sigma or how they are different.
Anyway, if this is Normal, then we can look at some things. For instance, let's look at Stef and Mic Qsenoch. Obviously Stef is higher-rated here. Fine. How often would you expect him to win here?
So here's how the system calculates it. We have the difference in ratings, which is 2.37. Then we need to calculate the standard deviation; a little background knowledge of statistics tells us that the difference of two normal distributions is a normal distribution with mean equal to the difference of means and variance equal to the sum of variances, so sigma_total = (sigma_1^2+sigma_2^2)^(1/2) or in this case, about 5.624. This puts Stef at... .4214 standard deviations above Mic Qsenoch, for an expected winrate of... .6633. Now, I don't know about you, but though Stef is really good, I don't think it's terribly near accurate to say he should be winning that matchup 2/3 of the time....
Edit: Okay, found another paper which makes it look like the Beta factor is some kind of strange shape parameter, the minimum variance such that if both players magically have sigmas of zero, the overall variance is still Beta squared, as the actual variance being used is sigma_1^2 + sigma_2^2+ Beta^2. This is somewhat interesting at least. Assuming that the default 4.166666666666666666667 was used in this implementation, then this affects our spoilered calculations somewhat. The standard deviation would now actually be 6.99958, meaning Stef would have only a .3386 standard deviation advantage, which would get him to an expected winrate of only .6325, which nevertheless still seems high to me
-
^The difference between beta and sigma is that sigma is uncertainty in the estimate of player level and beta is uncertainty in the result of the game given knowledge of the player level. So the probability of Stef beating Mic is the probability that a normal random variable with mean 0 and variance beta is less than their difference in skill, or that N(m1,s1)-N(m2,s2)-N(0,b)>0 = P(N(m1-m2,s1+s2+b)>0, as you have calculated.
The value of beta is based on the randomness of the game, and will be smaller for games whose outcome is determined more by skill than luck. It's hard to derive a good value for beta though, and in reality it may be different for different levels of play. Trueskill cannot account for this.
-
I made some corrections to my original post. Sorry for any confusion.
I don't fully understand the details of TrueSkill or Heungsub Lee's Python implementation, nor do I really plan to. The package is open-source, though, and I'm happy to implement variations.
It would be particularly nice to have a means of calibrating parameters and comparing predictive accuracy, if anyone is willing to contribute that code. I believe WW has described how to do this somewhere.
-
Look at that tremendously sexy #7 on both leaderboards. Look at it real close before I finally quit this game I swear to god I'll manage to pull myself away eventually ooh I bet I can create a Plaza engine on this board.
-
Looking back over the TrueSkill documentation, I'm still vexed to try to figure out what their Beta factor actually does. As far as I can tell, they're more or less modelling the ratings as normal distributions. And the Tau factor has to do with their updating scheme. But Beta... well, apparently Beta is supposed to be related to how much skill it takes for player A to have an expected win rate of X over player B (with different sources giving X as 80% and 75.6%; I assume 80% is just some chosen round number, but I have no idea how they get 75.6). Like, I think it's just a scale factor? If that's the case, then (updating fanciness aside), we really do just have normal distributions. And if we have normal distributions, we can do some math on that, that's not too bad (the other issue I have is that it's vexingly hard to find their actual math - there are lots of handwaving explanations but relatively few equations; when I go and try to look at their papers on the subject, as best I can tell, beta is the actual standard deviation variable, but this doesn't seem to make sense - mostly, I don't understand why there is both Beta and Sigma or how they are different.
Anyway, if this is Normal, then we can look at some things. For instance, let's look at Stef and Mic Qsenoch. Obviously Stef is higher-rated here. Fine. How often would you expect him to win here?
So here's how the system calculates it. We have the difference in ratings, which is 2.37. Then we need to calculate the standard deviation; a little background knowledge of statistics tells us that the difference of two normal distributions is a normal distribution with mean equal to the difference of means and variance equal to the sum of variances, so sigma_total = (sigma_1^2+sigma_2^2)^(1/2) or in this case, about 5.624. This puts Stef at... .4214 standard deviations above Mic Qsenoch, for an expected winrate of... .6633. Now, I don't know about you, but though Stef is really good, I don't think it's terribly near accurate to say he should be winning that matchup 2/3 of the time....
Edit: Okay, found another paper which makes it look like the Beta factor is some kind of strange shape parameter, the minimum variance such that if both players magically have sigmas of zero, the overall variance is still Beta squared, as the actual variance being used is sigma_1^2 + sigma_2^2+ Beta^2. This is somewhat interesting at least. Assuming that the default 4.166666666666666666667 was used in this implementation, then this affects our spoilered calculations somewhat. The standard deviation would now actually be 6.99958, meaning Stef would have only a .3386 standard deviation advantage, which would get him to an expected winrate of only .6325, which nevertheless still seems high to me
The funny thing about you picking me and Stef to say that Trueskill overestimates the likelihood of his winning is that based on your numbers, Trueskill is doing a fantastic job. Stef's record against me is 49-25-1 for a win rate of 0.65333. He is my bane.
-
I made some corrections to my original post. Sorry for any confusion.
As to your question in these revisions, mu-sigma isn't accepted, nor is mu-3*sigma. Ok, I have seen mu-3*sigma a decent bit, but it is 'common' only because Microsoft has pushed it out there to be standard for their TrueSkill, which is, in my estimation, mostly a way of trying to push people to play more, so as to get higher profits.
IF you believe that these are reasonable estimates as to the actual skill of the participants (something which seems quite suspect to me, actually), then mu-3*sigma gives a 99.865% chance that the player's skill is at least at the level. But 2sigma would give a 97.7% chance, 1sigma gives a 84.1% chance. But the more important thing is that these are one-sided - you could just as easily add the sigmas and have very good chances of being beneath - really, I don't see any reason to not just go based on straight-up mu, which is the central number and 'best guess' of the system, if you want a number for rating.
I don't fully understand the details of TrueSkill or Heungsub Lee's Python implementation, nor do I really plan to. The package is open-source, though, and I'm happy to implement variations.
It would be particularly nice to have a means of calibrating parameters and comparing predictive accuracy, if anyone is willing to contribute that code. I believe WW has described how to do this somewhere.
I eventually dug around to a paper which gives, well, not a perfect explanation of the system, but one to where I have a good feel now for the distribution they're using, and I figure I could probably get a pretty good idea of how their updating works if I card to. If there are serious questions, probably someone here can generally answer them.
As for the parameters, I am looking into what curve is going to be best for this, but it is fairly deep on my priority list at the moment, and moreover I am trying to write the program in a very general sense, such that I can use it for many different endeavors (and not just Dominion). For sure I will give an update when I have one, but I suspect this will be months...
One thing to note is that no matter what they are doing with the Beta factor, you still end up with normal distributions, and well, I have my doubts about the normal fitting well here. Eh, maybe it does. But I would at least try higher (relative to the mu and sigma you are using) values of beta. Basically what this does, as described above, is lessen the impact of any particular rating difference.
Oh, and for more evidence that this system is REALLY wrong: Stef vs Mic Q is bad enough (sure, Stef has Mic Q's number so far, so that sort of matches, but I seriously must believe that this is basically luck), but if we take it down to the number 100 guy on the list, we see... Stef favored to win just over 98%(!) of the time!!! I mean, folks, he is good, but he isn't *that* good.
-
IF you believe that these are reasonable estimates as to the actual skill of the participants (something which seems quite suspect to me, actually), then mu-3*sigma gives a 99.865% chance that the player's skill is at least at the level. But 2sigma would give a 97.7% chance, 1sigma gives a 84.1% chance. But the more important thing is that these are one-sided - you could just as easily add the sigmas and have very good chances of being beneath - really, I don't see any reason to not just go based on straight-up mu, which is the central number and 'best guess' of the system, if you want a number for rating.
When I sort by just mu, I get a leaderboard that consists largely of unknowns, people who have played a couple dozen and won against strong players. Maybe that's correct in the sense that they really are the most likely players to win in any given match, but it's not really the leaderboard I want to see. It's just too noisy.
I eventually dug around to a paper which gives, well, not a perfect explanation of the system, but one to where I have a good feel now for the distribution they're using, and I figure I could probably get a pretty good idea of how their updating works if I card to. If there are serious questions, probably someone here can generally answer them.
Cool. Can you link to it?
Oh, and for more evidence that this system is REALLY wrong: Stef vs Mic Q is bad enough (sure, Stef has Mic Q's number so far, so that sort of matches, but I seriously must believe that this is basically luck), but if we take it down to the number 100 guy on the list, we see... Stef favored to win just over 98%(!) of the time!!! I mean, folks, he is good, but he isn't *that* good.
This is compelling, but have you adjusted for my screwup? You need to "unscale" the mu/sigma values I gave above if you're going to use the package's default beta and tau for this calculation. Adjust to mu = (mu'-25)/5 and sigma = sigma/5.
-
Do you have historical logs of the goko ratings? You could compute how often TrueSkill and Goko are right at predicting the winner of games.
-
Do you have historical logs of the goko ratings? You could compute how often TrueSkill and Goko are right at predicting the winner of games.
I'd need a way to map from Goko ratings to win probabilities.
-
Do you have historical logs of the goko ratings? You could compute how often TrueSkill and Goko are right at predicting the winner of games.
I'd need a way to map from Goko ratings to win probabilities.
And I think that by now we all know what the issue with doing that is...
-
Trying to get a probabalistc model out of goko ratings isn't worth the trouble. Just count matches where TrueSkill ranks A > B, and Goko ranks B > A, and then see who actually wins.
-
IF you believe that these are reasonable estimates as to the actual skill of the participants (something which seems quite suspect to me, actually), then mu-3*sigma gives a 99.865% chance that the player's skill is at least at the level. But 2sigma would give a 97.7% chance, 1sigma gives a 84.1% chance. But the more important thing is that these are one-sided - you could just as easily add the sigmas and have very good chances of being beneath - really, I don't see any reason to not just go based on straight-up mu, which is the central number and 'best guess' of the system, if you want a number for rating.
When I sort by just mu, I get a leaderboard that consists largely of unknowns, people who have played a couple dozen and won against strong players. Maybe that's correct in the sense that they really are the most likely players to win in any given match, but it's not really the leaderboard I want to see. It's just too noisy.
Ah, yes, this problem. It's not the most elegant thing ever, but I would suggest throwing out as too unpredictable or 'provisional' or something, everyone over some certain threshold. Glancing at your leaderboard as is, I would suggest perhaps sigma = 5 or sigma = 7.5 as a cutoff.
I eventually dug around to a paper which gives, well, not a perfect explanation of the system, but one to where I have a good feel now for the distribution they're using, and I figure I could probably get a pretty good idea of how their updating works if I card to. If there are serious questions, probably someone here can generally answer them.
Cool. Can you link to it?
http://research.microsoft.com/pubs/74417/NIPS2007_0931.pdf
I'm not sure how many of you will find it useful, as I did, and to how many it will be gobbledy gook.
Oh, and for more evidence that this system is REALLY wrong: Stef vs Mic Q is bad enough (sure, Stef has Mic Q's number so far, so that sort of matches, but I seriously must believe that this is basically luck), but if we take it down to the number 100 guy on the list, we see... Stef favored to win just over 98%(!) of the time!!! I mean, folks, he is good, but he isn't *that* good.
This is compelling, but have you adjusted for my screwup? You need to "unscale" the mu/sigma values I gave above if you're going to use the package's default beta and tau for this calculation. Adjust to mu = (mu'-25)/5 and sigma = sigma/5.
I recalculated this, but the very important thing is, when the system is actually doing the calculations, as you have it implemented, which parameters was it using? Was this 'screw-up' just a change you made in displaying things at the end, or was it something that was in there for any of the calculations. If it was in there for (any of) the calculations, then recalculating is in a way pointless - we want to test what it actually says. But if it's just a thing you did at the end, then... first of all, this would effectively make the change of enlarging beta, as I had suggested. And we get this: Stef over Mic Q, expectation is 54.37%. Stef over #100 guy, expectation is 74.84%. These are at least both plausible enough that I would actually want to look at data before really proclaiming they're wrong, but I do have my suspicions that just any normal curve is going to have the problem of too steep 'shoulders'.
-
But if it's just a thing you did at the end, then... first of all, this would effectively make the change of enlarging beta, as I had suggested. And we get this: Stef over Mic Q, expectation is 54.37%. Stef over #100 guy, expectation is 74.84%. These are at least both plausible enough that I would actually want to look at data before really proclaiming they're wrong, but I do have my suspicions that just any normal curve is going to have the problem of too steep 'shoulders'.
Yup, just a change at the end. The live leaderboard doesn't do any similar nonsense.
Those numbers make a lot more sense.
-
Just stopped in to say that this is some hardcore shit right here. I had no idea that I have been playing against so many math wizards. Keep up the great work and thanks for the isotropish ratings, I like them.
~Scheme From The Bottom
-
Do you have the ability to create a separate ranking for each player when going first and when going second?
Obviously going first and second evens out after a certain number of games, so there's no reason to take that into account in the ranking formula, but it would be an interesting measure of first player advantage.
-
Do you have the ability to create a separate ranking for each player when going first and when going second?
Obviously going first and second evens out after a certain number of games, so there's no reason to take that into account in the ranking formula, but it would be an interesting measure of first player advantage.
I haven't been parsing this out, but I am now (it was maybe 4 extra lines of code). So I'll have data on it as it collects or when I get around to reparsing it out of the old games.