Dominion Strategy Forum
Archive => Archive => GokoDom => Topic started by: blueblimp on March 22, 2012, 04:41:05 am
-
Update 2012-04-07. (http://forum.dominionstrategy.com/index.php?topic=2031.msg33684#msg33684)
(Edit: Given the revised tournament format, this post is now using the correct tournament structure to predict the chance of advancing to the bracket championship.)
Introduction
Using the same TrueSkill algorithm and parameters that are used to create the leaderboard, I've run some simulations to see who's likely to win their groups. There's no particular reason that this would be accurate (even assuming I didn't make an error), so don't take it too seriously.
Notes
- Ignorentmen, rspeer, and Tonks77 are missing from the current leaderboard, so they get the default 25 +/- 25 rating--meaning their results are not very predictive at all. If any of you have a name with a reasonable rating on it, I can add that to the simulation to get more interesting results.
- More than 1% of the time, there is more than a 2-way tie for the winner of some group. Since there isn't a rule for this case, I don't handle it correctly.
- I randomly sample player skills once before each simulation and hold them fixed throughout the simulation. Hopefully this is a reasonable way to interpret the TrueSkill ratings.
Results
Below are the results from 10,000 simulations. Here's how to read the results. The percentage shows the number of times that player wins their group. "mu" is the mean skill of that player (which is shown on the leaderboard), and "sigma" is the standard deviation of that player's skill (which is one third of the +/- shown on the leaderboard).
Brackets:
Groups:
Ranking:
48%: shark_bait (mu = 50.2, sigma = 3.5)
17%: dghunter79 (mu = 45.6, sigma = 2.4)
16%: Titandrake (mu = 45.0, sigma = 3.4)
14%: O (mu = 44.9, sigma = 2.4)
2%: antony (mu = 37.8, sigma = 2.8)
2%: gorgonstar (mu = 38.0, sigma = 2.5)
1%: ednever (mu = 36.0, sigma = 2.9)
0%: Insomniac-X (mu = 27.1, sigma = 4.0)
Ranking:
35%: tlloyd (mu = 48.3, sigma = 2.8)
20%: perdhapley (mu = 45.9, sigma = 2.1)
17%: Axxle (mu = 44.7, sigma = 3.7)
13%: Jorbles (mu = 43.6, sigma = 3.7)
9%: [MAD] Mergus (mu = 42.1, sigma = 2.8)
3%: yuma (mu = 37.9, sigma = 3.0)
2%: Coheed (mu = 38.0, sigma = 2.6)
1%: Ignorentmen (mu = 25.0, sigma = 8.3)
Ranking:
42%: NinjaBus (mu = 50.5, sigma = 2.6)
21%: michaeljb (mu = 46.9, sigma = 3.6)
12%: mikemike (mu = 44.7, sigma = 3.0)
11%: A_S00 (mu = 44.5, sigma = 2.8)
8%: BJ Penn (mu = 43.7, sigma = 2.0)
6%: mnavratil (mu = 42.5, sigma = 2.9)
1%: CarpeDeezNuts (mu = 35.5, sigma = 2.5)
0%: ebEliminator (mu = 27.6, sigma = 3.2)
Ranking:
76%: RisingJaguar (mu = 55.9, sigma = 3.6)
13%: greatexpectations (mu = 46.1, sigma = 3.1)
10%: Mean Mr Mustard (mu = 45.4, sigma = 2.5)
1%: Brando Commando (mu = 38.1, sigma = 2.9)
0%: ^_^_^_^ (mu = 33.9, sigma = 3.0)
0%: rspeer (mu = 25.0, sigma = 8.3)
0%: Nicki Menagerie (mu = 30.1, sigma = 2.6)
0%: AHoppy (mu = 14.5, sigma = 3.8)
Groups:
Ranking:
40%: Geronimoo (mu = 51.5, sigma = 3.0)
36%: Rabid (mu = 50.9, sigma = 2.6)
11%: lespeutere (mu = 45.6, sigma = 2.4)
9%: MrEevee (mu = 44.9, sigma = 2.6)
2%: Dubdubdubdub (mu = 39.1, sigma = 3.9)
1%: luliin (mu = 37.0, sigma = 3.7)
0%: Mangsky (mu = 27.0, sigma = 3.0)
0%: Nucleus (mu = 26.4, sigma = 3.0)
Ranking:
61%: Fabian (mu = 55.9, sigma = 2.7)
27%: Brathannes (mu = 51.2, sigma = 2.5)
5%: StickaRicka (mu = 44.3, sigma = 3.3)
3%: Lekkit (mu = 42.6, sigma = 2.7)
3%: JanErik (mu = 41.9, sigma = 3.8)
1%: ugasoft (mu = 37.1, sigma = 3.7)
0%: Tonks77 (mu = 25.0, sigma = 8.3)
0%: ArjanB (mu = 29.4, sigma = 3.5)
Ranking:
59%: WanderingWinder (mu = 51.3, sigma = 2.8)
15%: Robz888 (mu = 44.8, sigma = 2.3)
9%: Mic Qsenoch (mu = 42.6, sigma = 2.6)
9%: blueblimp (mu = 42.4, sigma = 2.9)
5%: Masticore (mu = 39.7, sigma = 3.5)
2%: pops (mu = 37.7, sigma = 3.2)
1%: zxcvbn2 (mu = 33.5, sigma = 3.5)
0%: angrybirds (mu = 17.4, sigma = 6.8)
Ranking:
66%: jonts26 (mu = 54.3, sigma = 3.1)
14%: DG (mu = 45.2, sigma = 5.2)
6%: Kirian (mu = 42.7, sigma = 3.6)
6%: Graystripe77 (mu = 42.5, sigma = 3.6)
5%: The Real ~~**Young Nick**~~ (mu = 41.6, sigma = 3.7)
3%: andwilk (mu = 40.6, sigma = 3.6)
0%: elahrairah13 (mu = 33.4, sigma = 2.7)
0%: fit1one (mu = 22.5, sigma = 2.9)
-
So just to understand: TS gives you a probability that A wins against B, you used this to sample who is winning, simulated all the matches and repeated this 10.000 times?
Do you still have the average number of points?
edit: Concerning the unknows: On the player pages on cr, it should be possible to find out the day of the last game, and on http://bggdl.square7.ch/leaderboard/leaderboard-2011-11-20.html you should find the leaderboard of this day. If you want you can adjust the variance, I don't know how much it increses per day, but that should be easy to find out...
-
I was quite busy over the last week and so did not play any game for the last ~1 week. But I think tonight I will play some games and so I will be back on the leaderboard tomorrow.
One more suggestion: Maybe you can count in the results of the previous games between 2 players in the same group?
-
To improve the simulation, you should let the trueskill parameters drift during a given simulation. As an underrated player overperforms, you should think you initially underestimated their skill. Call it the RisingJaguar effect if you want. This will tend to make the predictions less strong, giving more favor to underdogs and less favor to the top rated players.
Fivethirtyeight does a great job explaining it here, http://fivethirtyeight.blogs.nytimes.com/2011/03/14/how-we-made-our-n-c-a-a-picks/#
Suppose that Duke is facing St. Peter’s. What is the likelihood that St. Peter’s upsets Duke?
Actually, it depends on what round they’re playing in. Our model thinks if St. Peter’s played Duke in the first round, it would have only about a 3 percent chance of winning the game, before accounting for any geographic advantage.
But suppose, instead, that Duke and St. Peter’s were on opposite sides of the bracket, and somehow meet in the national championship game. What odds would St. Peter’s have then?
From the standpoint of Bayesian probability, they’d be quite a lot better (about 8 percent rather than 3 percent, our model thinks). This is because Duke and St. Peter’s meeting in the national championship is conditional on something else happening — namely, St. Peter’s winning five consecutive tournament games to get the championship, each as a heavy underdog. If you looked up the power rating for St. Peter’s after these fives games had been played, it would be quite a bit higher than it is now. Duke’s rating might also be a little bit higher. But because Duke would be favorites in most of these games, the effect would not be so profound.
Our model accounts for these conditional probabilities, and as a result, has a somewhat more optimistic view of lower-seeded teams in later rounds. To take an extreme case, it thinks that Princeton is “only” about a 40,000-to-1 underdog to win the tournament, whereas without this adjustment it would have considered them as a 300,000-to-1 underdog. Alternatively, it thinks that a No. 9 seed, Illinois, has an 0.7 percent chance to win the tournament, rather than 0.4 percent without this adjustment.
-
And also, great job, even if you don't change anything :)
-
I'm not sure that this is the same.
In your example, the "real world" gives some new input. Somebody might be better than he his rated, and thus the reality corrects you by letting him win. It might have been luck, you can't tell, but it also might have been skill.
In the simulations, nobody wins because he has more skill than he was accounted for, everything is just luck by design.
-
That's okay, I am sure that this is the same ;).
You are trying to simulate the real world. Just like people thought RJ was a lot better than his rating gave him credit for once he reached the semis, or St. Peter's when it (hypothetically) reaches the NCAA finals, the quality of the simulation results would be improved by accounting for the conditional probabilities*.
*This assumes the initial trueskills are accurate, if indeed they are too clumped and apriori understate the probability of higher rated players winning, then this will make that effect worse. Of course, there is a symetry, if they are too spread, this will help.
-
rrenaud, I don't understand how conditional probabilities can come into play if we assume the initial trueskills are accurate. If the skill level estimate is correct, why would there be a need to adjust it?
-
The ratings can accurate and still have some uncertainty in them.
Just like I think the leaderboard two days ago is going to be a pretty good predictor of tomorrow's performance (it's accurate), I'd still rather use yesterday's leaderboard (it's probably more accurate.)
-
Ok then we're using accurate in different ways I guess. Thanks.
-
Also, another way of thinking about this:
What is the chance that someone good enough to make it to the finals of this tournament is going to be good enough to win it?
Well, probably, pretty high.
What is the chance that a level 10 can win the tournament? Pretty damn low!
But what is the chance that a level 10 who made it to the finals can win the tournament? It's not all that bad. Sure, they probably aren't as good as a level 42 who is in the finals, but man, there is (almost) no way that they are really a level 10, after all, they made it to the finals! You just told me you think someone who is good enough to make it to the finals could very well win it.
-
I've decided that now that I've seen I have a 0% chance of winning, I'm gonna win the tournament. Before I wasn't motivated but now, now I'm an underdog. And never underestimate the power of the underdog.
-
I've noticed some people play a lot better in these tournaments than when you play them in a random auto-match.
-
I never ever thought I'd see 538 quoted here on f.ds. We don't even have a politics flamewar forum! But I always forget Nate does other predictions too.
-
So just to understand: TS gives you a probability that A wins against B, you used this to sample who is winning, simulated all the matches and repeated this 10.000 times?
Do you still have the average number of points?
edit: Concerning the unknows: On the player pages on cr, it should be possible to find out the day of the last game, and on http://bggdl.square7.ch/leaderboard/leaderboard-2011-11-20.html you should find the leaderboard of this day. If you want you can adjust the variance, I don't know how much it increses per day, but that should be easy to find out...
Yes, that's how it works. In fact, TS additionally gives a probability of tying, and I use that too. I didn't track the average number of points in that run, unfortunately. I'll re-run the simulation later today (incorporating the most recent historical leaderboard data I can find).
I was quite busy over the last week and so did not play any game for the last ~1 week. But I think tonight I will play some games and so I will be back on the leaderboard tomorrow.
One more suggestion: Maybe you can count in the results of the previous games between 2 players in the same group?
Using additional information (like head-to-head records) in the simulation would require a lot more modelling effort. I'll consider it though.
One serious modelling omission is that I didn't have any way to account for the effect of first-player advantage. This is significant enough that it's the first thing I'd try to correct.
To improve the simulation, you should let the trueskill parameters drift during a given simulation. As an underrated player overperforms, you should think you initially underestimated their skill. Call it the RisingJaguar effect if you want. This will tend to make the predictions less strong, giving more favor to underdogs and less favor to the top rated players.
Fivethirtyeight does a great job explaining it here, http://fivethirtyeight.blogs.nytimes.com/2011/03/14/how-we-made-our-n-c-a-a-picks/#
Suppose that Duke is facing St. Peter’s. What is the likelihood that St. Peter’s upsets Duke?
Actually, it depends on what round they’re playing in. Our model thinks if St. Peter’s played Duke in the first round, it would have only about a 3 percent chance of winning the game, before accounting for any geographic advantage.
But suppose, instead, that Duke and St. Peter’s were on opposite sides of the bracket, and somehow meet in the national championship game. What odds would St. Peter’s have then?
From the standpoint of Bayesian probability, they’d be quite a lot better (about 8 percent rather than 3 percent, our model thinks). This is because Duke and St. Peter’s meeting in the national championship is conditional on something else happening — namely, St. Peter’s winning five consecutive tournament games to get the championship, each as a heavy underdog. If you looked up the power rating for St. Peter’s after these fives games had been played, it would be quite a bit higher than it is now. Duke’s rating might also be a little bit higher. But because Duke would be favorites in most of these games, the effect would not be so profound.
Our model accounts for these conditional probabilities, and as a result, has a somewhat more optimistic view of lower-seeded teams in later rounds. To take an extreme case, it thinks that Princeton is “only” about a 40,000-to-1 underdog to win the tournament, whereas without this adjustment it would have considered them as a 300,000-to-1 underdog. Alternatively, it thinks that a No. 9 seed, Illinois, has an 0.7 percent chance to win the tournament, rather than 0.4 percent without this adjustment.
I believe this is already accounted for in the TrueSkill model. There are two reasons the simulation could be wrong about the player's true skill (under the TrueSkill model):
- Uncertainty: The mean skill on the leaderboard was not a good estimate of the player's true skill, due to not enough (recent) data.
- Drift: The player's true skill changes during the course of the tournament.
Uncertainty is accounted for at the beginning of each simulation, by taking the player's TrueSkill rating and using it to randomly select a true skill. This allows for dark horses--the "RisingJaguar effect". It shouldn't be necessary to account for this further, unless we doubt the accuracy of the leaderboard's model.
I ignore drift in the simulation. My justification is that it seems unlikely that a player's skill will significantly increase or decrease during the relatively short duration of the tournament. Besides, I don't trust the Isotropic drift model very much. That said, I'll consider implementing drift.
Digression about 538's model...
In 538's case, their simulation assumes that a pre-calculated power rank gives the true skill. That means their model does not have uncertainty built in, so they have to account for this somehow. I disagree with their method for accounting for uncertainty, because rather than perturbing the initial true skill, they are updating the true skill based on samples generated using that true skill, which is odd.
To see why this weird, imagine a team goes on a win streak. If I observe this, I would revise my belief about the team's skill, as I likely underrated them. However, I don't think that the team's true skill actually improved because of the win streak--I was simply wrong about their true skill before the streak, and their true skill could drift in any direction over the course of the streak. What 538's model does is assume that wins make teams better and losses make teams worse, which is a form of the hot-hand fallacy (http://en.wikipedia.org/wiki/Hot-hand_fallacy).
Effectively, what 538's skill updating does is to introduce a form of skill drift that is caused by game results. I wouldn't be surprised if this does a decent job at accounting for skill drift, despite the questionable choice of causation. I doubt it would do a good job at accounting for initial uncertainty, though. The reason is that, under this system, teams would tend to perform very close to their power rank in their first-round series. This ignores the dark horse effect of a low seed upsetting a high seed in the first round.
Edit: To be clear, despite this criticism, 538's predictions for the NCAA are sure to be far more accurate than my predictions for IsoDom. The reason is that they tuned a model for the purpose of predicting game outcomes, whereas I'm just using TrueSkill with Isotropic parameters, and TrueSkill only has game outcome prediction as an indirect goal. That, plus Nate Silver is incomparably better at statistics than I am.
-
I must be missing something here... You've simulated this 10,000 times to conclude that the people higher up on the Isotropic leaderboard are more likely to win their groups than the people lower down? I don't really understand why the 10,000 simulations were needed...
-
I must be missing something here... You've simulated this 10,000 times to conclude that the people higher up on the Isotropic leaderboard are more likely to win their groups than the people lower down? I don't really understand why the 10,000 simulations were needed...
Well, one thing is that the leaderboard doesn't tell you the chance of each player making it out of the group. It's somewhat interesting to me that I apparently have a 9% chance to advance from my group (under the assumptions of the simulation).
It seems to me that you're objecting to the idea of simulating game/sport tournament results in general... which is fair enough, but in that case I don't understand your confusion.
-
I'll try to trick your guys intuition into believing the right answer. Consider the case of an incoming contender with no previous data. You just have that wide, mu = 25, big sigma^2 initial distribution for the players skill.
Do you still want to ignore the information of her getting into round X when making predictions about him getting into round X + 1?
At least for this case, you agree that taking the intermediate tournament results into account will help your prediction, right?
-
I'll try to trick your guys intuition into believing the right answer. Consider the case of an incoming contender with no previous data. You just have that wide, mu = 25, big sigma^2 initial distribution for the players skill.
Do you still want to ignore the information of her getting into round X when making predictions about him getting into round X + 1?
At least for this case, you agree that taking the intermediate tournament results into account will help your prediction, right?
It depends what you mean by "intermediate tournament results". Using real intermediate tournament results will help, yes, because they give us more information about the true skill of the player. Using fake intermediate tournament results will not help at all, because the only information contained in there is information we already have.
The key is that there are really two sources of randomness here:
- Our uncertainty about the true skill of the player.
- The luck involved in performance in individual games (some combination of things like shuffle luck, off days, etc.).
It seems to me that the most reasonable way to deal with this is as follows:
- At the beginning of the simulation, from each player's mean skill mu and standard deviation sigma, generate a true skill s for that player. This simulates our uncertainty about the world as it exists now.
- In each game, using the true skills s we generated and using the fixed standard deviation parameter beta=25, generate performance values t to determine the result of the game. This simulates our uncertainty about the future.
Then there are two scenarios for how a low-rated player might beat a high-rated player in an early round:
- The first scenario is that the low-rated player was actually better than we thought. In other words, in this simulation, mu was an underestimate of our generated value s. In this case, we don't need to update s after the series, because it is already large (and that's why the player won the series).
- The second scenario is that the low-rated player got good shuffle luck, the high-rated player had an off day, or similar situations. In other words, the performance values of the low-rated player during the series happened to be high, or the performance values of the high-rated player during the series happened to be low. In this case, we shouldn't update s after the series, because there's no reason to think these factors will influence later games.
Either way, we don't update s based on game results. (We might choose to update s based on some model of skill drift, but that's a different issue.)
-
Here's another way of putting it (from a friend):
In a simulation, we choose reality. In real life, we don't know reality, so we have a belief, which might be wrong. But the parameter we choose in a simulation can't be wrong, because it's reality, within the context of the simulation.
-
Now that I realize I was mistaken about the tournament structure, here are the results if we are just interested in how likely it is for each player to reach the 4-player series in their group. (I'm not confident enough to simulate the outcomes of 4-player games, since I think for most players, their ratings are mostly based on their 2-player performance.)
Brackets:
Groups:
Ranking:
91% (avg pts 60): shark_bait
77% (avg pts 55): dghunter79
74% (avg pts 54): O
72% (avg pts 54): Titandrake
31% (avg pts 46): gorgonstar
29% (avg pts 45): antony
23% (avg pts 44): ednever
2% (avg pts 31): Insomniac-X
Ranking:
87% (avg pts 58): tlloyd
76% (avg pts 55): perdhapley
64% (avg pts 53): Axxle
63% (avg pts 52): Jorbles
51% (avg pts 50): [MAD] Mergus
28% (avg pts 45): yuma
27% (avg pts 45): Coheed
5% (avg pts 30): Ignorentmen
Ranking:
90% (avg pts 59): NinjaBus
71% (avg pts 54): michaeljb
68% (avg pts 53): mikemike
62% (avg pts 52): A_S00
49% (avg pts 50): BJ Penn
48% (avg pts 49): mnavratil
11% (avg pts 41): CarpeDeezNuts
1% (avg pts 30): ebEliminator
Ranking:
99% (avg pts 68): RisingJaguar
86% (avg pts 58): greatexpectations
83% (avg pts 57): Mean Mr Mustard
65% (avg pts 53): rspeer
44% (avg pts 49): Brando Commando
17% (avg pts 43): ^_^_^_^
6% (avg pts 38): Nicki Menagerie
0% (avg pts 22): AHoppy
Groups:
Ranking:
95% (avg pts 62): Geronimoo
92% (avg pts 61): Rabid
81% (avg pts 56): lespeutere
73% (avg pts 54): MrEevee
29% (avg pts 46): Dubdubdubdub
28% (avg pts 46): luliin
1% (avg pts 32): Nucleus
1% (avg pts 31): Mangsky
Ranking:
98% (avg pts 65): Fabian
92% (avg pts 59): Brathannes
61% (avg pts 50): StickaRicka
53% (avg pts 49): Lekkit
46% (avg pts 48): JanErik
28% (avg pts 44): Tonks77
20% (avg pts 42): ugasoft
2% (avg pts 32): ArjanB
Ranking:
96% (avg pts 63): WanderingWinder
75% (avg pts 55): Robz888
66% (avg pts 53): blueblimp
65% (avg pts 53): Mic Qsenoch
44% (avg pts 49): Masticore
33% (avg pts 47): pops
21% (avg pts 44): zxcvbn2
0% (avg pts 22): angrybirds
Ranking:
98% (avg pts 66): jonts26
73% (avg pts 54): DG
55% (avg pts 50): The Real ~~**Young Nick**~~
53% (avg pts 50): Kirian
52% (avg pts 49): Graystripe77
51% (avg pts 49): andwilk
17% (avg pts 42): elahrairah13
0% (avg pts 28): fit1one
-
So basically Fit1one is screwed?
-
I could get used to this so called RisingJaguar Effect.
-
I could get used to this so called RisingJaguar Effect.
Don't think you'll enjoy it ever again... Well, maybe right after the isotropic replacement is put in place, we all will
-
I could get used to this so called RisingJaguar Effect.
Don't think you'll enjoy it ever again... Well, maybe right after the isotropic replacement is put in place, we all will
He might not get to enjoy it directly, but it's kinda nice to end up with it named after him. Dark horse low seed wins his division, then two months later is riding the top of the leaderboard? I think the RJ Effect works.
-
Thanks for running these sims. I was wondering what my odds were. I think the calculated odds are about right. I've been getting quite an education from the articles and posts on this board. But now I'm playing against the people writing those articles. This tournament is going to be quite a challenge for me.
-
I could get used to this so called RisingJaguar Effect.
Don't think you'll enjoy it ever again... Well, maybe right after the isotropic replacement is put in place, we all will
He might not get to enjoy it directly, but it's kinda nice to end up with it named after him. Dark horse low seed wins his division, then two months later is riding the top of the leaderboard? I think the RJ Effect works.
As I was following the DS Championships after I got knocked out, I thought it was very fitting that his name was Rising Jaguar.
-
I could get used to this so called RisingJaguar Effect.
Don't think you'll enjoy it ever again... Well, maybe right after the isotropic replacement is put in place, we all will
He might not get to enjoy it directly, but it's kinda nice to end up with it named after him. Dark horse low seed wins his division, then two months later is riding the top of the leaderboard? I think the RJ Effect works.
As I was following the DS Championships after I got knocked out, I thought it was very fitting that his name was Rising Jaguar.
I have heard this a few times :)
I wonder how many people on this forum even knows where RisingJaguar comes from...
-
I have heard this a few times :)
I wonder how many people on this forum even knows where RisingJaguar comes from...
Even if they didn't, google provides!
It will never beat Hadouken! though!
Or even my personal favourite 'GET OVER HERE'
-
I wonder how many people on this forum even knows where RisingJaguar comes from...
Being a Street Fighter player, I thoght about Adon the first time I saw the name. :)
-
(http://images4.wikia.nocookie.net/__cb20100824130334/streetfighter/images/8/81/Rising_jaguar.jpg) (http://streetfighter.wikia.com/index.php?title=Rising_Jaguar&image=Rising_jaguar-jpg)
-
Antony is really showing the RisingJaguar effect is still here :)
#ShamelessPlug
-
Update!
Notes
- This simulation was run with the most recent leaderboard, since that is the most accurate available estimate of player skill. (The one exception is ^_^_^_^, who isn't on the most recent leaderboard, so his skill comes from an old leaderboard.) This means that change in win chance comes from some combination of tournament results and change in rating.
- 3-way ties are still handled incorrectly, but it happens infrequently enough that it shouldn't matter at this level of accuracy.
Current Standings
Players are ranked by points-per-game. PPG means points-per-game, PTS means points, and GMS means games.
PPG PTS GMS NAME
Brackets:
Groups:
Ranking:
1.5 31 21 antony
1.4 20 14 O
1.3 18 14 dghunter79
1.0 14 14 shark_bait
0.9 13 14 ednever
0.9 12 14 Tmoiy
0.6 9 14 Insomniac-X
0.4 9 21 Titandrake
Ranking:
1.2 25 21 [MAD] Mergus
1.1 16 14 Axxle
1.1 16 14 Jorbles
1.1 15 14 ignorantmen
0.9 13 14 yuma
0.9 13 14 Coheed
0.9 12 14 tlloyd
0.8 16 21 perdhapley
Ranking:
1.4 20 14 NinjaBus
1.3 18 14 michaeljb
1.1 16 14 ebEliminator
1.0 14 14 mikemike
1.0 14 14 A_S00
0.9 12 14 BJ Penn
0.7 10 14 CarpeDeezNuts
0.6 8 14 mnavratil
Ranking:
1.7 12 7 Mean Mr Mustard
1.4 20 14 RisingJaguar
1.3 18 14 rspeer
0.9 12 14 Brando Commando
0.9 12 14 greatexpectations
0.9 6 7 Nicki Menagerie
0.3 2 7 AHoppy
0.3 2 7 ^_^_^_^
Groups:
Ranking:
1.3 18 14 lespeutere
1.1 16 14 MrEevee
1.1 16 14 Rabid
1.1 8 7 Dubdubdubdub
0.9 12 14 Geronimoo
0.9 12 14 Nucleus
0.9 6 7 luliin
0.7 10 14 Mangsky
Ranking:
1.3 18 14 Lekkit
1.1 24 21 Fabian
1.1 16 14 StickaRicka
1.1 23 21 ugasoft
1.0 14 14 JanErik
0.9 13 14 Tonks77
0.8 11 14 AngBoy
0.5 7 14 ArjanB
Ranking:
1.4 20 14 Mic Qsenoch
1.4 10 7 WanderingWinder
1.1 16 14 Robz888
1.1 16 14 Masticore
1.0 20 21 blueblimp
0.9 13 14 angrybirds
0.7 15 21 zxcvbn2
0.3 2 7 Voltaire
Ranking:
1.6 22 14 jonts26
1.1 8 7 DG
1.1 16 14 Tha Trillest Young Nick
1.0 14 14 Kirian
1.0 14 14 Graystripe77
0.9 12 14 andwilk
0.8 16 21 fit1one
0.7 10 14 elahrairah13
Revised Predictions
Brackets:
Groups:
Ranking:
41% (avg pts 61): O
23% (avg pts 58): shark_bait
22% (avg pts 58): antony
13% (avg pts 56): dghunter79
1% (avg pts 43): Tmoiy
0% (avg pts 42): ednever
0% (avg pts 40): Titandrake
0% (avg pts 30): Insomniac-X
Ranking:
28% (avg pts 54): [MAD] Mergus
23% (avg pts 53): tlloyd
19% (avg pts 52): Jorbles
19% (avg pts 52): Axxle
3% (avg pts 45): Coheed
3% (avg pts 42): ignorantmen
2% (avg pts 46): perdhapley
2% (avg pts 44): yuma
Ranking:
72% (avg pts 63): NinjaBus
17% (avg pts 54): michaeljb
4% (avg pts 49): mikemike
4% (avg pts 50): A_S00
2% (avg pts 48): BJ Penn
0% (avg pts 42): CarpeDeezNuts
0% (avg pts 41): ebEliminator
0% (avg pts 40): mnavratil
Ranking:
77% (avg pts 70): RisingJaguar
20% (avg pts 63): Mean Mr Mustard
1% (avg pts 54): greatexpectations
1% (avg pts 51): rspeer
0% (avg pts 48): Brando Commando
0% (avg pts 41): ^_^_^_^
0% (avg pts 39): Nicki Menagerie
0% (avg pts 23): AHoppy
Groups:
Ranking:
51% (avg pts 63): Rabid
22% (avg pts 59): Geronimoo
18% (avg pts 57): lespeutere
7% (avg pts 54): MrEevee
1% (avg pts 45): Dubdubdubdub
0% (avg pts 41): luliin
0% (avg pts 35): Nucleus
0% (avg pts 34): Mangsky
Ranking:
44% (avg pts 61): Fabian
26% (avg pts 58): Lekkit
13% (avg pts 56): JanErik
9% (avg pts 54): StickaRicka
6% (avg pts 54): ugasoft
1% (avg pts 48): Tonks77
0% (avg pts 28): AngBoy
0% (avg pts 30): ArjanB
Ranking:
54% (avg pts 62): WanderingWinder
24% (avg pts 58): Robz888
18% (avg pts 57): Mic Qsenoch
3% (avg pts 50): Masticore
2% (avg pts 51): blueblimp
0% (avg pts 34): Voltaire
0% (avg pts 37): angrybirds
0% (avg pts 38): zxcvbn2
Ranking:
69% (avg pts 65): jonts26
18% (avg pts 56): DG
9% (avg pts 54): Tha Trillest Young Nick
2% (avg pts 48): Kirian
2% (avg pts 49): Graystripe77
0% (avg pts 43): andwilk
0% (avg pts 40): elahrairah13
0% (avg pts 33): fit1one
Missing Games
Some games from Week 1 and Week 2 are still missing. (Week 3 has tons of missing games since it's still in progress.)
Week 1 missing games:
WanderingWinder vs. Voltaire
Week 2 missing games:
AHoppy vs. Mean Mr. Mustard
Nicki Menagerie vs. ^_^_^_^
Dubdubdubdub vs. luliin
Greystripe77 vs. DG
Data
Since the tournament result data is not in a very handy format, I mostly created my own by going through the posts. (So there may be errors.) In particular, I made sure to use exact isotropic usernames, because otherwise the simulator gets confused. I'm putting the data here in case anyone else wants to use it.
Week 1:
Week One Match-ups:
Pacific Group:
Tmoiy vs. ednever 2-4-1
shark_bait vs. antony 2-5-0
dghunter79 vs. Insomniac-X 5-0-2
O vs. Titandrake 6-1-0
Mountain Group
[MAD] Mergus vs. yuma 4-2-1
Axxle vs. tlloyd 4-3-0
Jorbles vs. perdhapley 5-2-0
Coheed vs. ignorantmen 2-4-1
Central Group:
BJ Penn vs. NinjaBus 2-5-0
ebEliminator vs. mikemike 2-3-2
CarpeDeezNuts vs. A_S00 2-5-0
michaeljb vs. mnavratil 5-2-0
Eastern Group:
Mean Mr Mustard vs. ^_^_^_^ 6-1-0
rspeer vs. AHoppy 6-1-0
Brando Commando vs. Nicki Menagerie 4-3-0
RisingJaguar vs. greatexpectations 5-2-0
Eurasia Group:
Geronimoo vs. lespeutere 3-4-0
MrEevee vs. Rabid 3-4-0
luliin vs. Nucleus 3-4-0
Dubdubdubdub vs. Mangsky 4-3-0
Central Europe Group:
Fabian vs. JanErik 4-3-0
AngBoy vs. ArjanB 4-2-1
Tonks77 vs. ugasoft 3-3-1
Lekkit vs. StickaRicka 5-2-0
Eastern America Group
WanderingWinder vs. Voltaire
zxcvbn2 vs. angrybirds 2-4-1
blueblimp vs. Mic Qsenoch 3-4-0
Robz888 vs. Masticore 4-3-0
Atlantic Group:
Tha Trillest Young Nick vs. Graystripe77 4-3-0
elahrairah13 vs. Kirian 3-4-0
andwilk vs. DG 3-4-0
fit1one vs. jonts26 6-1-0
Week 2:
Week Two Match-ups:
Earthquake Group
antony vs. ednever 5-2-0
Insomniac-X vs. Tmoiy 3-3-1
Titandrake vs. shark_bait 2-5-0
O vs. dghunter79 4-3-0
Avalanche Group
tlloyd vs. [MAD] Mergus 3-4-0
perdhapley vs. yuma 3-4-0
ignorantmen vs. Axxle 3-4-0
Coheed vs. Jorbles 4-3-0
Tornado Group
mikemike vs. BJ Penn 3-4-0
A_S00 vs. NinjaBus 2-5-0
mnavratil vs. ebEliminator 2-5-0
michaeljb vs. CarpeDeezNuts 4-3-0
Jersey Shore Group
AHoppy vs. Mean Mr. Mustard
Nicki Menagerie vs. ^_^_^_^
greatexpectations vs. rspeer 4-3-0
RisingJaguar vs. Brando Commando 5-2-0
Eurasia Group:
Geronimoo vs. Rabid 3-4-0
lespeutere vs. Nucleus 5-2-0
MrEevee vs. Mangsky 5-2-0
Dubdubdubdub vs. luliin
Central Europe Group:
Fabian vs. ArjanB 6-1-0
JanErik vs. ugasoft 4-3-0
AngBoy vs. StickaRicka 1-6-0
Lekkit vs. Tonks77 4-3-0
Eastern America Group
WanderingWinder vs. angrybirds 5-2-0
Voltaire vs. Mic Qsenoch 1-6-0
zxcvbn2 vs. Masticore 2-5-0
Robz888 vs. blueblimp 4-3-0
Atlantic Group:
Tha Trillest Young Nick vs. Kirian 4-3-0
Greystripe77 vs. DG
elahrairah13 vs. jonts26 2-5-0
fit1one vs. andwilk 4-3-0
Week 3:
Week Three Match-ups:
Pacific Coast Group
ednever vs. Insomniac-X
antony vs. Titandrake 5-1-1
Tmoiy vs. O
shark_bait vs. DGHunter79
What's a Coast? Group
[MAD] Mergus vs. perdhapley 4-3-0
tlloyd vs. Ignorantmen
Yuma vs. coheed
Axxle vs. Jorbles
Gulf Coast Group
BJ Penn vs. A_S00
mikemike vs. mnavratil
ninjabus vs. michaeljb
ebEliminator vs. CarpeDeezNuts
Atlantic Coast Group
Mean Mr Mustard vs. Nicki Menagerie
AHoppy vs. greatexpectations
^_^_^_^ vs. RisingJaguar
rspeer vs. Brando Commando
Eurasia Group:
Geronimoo vs. Nucleus
Rabid vs. Mangsky
lespeutere vs. Dubdubdubdub
MrEevee vs. luliin
Central Europe Group:
Fabian vs. ugasoft 2-5-0
ArjanB vs. StickaRicka
JanErik vs. Lekkit
angboy vs. Tonks77
Eastern America Group
Wandering Winder vs. Mic Qsenoch
angrybirds vs. Masticore
Voltaire vs. Robz888
zxcvbn2 vs. blueblimp 3-4-0
Atlantic Group:
Young Nick vs. DG
Kirian vs. Jonts26
Graystripe77 vs. fit1one 4-3-0
elahrairah13 vs. andwilk
-
You've got some weird stuff going on in my division. I believe I should have 22 points and fit1one should have 16. And in the week 2 results it should be jonts26 over elahrairah13, 5 games to 2.
-
You've got some weird stuff going on in my division. I believe I should have 22 points and fit1one should have 16. And in the week 2 results it should be jonts26 over elahrairah13, 5 games to 2.
Thanks, my bad. In week 1, I typo'd as fit1one winning 6-1-0 over you, when it was actually you winning 6-1-0. The week 2 results are correct (and match what you say in your post).
I'll fix the post above momentarily.
Edit: Maybe not surprisingly, fixing this typo makes a huge difference to your group's win chances. It takes you from 26% to 69%!
-
:minor necro:
I'll try to trick your guys intuition into believing the right answer. Consider the case of an incoming contender with no previous data. You just have that wide, mu = 25, big sigma^2 initial distribution for the players skill.
Do you still want to ignore the information of her getting into round X when making predictions about him getting into round X + 1?
At least for this case, you agree that taking the intermediate tournament results into account will help your prediction, right?
blueblimp is entirely correct. I believe what's happening here, rrenaud, is that your intuition is saying "surely updating on the fact that she made it to round X moves our belief about her skill up, because there's far more probability mass in (she got to round X with high skill) than (she got to round X with low skill)". This is entirely accurate in a world where we have uncertainty about her skill. blueblimp's model is different. The initial mu=25, s^2 large distribution is a measure of our uncertainty about her skill. We have two basic options:
1. Calculate the odds of her winning round 1 while incorporating that uncertainty, then update our uncertainty to one thing in the branch of the problem where we suppose she won and to another thing in the branch of the problem where we suppose she lost. Keep going until we have the probabilities of all n! outcomes, combine those with the same winners.
2. Instead of propagating and updating the uncertainty, resolve it artificially right now. Sample a single point randomly from that distribution and suppose that is her actual skill. No uncertainty left. Now do the math as if we had no skill uncertainty, find the distribution of winners, record it, repeat the whole thing a bajillion times, and combine the resulting distributions to get the win probabilities.
(between these options, it's possible to narrow down player skills by sampling and then propagate the uncertainty, but I think that's not useful if our uncertainties start out nice and normal)
Which of these two is better? I think 2. converges to 1. in the limit and is much easier to code... Let's see what 1. would look like. I'll make the falsebad assumption that TrueSkill makes about uncertainty, which is that we can keep it normal the whole way.
One round: New (25, 8.33^2) versus blueblimp (42.4, 2.3^2)
The probability that blueblimp's skill is X greater than New's follows N(42.4-25,2.3^2+8.33^2), and a player with X greater skill wins with probability 4^(X/25)/(4^(X/25)+1). Plugging into Wolfram Alpha (1) that's a 71.5% chance blueblimp wins, and their new skill distributions according to (3) are
blueblimp wins: New (24.258, 7.802) and blueblimp (42.457, 2.291).
New wins: New (38.998, 5.594) and blueblimp (41.332, 2.253)
Now just do that lots of times and you can get the full table! Tools like a language with a numerical integration module may make this easier. :D
(1) http://www.wolframalpha.com/input/?i=int%281%2Fsqrt%282*pi%29*1%2F%282.3%5E2%2B8.33%5E2%29%5E.5*e%5E-%28%28X-%2842.4-25%29%29%5E2%2F%282*%282.3%5E2%2B8.33%5E2%29%29%29*4%5E%28X%2F25%29%2F%284%5E%28X%2F25%29%2B1%29%2CX%3D-inf..inf%29
(2) http://atom.research.microsoft.com/trueskill/rankcalculator.aspx
-
Thanks for your post. I don't follow your method 1 though. Is the approach you describe to calculate exact probabilities of all outcomes?
-
Thanks for your post. I don't follow your method 1 though. Is the approach you describe to calculate exact probabilities of all outcomes?
Yep. Or it would be exact except that we're not doing a true update of our prior normal probability, we're instead finding the means and variances of resulting distributions and calling the results the normal distributions with those means and variances.
-
Thanks for your post. I don't follow your method 1 though. Is the approach you describe to calculate exact probabilities of all outcomes?
Yep. Or it would be exact except that we're not doing a true update of our prior normal probability, we're instead finding the means and variances of resulting distributions and calling the results the normal distributions with those means and variances.
I think I see now, thanks. Unfortunately it's impractical here because of the high number of games played. For a single series, it could probably work.
-
One thing that was often on my mind during this whole tournament was this tournament prediction. I thought this was a fun way to quantify how often a player should win. This would be a nice time to revisit the results, below I have bolded the winners.
Brackets:
Groups:
Ranking:
41% (avg pts 61): O
23% (avg pts 58): shark_bait
22% (avg pts 58): antony
13% (avg pts 56): dghunter79
1% (avg pts 43): Tmoiy
0% (avg pts 42): ednever
0% (avg pts 40): Titandrake
0% (avg pts 30): Insomniac-X
Ranking:
28% (avg pts 54): [MAD] Mergus
23% (avg pts 53): tlloyd
19% (avg pts 52): Jorbles
19% (avg pts 52): Axxle
3% (avg pts 45): Coheed
3% (avg pts 42): ignorantmen
2% (avg pts 46): perdhapley
2% (avg pts 44): yuma
Ranking:
72% (avg pts 63): NinjaBus
17% (avg pts 54): michaeljb
4% (avg pts 49): mikemike
4% (avg pts 50): A_S00
2% (avg pts 48): BJ Penn
0% (avg pts 42): CarpeDeezNuts
0% (avg pts 41): ebEliminator
0% (avg pts 40): mnavratil
Ranking:
77% (avg pts 70): RisingJaguar
20% (avg pts 63): Mean Mr Mustard
1% (avg pts 54): greatexpectations
1% (avg pts 51): rspeer
0% (avg pts 48): Brando Commando
0% (avg pts 41): ^_^_^_^
0% (avg pts 39): Nicki Menagerie
0% (avg pts 23): AHoppy
Groups:
Ranking:
51% (avg pts 63): Rabid
22% (avg pts 59): Geronimoo
18% (avg pts 57): lespeutere
7% (avg pts 54): MrEevee
1% (avg pts 45): Dubdubdubdub
0% (avg pts 41): luliin
0% (avg pts 35): Nucleus
0% (avg pts 34): Mangsky
Ranking:
44% (avg pts 61): Fabian
26% (avg pts 58): Lekkit
13% (avg pts 56): JanErik
9% (avg pts 54): StickaRicka
6% (avg pts 54): ugasoft
1% (avg pts 48): Tonks77
0% (avg pts 28): AngBoy
0% (avg pts 30): ArjanB
Ranking:
54% (avg pts 62): WanderingWinder
24% (avg pts 58): Robz888
18% (avg pts 57): Mic Qsenoch
3% (avg pts 50): Masticore
2% (avg pts 51): blueblimp
0% (avg pts 34): Voltaire
0% (avg pts 37): angrybirds
0% (avg pts 38): zxcvbn2
Ranking:
69% (avg pts 65): jonts26
18% (avg pts 56): DG
9% (avg pts 54): Tha Trillest Young Nick
2% (avg pts 48): Kirian
2% (avg pts 49): Graystripe77
0% (avg pts 43): andwilk
0% (avg pts 40): elahrairah13
0% (avg pts 33): fit1one
-
Note that those predictions included two-and-a-half weeks' worth of results.