Topic: Some Statistics on Ratings (Read 5411 times)

markus · « **on:** February 07, 2018, 05:27:45 pm »

For this analysis, I’m using the same data as Scavenger. (If you don’t know it, check it out!)
I’m using all rated 2-player games played until January 29th.

Rating System

I’ll mostly talk about mu, so let’s start with a quick summary of the rating system. You can find some more info also here and the links contained therein.

1) mu (µ): this is the best measure of your skill and everyone starts with mu=0. It’s a relative measure and the expected win percentage between two players mostly depends on the difference between the two players’ mu. For example, a difference of 1 corresponds to about 73% chance of winning (ties always count as half a win). Here’s a graph that shows this probability in general:

2) phi (ϕ): the second parameter measures the uncertainty around the skill mu. In 95% of the cases a player’s true skill should lie in the interval [mu-2*phi,mu+2*phi]. Players start with phi=0.75.

3) Level: the level is simply calculated as 50+7.5*(mu-2*phi). It is therefore a conservative measure of your skill as it takes the lower bound of the interval given above. That also means that players with fewer games (recently) are on average underrated in terms of their level. But you can’t sit on your high level after some (lucky) wins.

4) sigma (σ): this is a measure for the stability of your skill. Players start with sigma=0.033 and it doesn’t move much, because stability of mu is hard to estimate given the few games per rating period (=1 day). Given this assumed parameter, the skill of a typical player either gains or loses 0.033 of skill on a day. This makes the estimate of the skill less certain when a player doesn’t play (much) and phi increases.

How does the rating change?
In theory, it’s simple: mu increases, if you win more games than you were expected to. Scavenger also calculates that for you. How much mu changes also depends on your uncertainty phi. The more certain your rating is, the less it will change.
In particular, the formula is:
mu_change = phi^2*(actual_wins – expected_wins)
So, if your phi=0.2, winning or losing a game makes a difference of mu=0.04 (or level=0.3). If you were expected to win with 75%, then winning adds mu=0.01 and losing subtracts mu=0.03.

Uncertainty phi decreases with each game played and increases due to sigma. If your opponent is closer to your skill, phi will decrease more as the result is more informative (what matters is (win_probability*(1-win_probability)). If you play a constant number of games per day, your phi will converge to a certain value. (if you play less afterwards, it will increase again and vice versa.)
For example, if you play 1/5/10 games per day, phi will end up around 0.26/0.17/0.15.

Games Played

Here’s the number of those rated 2-player games recorded per day and the number of what I defined as “active players”, i.e. having played at least 10 games in the last 30 days.
Edit: the number of games in the left graph should be halved because each game is counted for each player, hence twice.

There are around ~~20,000~~ 10,000 games played per day and active players are around 5,000. You can notice the reduction in games played in late October, when Nocturne preview was available.

Distribution of Skill

Here’s the histogram of the current skill of all players, only active players, and the one weighted by the number of games played (in that one mu is the value on the day the game was played:

The following heat maps show which players get matched most frequently. The right one zooms in one games with at least one player having mu=1.5:

You can see above that the distribution is not centred on mu=0 anymore, but the average is negative. Here is how the average has evolved since the start of the leaderboard:

First, let me be clear that this decline is not a big problem, because what matters is not the absolute value of mu but the difference between two players.
But what’s the reason? As described above, the change of mu depends on the difference between actual wins and expected wins and phi. The former is symmetric: if player 1 outperforms expectations, player 2 underperforms by the same amount. But phi can differ between the two. In particular, if the underperformer has a higher phi than the overperformer, mu of the underperformer will fall more than mu of the overperformer increases and average mu falls. This could happen, because new players (high phi) are doing worse than expected (mu=0) or players that have been away for some time (higher phi) are playing worse than before.

Something to note are the two breaks in the red curve of active players above: end of May the decline stopped when the matching system was changed to make the default match more even (smaller level difference allowed). The second break was end of July, when the parameters of the ratings system were changed. That increased the level of new players to 38.75 and made matches of new players with experienced positive mu players more likely.
(Note: I calculated each player’s mu from the start using the current parameters, such that there’s no break in the method. Lowering starting phi from 2 to 0.75 helped to keep average mu more stable, because new players don’t lose that much rating on their first losses anymore. If I calculated today’s ratings with the original parameters, the average would be at -0.85 for all players and -0.6 for active players.)

To round this up, here are the upper percentiles and how they have evolved:

Beat the Expectation?
If you want to increase your mu, you need to play better than expected. A question that regularly comes up is whether it’s more beneficial to play a better or weaker opponent. For that I look at the difference between expected outcome and actual outcome for different bins of level difference (I use level here, because that’s what you can set in your matching options). I restrict the sample to the better player being at least level 45. The result is the left panel of this graph:

It shows that a better player slightly underperforms when facing a weaker player. But the difference is hardly significant: playing someone 8 levels higher would give you a 1% better outcome than playing someone 8 levels lower. Therefore, when averaged over all players, the theoretical win probability shown in the first graph matches the outcome well. Some players might still do better when facing someone stronger or weaker.

The right graph shows the overperformance in the n-th game of a player on a given day (only using players with already 100 games). You might think that it’s harder to focus on many games in a row, but that graph doesn’t show a strong effect, either. The caveat is that I can only use the rating day, such that I can’t see whether there’s been some hours of break between games. If someone plays around 0:00 UTC, then games also count for two days.
What you can see from the right graph is that there is an outperformance on average for those players with 100+ games. That means that those players tend to increase their rating when they play. So let’s have a look at the correlation between games played and skill in the following heat map:

There is a mildly positive relationship between the total number of games played and a player’s mu. But you can also see that there’s a lot of variance and playing many games is not sufficient for becoming a good player. Hence, you might want to spend some time on the other sections of this forum or the discord channel.

Cave-o-sapien · « **Reply #1 on:** February 07, 2018, 06:10:19 pm »

This is a really nice writeup and compilation of that discord data dump.

I think the correlation between games played and µ is hiding some important information: the historical record of all games played on other systems. Of course you don't have that data, and maybe it's missing completely at random, but I doubt it.

markus · « **Reply #2 on:** February 08, 2018, 05:31:18 am »

Quote from: Cave-o-sapien on February 07, 2018, 06:10:19 pm

I think the correlation between games played and µ is hiding some important information: the historical record of all games played on other systems. Of course you don't have that data, and maybe it's missing completely at random, but I doubt it.

That is true to some extent. I only took accounts with at least 100 games. By then mu should be about where the starting skill is due to previous experience. That explains the dispersion at the left end. But the other problem is that I'm just looking at the cross-section of players right now.

So let me attempt something different, that aims at seeing how a player's skill changes over time. Here I'm only taking the 1745 players with at least 1000 games and look at how their mu has changed since game 100:

Chappy7 · « **Reply #3 on:** February 08, 2018, 04:22:34 pm »

My little brain can't handle this post

timchen · « **Reply #4 on:** February 09, 2018, 10:22:21 am »

Why is the update of mu proportional to phi^2?

markus · « **Reply #5 on:** February 09, 2018, 10:45:25 am »

Quote from: timchen on February 09, 2018, 10:22:21 am

Why is the update of mu proportional to phi^2?

It's intuitive that the update should increase in phi as higher uncertainty about the skill makes you update your beliefs more when new information comes in. But I guess your question is why it's the square. For that you'd have to consult the Glicko paper.

gloures · « **Reply #6 on:** July 23, 2018, 03:26:10 am »

One thing that I see empirically from a few players that seem to manage to outperform their apparent skill ladder is that it seems like when the difference in level is really big (15+ at least I think) players seem to manage to outperform their expectation. There were a few people that seemed to manage to ladder effectively by only playing far weaker opponents (even without board rigging).

markus · « **Reply #7 on:** July 23, 2018, 04:57:45 am »

If someone only plays people 10 or 15 levels below, it's not possible to judge whether they outperform or not, because there would only be observations against weak players.

In general, the rating converges to a level such that it is on average right for the set of opponents you play. So strictly speaking, with the data I can only say that people who play opponents of different skill levels don't perform better in a subset of those opponents.

Luther · « **Reply #8 on:** November 15, 2018, 02:17:28 pm »

How do you calculate your wins and losses from these numbers??

markus · « **Reply #9 on:** November 15, 2018, 03:20:05 pm »

Quote from: Luther on November 15, 2018, 02:17:28 pm

How do you calculate your wins and losses from these numbers??

You can check Scavenger for your past games and the record: http://dominion.lauxnet.com/scavenger/

Dominion Strategy Forum

News:

Author Topic: Some Statistics on Ratings (Read 5411 times)

markus

Some Statistics on Ratings

Cave-o-sapien

Re: Some Statistics on Ratings

markus

Re: Some Statistics on Ratings

Chappy7

Re: Some Statistics on Ratings

timchen

Re: Some Statistics on Ratings

markus

Re: Some Statistics on Ratings

gloures

Re: Some Statistics on Ratings

markus

Re: Some Statistics on Ratings

Luther

Re: Some Statistics on Ratings

markus

Re: Some Statistics on Ratings