1

**Dominion Online at Shuffle iT / Some Statistics on Ratings**

« **on:**February 07, 2018, 05:27:45 pm »

For this analysis, I’m using the same data as Scavenger. (If you don’t know it, check it out!)

I’m using all rated 2-player games played until January 29th.

I’ll mostly talk about mu, so let’s start with a quick summary of the rating system. You can find some more info also here and the links contained therein.

In theory, it’s simple: mu increases, if you win more games than you were expected to. Scavenger also calculates that for you. How much mu changes also depends on your uncertainty phi. The more certain your rating is, the less it will change.

In particular, the formula is:

mu_change = phi^2*(actual_wins – expected_wins)

So, if your phi=0.2, winning or losing a game makes a difference of mu=0.04 (or level=0.3). If you were expected to win with 75%, then winning adds mu=0.01 and losing subtracts mu=0.03.

Uncertainty phi decreases with each game played and increases due to sigma. If your opponent is closer to your skill, phi will decrease more as the result is more informative (what matters is (win_probability*(1-win_probability)). If you play a constant number of games per day, your phi will converge to a certain value. (if you play less afterwards, it will increase again and vice versa.)

For example, if you play 1/5/10 games per day, phi will end up around 0.26/0.17/0.15.

Here’s the number of those rated 2-player games recorded per day and the number of what I defined as “active players”, i.e. having played at least 10 games in the last 30 days.

There are around~~20,000 ~~10,000 games played per day and active players are around 5,000. You can notice the reduction in games played in late October, when Nocturne preview was available.

Here’s the histogram of the current skill of all players, only active players, and the one weighted by the number of games played (in that one mu is the value on the day the game was played:

The following heat maps show which players get matched most frequently. The right one zooms in one games with at least one player having mu=1.5:

You can see above that the distribution is not centred on mu=0 anymore, but the average is negative. Here is how the average has evolved since the start of the leaderboard:

First, let me be clear that this decline is not a big problem, because what matters is not the absolute value of mu but the difference between two players.

But what’s the reason? As described above, the change of mu depends on the difference between actual wins and expected wins and phi. The former is symmetric: if player 1 outperforms expectations, player 2 underperforms by the same amount. But phi can differ between the two. In particular, if the underperformer has a higher phi than the overperformer, mu of the underperformer will fall more than mu of the overperformer increases and average mu falls. This could happen, because new players (high phi) are doing worse than expected (mu=0) or players that have been away for some time (higher phi) are playing worse than before.

Something to note are the two breaks in the red curve of active players above: end of May the decline stopped when the matching system was changed to make the default match more even (smaller level difference allowed). The second break was end of July, when the parameters of the ratings system were changed. That increased the level of new players to 38.75 and made matches of new players with experienced positive mu players more likely.

(Note: I calculated each player’s mu from the start using the current parameters, such that there’s no break in the method. Lowering starting phi from 2 to 0.75 helped to keep average mu more stable, because new players don’t lose that much rating on their first losses anymore. If I calculated today’s ratings with the original parameters, the average would be at -0.85 for all players and -0.6 for active players.)

To round this up, here are the upper percentiles and how they have evolved:

If you want to increase your mu, you need to play better than expected. A question that regularly comes up is whether it’s more beneficial to play a better or weaker opponent. For that I look at the difference between expected outcome and actual outcome for different bins of level difference (I use level here, because that’s what you can set in your matching options). I restrict the sample to the better player being at least level 45. The result is the left panel of this graph:

It shows that a better player slightly underperforms when facing a weaker player. But the difference is hardly significant: playing someone 8 levels higher would give you a 1% better outcome than playing someone 8 levels lower. Therefore, when averaged over all players, the theoretical win probability shown in the first graph matches the outcome well. Some players might still do better when facing someone stronger or weaker.

The right graph shows the overperformance in the n-th game of a player on a given day (only using players with already 100 games). You might think that it’s harder to focus on many games in a row, but that graph doesn’t show a strong effect, either. The caveat is that I can only use the rating day, such that I can’t see whether there’s been some hours of break between games. If someone plays around 0:00 UTC, then games also count for two days.

What you can see from the right graph is that there is an outperformance on average for those players with 100+ games. That means that those players tend to increase their rating when they play. So let’s have a look at the correlation between games played and skill in the following heat map:

There is a mildly positive relationship between the total number of games played and a player’s mu. But you can also see that there’s a lot of variance and playing many games is not sufficient for becoming a good player. Hence, you might want to spend some time on the other sections of this forum or the discord channel.

I’m using all rated 2-player games played until January 29th.

**Rating System**I’ll mostly talk about mu, so let’s start with a quick summary of the rating system. You can find some more info also here and the links contained therein.

**1) mu (µ)**: this is the best measure of your skill and everyone starts with mu=0. It’s a relative measure and the expected win percentage between two players mostly depends on the difference between the two players’ mu. For example, a difference of 1 corresponds to about 73% chance of winning (ties always count as half a win). Here’s a graph that shows this probability in general:**2) phi (ϕ)**: the second parameter measures the uncertainty around the skill mu. In 95% of the cases a player’s true skill should lie in the interval [mu-2*phi,mu+2*phi]. Players start with phi=0.75.**3) Level**: the level is simply calculated as 50+7.5*(mu-2*phi). It is therefore a conservative measure of your skill as it takes the lower bound of the interval given above. That also means that players with fewer games (recently) are on average underrated in terms of their level. But you can’t sit on your high level after some (lucky) wins.**4) sigma (σ)**: this is a measure for the stability of your skill. Players start with sigma=0.033 and it doesn’t move much, because stability of mu is hard to estimate given the few games per rating period (=1 day). Given this assumed parameter, the skill of a typical player either gains or loses 0.033 of skill on a day. This makes the estimate of the skill less certain when a player doesn’t play (much) and phi increases.**How does the rating change?**In theory, it’s simple: mu increases, if you win more games than you were expected to. Scavenger also calculates that for you. How much mu changes also depends on your uncertainty phi. The more certain your rating is, the less it will change.

In particular, the formula is:

mu_change = phi^2*(actual_wins – expected_wins)

So, if your phi=0.2, winning or losing a game makes a difference of mu=0.04 (or level=0.3). If you were expected to win with 75%, then winning adds mu=0.01 and losing subtracts mu=0.03.

Uncertainty phi decreases with each game played and increases due to sigma. If your opponent is closer to your skill, phi will decrease more as the result is more informative (what matters is (win_probability*(1-win_probability)). If you play a constant number of games per day, your phi will converge to a certain value. (if you play less afterwards, it will increase again and vice versa.)

For example, if you play 1/5/10 games per day, phi will end up around 0.26/0.17/0.15.

**Games Played**Here’s the number of those rated 2-player games recorded per day and the number of what I defined as “active players”, i.e. having played at least 10 games in the last 30 days.

*Edit: the number of games in the left graph should be halved because each game is counted for each player, hence twice.*There are around

**Distribution of Skill**Here’s the histogram of the current skill of all players, only active players, and the one weighted by the number of games played (in that one mu is the value on the day the game was played:

The following heat maps show which players get matched most frequently. The right one zooms in one games with at least one player having mu=1.5:

You can see above that the distribution is not centred on mu=0 anymore, but the average is negative. Here is how the average has evolved since the start of the leaderboard:

First, let me be clear that this decline is not a big problem, because what matters is not the absolute value of mu but the difference between two players.

But what’s the reason? As described above, the change of mu depends on the difference between actual wins and expected wins and phi. The former is symmetric: if player 1 outperforms expectations, player 2 underperforms by the same amount. But phi can differ between the two. In particular, if the underperformer has a higher phi than the overperformer, mu of the underperformer will fall more than mu of the overperformer increases and average mu falls. This could happen, because new players (high phi) are doing worse than expected (mu=0) or players that have been away for some time (higher phi) are playing worse than before.

Something to note are the two breaks in the red curve of active players above: end of May the decline stopped when the matching system was changed to make the default match more even (smaller level difference allowed). The second break was end of July, when the parameters of the ratings system were changed. That increased the level of new players to 38.75 and made matches of new players with experienced positive mu players more likely.

(Note: I calculated each player’s mu from the start using the current parameters, such that there’s no break in the method. Lowering starting phi from 2 to 0.75 helped to keep average mu more stable, because new players don’t lose that much rating on their first losses anymore. If I calculated today’s ratings with the original parameters, the average would be at -0.85 for all players and -0.6 for active players.)

To round this up, here are the upper percentiles and how they have evolved:

**Beat the Expectation?**If you want to increase your mu, you need to play better than expected. A question that regularly comes up is whether it’s more beneficial to play a better or weaker opponent. For that I look at the difference between expected outcome and actual outcome for different bins of level difference (I use level here, because that’s what you can set in your matching options). I restrict the sample to the better player being at least level 45. The result is the left panel of this graph:

It shows that a better player slightly underperforms when facing a weaker player. But the difference is hardly significant: playing someone 8 levels higher would give you a 1% better outcome than playing someone 8 levels lower. Therefore, when averaged over all players, the theoretical win probability shown in the first graph matches the outcome well. Some players might still do better when facing someone stronger or weaker.

The right graph shows the overperformance in the n-th game of a player on a given day (only using players with already 100 games). You might think that it’s harder to focus on many games in a row, but that graph doesn’t show a strong effect, either. The caveat is that I can only use the rating day, such that I can’t see whether there’s been some hours of break between games. If someone plays around 0:00 UTC, then games also count for two days.

What you can see from the right graph is that there is an outperformance on average for those players with 100+ games. That means that those players tend to increase their rating when they play. So let’s have a look at the correlation between games played and skill in the following heat map:

There is a mildly positive relationship between the total number of games played and a player’s mu. But you can also see that there’s a lot of variance and playing many games is not sufficient for becoming a good player. Hence, you might want to spend some time on the other sections of this forum or the discord channel.