Dominion Strategy Forum

Please login or register.

Login with username, password and session length
Pages: [1]

Author Topic: Some Statistics on Ratings  (Read 425 times)

0 Members and 1 Guest are viewing this topic.

markus

  • Golem
  • ****
  • Offline Offline
  • Posts: 178
  • Shuffle iT Username: markus
  • Respect: +158
    • View Profile
Some Statistics on Ratings
« on: February 07, 2018, 05:27:45 pm »
+29

For this analysis, Iím using the same data as Scavenger. (If you donít know it, check it out!)
Iím using all rated 2-player games played until January 29th.

Rating System

Iíll mostly talk about mu, so letís start with a quick summary of the rating system. You can find some more info also here and the links contained therein.

1) mu (Ķ): this is the best measure of your skill and everyone starts with mu=0. Itís a relative measure and the expected win percentage between two players mostly depends on the difference between the two playersí mu. For example, a difference of 1 corresponds to about 73% chance of winning (ties always count as half a win). Hereís a graph that shows this probability in general:


2) phi (ϕ): the second parameter measures the uncertainty around the skill mu. In 95% of the cases a playerís true skill should lie in the interval [mu-2*phi,mu+2*phi]. Players start with phi=0.75.

3) Level: the level is simply calculated as 50+7.5*(mu-2*phi). It is therefore a conservative measure of your skill as it takes the lower bound of the interval given above. That also means that players with fewer games (recently) are on average underrated in terms of their level. But you canít sit on your high level after some (lucky) wins.

4) sigma (σ): this is a measure for the stability of your skill. Players start with sigma=0.033 and it doesnít move much, because stability of mu is hard to estimate given the few games per rating period (=1 day). Given this assumed parameter, the skill of a typical player either gains or loses 0.033 of skill on a day. This makes the estimate of the skill less certain when a player doesnít play (much) and phi increases.

How does the rating change?
In theory, itís simple: mu increases, if you win more games than you were expected to. Scavenger also calculates that for you. How much mu changes also depends on your uncertainty phi. The more certain your rating is, the less it will change.
In particular, the formula is:
mu_change = phi^2*(actual_wins Ė expected_wins)
So, if your phi=0.2, winning or losing a game makes a difference of mu=0.04 (or level=0.3). If you were expected to win with 75%, then winning adds mu=0.01 and losing subtracts mu=0.03.

Uncertainty phi decreases with each game played and increases due to sigma. If your opponent is closer to your skill, phi will decrease more as the result is more informative (what matters is (win_probability*(1-win_probability)). If you play a constant number of games per day, your phi will converge to a certain value. (if you play less afterwards, it will increase again and vice versa.)
For example, if you play 1/5/10 games per day, phi will end up around 0.26/0.17/0.15.


Games Played

Hereís the number of those rated 2-player games recorded per day and the number of what I defined as ďactive playersĒ, i.e. having played at least 10 games in the last 30 days.
Edit: the number of games in the left graph should be halved because each game is counted for each player, hence twice.

There are around 20,000 10,000 games played per day and active players are around 5,000. You can notice the reduction in games played in late October, when Nocturne preview was available.


Distribution of Skill

Hereís the histogram of the current skill of all players, only active players, and the one weighted by the number of games played (in that one mu is the value on the day the game was played:


The following heat maps show which players get matched most frequently. The right one zooms in one games with at least one player having mu=1.5:



You can see above that the distribution is not centred on mu=0 anymore, but the average is negative. Here is how the average has evolved since the start of the leaderboard:


First, let me be clear that this decline is not a big problem, because what matters is not the absolute value of mu but the difference between two players.
But whatís the reason? As described above, the change of mu depends on the difference between actual wins and expected wins and phi. The former is symmetric: if player 1 outperforms expectations, player 2 underperforms by the same amount. But phi can differ between the two. In particular, if the underperformer has a higher phi than the overperformer, mu of the underperformer will fall more than mu of the overperformer increases and average mu falls. This could happen, because new players (high phi) are doing worse than expected (mu=0) or players that have been away for some time (higher phi) are playing worse than before.

Something to note are the two breaks in the red curve of active players above: end of May the decline stopped when the matching system was changed to make the default match more even (smaller level difference allowed). The second break was end of July, when the parameters of the ratings system were changed. That increased the level of new players to 38.75 and made matches of new players with experienced positive mu players more likely.
(Note: I calculated each playerís mu from the start using the current parameters, such that thereís no break in the method. Lowering starting phi from 2 to 0.75 helped to keep average mu more stable, because new players donít lose that much rating on their first losses anymore. If I calculated todayís ratings with the original parameters, the average would be at -0.85 for all players and -0.6 for active players.)

To round this up, here are the upper percentiles and how they have evolved:



Beat the Expectation?
If you want to increase your mu, you need to play better than expected. A question that regularly comes up is whether itís more beneficial to play a better or weaker opponent. For that I look at the difference between expected outcome and actual outcome for different bins of level difference (I use level here, because thatís what you can set in your matching options). I restrict the sample to the better player being at least level 45. The result is the left panel of this graph:
 

It shows that a better player slightly underperforms when facing a weaker player. But the difference is hardly significant: playing someone 8 levels higher would give you a 1% better outcome than playing someone 8 levels lower. Therefore, when averaged over all players, the theoretical win probability shown in the first graph matches the outcome well. Some players might still do better when facing someone stronger or weaker.

The right graph shows the overperformance in the n-th game of a player on a given day (only using players with already 100 games). You might think that itís harder to focus on many games in a row, but that graph doesnít show a strong effect, either. The caveat is that I can only use the rating day, such that I canít see whether thereís been some hours of break between games. If someone plays around 0:00 UTC, then games also count for two days.
What you can see from the right graph is that there is an outperformance on average for those players with 100+ games. That means that those players tend to increase their rating when they play. So letís have a look at the correlation between games played and skill in the following heat map:

There is a mildly positive relationship between the total number of games played and a playerís mu. But you can also see that thereís a lot of variance and playing many games is not sufficient for becoming a good player. Hence, you might want to spend some time on the other sections of this forum or the discord channel.
« Last Edit: February 14, 2018, 05:20:17 am by markus »
Logged

Cave-o-sapien

  • Minion
  • *****
  • Offline Offline
  • Posts: 694
  • Respect: +1130
    • View Profile
Re: Some Statistics on Ratings
« Reply #1 on: February 07, 2018, 06:10:19 pm »
0

This is a really nice writeup and compilation of that discord data dump.

I think the correlation between games played and Ķ is hiding some important information: the historical record of all games played on other systems. Of course you don't have that data, and maybe it's missing completely at random, but I doubt it.

 
Logged

markus

  • Golem
  • ****
  • Offline Offline
  • Posts: 178
  • Shuffle iT Username: markus
  • Respect: +158
    • View Profile
Re: Some Statistics on Ratings
« Reply #2 on: February 08, 2018, 05:31:18 am »
+5

I think the correlation between games played and Ķ is hiding some important information: the historical record of all games played on other systems. Of course you don't have that data, and maybe it's missing completely at random, but I doubt it.
That is true to some extent. I only took accounts with at least 100 games. By then mu should be about where the starting skill is due to previous experience. That explains the dispersion at the left end. But the other problem is that I'm just looking at the cross-section of players right now.

So let me attempt something different, that aims at seeing how a player's skill changes over time. Here I'm only taking the 1745 players with at least 1000 games and look at how their mu has changed since game 100:
Logged

Chappy7

  • Conspirator
  • ****
  • Offline Offline
  • Posts: 204
  • Shuffle iT Username: Chappy7
  • Respect: +257
    • View Profile
Re: Some Statistics on Ratings
« Reply #3 on: February 08, 2018, 04:22:34 pm »
+2

My little brain can't handle this post
Logged

timchen

  • Minion
  • *****
  • Offline Offline
  • Posts: 699
  • Shuffle iT Username: allfail
  • Respect: +229
    • View Profile
Re: Some Statistics on Ratings
« Reply #4 on: February 09, 2018, 10:22:21 am »
0

Why is the update of mu proportional to phi^2?
Logged

markus

  • Golem
  • ****
  • Offline Offline
  • Posts: 178
  • Shuffle iT Username: markus
  • Respect: +158
    • View Profile
Re: Some Statistics on Ratings
« Reply #5 on: February 09, 2018, 10:45:25 am »
0

Why is the update of mu proportional to phi^2?
It's intuitive that the update should increase in phi as higher uncertainty about the skill makes you update your beliefs more when new information comes in. But I guess your question is why it's the square. For that you'd have to consult the Glicko paper.
Logged
Pages: [1]
 

Page created in 0.121 seconds with 21 queries.