1

**Dominion General Discussion / Dominion Log Statistics**

« **on:**September 17, 2018, 06:10:36 pm »

We’ve had some fun with that already on the Dominion Discord and I thought it was time to write up a summary.

Using ceviri’s tool woodcutter (http://ceviri.me/woodcutter/) I’ve logged games of top players played since the start of this year. Games qualified, if at least one player had skill (=mu) of at least 1.9 at the time. For all conclusions that you draw from this, keep in mind that this is really the right tail of the skill distribution (top 0.9%).

In addition, about 12% of the logged games are because of specific players that played them. I’m dropping all games that ended before turn 3 and those with more than 2 events/landmarks. The result are about 24,000 games that I use - and for which the logs can be found on Woodcutter.

The results of the log analysis can be found starting from this google sheet: https://docs.google.com/spreadsheets/d/1M2L7hcY3sbA33OwuZhgPYJWVlMFgJYBdK8cnkbJHmbo/edit#gid=0

Summary of the results can be found for each card in the form of images in this album: https://1drv.ms/a/s!AgOcGYxKWHVDnXKCXradFogAJnMu

Some caveats first: for about 5% of the games grabbing the log was not successful. Some of them might be because I lost connection, more worrisome are the ones that are not randomly dropped: Smugglers had a bug that made some of its games unloadable; when the last decision is an autoplay by the bot (primarily Changeling) it can’t be loaded; and there are some internal errors. There’s also a bug with Band of Misfits and Overlord such that it counts as the copy of the card it is when it is in play at game end – so I try to exclude those games when that matters.

A limitation of the logs is that the last decision is not recorded. That could be an innocuous “end buy phase”, but also buying the last Province.

In this post, I primarily want to describe what I did and what you can find there. There’s a lot of data so expect to find some outliers, if you start searching for them. For example, it’s intuitive that it’s good to have a 5-2 opening on a board that has Witch. That it’s good to have 5-2 on a board with Fountain is more likely to be noise.

Let’s start with the information included in the graphs, using Rebuild as an example:

There are 741 boards with Rebuild.

The first player has won 62% (more precisely, if both players had the same strength, the first player would win 62%). This is slightly higher than the 59% estimated across all boards. But the standard error of this estimate is 2.1%, so it’s not a (statistically) significant deviation from the usual first player advantage. This is an observation that I’ve made more generally: changes in first player advantage tend to be small: there is little signal relative to the noise, so don’t try to interpret too much into it - even if it makes sense that FPA should be higher on Rebuild boards. On the flipside, “little signal” means that we can be relatively sure that there are no cards which make the first player win 70% or more of the games.

What I call the “skill multiplier“ is 0.93 for Rebuild, which indicates that it favours the weaker player as it’s less than 1. The motivation for this estimate is that in theory the win probability of a player with skill difference ∆mu is given by winprob=1/(1+exp(-∆mu)). The skill multiplier is the factor that multiplies the skill difference in this formula such that the observed results on Rebuild boards are explained best: winprob=1/(1+exp(-∆mu*skill_multiplier)). A value less than 1 means that the difference in mu between the players gets effectively shrinked – the better player wins less than they should according to their skill advantage. For example, a player with a positive ∆mu=1 (that is 7.5 levels) should win 73.1% of the games in general, but only wins 71.7%. Again, I show the standard error for this estimate showing that it’s not significantly smaller than 1. Also note that the estimated skill multiplier across all boards is 0.94, such that better players always tend to underperform a bit. (My short explanation would be that for top players their skill estimate mu is too swingy – my mu has fluctuated between 1.9 and 2.3 this year and I don’t believe that a lot of this was actual skill changes. As a result, when my mu is low after a bad streak I outperform expectations and vice versa.)

Next on the top left are the usual game endings with that card on the board. As the last decision is not logged, the classification might not always be exact, but I’m following the rules: if there’s at most 1 Province or Colony in supply it counts as Province ending; if the supply is at most 1 card (not Province or Colony) away from a three-pile ending it counts as such. All other games count as resignation. Note that some games will be classified as both Province and 3-pile ending (more than should be in reality) and some 3-piles might wrongly count as resignation (e.g. two Ports left in supply that are bought with last buy for 3-pile). Over all games there are 39% Province endings, 28% 3-piles, and 35% resignations. Governor leads to many Province and few 3-pile endings and Goons is the other way around. Tournament games have a high rate of resigns.

In the bottom panels there is the histogram with the share of games in which each player gains a certain number (left) and for the difference in the number of gains (right).

You can roughly see whether that led to more wins or losses from the colours in those bars and the top panels have the details: first, the blue coloured lines show the estimate for the win rate with a certain number of gains as well as the 95% confidence interval. For Rebuild this suggests that a player who doesn’t gain any Rebuild wins more than 50% of the games and a player with 1 Rebuild wins fewer than 50% of the games. But this might reflect that better players are more likely to skip Rebuild and they would also win more often if they go for Rebuild. To take out this effect, I estimate the version corrected for skill in green. This version uses the skill difference as an explanatory variable such that the result is an estimate for how well the players do against an equal opponent. This reduces the effect of playing without Rebuild to basically 0 (49% win rate). The right panel shows the same using the difference in gains instead of the absolute number – in the case of Rebuild there’s nothing statistically significant there.

Some gain statistics are also summarized on the top:

Some thoughts on interpreting those numbers:

Finally, let me remind you that this only uses the games of the top, and you would likely find different results for lower ranked players.

So much for the summary stats for each card. Most of the underlying information and much more can be found on the google sheets starting from here: I hope that it is more or less self-explanatory for someone who wants to dig deeper. I’ll just point out what else can be found there. First there are tabs on that sheet with stats for the whole database. Then, there are separate sheets for the different players that have a bunch of games (or were interested in them). Those are linked from the overview tab. Most useful for a general audience are:

On the general sheets, there is a tab with

The individual sheets have a tab that compares the buys / gains / trashes of the named baseline player with their opponents and one tab that shows the distribution of the number of buys / gains / trashes of each card.

Then they have the “gain 1st" and “gain 1st Qvist" tabs for that player only.

The boards tab has some aggregate statistics for boards with that card: average number of turns, average number of buys / gains / trashes. Then it has the first player advantage (not a lot of effect there) and the change in the win probability for that player. For the named players and the 5-2 opener it shows how much they outperformed expectations when that card is on the board. (e.g. being the only player to open 5-2 on a Witch board gives you a 15% outperformance, that is a 65% win chance against an equal opponent with random start.) For the better player sheet this column shows the skill multiplier (whether skill difference is more or less important on those boards).

I also tried to classify cards on the better-boards sheet in terms of being village, draw, trasher, gainer (and +buy), alt-VP, attack and types of attack. The idea was to see how the presence or absence / combination of these affects the win probability. Now, you could fill threads discussing the cards, my first try was to have them at value 0, 0.5, or 1 and then round down. (if there’s only a 0.5 Village on the board like Necropolis, the board counts as not having Villages).

Finally the logged game numbers used for the sheet with the kingdoms are on the last tab.

Have fun with the numbers and let me know what else you'd like to see!

Using ceviri’s tool woodcutter (http://ceviri.me/woodcutter/) I’ve logged games of top players played since the start of this year. Games qualified, if at least one player had skill (=mu) of at least 1.9 at the time. For all conclusions that you draw from this, keep in mind that this is really the right tail of the skill distribution (top 0.9%).

In addition, about 12% of the logged games are because of specific players that played them. I’m dropping all games that ended before turn 3 and those with more than 2 events/landmarks. The result are about 24,000 games that I use - and for which the logs can be found on Woodcutter.

The results of the log analysis can be found starting from this google sheet: https://docs.google.com/spreadsheets/d/1M2L7hcY3sbA33OwuZhgPYJWVlMFgJYBdK8cnkbJHmbo/edit#gid=0

Summary of the results can be found for each card in the form of images in this album: https://1drv.ms/a/s!AgOcGYxKWHVDnXKCXradFogAJnMu

Some caveats first: for about 5% of the games grabbing the log was not successful. Some of them might be because I lost connection, more worrisome are the ones that are not randomly dropped: Smugglers had a bug that made some of its games unloadable; when the last decision is an autoplay by the bot (primarily Changeling) it can’t be loaded; and there are some internal errors. There’s also a bug with Band of Misfits and Overlord such that it counts as the copy of the card it is when it is in play at game end – so I try to exclude those games when that matters.

A limitation of the logs is that the last decision is not recorded. That could be an innocuous “end buy phase”, but also buying the last Province.

In this post, I primarily want to describe what I did and what you can find there. There’s a lot of data so expect to find some outliers, if you start searching for them. For example, it’s intuitive that it’s good to have a 5-2 opening on a board that has Witch. That it’s good to have 5-2 on a board with Fountain is more likely to be noise.

Let’s start with the information included in the graphs, using Rebuild as an example:

There are 741 boards with Rebuild.

The first player has won 62% (more precisely, if both players had the same strength, the first player would win 62%). This is slightly higher than the 59% estimated across all boards. But the standard error of this estimate is 2.1%, so it’s not a (statistically) significant deviation from the usual first player advantage. This is an observation that I’ve made more generally: changes in first player advantage tend to be small: there is little signal relative to the noise, so don’t try to interpret too much into it - even if it makes sense that FPA should be higher on Rebuild boards. On the flipside, “little signal” means that we can be relatively sure that there are no cards which make the first player win 70% or more of the games.

What I call the “skill multiplier“ is 0.93 for Rebuild, which indicates that it favours the weaker player as it’s less than 1. The motivation for this estimate is that in theory the win probability of a player with skill difference ∆mu is given by winprob=1/(1+exp(-∆mu)). The skill multiplier is the factor that multiplies the skill difference in this formula such that the observed results on Rebuild boards are explained best: winprob=1/(1+exp(-∆mu*skill_multiplier)). A value less than 1 means that the difference in mu between the players gets effectively shrinked – the better player wins less than they should according to their skill advantage. For example, a player with a positive ∆mu=1 (that is 7.5 levels) should win 73.1% of the games in general, but only wins 71.7%. Again, I show the standard error for this estimate showing that it’s not significantly smaller than 1. Also note that the estimated skill multiplier across all boards is 0.94, such that better players always tend to underperform a bit. (My short explanation would be that for top players their skill estimate mu is too swingy – my mu has fluctuated between 1.9 and 2.3 this year and I don’t believe that a lot of this was actual skill changes. As a result, when my mu is low after a bad streak I outperform expectations and vice versa.)

Next on the top left are the usual game endings with that card on the board. As the last decision is not logged, the classification might not always be exact, but I’m following the rules: if there’s at most 1 Province or Colony in supply it counts as Province ending; if the supply is at most 1 card (not Province or Colony) away from a three-pile ending it counts as such. All other games count as resignation. Note that some games will be classified as both Province and 3-pile ending (more than should be in reality) and some 3-piles might wrongly count as resignation (e.g. two Ports left in supply that are bought with last buy for 3-pile). Over all games there are 39% Province endings, 28% 3-piles, and 35% resignations. Governor leads to many Province and few 3-pile endings and Goons is the other way around. Tournament games have a high rate of resigns.

In the bottom panels there is the histogram with the share of games in which each player gains a certain number (left) and for the difference in the number of gains (right).

You can roughly see whether that led to more wins or losses from the colours in those bars and the top panels have the details: first, the blue coloured lines show the estimate for the win rate with a certain number of gains as well as the 95% confidence interval. For Rebuild this suggests that a player who doesn’t gain any Rebuild wins more than 50% of the games and a player with 1 Rebuild wins fewer than 50% of the games. But this might reflect that better players are more likely to skip Rebuild and they would also win more often if they go for Rebuild. To take out this effect, I estimate the version corrected for skill in green. This version uses the skill difference as an explanatory variable such that the result is an estimate for how well the players do against an equal opponent. This reduces the effect of playing without Rebuild to basically 0 (49% win rate). The right panel shows the same using the difference in gains instead of the absolute number – in the case of Rebuild there’s nothing statistically significant there.

Some gain statistics are also summarized on the top:

- How many copies are gained on average by the first and second player (also conditional on gaining at least one)?
- How often is at least one copy gained by one or both players?
- How often is at least one copy gained by one or both players in the opening, which I count as everything that happens before the first player’s turn 3?
- How often does the only player who gains it - in general or in the opening – win the game? (This is again for the raw win rate and the one that corrects for skill differences. It also includes the standard errors for the estimates.)

Some thoughts on interpreting those numbers:

- For cards that are widely accessible (e.g. Jack of All Trades) a win rate below 50% for the only player to gain it suggests that usually the player who skips it is correct.
- For cards that are difficult to gain because they are in limited supply (e.g. tournament Prizes) or expensive (e.g. King’s Court) a positive effect of gaining it, is not necessarily because players don’t realize the card’s strength. I would rather see it as a measure for how good it is to be the one that built the deck / got lucky to gain them.
- Gaining Provinces or buying Salt the Earth is a direct sign that the player scores or is in a position to win the game. The reason for this would have to be found somewhere else most likely. Similarly, cards that are often gained to 3-pile for a win (e.g. Candlestick Maker) show a positive effect for the player gaining a lot of them.

Finally, let me remind you that this only uses the games of the top, and you would likely find different results for lower ranked players.

So much for the summary stats for each card. Most of the underlying information and much more can be found on the google sheets starting from here: I hope that it is more or less self-explanatory for someone who wants to dig deeper. I’ll just point out what else can be found there. First there are tabs on that sheet with stats for the whole database. Then, there are separate sheets for the different players that have a bunch of games (or were interested in them). Those are linked from the overview tab. Most useful for a general audience are:

- better player: presenting the stats from the perspective of the player with the higher mu.
- winner: presenting the stats from the perspective of the winner of each game.
- 5-2 opening: limits the sample to the games in which exactly 1 player had a 5-2 opening (more precisely either drew 0 or 3 Estates/Shelters for their turn 1) and presents the stats from their perspective.

On the general sheets, there is a tab with

- opening gains: e.g. double Ambassador was opened 17% of the times by the better player and 15% by the subsequent winner.
- gain 1st: which card is the first one that is bought / gained / trashed a certain number of times by the better player? (Ambassador is the first card to be bought twice on 32% of Ambassador boards.)
- gain 1st Qvist: does the same within the Qvist-cost-categories (Ambassador was the first $3-cost card to be bought twice on 40% of Ambassador boards.)
- empty piles: has the chance of each pile in the kingdom of running out and the distribution of game end conditions.
- gains&plays: how often is each card gained, how often are its copies played? (The better player gains 1.6 Ambassadors and plays 8.8 Ambassadors in the average game, making it 5.5 plays for each Ambassador until game end.)
- impact factor: this was motivated by the discussion in this thread. It tries to measure how much different boards with a certain card are in terms of buying, gaining and trashing compared to the average board. For that purpose, it compares how much the probability of all other cards to be bought or the average number of cards to be bought changes. It also shows which cards are affected the most in a positive or negative way. Note that for normal card pairs only about 20 games with the combo are in the database such that the effect must be strong to show up. Nevertheless, the usual suspects for power combos do show up.
- impacted tab shows which cards drive the impact factor the most. Intuitively, those are the cards that are most board dependent. (e.g. Peddler for number of being bought, Fortress for number of being trashed).

The individual sheets have a tab that compares the buys / gains / trashes of the named baseline player with their opponents and one tab that shows the distribution of the number of buys / gains / trashes of each card.

Then they have the “gain 1st" and “gain 1st Qvist" tabs for that player only.

The boards tab has some aggregate statistics for boards with that card: average number of turns, average number of buys / gains / trashes. Then it has the first player advantage (not a lot of effect there) and the change in the win probability for that player. For the named players and the 5-2 opener it shows how much they outperformed expectations when that card is on the board. (e.g. being the only player to open 5-2 on a Witch board gives you a 15% outperformance, that is a 65% win chance against an equal opponent with random start.) For the better player sheet this column shows the skill multiplier (whether skill difference is more or less important on those boards).

I also tried to classify cards on the better-boards sheet in terms of being village, draw, trasher, gainer (and +buy), alt-VP, attack and types of attack. The idea was to see how the presence or absence / combination of these affects the win probability. Now, you could fill threads discussing the cards, my first try was to have them at value 0, 0.5, or 1 and then round down. (if there’s only a 0.5 Village on the board like Necropolis, the board counts as not having Villages).

Finally the logged game numbers used for the sheet with the kingdoms are on the last tab.

Have fun with the numbers and let me know what else you'd like to see!