Topic: Dominion Log Statistics (Read 9775 times)

markus · « **on:** September 17, 2018, 06:10:36 pm »

We’ve had some fun with that already on the Dominion Discord and I thought it was time to write up a summary.
Using ceviri’s tool woodcutter (http://ceviri.me/woodcutter/) I’ve logged games of top players played since the start of this year. Games qualified, if at least one player had skill (=mu) of at least 1.9 at the time. For all conclusions that you draw from this, keep in mind that this is really the right tail of the skill distribution (top 0.9%).
In addition, about 12% of the logged games are because of specific players that played them. I’m dropping all games that ended before turn 3 and those with more than 2 events/landmarks. The result are about 24,000 games that I use - and for which the logs can be found on Woodcutter.

The results of the log analysis can be found starting from this google sheet: https://docs.google.com/spreadsheets/d/1M2L7hcY3sbA33OwuZhgPYJWVlMFgJYBdK8cnkbJHmbo/edit#gid=0
Summary of the results can be found for each card in the form of images in this album: https://1drv.ms/a/s!AgOcGYxKWHVDnXKCXradFogAJnMu

Some caveats first: for about 5% of the games grabbing the log was not successful. Some of them might be because I lost connection, more worrisome are the ones that are not randomly dropped: Smugglers had a bug that made some of its games unloadable; when the last decision is an autoplay by the bot (primarily Changeling) it can’t be loaded; and there are some internal errors. There’s also a bug with Band of Misfits and Overlord such that it counts as the copy of the card it is when it is in play at game end – so I try to exclude those games when that matters.
A limitation of the logs is that the last decision is not recorded. That could be an innocuous “end buy phase”, but also buying the last Province.

In this post, I primarily want to describe what I did and what you can find there. There’s a lot of data so expect to find some outliers, if you start searching for them. For example, it’s intuitive that it’s good to have a 5-2 opening on a board that has Witch. That it’s good to have 5-2 on a board with Fountain is more likely to be noise.

Let’s start with the information included in the graphs, using Rebuild as an example:

There are 741 boards with Rebuild.
The first player has won 62% (more precisely, if both players had the same strength, the first player would win 62%). This is slightly higher than the 59% estimated across all boards. But the standard error of this estimate is 2.1%, so it’s not a (statistically) significant deviation from the usual first player advantage. This is an observation that I’ve made more generally: changes in first player advantage tend to be small: there is little signal relative to the noise, so don’t try to interpret too much into it - even if it makes sense that FPA should be higher on Rebuild boards. On the flipside, “little signal” means that we can be relatively sure that there are no cards which make the first player win 70% or more of the games.

What I call the “skill multiplier“ is 0.93 for Rebuild, which indicates that it favours the weaker player as it’s less than 1. The motivation for this estimate is that in theory the win probability of a player with skill difference ∆mu is given by winprob=1/(1+exp(-∆mu)). The skill multiplier is the factor that multiplies the skill difference in this formula such that the observed results on Rebuild boards are explained best: winprob=1/(1+exp(-∆mu*skill_multiplier)). A value less than 1 means that the difference in mu between the players gets effectively shrinked – the better player wins less than they should according to their skill advantage. For example, a player with a positive ∆mu=1 (that is 7.5 levels) should win 73.1% of the games in general, but only wins 71.7%. Again, I show the standard error for this estimate showing that it’s not significantly smaller than 1. Also note that the estimated skill multiplier across all boards is 0.94, such that better players always tend to underperform a bit. (My short explanation would be that for top players their skill estimate mu is too swingy – my mu has fluctuated between 1.9 and 2.3 this year and I don’t believe that a lot of this was actual skill changes. As a result, when my mu is low after a bad streak I outperform expectations and vice versa.)

Next on the top left are the usual game endings with that card on the board. As the last decision is not logged, the classification might not always be exact, but I’m following the rules: if there’s at most 1 Province or Colony in supply it counts as Province ending; if the supply is at most 1 card (not Province or Colony) away from a three-pile ending it counts as such. All other games count as resignation. Note that some games will be classified as both Province and 3-pile ending (more than should be in reality) and some 3-piles might wrongly count as resignation (e.g. two Ports left in supply that are bought with last buy for 3-pile). Over all games there are 39% Province endings, 28% 3-piles, and 35% resignations. Governor leads to many Province and few 3-pile endings and Goons is the other way around. Tournament games have a high rate of resigns.

In the bottom panels there is the histogram with the share of games in which each player gains a certain number (left) and for the difference in the number of gains (right).
You can roughly see whether that led to more wins or losses from the colours in those bars and the top panels have the details: first, the blue coloured lines show the estimate for the win rate with a certain number of gains as well as the 95% confidence interval. For Rebuild this suggests that a player who doesn’t gain any Rebuild wins more than 50% of the games and a player with 1 Rebuild wins fewer than 50% of the games. But this might reflect that better players are more likely to skip Rebuild and they would also win more often if they go for Rebuild. To take out this effect, I estimate the version corrected for skill in green. This version uses the skill difference as an explanatory variable such that the result is an estimate for how well the players do against an equal opponent. This reduces the effect of playing without Rebuild to basically 0 (49% win rate). The right panel shows the same using the difference in gains instead of the absolute number – in the case of Rebuild there’s nothing statistically significant there.

Some gain statistics are also summarized on the top:

How many copies are gained on average by the first and second player (also conditional on gaining at least one)?
How often is at least one copy gained by one or both players?
How often is at least one copy gained by one or both players in the opening, which I count as everything that happens before the first player’s turn 3?
How often does the only player who gains it - in general or in the opening – win the game? (This is again for the raw win rate and the one that corrects for skill differences. It also includes the standard errors for the estimates.)

Some thoughts on interpreting those numbers:

For cards that are widely accessible (e.g. Jack of All Trades) a win rate below 50% for the only player to gain it suggests that usually the player who skips it is correct.
For cards that are difficult to gain because they are in limited supply (e.g. tournament Prizes) or expensive (e.g. King’s Court) a positive effect of gaining it, is not necessarily because players don’t realize the card’s strength. I would rather see it as a measure for how good it is to be the one that built the deck / got lucky to gain them.
Gaining Provinces or buying Salt the Earth is a direct sign that the player scores or is in a position to win the game. The reason for this would have to be found somewhere else most likely. Similarly, cards that are often gained to 3-pile for a win (e.g. Candlestick Maker) show a positive effect for the player gaining a lot of them.

Finally, let me remind you that this only uses the games of the top, and you would likely find different results for lower ranked players.

So much for the summary stats for each card. Most of the underlying information and much more can be found on the google sheets starting from here: I hope that it is more or less self-explanatory for someone who wants to dig deeper. I’ll just point out what else can be found there. First there are tabs on that sheet with stats for the whole database. Then, there are separate sheets for the different players that have a bunch of games (or were interested in them). Those are linked from the overview tab. Most useful for a general audience are:

better player: presenting the stats from the perspective of the player with the higher mu.
winner: presenting the stats from the perspective of the winner of each game.
5-2 opening: limits the sample to the games in which exactly 1 player had a 5-2 opening (more precisely either drew 0 or 3 Estates/Shelters for their turn 1) and presents the stats from their perspective.

On the general sheets, there is a tab with

opening gains: e.g. double Ambassador was opened 17% of the times by the better player and 15% by the subsequent winner.
gain 1st: which card is the first one that is bought / gained / trashed a certain number of times by the better player? (Ambassador is the first card to be bought twice on 32% of Ambassador boards.)
gain 1st Qvist: does the same within the Qvist-cost-categories (Ambassador was the first $3-cost card to be bought twice on 40% of Ambassador boards.)
empty piles: has the chance of each pile in the kingdom of running out and the distribution of game end conditions.
gains&plays: how often is each card gained, how often are its copies played? (The better player gains 1.6 Ambassadors and plays 8.8 Ambassadors in the average game, making it 5.5 plays for each Ambassador until game end.)
impact factor: this was motivated by the discussion in this thread. It tries to measure how much different boards with a certain card are in terms of buying, gaining and trashing compared to the average board. For that purpose, it compares how much the probability of all other cards to be bought or the average number of cards to be bought changes. It also shows which cards are affected the most in a positive or negative way. Note that for normal card pairs only about 20 games with the combo are in the database such that the effect must be strong to show up. Nevertheless, the usual suspects for power combos do show up.
impacted tab shows which cards drive the impact factor the most. Intuitively, those are the cards that are most board dependent. (e.g. Peddler for number of being bought, Fortress for number of being trashed).

The individual sheets have a tab that compares the buys / gains / trashes of the named baseline player with their opponents and one tab that shows the distribution of the number of buys / gains / trashes of each card.
Then they have the “gain 1st" and “gain 1st Qvist" tabs for that player only.
The boards tab has some aggregate statistics for boards with that card: average number of turns, average number of buys / gains / trashes. Then it has the first player advantage (not a lot of effect there) and the change in the win probability for that player. For the named players and the 5-2 opener it shows how much they outperformed expectations when that card is on the board. (e.g. being the only player to open 5-2 on a Witch board gives you a 15% outperformance, that is a 65% win chance against an equal opponent with random start.) For the better player sheet this column shows the skill multiplier (whether skill difference is more or less important on those boards).

I also tried to classify cards on the better-boards sheet in terms of being village, draw, trasher, gainer (and +buy), alt-VP, attack and types of attack. The idea was to see how the presence or absence / combination of these affects the win probability. Now, you could fill threads discussing the cards, my first try was to have them at value 0, 0.5, or 1 and then round down. (if there’s only a 0.5 Village on the board like Necropolis, the board counts as not having Villages).
Finally the logged game numbers used for the sheet with the kingdoms are on the last tab.

Have fun with the numbers and let me know what else you'd like to see!

trivialknot · « **Reply #1 on:** September 17, 2018, 09:51:58 pm »

That's really neat!

I haven't looked through the spreadsheet yet, and I was just browsing the images. A few random observations...
-People who gained Dame Anna apparently didn't have a significantly improved chance of winning.
-Going by win rates for only one player receiving, the best boons are Earth, Field, Sea, and Wind (57%), and the worst ones are Moon, Mountain, and Sky (54%). But the confidence intervals are all 1.4% so they're all pretty close.
-The worst hexes to receive are Greed (43%), War (44%), and Misery (44%), and the least bad ones are Envy (48%), Delusion (47%), Bad Omens (47%), and haunting (47%). Confidence intervals are 1.5%. (Edit: as pointed out by markus, these are error bars not confidence intervals)
-Trusty Steed is the most popular prize (gained 77% of the time), closely followed by Followers (71%). Princess is 57%, Diadem is 32%, and Bag of Gold is 31%.
-Save is bought 5-6 times per game on average. That's more than Alms (4-5 times), so I think it might card with the highest buy/gain/receive rate.

Hey, is it possible to sort the cards by skill multiplier?

faust · « **Reply #2 on:** September 18, 2018, 01:47:28 am »

What is very interesting to me is that a lot of trashers have negative gain advantage. Amulet is at -11%, Raze even at -18%, and Chapel and Steward both at -4% (numbers where only 1 player gains it). That seems to indicate that trashing is overvalued in the current metagame.

markus · « **Reply #3 on:** September 18, 2018, 02:21:58 am »

Quote from: trivialknot on September 17, 2018, 09:51:58 pm

Hey, is it possible to sort the cards by skill multiplier?

This is in the sheets found here.
(It's called skill factor there and also includes games with Band of Misfits and Overlord as the only problem with them is identifying how many were gained.)
You can have "temporary filters" in google sheet to sort it directly there, but if you want to crunch the numbers a bit more I'd recommend downloading the sheet.

Top cards for the skill factor are Mountain Pass, Secret Cave, Donate, Bishop, Peasant. (going up to 1.3 such that you'd win for example 78.6% instead of 73.1%)
Bottom cards are Swindler, Chariot Race, Familiar, Fool, Hunting Grounds (going down to 0.7 such that you'd win for example 66.8% instead of 73.1%)

I just want to point out the +/- numbers are standard errors such that you have to add/subtract the number twice to get about a 95% confidence interval. And then keep in mind that with 400 estimates, you'd expect to see 20 that are outside of this interval - and that the large/small numbers in a top/bottom list are more likely to be affected by noise.

Cave-o-sapien · « **Reply #4 on:** September 18, 2018, 03:11:51 am »

Quote from: faust on September 18, 2018, 01:47:28 am

What is very interesting to me is that a lot of trashers have negative gain advantage. Amulet is at -11%, Raze even at -18%, and Chapel and Steward both at -4% (numbers where only 1 player gains it). That seems to indicate that trashing is overvalued in the current metagame.

Or people are choosing the wrong trasher when presented with several options.

faust · « **Reply #5 on:** September 18, 2018, 11:00:30 am »

Another surprising bit of data: Tax has a completely average first player advantage.

trivialknot · « **Reply #6 on:** September 18, 2018, 11:20:59 am »

Castles!

The likelihood that each castle will be gained are, in order:
73%, 60%, 51%, 47%, 43%, 38%, 33%, 26%.

Lower-ranked players are more likely to gain each Castle, so I think the most common situation is a lower-ranked player going for the Castles pile, and the higher-ranked player tactically swiping a few. Which ones are the best swiping targets?

Gain % for higher-ranked player / gain % for lower-ranked player:
35/38, 25/35, 22/29, 23/23, 19/23, 16/22, 16/17, 15/11

So the favorite swiping targets appear to be Humble, Haunted, Grand, and King's.

trivialknot · « **Reply #7 on:** September 18, 2018, 11:36:39 am »

Quote from: faust on September 18, 2018, 01:47:28 am

What is very interesting to me is that a lot of trashers have negative gain advantage. Amulet is at -11%, Raze even at -18%, and Chapel and Steward both at -4% (numbers where only 1 player gains it). That seems to indicate that trashing is overvalued in the current metagame.

I'm not sure that's true of trashers in general. Sentry is at +13%, Plan is +3%, Cemetery is +1%. There isn't an easy way to look at them all together though, so I'm not sure.

Raze in particular is interesting, because it's -18% if you correct for skill, and -1% if you don't correct for skill. That suggests to me that Raze really is overrated by higher-ranked players.

Awaclus · « **Reply #8 on:** September 18, 2018, 12:16:26 pm »

Quote from: faust on September 18, 2018, 11:00:30 am

Another surprising bit of data: Tax has a completely average first player advantage.

That's not that surprising IMO. A lot of the time, you just buy the exact same cards in the opening as your opponent does regardless of Tax.

trivialknot · « **Reply #9 on:** September 20, 2018, 07:41:59 pm »

Another question: can you extract statistics on Mountain Pass bids?

markus · « **Reply #10 on:** September 21, 2018, 02:45:02 am »

You should join Discord where I already posted that in the past.

First bidder is usually the player that didn't gain first Province:

Games with Mountain pass bidding taking place: 73%
Average turns before bidding taking place: 22.8
First bidder wins bid: 40%
Bid winner wins game: 55%
First bidder wins game: 39%
Average winning bid: 14.2
Median winning bid: 14

aku_chi · « **Reply #11 on:** February 01, 2019, 08:46:21 am »

markus is still collecting stats and presenting them better than ever! I recently made a video where I talk about how to find the stats, interpret them, and why I find them valuable. Hopefully this interests some people.

Honkeyfresh · « **Reply #12 on:** July 20, 2022, 06:25:52 pm »

Can we revive this and see all the data on the new cards/expansions since this thread etc?

Oh and if possible can you link to where we can input cards and pull data in the same way? I'd really like it and always feel bad asking others to do the homework for all the random stoned musings that I request left and right.

Oh and this is amazing AF. Really interesting to see. I would be very interested to see how cards errata changes affected their utility, as I have noticed a lot that even knowing the errata has changed my emotional autopilot causes me to forgo some cards based on muscle memory.

Honkeyfresh · « **Reply #13 on:** July 20, 2022, 06:33:58 pm »

Quote from: faust on September 18, 2018, 01:47:28 am

What is very interesting to me is that a lot of trashers have negative gain advantage. Amulet is at -11%, Raze even at -18%, and Chapel and Steward both at -4% (numbers where only 1 player gains it). That seems to indicate that trashing is overvalued in the current metagame.

wow. How is it possible that chapel could have a negative advantage? Might be that a lot of new players don't use it correctly (like not trashing 4 coppers e.g)

There are definitely times where raze/amulet/lookout can be just amazing and just a shitty stone in ur deck. But chapel just seems like it's never bad.

trivialknot · « **Reply #14 on:** July 20, 2022, 07:02:41 pm »

The Dominion Statistics are actively updated and maintained. They're regularly referenced in Discord, where there's a dedicated chatbot command to summon a graphical summary of statistics for any card. However, unless you're on Discord it's a tool that's easy to miss. It should honestly have its own Dominion Wiki page.

AJD · « **Reply #15 on:** July 20, 2022, 07:47:16 pm »

Oh gosh, I had completely forgotten this even existed! By all means, please do create a wiki page for it.

Honkeyfresh · « **Reply #16 on:** July 20, 2022, 08:11:23 pm »

Quote from: trivialknot on July 20, 2022, 07:02:41 pm

The Dominion Statistics are actively updated and maintained. They're regularly referenced in Discord, where there's a dedicated chatbot command to summon a graphical summary of statistics for any card. However, unless you're on Discord it's a tool that's easy to miss. It should honestly have its own Dominion Wiki page.

oooh neat-o. Thanks a bunch!

Honkeyfresh · « **Reply #17 on:** July 20, 2022, 08:12:32 pm »

Quote from: AJD on July 20, 2022, 07:47:16 pm

Oh gosh, I had completely forgotten this even existed! By all means, please do create a wiki page for it.

I'm almost afraid to pan through all the slides on the onedrive link since they contain outdated info and i'd just spend hours browsing the old data b/c it's so neat.

Awaclus · « **Reply #18 on:** July 21, 2022, 01:51:35 am »

Quote from: Honkeyfresh on July 20, 2022, 06:33:58 pm

wow. How is it possible that chapel could have a negative advantage? Might be that a lot of new players don't use it correctly (like not trashing 4 coppers e.g)

There are definitely times where raze/amulet/lookout can be just amazing and just a shitty stone in ur deck. But chapel just seems like it's never bad.

It might be that a lot of new players don't use it correctly, but that's not the reason why the stats are like this, since they only take into account games with players with µ >= 1.9 and I don't think very many of them ever get matched against n00bs.

If we look at the up-to-date DomBot stats, it turns out that being the only player to open Chapel gives you a 49%+/-2.7% chance of winning when corrected for skill, which is to say that the disadvantage there is not statistically significant. The disadvantage from being the only player to gain Chapel, which is statistically significant at 44%+/-3.1% when corrected for skill, therefore mostly comes from games where the only player who gained Chapel gained it after the opening, which is a thing that people do as a desperate attempt to recover when they've fallen behind in thinning e.g. because their non-Chapel trasher missed the shuffle a lot or they got junked a lot or something, and it makes complete sense that they would have fewer wins in that scenario.

Dominion Strategy Forum

News:

Author Topic: Dominion Log Statistics (Read 9775 times)