So I wanted to say some more things here, because this has been bugging me a bit.

First of all, it wasn't my intention to try to insult or attack grbsmd or his work here. It seems some people have gotten that impression, and this was not my intention. Most specifically to grbsmd himself, if you feel this way, I really am sorry for that.

I also feel like while my gut told me that something was wrong of the analysis that 'these are significant findings', I didn't present a terribly good explanation of why that isn't true. I'm going to try to do that now.

That the details of the bootstrap procedure isn't really important (it would be in a peer-reviewed scholarly paper, but whatever, this isn't, we can just take your word here that you did it 'a lot'). You don't need to give standard deviations on this (I'm presuming what you quoted are actually standard errors, but again, whatever, no big deal), though, because when you're bootstrapping, you can just give the exact percentage of bootstraps you crossed your threshold. Also, Standard Deviations of a metric like r don't really make sense, as it is a summary statistic; it would be like asking what the standard deviation of mean player skill is, or the standard deviation of the maximum. But anyway, that's not the point - even if I don't believe what 10 SE implies, these numbers probably are quite assuredly different from zero in a 'statistical significant' sense. But basically what that means is, we're really sure they're different from zero. It doesn't tell us how far from zero they are.

The biggest point I have to make is my original one: these numbers show that players' abilities to correctly rate the general strength of cards is not a very big chunk of how strong they are. I'm going to quote the rebut of my original statement here:

So, the biggest thing to note here is that all of the numbers are tiny. You're not finding anything significant. Well, maybe statistically significant, but not practically so.

I'd argue that finding correlations this large is actually fairly substantial. If we assume that how often you buy a card is independent of other cards (which is a fairly reasonable assumption as far as independence assumptions go, since in full random the chance of getting any two specific cards in a kingdom is ~0.002%), then the r^2 values range from 0.7% to 4%. This means that statistically, I can explain 4% of the variation in skill among players simply by looking at how often the player buys Governor. If you sum up the top 20 cards on the weighted list, that explains 29% of the variance in the skill.

That's huge. This doesn't even include things like how cards are played once they're bought, when to start greening, etc. So the fact that we can explain so much of the variance in skill simply by a how often a few cards are bought is a really big deal.

First of all, as has been pointed out, some of this is down to cause vs effect. I usually win when I get more provinces. Is that because I value province more? No - in fact, I'm pretty sure I value province less than most players. But when I get more, I am just more likely to win. It's like the old John Madden quote "the team that scores more points - well, they usually win the game".

Moreover, I don't think you're looking at these numbers correctly. I assume that you are taking gain rate when available in the kingdom, rather than cards bought per game (in which case, you'd only get information about set ownership really drowning out most everything else). Which means you can't really combine all these different values together. Moreover, you can't add the 'variance explained' at all. If you wanted to do that, you would want to multiply, 96% of the variance remains unexplained from the first card, 96% from the second leads us to a bit more than 92% remaining unexplained after two. The difference is pretty small between two cards, but once you're compounding 200 times, it will add up.

Most importantly, though, you really can't combine these together at all. You ran a whole bunch of correlations between the single card's gain rate and player skill. This gives you a bunch of different things. However, what you WANT to do is run one multiple correlation. You really should only get one combined r. And the independence assumption breaks down, HARD. If I don't buy an A, that means I probably bought a B. The things are absolutely related to each other, though again, some of what you'll see on the right is that better players buy more stuff overall, but that is because they are better, not why they are better. On the left, you're going to end up seeing that overall, it's going to be something like 5% (or less) of the variance in skill is explained by knowing if a card is good or bad. On the right, it will be somewhat higher, but again, I think this difference is mostly down to the cause/effect imbalance.

I mean, quick back-of-the-napkin shows that, because you only ever have 10 kingdom cards (yes, there are edge cases), if you multiply across a kingdom, even if it's close to the highest-scoring one, you're going to get 1-.99^10, ~= 9.6% of the variation in the skill between the players comes from knowing which cards are more rawly powerful than the others. Once you correctly take independence into consideration, and/or take an average set of 10 cards, I expect that will come down a lot from this even.

The thing is, yes, some cards are better than others, but by far, there is a lot more skill in knowing what is good or bad on a particular board. And then more skill yet in knowing how to sequence things, adjust to the gamestate and opponents' plans, etc. Knowing raw card skill is just a really small thing, and one that's pretty easy to pick up on.