Another way to crudely estimate this effect in percentage terms is the following.
Assume the following:
- Every player has a true "win probability" x1 or x2 that is their %win against an average player in a long match where both players go first equally often, and these win probabilities are roughly normal over the population with some standard deviation s.
- There is a true 1st player advantage t, and win probabilities are linear so that if x1 plays x2 and goes first, w(x1)=t+x1-x2+1/2.
Now suppose we sample two players x1 and x2 randomly from the game-population. Assume they played an average player in their last game. Then x1 goes first with probability (1-x1+x2)/2 and x2 goes first with probability (1+x1-x2)/2.
The win rate of the player who goes first overall is t+1/2 - (x2-x1)^2
This is equal to the overall reported "first player win percentage".
x2-x1 is normally distributed around zero, so the expected value of its square is twice the variance of each part, so E(x2-x1)^2 = 2s^2.
Sloppy quick looking at win pcts around rank 100 suggests s~0.08 or so, which means that the suppression of t from the player selection bias would be around 1.2% (win pct). So if overall CR data suggests that the first player wins 55-45 (what is the actual number?), an unbiased number would be perhaps 56-44 or so.
Now this model is pretty crude, but it would be easily adapted to monte carlo if someone had different priors on the composition of the playing population really wanted to know the answer to this.