The clearest heuristic reasoning I can give is due to the saturation effects (like the 2 max vs 3 max) that I gave above.
Consider the limit case where a superstar wins 99% of games against average players in 2p games. Each average individual has a 1 in 100 chance of winning the lottery or hitting the hail mary to win. You might expect him to win 98% or something like that of his 3p games (he just needs to dodge 2 independent, rare events instead of 1), and so his 2p and 3p win rate would be something like 1.98 and 2.94 respectively.
The behavior stays at lesser skill differentials, but obviously to a lesser degree. I suggest you run/tweak this small program until you believe me, or can come up with a similarly small program that shows that I am wrong.
#!/usr/bin/python
import random
GOOD_P_MAX = 120
AVG_P_MAX = 100
def sim(strength_list):
outcomes = [random.random() * s for s in strength_list]
winner = max(outcomes)
# ignore ties, will basically never happen with random floats anyway.
return outcomes.index(winner)
def sim_many(strength_list, N):
ret = [0 for i in strength_list]
for i in xrange(N):
ret[sim(strength_list)] += len(strength_list)
for ind, v in enumerate(ret):
ret[ind] = v / float(N)
return ret
print sim_many([GOOD_P_MAX, AVG_P_MAX], 100000)
print sim_many([GOOD_P_MAX, AVG_P_MAX, AVG_P_MAX], 100000)
# sanity check that order doesn't matter
print sim_many([AVG_P_MAX, AVG_P_MAX, GOOD_P_MAX], 100000)
output:
[1.1701600000000001, 0.82984000000000002]
[1.3330200000000001, 0.83103000000000005, 0.83594999999999997]
[0.83675999999999995, 0.83099999999999996, 1.3322400000000001]