Okay. Think I'm counting right again now? 3p pod means 216 order-invariant possible result lines, with 8 of these giving perfect scores, for a ~3.7% chance. 4p pod means 13824 lines with 216 of them giving perfect scores, for a rate of only 1.5625%. As the number of possibilities grows, though, it becomes increasingly difficult to count them, because there's LOTS of different cases, especially for the middle ones like 18 points.
So first you are trying to calculate the chance for a pod to have at least one player qualify, given some threshold.
Note that
8/216=1/27=1/81*3 (3 players each have 1/81 chance to win 4 straight games)
216/13824=1/64=1/256*4
is identical the first table you have given. Yes, if you go down further it will start to be different for obvious reasons (that is, the probability can not be larger than one so obviously you cannot just multiply that number by 3 or 4), but it is more due to the fact that when there are more than one player qualifying then a naive multiplication counts those instances multiple times. It is not due to the fact that sometimes one sequence can prevent another (for example, when one player scores 1111 the others can't.) Actually when there are mutual exclusiveness the probability will be that number multiply by the number of players.
A better question to ask perhaps is given a threshold what is the expected number of players per pod to qualify. This question can also be answered more easily. And more importantly, the answer to this question is the same as if we treat the record of each player as independent.
The reason is as follows: let us denote the record for a player (ex. something like 1-4-3-2) R. R1 denotes the record of the first player, etc. Now in a 4p game, R1, R2, R3, R4 are four correlated random variables; say if R1=1111 then R2 cannot be 1234. Notice there are conditional probability as well; ex. given R1=1221 the chance for R2 to be 2134 will be different from not knowing R1=1221.
We can encode all this information in the joint probability density function P(R1,R2,R3,R4): P(1111,1234,R3,R4)=0, etc.
Let us also define the threshold function Q(R), basically when a record qualifies it returns 1. So for example if threshold=22, (with 6-4-2-0 scoring) then Q(1111)=Q(1112)=Q(1121)=Q(1211)=Q(2111)=1 and the rest is zero.
The expected number of qualified players is then <Q(R1)+Q(R2)+Q(R3)+Q(R4)>.
Now here is an important observation. When we know nothing about other players, the probability of say P1 to have a certain record is the same as our previous calculation, which let us call it p(R) (That is, assuming equally skilled players, the chance to score 1111 is 1/256, 1112 and the like 4/256, etc, etc.) In equations it reads
sum_(R2,R3,R4)P(R1,R2,R3,R4)=p(R1).
So, even though they are not independent (ie, P(R1,R2,R3,R4)!=p(R1)p(R2)p(R3)p(R4)), the marginal probability is the same. Intuitively, since you know nothing about others, correlations don't matter.
The expected number of player to qualify, then can be calculated as
<Q(R1)+Q(R2)+Q(R3)+Q(R4)>=<Q(R1)>+<Q(R2)>+<Q(R3)>+<Q(R4)>=4 <Q(R)>.
That is, the expected number of qualified player per pod is just the chances you tabulated times the number of players.
Now as we see the difference of the chance between 3p and 4p is oscillating, which means that it does not favor 3p pods in particular. (I do agree if you look in detail at the table there are more spots favoring 3p though; but I think this is coincidental. That is, you can change this by changing the number of games played. And at the very top of course it favors 3p, which is not surprising as it's easier to get to first place in 3p, and at the top you need to get to first place.) There is a factor of 4/3 from the number of players, but this is expected. More players in a pod should generate more qualified players.
But the point is, a 3p pod is more likely to produce players at these higher point threshholds, even if any particular player may not be so much favoured. Which is important insofar as the 3p pods increase the threshold more quickly than 4p pods do.
Not sure this is right. As is tabulated 3p pod is more likely to produce 24 pts players but less likely to produce 22 and up. at 20-21 they are almost equal. Anyway this goes back to the same table. If you say that table favors 3p then ok, it favors 3p.
If I add an extra player to the pod, it is more likely that the top player in that pod will score worse, even with the added points injected. Certainly you see this at the top end, which is what is really important.
Not sure how you reach the conclusion with two cancelling factors (harder to win vs. more points at 2nd place), but this comes back to the question that whether the expected score of a player represents his skill well, across different number of players. This scoring method ensures that for an average player it will work. For player at higher end, it depends on the game itself, so yeah, maybe your feeling is right, but I didn't find convincing argument for that.