Still nowhere near a solution
I think the causal diagram here is
Number Distribution -> P1 Action
|
-----------------------> P2 Action
So by observing the number distribution, P2 can learn up to maximum entropy (~5,5 bits per box) about P1's Action. But this itself is completely useless because we're not graded on how similar or different P1's and P2's actions look. I mean, you could also just fix them in advance.
I don't think it is possible, given this diagram, to learn more about the number distribution. Therefore, I think that P(P2 found the right number) = 0.5. And ditto for any other player. I think all 100 players will 0.5 chance of finding their number.
If this is true, the trick is contained entirely in making the successes correlated. P(P2 found her number | P1 found her number) > 0.5. Not completely sure about this, though.