:minor necro:
I'll try to trick your guys intuition into believing the right answer. Consider the case of an incoming contender with no previous data. You just have that wide, mu = 25, big sigma^2 initial distribution for the players skill.
Do you still want to ignore the information of her getting into round X when making predictions about him getting into round X + 1?
At least for this case, you agree that taking the intermediate tournament results into account will help your prediction, right?
blueblimp is entirely correct. I believe what's happening here, rrenaud, is that your intuition is saying "surely updating on the fact that she made it to round X moves our belief about her skill up, because there's far more probability mass in (she got to round X with high skill) than (she got to round X with low skill)". This is entirely accurate in a world where we have uncertainty about her skill. blueblimp's model is different. The initial mu=25, s^2 large distribution is a measure of our uncertainty about her skill. We have two basic options:
1. Calculate the odds of her winning round 1 while incorporating that uncertainty, then update our uncertainty to one thing in the branch of the problem where we suppose she won and to another thing in the branch of the problem where we suppose she lost. Keep going until we have the probabilities of all n! outcomes, combine those with the same winners.
2. Instead of propagating and updating the uncertainty, resolve it artificially right now. Sample a single point randomly from that distribution and suppose that is her actual skill. No uncertainty left. Now do the math as if we had no skill uncertainty, find the distribution of winners, record it, repeat the whole thing a bajillion times, and combine the resulting distributions to get the win probabilities.
(between these options, it's possible to narrow down player skills by sampling and then propagate the uncertainty, but I think that's not useful if our uncertainties start out nice and normal)
Which of these two is better? I think 2. converges to 1. in the limit and is much easier to code... Let's see what 1. would look like. I'll make the falsebad assumption that TrueSkill makes about uncertainty, which is that we can keep it normal the whole way.
One round: New (25, 8.33^2) versus blueblimp (42.4, 2.3^2)
The probability that blueblimp's skill is X greater than New's follows N(42.4-25,2.3^2+8.33^2), and a player with X greater skill wins with probability 4^(X/25)/(4^(X/25)+1). Plugging into Wolfram Alpha (1) that's a 71.5% chance blueblimp wins, and their new skill distributions according to (3) are
blueblimp wins: New (24.258, 7.802) and blueblimp (42.457, 2.291).
New wins: New (38.998, 5.594) and blueblimp (41.332, 2.253)
Now just do that lots of times and you can get the full table! Tools like a language with a numerical integration module may make this easier.
(1)
http://www.wolframalpha.com/input/?i=int%281%2Fsqrt%282*pi%29*1%2F%282.3%5E2%2B8.33%5E2%29%5E.5*e%5E-%28%28X-%2842.4-25%29%29%5E2%2F%282*%282.3%5E2%2B8.33%5E2%29%29%29*4%5E%28X%2F25%29%2F%284%5E%28X%2F25%29%2B1%29%2CX%3D-inf..inf%29
(2)
http://atom.research.microsoft.com/trueskill/rankcalculator.aspx