WW, without giving away your intellectual property, can you tell us how your algorithm compares to other Bayesian skill-rating systems (e.g. TrueSkill), either theoretically or in simulated competitions?
Short answer, no.
Long answer, how can I do this? I mean, what do you mean compares, exactly? And what data am I going to test it on - simulated competitions really don't make any sense, and it's not like there's a bunch of real data sitting around. Or maybe there is, but neither have I gone through it, nor have I gone through a full implementation of such other systems myself. And what are you measuring by?
I don't think your questions have obvious answers, but you'll want to address them if you're seriously thinking about patenting or publishing your algorithm. It's probably worth looking at the approaches used to test TrueSkill and the Glicko system.
TrueSkill:
http://research.microsoft.com/pubs/67956/NIPS2006_0688.pdfGlicko:
http://tennis-skill-rankings.googlecode.com/hg-history/c977c53a3af2913e780e39666fe1a272cc298319/links/glicko.pdfHerbrich & Graepel compare TrueSkill to Elo on predictive performance, match quality, win probability, and convergence rates. They're a bit imprecise about how they calculate their metrics, but their basic criteria are:
- Ratings differences should be a reliable way of predicting the outcome of a match.
- A match between similarly rated players should be (relatively) likely to result in a draw.
- Each player's long-run record against similarly rated players should be close to 50%.
- Ratings should converge quickly
They work with experimental data from the beta testing of Halo 2, which is team-based so not appropriate here. I'll bet you could get your hands on a large database of online chess matches though.
Glickman uses simulated data where outcomes are determined by a random draw of performance p_i from a normal distribution N(s_i,b), where players i=1,2 have individual skill levels s_i and shared performance variance b. Actually, that's the algorithm that Elo is designed for, but Glickman modifies is by assuming that skill levels vary over time, following a Bayesian random walk with standard deviation v. He evaluates the Glicko based on the average deviation between it's estimates of s_i and v and their actual values.
There's a fair amount of other academic work on the topic, but these seem to be the best-received (or at least most cited). Their tests aren't gospel, but they're a reasonable place to start.