Dominion Strategy Forum

Dominion => Dominion Online at Shuffle iT => Dominion General Discussion => Goko Dominion Online => Topic started by: ThaddeusB on September 02, 2014, 05:45:27 pm

Title: working on a new stats database - what questions do you want answered
Post by: ThaddeusB on September 02, 2014, 05:45:27 pm
I am working on a log scraper/database tool for goko and am interested it what kind of stats questions people may have. Ultimately there will hopefully be a Councilroom style web interface, but don't get too excited as that is a long ways off. Progress is slow so far due to both the complexities of Dominion and goko's inconsistent logs... When complete I should be able to answer all kinds of things as I am capturing each turn, not just kingdom cards and players. To help get the data structure right, I am asking for the type of questions that you are curious about - they can be specific or vague. For example, how often does the player with more actions played win? What is the damage done by card x missing the 2nd shuffle? What is the most VP scored in a loss? And so on. Some things are a lot harder to code for than others (CR stuff is mostly easy), and some things may be impossible to answer if not  pre-planned for, so I'm asking for feedback in advance. Thanks!
Title: Re: working on a new stats database - what questions do you want answered
Post by: Awaclus on September 02, 2014, 06:09:53 pm
First player advantage for each card would be pretty cool. Assuming that you're going to implement the CR features already.
Title: Re: working on a new stats database - what questions do you want answered
Post by: rrenaud on September 02, 2014, 06:10:59 pm
One suggestion on implementation.  Don't worry about working around all the bugs/craziness you find the logs.  Detect a problem and just throw the whole game away rather than adding hacks to the parsing code.  Aim to parse say, 95% of the logs that are reasonably well formed.  Keep track of how many game logs you are discarding, but be willing to lose a few.

Writing log parsers and working around bugs sucks.
Title: Re: working on a new stats database - what questions do you want answered
Post by: silverspawn on September 02, 2014, 06:20:03 pm
First player advantage for each card would be pretty cool. Assuming that you're going to implement the CR features already.
agreed

I don't think the "when does more action played win" is particularly interesting, since low level play will dominate here, and it doesn't really tell much at all.
Title: Re: working on a new stats database - what questions do you want answered
Post by: rrenaud on September 02, 2014, 06:28:26 pm
One thing to learn from CR.

Everyone second guesses the global stats because there are lots of games from crappy players.

Pick some decent heuristic metric for "good player", (say, in top 200 at time of game), and in addition to the global stats, provide stats just from that subpool of players.
Title: Re: working on a new stats database - what questions do you want answered
Post by: Beyond Awesome on September 02, 2014, 06:53:48 pm
One thing to learn from CR.

Everyone second guesses the global stats because there are lots of games from crappy players.

Pick some decent heuristic metric for "good player", (say, in top 200 at time of game), and in addition to the global stats, provide stats just from that subpool of players.

Second this. Also, use the iostropish leaderboard to determine top 200, not the Goko board for obvious reasons.

First turn advantage of cards as has been said.

Plus win rate with a card vs. not getting said card

Also, data on opening splits
Title: Re: working on a new stats database - what questions do you want answered
Post by: theblankman on September 02, 2014, 07:15:08 pm
Plus win rate with a card vs. not getting said card
Along the same lines, is there significant correlation between winning and:
- gaining your first copy of a card before opponent
- gaining more copies of a card than opponent
- playing a card for the first time before opponent
- playing a card more times than opponent

(Hypothesis: These will all turn out heavily in favor of my least favorite card: Cultist :) )
Title: Re: working on a new stats database - what questions do you want answered
Post by: Awaclus on September 02, 2014, 07:36:32 pm
Plus win rate with a card vs. not getting said card
Along the same lines, is there significant correlation between winning and:
- gaining your first copy of a card before opponent
- gaining more copies of a card than opponent
- playing a card for the first time before opponent
- playing a card more times than opponent

(Hypothesis: These will all turn out heavily in favor of my least favorite card: Cultist :) )
Actually the "first time before opponent" stats could be graphs where X is the turn the card is first bought/gained and Y is the advantage it gives for the player who did it.
Title: Re: working on a new stats database - what questions do you want answered
Post by: ThaddeusB on September 02, 2014, 07:53:16 pm
One thing to learn from CR.

Everyone second guesses the global stats because there are lots of games from crappy players.

Pick some decent heuristic metric for "good player", (say, in top 200 at time of game), and in addition to the global stats, provide stats just from that subpool of players.

Yes, that had occurred to me.  I will do one of two things - either I'll retroactively calculate the TrueSkill ratings myself or I'll use the current Iso leaderboard.  The first is a bit more work, but would allow filtering based on a player's rating at game time, which should be more accurate than current rating.  Either way, it'll be something that will be a parameter that can be adjusted to preference.
Title: Re: working on a new stats database - what questions do you want answered
Post by: ThaddeusB on September 02, 2014, 07:57:20 pm

Also, data on opening splits

Do you mean win % with different opening pairs, win rate with 5/2 vs. 4/3, or something else?
Title: Re: working on a new stats database - what questions do you want answered
Post by: Beyond Awesome on September 02, 2014, 08:00:49 pm

Also, data on opening splits

Do you mean win % with different opening pairs, win rate with 5/2 vs. 4/3, or something else?

I mean both of what you just said.
Title: Re: working on a new stats database - what questions do you want answered
Post by: theblankman on September 02, 2014, 08:38:45 pm
Plus win rate with a card vs. not getting said card
Along the same lines, is there significant correlation between winning and:
- gaining your first copy of a card before opponent
- gaining more copies of a card than opponent
- playing a card for the first time before opponent
- playing a card more times than opponent

(Hypothesis: These will all turn out heavily in favor of my least favorite card: Cultist :) )
Actually the "first time before opponent" stats could be graphs where X is the turn the card is first bought/gained and Y is the advantage it gives for the player who did it.
Another possibility for X is number of turns between player A's first gain or play and B's.  This might be instructive in a case like Mercenary, i.e. is it a bigger deal if I play Merc three turns before you vs one turn. 

Meanwhile I thought of another: Correlation between the presence of a given card in a kingdom, and the variety of cards gained during games, i.e. which cards statistically lead to high-variety or low-variety decks (I have some suspects, like Cultist and Rebuild for low variety, Fairgrounds for high variety, but in this kind of analysis the surprises are often the interesting part). 
Title: Re: working on a new stats database - what questions do you want answered
Post by: Awaclus on September 02, 2014, 08:54:30 pm
Plus win rate with a card vs. not getting said card
Along the same lines, is there significant correlation between winning and:
- gaining your first copy of a card before opponent
- gaining more copies of a card than opponent
- playing a card for the first time before opponent
- playing a card more times than opponent

(Hypothesis: These will all turn out heavily in favor of my least favorite card: Cultist :) )
Actually the "first time before opponent" stats could be graphs where X is the turn the card is first bought/gained and Y is the advantage it gives for the player who did it.
Another possibility for X is number of turns between player A's first gain or play and B's.  This might be instructive in a case like Mercenary, i.e. is it a bigger deal if I play Merc three turns before you vs one turn. 

Meanwhile I thought of another: Correlation between the presence of a given card in a kingdom, and the variety of cards gained during games, i.e. which cards statistically lead to high-variety or low-variety decks (I have some suspects, like Cultist and Rebuild for low variety, Fairgrounds for high variety, but in this kind of analysis the surprises are often the interesting part).
I'm pretty sure Cultist results in high variety, as it adds up to 5 unique cards to the game and usually puts them all in the players' decks, and works well in engines. Rebuild is definitely one that leads to low variety, and Scout and others that basically remove one card slot from the kingdom by taking it and being completely useless.

One thing to learn from CR.

Everyone second guesses the global stats because there are lots of games from crappy players.

Pick some decent heuristic metric for "good player", (say, in top 200 at time of game), and in addition to the global stats, provide stats just from that subpool of players.

Yes, that had occurred to me.  I will do one of two things - either I'll retroactively calculate the TrueSkill ratings myself or I'll use the current Iso leaderboard.  The first is a bit more work, but would allow filtering based on a player's rating at game time, which should be more accurate than current rating.  Either way, it'll be something that will be a parameter that can be adjusted to preference.
And I guess for some (all?) of these stats, it shouldn't count if only one player is "good".
Title: Re: working on a new stats database - what questions do you want answered
Post by: SCSN on September 02, 2014, 09:04:26 pm
It would be really nice if you could associate with each log the rating of both players before the game so that you can calculate the predicted win % (the isotropish code is on AI's github I think).

Once you have that you can calculate the "skill factor" of each card: for each game containing card A you assign the value "predicted win% of player who won" to that game. Then just take the average of that number over all kingdoms containing A. For a high-skill card like King's Court or Golem this number will be a lot higher than for a card that levels the playing field like Familiar or Cultist. I think this method is better than simply counting the number of upsets as it takes into account the magnitude of an upset.

Ideally I'd like a way to dynamically choose what data to include for those numbers, e.g. only games played between players who both have at least a specifiable rating.
Title: Re: working on a new stats database - what questions do you want answered
Post by: theblankman on September 02, 2014, 11:14:58 pm
I'm pretty sure Cultist results in high variety, as it adds up to 5 unique cards to the game and usually puts them all in the players' decks, and works well in engines. Rebuild is definitely one that leads to low variety, and Scout and others that basically remove one card slot from the kingdom by taking it and being completely useless.
I suppose it technically does, but anecdotally I think it causes "low variety strategies" where both players mainly want to buy and play lots of Cultists.  In that particular case I'd amend the criterion from just "gained" to "gained by choice, not due to an attack played by an opponent."  Or "gained during the player's turn," which is probably close enough and easier to implement. 
Title: Re: working on a new stats database - what questions do you want answered
Post by: Beyond Awesome on September 03, 2014, 12:08:36 am
Maybe the impact of the opening cards missing the first shuffle can also be calculated.
Title: Re: working on a new stats database - what questions do you want answered
Post by: Hydrad on September 03, 2014, 12:13:28 am
This feels like it will take a really long time to code all of these statistics in.
Title: Re: working on a new stats database - what questions do you want answered
Post by: Beyond Awesome on September 03, 2014, 12:21:54 pm
This feels like it will take a really long time to code all of these statistics in.

True. Well, whatever is easiest to code first, I guess. I am curious to see just about any stats of DA and Guilds cards.
Title: Re: working on a new stats database - what questions do you want answered
Post by: ThaddeusB on September 03, 2014, 01:12:37 pm
Thanks for all the feedback so far. It will be very helpful as development continues. I especially like the idea of using expected win %, as I think that will generate better ideas of card/opening/whatever strength.
 
This feels like it will take a really long time to code all of these statistics in.

The key is to intelligently set up the database from the beginning. Then it's a just a matter of figuring out the query necessary to get the info you want. Something like adding player ratings is more work, but just once and then it can be tied to every stat.

don't get too excited as that is a long ways off. Progress is slow so far

As of this morning 96/204 action cards are fully implemented and tested after about 1 month of work. That is not to say it is only half done - new cards get easier and easier to add as new effects get rarer. I'd guess the scraping is 2/3rds done. Then answers can start being generated as the web interface is built and back data filled in.
Title: Re: working on a new stats database - what questions do you want answered
Post by: ThaddeusB on September 09, 2014, 07:12:36 pm
Update: 120 action cards and most Kingdom treasures now implemented.  Counterfeit proved to be especially difficult, requiring several hours of effort to get right... While doing Counterfeit I cam across the following game, with the craziest turn yet tested: http://gokosalvager.com/static/logprettifier.html?20140721/log.50a4d41ee4b03214bb781b08.1405960866300.txt#1-16

Among other things, it includes a 40 coin play of Diadem.

Title: Re: working on a new stats database - what questions do you want answered
Post by: soulnet on September 10, 2014, 12:53:31 pm
There is probably a lot of people here that can code SQL (like me) and a significant percentage that either do not want or cannot code a parser, host a web service, code a UI, etc (like me).

So a good way to have a lot of statistics up and going in a short period of time would be:

Build the original database of game summarys with some of the data points and well indexed.
Have some sort of simple social network: allow people to write SQL select queries and save those queries, similar to what Dominiate allows for strategies. Then, each person can just play with the SQL and the favorites get saved and displayed nicely.
Maybe add some prewritten nice queries or predicates to be used in queries, like "consider only games among top 100 players" or something like that.

I would spend some type coding SQL for this, but I do not have time to learn and use the skills necessary for all the rest.

EDIT: If you worry about security, you can consider password-protecting adding custom SQL queries and sharing the password only with specific people you trust from the forums (or from somewhere else).
Title: Re: working on a new stats database - what questions do you want answered
Post by: blueblimp on September 10, 2014, 02:04:57 pm
Or just make available for download some kind of aggregated parsed game logs in a structured format. The two most imposing things for doing analysis on the logs, IMO, are collecting the game logs and parsing them.

On the original topic of the thread, I thought the most interesting and unexplored statistics in councilroom were the conditioned-gain and conditioned-buy statistics, such as "in what proportion of games with both Fool's Gold and Mine in the kingdom does a particular player gain Mine at some point".
Title: Re: working on a new stats database - what questions do you want answered
Post by: ThaddeusB on September 19, 2014, 03:45:12 pm
Update: About 160 action cards are now implemented, minus a small list of special interactions/scenarios I have to teak down logs for to see how goko handles them. I estimate the parser part of the project should be done the first week of October.
Title: Re: working on a new stats database - what questions do you want answered
Post by: Polk5440 on September 20, 2014, 09:11:34 am
So, what is the benefit from doing this from scratch? It's easier? Did you try to talk to the Council Room guys to see where they are at, or whether you could actually get Council Room up and running without having to start over?
Title: Re: working on a new stats database - what questions do you want answered
Post by: ThaddeusB on September 20, 2014, 04:49:14 pm
So, what is the benefit from doing this from scratch? It's easier? Did you try to talk to the Council Room guys to see where they are at, or whether you could actually get Council Room up and running without having to start over?

Isotopic and Goko logs are completely different, so there isn't really any option other than to start from scratch as far as a parser goes. I am also enabling different stats by tracking each hand card by card.
Title: Re: working on a new stats database - what questions do you want answered
Post by: Polk5440 on September 24, 2014, 12:35:51 pm
So, what is the benefit from doing this from scratch? It's easier? Did you try to talk to the Council Room guys to see where they are at, or whether you could actually get Council Room up and running without having to start over?

Isotopic and Goko logs are completely different, so there isn't really any option other than to start from scratch as far as a parser goes. I am also enabling different stats by tracking each hand card by card.


I agree if you had to start with where Council Room was for Iso, it's not worth inquiring, but I thought there was significant progress made towards getting Council Room to work with Goko, already. In fact, I thought they were almost done and the project just dropped because of a lack of manpower. Is this not right?

Contact info here: http://forum.dominionstrategy.com/index.php?topic=9198.msg283469#msg283469

or see comment here: http://forum.dominionstrategy.com/index.php?topic=8163.msg371865#msg371865

Anyone know for sure who is working on Council Room??? Or maybe I am delusional???



Title: Re: working on a new stats database - what questions do you want answered
Post by: ThaddeusB on September 29, 2014, 12:43:52 pm
So, what is the benefit from doing this from scratch? It's easier? Did you try to talk to the Council Room guys to see where they are at, or whether you could actually get Council Room up and running without having to start over?
II
Isotopic and Goko logs are completely different, so there isn't really any option other than to start from scratch as far as a parser goes. I am also enabling different stats by tracking each hand card by card.


I agree if you had to start with where Council Room was for Iso, it's not worth inquiring, but I thought there was significant progress made towards getting Council Room to work with Goko, already. In fact, I thought they were almost done and the project just dropped because of a lack of manpower. Is this not right?

Contact info here: http://forum.dominionstrategy.com/index.php?topic=9198.msg283469#msg283469

or see comment here: http://forum.dominionstrategy.com/index.php?topic=8163.msg371865#msg371865

Anyone know for sure who is working on Council Room??? Or maybe I am delusional???

To  answer the original question, I didn't attempt to use council room code because I didn't know a rewrite was attempted. Now that I know about it, it's too late to do me any good since I'm mostly done worth my parser. I did however look into it some and is not really clear to me if anyone actually started on a rewrite anyway - there was a lot of talk about it, but I didn't see any links to a new github or anything like that in the group chat. Regardless it is pretty clear no one has done any work on it in more than a year.

I'm now over 170  action cards complete, about 30 to go; still on track for an early October completion.
Title: Re: working on a new stats database - what questions do you want answered
Post by: ThaddeusB on October 17, 2014, 08:09:36 pm
I was unable to work on this the last two weeks or so (damn paid work interfering), but work has resumed.  About a dozen cards remain to be implemented.
Title: Re: working on a new stats database - what questions do you want answered
Post by: flies on November 03, 2014, 02:30:46 pm
i just wanted to thank you for making the effort!