Topic: Provincial: Cool looking Dominion AI (Read 22847 times)

Polk5440 · « **Reply #25 on:** February 23, 2013, 08:43:11 am »

Quote

A natural question to ask is what is the ratio of complex to simple kingdoms. It turns out that complex kingdoms are definitely the exception and the vast majority of kingdoms are simple, converging directly to a very good strategy in a small number of iterations. Out of a thousand kingdoms, I was only able to find about five with non-dominating behavior.

Even if finding the best strategy is difficult more often, this makes me a little sad.

Polk5440 · « **Reply #26 on:** February 23, 2013, 09:01:57 am »

Also, Matt, if you are fixing bugs, I noticed that Talisman does not actually gain you a copy of a card. (The instance I had was play Talisman, buy Treasure Map, and I only gained only one Treasure Map.)

techmatt · « **Reply #27 on:** February 23, 2013, 05:43:37 pm »

Quote

Shouldn't the hard coded strategies eventually get out competed, and then fall out of the opponent pool?

Except in very weird circumstances, they will (likely) eventually get weeded out as you suspect. I did something like this with big-money variants, but those are fairly simple so don't really have much impact. Using well-trained AIs could definitely help improve training times, although it already trains reasonably fast.

Quote

Do you have a measure of how many distinct kingdom cards dominant strategies tend to use?

Averaged over a few hundred games, it's typically 4-6 cards, although sometimes it's just a simple "if you have exactly 2$, buy a cellar" sort of thing.

Quote

If it tends to be low, you could imagine saving dominant strategies into a database, and then finding historical dominant strategies as initial candidates for a given set. If it's not low, you can imagine optimizing to prune cards out of a strategy so that it uses only 3 distinct and still maximizes its chances against the full strategies. These would then be good initial candidate strategies to any applicable sets.

Current plans are along these lines. At present the initial seed for the 1st generation is just random mutations of big money. Instead, it could be the X-best strategies from the Y-most similar kingdoms in the training set. That way, if we had say, Wharf or Mountebank, the initial pool will be flooded with these cards. Might quash some good/interesting strategies that manage to avoid these cards, but more likely than not it would converge to something similar with fewer generations.

Rabid · « **Reply #28 on:** February 23, 2013, 05:49:23 pm »

Thanks for posting about this techmatt, really interesting AI you have made.
Would you mind sharing it isotropic user name by any chance?
I would be interested in seeing some of its old games on Council Room.
http://councilroom.com/

techmatt · « **Reply #29 on:** February 23, 2013, 05:52:33 pm »

Quote

Even if finding the best strategy is difficult more often, this makes me a little sad.

I'll admit I was a bit surprised myself. The reason I made the big "tournament matrix" on the right was mostly because I expected it to actually be rare that a single strategy dominated. Naturally, there might be some difference between this AI and a near-optimal player, but I suspect that even in the case of a very good player there is still typically a single dominant strategy that cleverly balances offense vs. defense (and any other compromise that needs to be made).

More concretely, if you look at the tournament matrix in these two kingdoms:
http://graphics.stanford.edu/~mdfisher/Images/Dominion/simpleGameA.png
http://graphics.stanford.edu/~mdfisher/Images/Dominion/simpleGameB.png
They all have the same "strictly dominating" structure; A > B > C > D > E, and not something more "rock-paper-scissors" like A > B > C > A.

techmatt · « **Reply #30 on:** February 23, 2013, 06:22:20 pm »

Quote

Also, Matt, if you are fixing bugs, I noticed that Talisman does not actually gain you a copy of a card. (The instance I had was play Talisman, buy Treasure Map, and I only gained only one Treasure Map.)

I am fixing bugs, thanks for pointing this one out. Sometimes when I change things to accommodate new cards something I've already implemented/tested gets broken and I only find out because the AI refuses to pick the card (which is a pretty wise choice, when talisman does nothing).

Quote

Would you mind sharing it isotropic user name by any chance?

At the moment I don't have a very good interface to Isotropic, and just have to manually copy+paste the entire webpage *each decision* into a special "Isotropic dominion state machine extractor" program, that then parses the page and spits out the AIs decision, which I have to manually translate back into the Isotropic interface. It's pretty hilariously inefficient, which is why I haven't run many games, and I decided to just use the username of one of my friend's; I'll poke him while I decide on if I want to work out a more clean Isotropic interface (which I suspect won't be worth the effort, both as Isotropic is going down and an AI isn't as interesting as human opponents).

rrenaud · « **Reply #31 on:** February 23, 2013, 07:06:22 pm »

It doesn't surprise me that most games of dominion have one dominant strategy. Dominion gets a lot of its replayability from the variable kingdom set selection, I doubt it would be a good a game with a fixed set of 10 kingdom cards (see the 'A few acres of snow' balance debacle for evidence).

Have you considered sending goko a mail and working with integrating your AI into their system?

soulnet · « **Reply #32 on:** February 23, 2013, 07:15:07 pm »

Quote from: techmatt on February 23, 2013, 06:22:20 pm

an AI isn't as interesting as human opponents).

I strongly disagree, having an AI that plays incredibly fast let you test things that is hard to do with humans. If there were an AI on Iso I would have played it a lot when I don't have a lot of time to spend or want to have the hability to cancel games in the middle without annoying a fellow player. The only reason I play several games I have on my computer instead of playing on Iso all the times is this flexibility that computers give as I don't feel a need to consider their feelings (please refrain from making comments with Matrix/Terminator/other sci-fi induced jokes).

EDIT: Is also great when you are new and are trying to learn all the cards at once, which was my case as I went into Iso only having played Base IRL a few times. Reading and understanding all the cards before a game can be annoying to other players who want a fast casual game.

techmatt · « **Reply #33 on:** February 23, 2013, 10:06:56 pm »

Quote

Have you considered sending goko a mail and working with integrating your AI into their system?

I've thought about it but haven't contacted them yet. Toying around with the goko code, integration seems like it would be fairly straightforward. Getting wider playability is also nice for any statistical approach (I'm naturally familiar with councilroom's data-mining, but it's still difficult or impossible to fully reconstruct the state at each time-step). However I was planning on finishing implementing the rest of the cards and switching to non-copyrighted versions of the artwork before I contact them.

Quote

I don't have a lot of time to spend or want to have the hability to cancel games in the middle without annoying a fellow player

Quote

Is also great when you are new and are trying to learn all the cards at once

These are good reasons, although it's unfortunate (but understandable) that Isotropic is going down as writing an "API interceptor" for that would be simpler than integrating with Goko. Of course Goko already has an AI; the main point would be making playing the AI actually challenging. Provincial isn't as good as it could be yet, but integrating its "pre-game buy analysis" with Goko's "mid-game card play awareness" is likely to be fairly effective. Although a few of the AIs advantages are a bit brutal (wishing well, for example, it can just pick the most likely card left in its deck, which it can always know precisely).

Toskk · « **Reply #34 on:** February 23, 2013, 10:49:46 pm »

Quote from: techmatt on February 23, 2013, 10:06:56 pm

Although a few of the AIs advantages are a bit brutal (wishing well, for example, it can just pick the most likely card left in its deck, which it can always know precisely).

Even brutal would be better than Goko's current AI implementation of Wishing Well, though.. I've watched that AI play Scout.. followed by Wishing Well.. and fail to pick the card on top.

Donald X. · « **Reply #35 on:** February 24, 2013, 06:55:00 am »

Quote from: techmatt on February 23, 2013, 10:06:56 pm

These are good reasons, although it's unfortunate (but understandable) that Isotropic is going down as writing an "API interceptor" for that would be simpler than integrating with Goko. Of course Goko already has an AI; the main point would be making playing the AI actually challenging. Provincial isn't as good as it could be yet, but integrating its "pre-game buy analysis" with Goko's "mid-game card play awareness" is likely to be fairly effective. Although a few of the AIs advantages are a bit brutal (wishing well, for example, it can just pick the most likely card left in its deck, which it can always know precisely).

Of course sometimes you want to make sure you don't draw the card, and sometimes you only want to draw the card if it's dead, and sometimes you want to gamble on something more helpful than the most common card.

Goko has said that they will let people write bots for them. Dunno when that will happen.

It has always seemed to me, and this seems obvious but why not say it, that the best approach to tactics is to have two algorithms. One is full of ad hoc logic and just makes each decision as best it can, like what Goko has. The other algorithm considers each possible move, and for each move it plays out the rest of the game however many times you can do quickly enough, with each player using the ad hoc logic, and then it picks the move that did the best. This will let you do clever things that would normally be wrong, but won't find a combination of two normally wrong things.

Strategy seems much harder, since it necessarily involves combinations, and you only have so much time between turns. Pick-order seems good for the early game; obv. the endgame has important factors missing there, and sometimes the game can end any moment on a huge turn that runs out piles. You might just want an unrelated endgame algorithm. And then it seems great to re-evaluate your strategy as of turn 4, with where your initial buys ended up being the major factor but obv. endless other things about the specific state contributing.

@Toskk: They know about the Scout/Wishing Well thing; obv. they simply don't track their knowledge of the top cards.

Schneau · « **Reply #36 on:** February 24, 2013, 08:07:10 am »

Quote from: Donald X. on February 24, 2013, 06:55:00 am

It has always seemed to me, and this seems obvious but why not say it, that the best approach to tactics is to have two algorithms. One is full of ad hoc logic and just makes each decision as best it can, like what Goko has. The other algorithm considers each possible move, and for each move it plays out the rest of the game however many times you can do quickly enough, with each player using the ad hoc logic, and then it picks the move that did the best. This will let you do clever things that would normally be wrong, but won't find a combination of two normally wrong things.

You are essentially describing a version of the minimax algorithm adjusted for nondeterministic games. The main problem here is that the branching grows exponentially for each decision, as well as exponentially for each random event (like what cards you draw). So, you can only look so far into the future in a short period of time.

Donald X. · « **Reply #37 on:** February 24, 2013, 08:49:54 am »

Quote from: Schneau on February 24, 2013, 08:07:10 am

Quote from: Donald X. on February 24, 2013, 06:55:00 am
It has always seemed to me, and this seems obvious but why not say it, that the best approach to tactics is to have two algorithms. One is full of ad hoc logic and just makes each decision as best it can, like what Goko has. The other algorithm considers each possible move, and for each move it plays out the rest of the game however many times you can do quickly enough, with each player using the ad hoc logic, and then it picks the move that did the best. This will let you do clever things that would normally be wrong, but won't find a combination of two normally wrong things.

You are essentially describing a version of the minimax algorithm adjusted for nondeterministic games. The main problem here is that the branching grows exponentially for each decision, as well as exponentially for each random event (like what cards you draw). So, you can only look so far into the future in a short period of time.

No, not what I am describing. We consider each possible move, and for each possible move we play out the rest of the game. However! When playing the rest of the game we *do not* consider each possible move - we use the ad hoc logic there to make decisions. It's not recursive.

Schneau · « **Reply #38 on:** February 24, 2013, 09:10:04 am »

Quote from: Donald X. on February 24, 2013, 08:49:54 am

Quote from: Schneau on February 24, 2013, 08:07:10 am
Quote from: Donald X. on February 24, 2013, 06:55:00 am
It has always seemed to me, and this seems obvious but why not say it, that the best approach to tactics is to have two algorithms. One is full of ad hoc logic and just makes each decision as best it can, like what Goko has. The other algorithm considers each possible move, and for each move it plays out the rest of the game however many times you can do quickly enough, with each player using the ad hoc logic, and then it picks the move that did the best. This will let you do clever things that would normally be wrong, but won't find a combination of two normally wrong things.

You are essentially describing a version of the minimax algorithm adjusted for nondeterministic games. The main problem here is that the branching grows exponentially for each decision, as well as exponentially for each random event (like what cards you draw). So, you can only look so far into the future in a short period of time.
No, not what I am describing. We consider each possible move, and for each possible move we play out the rest of the game. However! When playing the rest of the game we *do not* consider each possible move - we use the ad hoc logic there to make decisions. It's not recursive.

Ah, I see. The thing is, this doesn't gain you much (or anything) over just using the ad hoc logic, since the future-looking part of it assumes that the ad hoc logic will be used, but when that point in the actual game happens, the look-forward search will happen. I could see it gaining a little, but using the ad hoc logic for the forward-looking search makes the process highly dependent on the quality of the ad hoc logic - and if your ad hoc logic is that good, you might as well use that instead!

Donald X. · « **Reply #39 on:** February 24, 2013, 09:43:38 am »

Quote from: Schneau on February 24, 2013, 09:10:04 am

Ah, I see. The thing is, this doesn't gain you much (or anything) over just using the ad hoc logic, since the future-looking part of it assumes that the ad hoc logic will be used, but when that point in the actual game happens, the look-forward search will happen. I could see it gaining a little, but using the ad hoc logic for the forward-looking search makes the process highly dependent on the quality of the ad hoc logic - and if your ad hoc logic is that good, you might as well use that instead!

I don't agree with your argument at all. This could turn into one of those things I regret pouring time into, so so much for that.

Schneau · « **Reply #40 on:** February 24, 2013, 11:21:13 am »

Quote from: Donald X. on February 24, 2013, 09:43:38 am

Quote from: Schneau on February 24, 2013, 09:10:04 am
Ah, I see. The thing is, this doesn't gain you much (or anything) over just using the ad hoc logic, since the future-looking part of it assumes that the ad hoc logic will be used, but when that point in the actual game happens, the look-forward search will happen. I could see it gaining a little, but using the ad hoc logic for the forward-looking search makes the process highly dependent on the quality of the ad hoc logic - and if your ad hoc logic is that good, you might as well use that instead!
I don't agree with your argument at all. This could turn into one of those things I regret pouring time into, so so much for that.

That's fine - I could very well be wrong! I do research in artificial intelligence, but do not have much background in game AI.

GwinnR · « **Reply #41 on:** February 24, 2013, 12:07:24 pm »

I've played with this some times yet and now I want to make an Simulation-tournament, but I don'T get, how it works.
I go to "game options" and write the required cards down, but when I want to "run tournament" he doesn't take the cards I wrote.
Can someone tell me what I'm doing wrong?

techmatt · « **Reply #42 on:** February 24, 2013, 01:19:48 pm »

Quote

No, not what I am describing. We consider each possible move, and for each possible move we play out the rest of the game. However! When playing the rest of the game we *do not* consider each possible move - we use the ad hoc logic there to make decisions. It's not recursive.

Donald's argument is fine and is a pretty good algorithm for Dominion. The more general version of this strategy is called Monte-Carlo tree search, and is what's used for games with really deep branching factors like Go. The main advantage in the case of Dominion is that you can avoid the need to try and define a "state value function", that will numerically decide how good each of the many trade-offs in the game is (what is the relative value of getting the "discard down to 3" Militia effect vs. depleting 3 victory cards from the top of your deck for a better next turn?) Since Dominion games aren't that long, Donald is just suggesting running simulations that go the end of the game directly, after making a single decision, and choosing the option with the best expected value.

If you look at the top-right figure of http://sander.landofsand.com/publications/Monte-Carlo_Tree_Search_-_A_New_Framework_for_Game_AI.pdf -> this algorithm is just skipping the expansion phase. Select -> Simulate -> Backpropogate and repeat until you feel confident enough about your choice.

A simple extension to the algorithm would be to consider branching out slightly deeper (probably just to depth 2 or 3), and having certain decisions where the heuristic is just called directly (example: generally when playing treasure cards, all this simulation work is just going to be wasted). The real complication is when deciding what to buy -- this could work, but it's equally likely to result in some really weird (and probably bad) purchases; there isn't a clear "heuristic choice" for buys. For that (I'd argue) you need to use a much more sophisticated engine (which is the part I was focusing on, for this reason).

Quote

You might just want an unrelated endgame algorithm

You probably would, but even this can do reasonably well. For example, to answer the end-game question of "I have 5 coins; should I buy a lab or a duchy"? This algorithm will give you the ad-hoc expectation of which is more likely to win you the game. I'd bet this is pretty well correlated with the "correct" answer. Naturally for things like buying out a 3rd pile to force an early end, you'd need something more specialized.

techmatt · « **Reply #43 on:** February 24, 2013, 01:21:43 pm »

Quote

I go to "game options" and write the required cards down, but when I want to "run tournament" he doesn't take the cards I wrote.
Can someone tell me what I'm doing wrong?

Click on "New Kingdom Cards" after you enter the required cards. This will update the current game (you should see it reflected back in the "Main" window), then click Run Tournament.

GwinnR · « **Reply #44 on:** February 24, 2013, 01:24:08 pm »

Quote from: techmatt on February 24, 2013, 01:21:43 pm

Quote
I go to "game options" and write the required cards down, but when I want to "run tournament" he doesn't take the cards I wrote.
Can someone tell me what I'm doing wrong?

Click on "New Kingdom Cards" after you enter the required cards. This will update the current game (you should see it reflected back in the "Main" window), then click Run Tournament.

Thanks! Works now! Very cool job you've done there!

Edit: But there are not all cards required, do they?

Donald X. · « **Reply #45 on:** February 24, 2013, 01:38:11 pm »

Quote from: techmatt on February 24, 2013, 01:19:48 pm

The real complication is when deciding what to buy -- this could work, but it's equally likely to result in some really weird (and probably bad) purchases; there isn't a clear "heuristic choice" for buys. For that (I'd argue) you need to use a much more sophisticated engine (which is the part I was focusing on, for this reason).

Yes, I would do the thing I said for short-term "tactics," more specifically decisions on how to play out turns, other than what to buy/gain; but it doesn't work for long-term "strategy," more specifically buys/gains, again because for strategy you need to consider combinations of decisions, which this doesn't.

techmatt · « **Reply #46 on:** February 24, 2013, 01:50:14 pm »

Quote

Edit: But there are not all cards required, do they?

If you specify fewer than 10 cards, it will just pick the new ones randomly. Although it doesn't currently have every card in every published expansion.
You could also go into data/parameters.txt and set useCustomCards to true, if you want more variety. You can browse the custom cards in data/custom, although they're disabled by default.

GwinnR · « **Reply #47 on:** February 24, 2013, 04:12:35 pm »

Quote from: techmatt on February 24, 2013, 01:50:14 pm

Although it doesn't currently have every card in every published expansion.

Ok, that was that, what I ment. I was a little astonished, because I wanted to try a set and it only always only got three cards of it.

Tables · « **Reply #48 on:** February 25, 2013, 06:36:36 pm »

If this were implemented into Goko, could the algorithm be sped up by say starting off using Lord Bottington's rules, then running the learning stuff?

techmatt · « **Reply #49 on:** February 25, 2013, 10:22:05 pm »

Quote

If this were implemented into Goko, could the algorithm be sped up by say starting off using Lord Bottington's rules, then running the learning stuff?

Yes, although the simpler solution is just to instead "pre-train" it in on 10,000+ randomly sampled kingdoms. Then training would only take a few seconds, as you run the best AIs of nearby kingdoms against each other to decide which are the best. In this case, you wouldn't really need to run the learning phase.

Dominion Strategy Forum

News:

Author Topic: Provincial: Cool looking Dominion AI (Read 22847 times)