Topic: Idea for an AI - state evaluation (Read 3321 times)

gman314 · « **on:** August 15, 2013, 12:10:23 pm »

A big problem for any attempted Dominion AI is that there are a lot of possible options on each turn, and it's hard to tell what's the best one. This is particularly true of Buy options; generally it's fairly easy to play a given hand in the best way. Since playing is relatively easy, the rest of this discussion will assume that the AI has good play rules and will focus on buy phase decisions.

My idea for an AI is one which imagines putting one card into its deck and determines what its likelihood of winning is. It does so for each possible card it could buy (you can probably cut some cards - For instance why consider Copper when you can get Silver?) and then buys the card that gives it the highest likelihood of winning. Now, how does it evaluate a state and all the hypothetical states? To evaluate any game state, it uses some sort of (basic) strategy to play out the rest of the game and determines its own probability of winning. For instance, the play-out strategy could just be to follow BMU buy rules for the rest of the game from whatever deck it now has. It also assumes that the opponent uses the same play-out strategy from its current deck.

This approach will obviously learn a strategy on any board, as its approach is independent of the cards available. BMU is an obvious play-out strategy, because it's a baseline available on any board, although other strategies are possible, including more complex options. In the case of difficult play decisions, this approach can be used as well, by imagining playing out the hand several ways with a random deck and determining the one which is the best according to some evaluation (possibly most $, but also possibly the play which yields the best possible deck state by some evaluation - this is important for TfB as you may want to Remodel a Gold or something like that). In the case of a turn with multiple buys, it can just use the buy determination model several times, with the available $ decreasing with each iteration. Here it is important to allow the bot to buy nothing, as it could otherwise end up getting a bunch of Copper by deciding that buying Copper was better than buying Curse.

Major downfalls of this approach are that it will likely skip some good strategies. For instance, since Ironworks isn't good for BMU, it would likely ignore something like Ironworks-Gardens if BMU were its play-out strategy. Also, I would be very interested in seeing what kinds of decks it builds. It may skip Villages entirely because they don't help BMU, but it may also take an approach of getting some terminals like Smithy and Militia and then discovering that maybe it should pick up a Village and then get into some sort of engine. It might also completely underrate +Buys for similar reasons.

A nice upside of this approach is that it will likely automatically play PPR, as this is just part of playing out the state to consider the effect of buying a Province when there are two left.

ftl · « **Reply #1 on:** August 15, 2013, 12:52:41 pm »

Quote from: gman314 on August 15, 2013, 12:10:23 pm

A big problem for any attempted Dominion AI is that there are a lot of possible options on each turn, and it's hard to tell what's the best one. This is particularly true of Buy options; generally it's fairly easy to play a given hand in the best way. Since playing is relatively easy, the rest of this discussion will assume that the AI has good play rules and will focus on buy phase decisions.

A big part of what makes Dominion hard is that these things which seem separate are actually intertwined.

For example, play decisions have to be intertwined with buy decisions. When you have one action left and two terminals in hand, which one you play depends heavily on, among other things, what you want to buy - if you have a Saboteur and a terminal silver in hand, whether you play the sab could easily depend on whether you have enough money to afford whatever you want to buy anyway. (Or, Militia vs Count, or Noble Brigand vs a terminal silver, and so on.)

It's not just the fact that there are so many decisions. It's that decisions you think are independent at first glance really aren't.

The 'evaluation function' for how good your deck is depends on your long-term strategy. If you're going for Vineyards, then a deck full of Pearl Divers, Oases, Hamlets, and a few potions is pretty darn good. But if you're using an evaluation function to determine your long-term strategy, then that dependence is circular - if you put in a BMU-like evaluation function, you'll come out with a BMU-like strategy.

blueblimp · « **Reply #2 on:** August 15, 2013, 04:50:32 pm »

This ought to be viable and is mainly a way to improve the tactical & medium-term play of existing bots, not to invent a strategy from whole cloth (except for simple BM+X).

One problem is that the play-out rules need to be pretty good for your opponent too, which you don't control, or else the results will be complete garbage. Still, this could do okay on kingdoms where both players are going for Provinces, or if there are multiple play-out rules available and the simulator tries to pick the most appropriate one.

Another tricky thing about this approach, in general, is that it's hard to avoid having the bot cheat by making decisions based on unknown information. The simplest form is the order of the decks, but that's simple enough to resolve by re-shuffling before each simulation (but be careful to leave known top & bottom cards where they are). A nastier example: imagine a game with Menageria, Militia, and Moat. Now imagine you played a Militia, your opponent didn't reveal a Moat. Also imagine that your buying decision depends on whether your opponent has a Moat in hand. You don't know whether they have a Moat in hand, because they could have chosen not to reveal it--but the fact that they didn't reveal one does (in a conditional probability sense) make it less likely that they had one in their hand in the first place.

That problem can be worked around by making sure the simulation is set up using only information that's known to the bot. Sometimes that will mean a reduction in bot strength, but that's better than unintentional cheating. Also, it'll be necessary to take that approach anyway if it's wanted to be able to play online eventually.

One final note is that sometimes a single buy just won't make such a big difference that it's easy to notice in a small number of simulations. This might not be a problem, since if that's the case then maybe it wasn't too important a decision anyway, and also a well-optimized simulator should be able to run through sims of BM+X pretty quickly.

I say go for it, although the fact that BM+X is so weak in the current state of the game makes it a lot less interesting to study than it used to be.

DG · « **Reply #3 on:** August 15, 2013, 07:30:33 pm »

Quote

To evaluate any game state, it uses some sort of (basic) strategy to play out the rest of the game and determines its own probability of winning.

There is a problem right here in the simplest part of the solution. There is no obvious milestone for predicting the end of the game. A strategy that gives six provinces by turn 16 may sound good but if they come three in turn 15 and three in turn 16 then it may all be too late. On the other hand a strategy that can buy one vp card a turn might always lose to a strategy that can close out with an extra buy. This complexity is present even without considering three pile endings and alternate vp.

If you construct a play strategy for the opponent and then simulate two decks playing against each other (to resolve the winning criteria) you have more than doubled the complexity of the problem (of strategy creation) and probably lowered the accuracy of the results. Following the Provincial AI method from the start is probably more practical.

rrwoods · « **Reply #4 on:** August 15, 2013, 07:39:01 pm »

Biggest problem I can see with this is: each time you attempt to compute your current win percentage, you need to play out the game (basically) including all future hands and buys (and gains). Are you going to test every possible deck ordering on every shuffle?

ftl · « **Reply #5 on:** August 15, 2013, 08:14:06 pm »

The problem isn't the number of deck shuffles to test; you can simulate high enough numbers of those fast enough, Geronimoo's simulator has shown. The problem is that this approach is fundamentally recursive. You need to simulate future decisions, and if you're making decisions via simulation, well... that's bad. If you ever try to do a simulation-within-a-simulation, that's entirely unmanageable.

Whereas if you make future decisions by some hard-coded method, that method is going to basically determine your strategy - because that will lead to making "the buy that fits best with the hard-coded simple simulation rules". If you say "make this decision assuming the rest of the game you're playing BMU", then you'll never buy any cards that don't work with BMU or go for any strategy that doesn't transition gracefully into BMU.

pst · « **Reply #6 on:** August 15, 2013, 09:04:20 pm »

Quote from: gman314 on August 15, 2013, 12:10:23 pm

For instance, the play-out strategy could just be to follow BMU buy rules for the rest of the game from whatever deck it now has.

Then when determining whether to buy a Treasure Map you'll see what happens if you never buy a second Treasure Map; when determining whether to buy a Fool's Gold you'll see what happens if you never buy a second Fool's Gold; when determining whether to buy a Potion you'll see what happens if you never use that Potion for anything; etc.

blueblimp · « **Reply #7 on:** August 15, 2013, 10:36:55 pm »

Quote from: pst on August 15, 2013, 09:04:20 pm

Quote from: gman314 on August 15, 2013, 12:10:23 pm
For instance, the play-out strategy could just be to follow BMU buy rules for the rest of the game from whatever deck it now has.

Then when determining whether to buy a Treasure Map you'll see what happens if you never buy a second Treasure Map; when determining whether to buy a Fool's Gold you'll see what happens if you never buy a second Fool's Gold; when determining whether to buy a Potion you'll see what happens if you never use that Potion for anything; etc.

Yes, which is why it needs to be added to an existing bot. This sort of approach is a complement to approaches based around evolving buy rules. Evolved buy rules may not be great at subtle buy ordering tweaks and reacting to your opponent's play, whereas that's exactly what this AI technique is good at.

popsofctown · « **Reply #8 on:** September 13, 2013, 08:44:04 am »

BMU would never be the best playout for the rest of the game. It needs to be more like, "What if I put this Envoy in my deck? I have Envoy tagged as useful for BM-dead draw. What are my win chances with BM-dead draw after I add the Envoy to my deck?
What are my win chances with the same Envoy, but with Village-Envoy-> double Province?

What are my win chances if I buy the Spice Merchant instead and use my Peddlerspam directives?"

Then you still have the problem that the bot can't discover new archetypes or identify those unusual situations where you are kinda doing two archetypes at once. But it's a pretty decent paradigm.

Dominion Strategy Forum

News:

Author Topic: Idea for an AI - state evaluation (Read 3321 times)

gman314

Idea for an AI - state evaluation

ftl

Re: Idea for an AI - state evaluation

blueblimp

Re: Idea for an AI - state evaluation

DG

Re: Idea for an AI - state evaluation

rrwoods

Re: Idea for an AI - state evaluation

ftl

Re: Idea for an AI - state evaluation

pst

Re: Idea for an AI - state evaluation

blueblimp

Re: Idea for an AI - state evaluation

popsofctown

Re: Idea for an AI - state evaluation