Topic: Simulation Challenge - Endgame Polish (Read 3809 times)

WanderingWinder · « **on:** January 11, 2012, 08:06:25 pm »

I'm looking for an improvement to the Big Money Ultimate bot. For the record, here's the standing champion BMU I have (made a small improvement earlier today):

Code: [Select]

<player name="Big Money Ultimate"
 author="WanderingWinder"
 description="The optimized strategy that buys only treasure.">
 <type name="UserCreated"/>
 <type name="TwoPlayer"/>
 <type name="SingleCard"/>
 <type name="Province"/>
 <type name="BigMoney"/>
 <type name="Bot"/>
 <type name="Optimized"/>
   <buy name="Province">
      <condition>
         <left type="getTotalMoney"/>
         <operator type="greaterThan" />
         <right type="constant" attribute="18.0"/>
      </condition>
   </buy>
   <buy name="Duchy">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="smallerOrEqualThan" />
         <right type="constant" attribute="4.0"/>
      </condition>
   </buy>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="smallerOrEqualThan" />
         <right type="constant" attribute="2.0"/>
      </condition>
   </buy>
   <buy name="Gold"/>
   <buy name="Duchy">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="smallerOrEqualThan" />
         <right type="constant" attribute="6.0"/>
      </condition>
   </buy>
   <buy name="Silver"/>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="smallerOrEqualThan" />
         <right type="constant" attribute="3.0"/>
      </condition>
   </buy>
</player>

The way I'm thinking to do this is in the endgame. I talk about my idea some here in this post, about halfway down the post. Basically, what I'm thinking about is this: if there are two provinces left, and the duchies are split 5-3, the estates are basically going to be irrelevant. The leader needs to get one province before the trailer gets two, and the trailer needs to grab both. So you shouldn't really be buying estates.
I'm sure there are similar scenarios with the duchies, where you should be preferring gold later than you normally would, but off the top of my head, I can't come up with any specifics.
Here's my first stab at doing this for the estates:

Code: [Select]

<player name="Big Money UltimateB"
 author="WanderingWinder"
 description="The optimized strategy that buys only treasure.">
 <type name="UserCreated"/>
 <type name="TwoPlayer"/>
 <type name="SingleCard"/>
 <type name="Province"/>
 <type name="BigMoney"/>
 <type name="Bot"/>
 <type name="Optimized"/>
   <buy name="Province">
      <condition>
         <left type="getTotalMoney"/>
         <operator type="greaterThan" />
         <right type="constant" attribute="18.0"/>
      </condition>
   </buy>
   <buy name="Duchy">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="smallerOrEqualThan" />
         <right type="constant" attribute="4.0"/>
      </condition>
   </buy>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="equalTo" />
         <right type="constant" attribute="2.0"/>
      </condition>
      <condition>
         <left type="countVP"/>
         <operator type="smallerThan" />
         <right type="countMAXOpponentVP"/>
         <extra_operation type="plus" attribute="2.0" />
      </condition>
      <condition>
         <left type="countVP"/>
         <operator type="greaterThan" />
         <right type="countMAXOpponentVP"/>
         <extra_operation type="minus" attribute="2.0" />
      </condition>
      <condition>
         <left type="countCardsInSupply" attribute="Duchy"/>
         <operator type="equalTo" />
         <right type="constant" attribute="0.0"/>
      </condition>
   </buy>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="equalTo" />
         <right type="constant" attribute="1.0"/>
      </condition>
      <condition>
         <left type="countVP"/>
         <operator type="smallerThan" />
         <right type="countMAXOpponentVP"/>
         <extra_operation type="plus" attribute="2.0" />
      </condition>
      <condition>
         <left type="countVP"/>
         <operator type="greaterThan" />
         <right type="countMAXOpponentVP"/>
         <extra_operation type="minus" attribute="2.0" />
      </condition>
      <condition>
         <left type="countCardsInSupply" attribute="Duchy"/>
         <operator type="equalTo" />
         <right type="constant" attribute="0.0"/>
      </condition>
   </buy>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="equalTo" />
         <right type="constant" attribute="2.0"/>
      </condition>
      <condition>
         <left type="countVP"/>
         <operator type="smallerThan" />
         <right type="countMAXOpponentVP"/>
         <extra_operation type="plus" attribute="11.0" />
      </condition>
      <condition>
         <left type="countCardsInSupply" attribute="Duchy"/>
         <operator type="equalTo" />
         <right type="constant" attribute="0.0"/>
      </condition>
   </buy>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="equalTo" />
         <right type="constant" attribute="1.0"/>
      </condition>
      <condition>
         <left type="countVP"/>
         <operator type="smallerThan" />
         <right type="countMAXOpponentVP"/>
         <extra_operation type="plus" attribute="5.0" />
      </condition>
      <condition>
         <left type="countCardsInSupply" attribute="Duchy"/>
         <operator type="equalTo" />
         <right type="constant" attribute="0.0"/>
      </condition>
   </buy>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="equalTo" />
         <right type="constant" attribute="2.0"/>
      </condition>
      <condition>
         <left type="countVP"/>
         <operator type="greaterThan" />
         <right type="countMAXOpponentVP"/>
         <extra_operation type="minus" attribute="11.0" />
      </condition>
      <condition>
         <left type="countCardsInSupply" attribute="Duchy"/>
         <operator type="equalTo" />
         <right type="constant" attribute="0.0"/>
      </condition>
   </buy>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="equalTo" />
         <right type="constant" attribute="1.0"/>
      </condition>
      <condition>
         <left type="countVP"/>
         <operator type="greaterThan" />
         <right type="countMAXOpponentVP"/>
         <extra_operation type="minus" attribute="5.0" />
      </condition>
      <condition>
         <left type="countCardsInSupply" attribute="Duchy"/>
         <operator type="equalTo" />
         <right type="constant" attribute="0.0"/>
      </condition>
   </buy>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="smallerOrEqualThan" />
         <right type="constant" attribute="2.0"/>
      </condition>
      <condition>
         <left type="countCardsInSupply" attribute="Duchy"/>
         <operator type="greaterThan" />
         <right type="constant" attribute="0.0"/>
      </condition>
   </buy>
   <buy name="Gold"/>
   <buy name="Duchy">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="smallerOrEqualThan" />
         <right type="constant" attribute="6.0"/>
      </condition>
   </buy>
   <buy name="Silver"/>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="smallerOrEqualThan" />
         <right type="constant" attribute="3.0"/>
      </condition>
   </buy>
</player>

It beat what I had at the top when I did an ultimate simulation, but only by 28 games out of 100,000.
Anybody want to try to do better?

rod- · « **Reply #1 on:** January 11, 2012, 08:54:44 pm »

Is PPR already hardcoded somewhere into the suicide mechanism? I vaguely recall it was so i will operate under that assumption...If not, it's clearly a large improvement to be made.

I'd almost think that province buys shouldn't be automatic every time you have 8, but codified based on the number of provinces left.

If 8-4 provinces are left, just buy it.
If 3 left, only buy if vp differential <-18 (this number is a bit sketchy, but when you're behind 4 provinces to 1...you only stand a chance if you get a 5-3 duchy split and they never get a province, so why not try to get that 5-3 split ASAP instead of giving them the extra turns)

(Probably not worth a whole lot, but it has to come up a few times in 10000 games)

If 2 left, buy if vp differential >0 (>=0 if you're 2nd player) or if between -7 and-8.5 (you want to win if you get both of them in 2 turns, but assume your opponent will at least manage a duchy, and obviously you buy it if you're already winning)

Also, endgame sort of depends on your opponents last several turns, particularly in BM. Maybe there needs to be a GetOpponentTotalMoney function?

In fact, if a GOTM function were implemented, there should also be one for OpponentTotalMoneyPlayedThisShuffle, and OpponentCardsInDeck / OpponentCardsPlayedThisShuffle so you can do something like:
If (GOTM-OTMPTS) / (OCID-OCPTS) <8, Go for it, else take the duchy.

WanderingWinder · « **Reply #2 on:** January 11, 2012, 09:14:04 pm »

Quote from: rod- on January 11, 2012, 08:54:44 pm

Is PPR already hardcoded somewhere into the suicide mechanism? I vaguely recall it was so i will operate under that assumption...If not, it's clearly a large improvement to be made.

I'd almost think that province buys shouldn't be automatic every time you have 8, but codified based on the number of provinces left.

If 8-4 provinces are left, just buy it.
If 3 left, only buy if vp differential <-18 (this number is a bit sketchy, but when you're behind 4 provinces to 1...you only stand a chance if you get a 5-3 duchy split and they never get a province, so why not try to get that 5-3 split ASAP instead of giving them the extra turns)

(Probably not worth a whole lot, but it has to come up a few times in 10000 games)

If 2 left, buy if vp differential >0 (>=0 if you're 2nd player) or if between -7 and-8.5 (you want to win if you get both of them in 2 turns, but assume your opponent will at least manage a duchy, and obviously you buy it if you're already winning)

Also, endgame sort of depends on your opponents last several turns, particularly in BM. Maybe there needs to be a GetOpponentTotalMoney function?

In fact, if a GOTM function were implemented, there should also be one for OpponentTotalMoneyPlayedThisShuffle, and OpponentCardsInDeck / OpponentCardsPlayedThisShuffle so you can do something like:
If (GOTM-OTMPTS) / (OCID-OCPTS) <8, Go for it, else take the duchy.

Implementing PPR actually hurts most big money situations, because by the time you're that green, you don't get back to $8 all that often (either of you).
Doing what you say at the end is something you should actually do when playing, but even a few steps more complex than what I'm proposing here.

rod- · « **Reply #3 on:** January 11, 2012, 09:30:27 pm »

I suppose I can use the old simulator for Big Money, I was just resigned to the simulator not working for me anymore because of the windows not working properly in OSX in the recent versions...I'll play around with it for awhile.

DG · « **Reply #4 on:** January 11, 2012, 09:38:24 pm »

I'm assuming you're just wanting this to beat other money bots. If the optimisation goes too deep it can lose ground as a baseline deck.

rod- · « **Reply #5 on:** January 11, 2012, 09:52:15 pm »

Humorously, the BMU bot in this old version of the simulator trounces the original BMU bot you have listed above, solely on the back of buying duchies way sooner.

Maybe the bot's been inbred a little too much already.

WanderingWinder · « **Reply #6 on:** January 11, 2012, 10:02:42 pm »

Quote from: rod- on January 11, 2012, 09:52:15 pm

Humorously, the BMU bot in this old version of the simulator trounces the original BMU bot you have listed above, solely on the back of buying duchies way sooner.

Maybe the bot's been inbred a little too much already.

Really? Can you provide that bot?

rod- · « **Reply #7 on:** January 11, 2012, 10:07:58 pm »

Code: [Select]

<player name="BM - Big Money Ultimate">
   <buy name="Province"/>
   <buy name="Duchy">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="smallerOrEqualThan" />
         <right type="constant" attribute="5.0"/>
      </condition>
   </buy>
   <buy name="Estate">
      <condition>
         <left type="countCardsInSupply" attribute="Province"/>
         <operator type="smallerOrEqualThan" />
         <right type="constant" attribute="2.0"/>
      </condition>
   </buy>
   <buy name="Gold"/>
   <buy name="Silver"/>
</player>

It's also possible that i just missed the duchy -> gold -> duchy trigger order in the other bot when i was clumsily copying it into this old version. I'm a klutz!

WanderingWinder · « **Reply #8 on:** January 11, 2012, 10:14:57 pm »

Quote from: rod- on January 11, 2012, 10:07:58 pm

Code: [Select]
<player name="BM - Big Money Ultimate"> <buy name="Province"/> <buy name="Duchy"> <condition> <left type="countCardsInSupply" attribute="Province"/> <operator type="smallerOrEqualThan" /> <right type="constant" attribute="5.0"/> </condition> </buy> <buy name="Estate"> <condition> <left type="countCardsInSupply" attribute="Province"/> <operator type="smallerOrEqualThan" /> <right type="constant" attribute="2.0"/> </condition> </buy> <buy name="Gold"/> <buy name="Silver"/> </player>
It's also possible that i just missed the duchy -> gold -> duchy trigger order in the other bot when i was clumsily copying it into this old version. You'd think that it would be able to read xml but it can't.

My bot beats that 47-44. I'm somewhat disconcerted that the margin is so small...

DG · « **Reply #9 on:** January 12, 2012, 08:05:02 pm »

The margins are small because
- the games are largely decided by a straightforward 5/3 split on provinces (or then duchies)
- relevant cards may never be drawn again after an end games changes are followed
- the conditions only apply in a very small percentage of games
- even if the conditions are met and the relevant new cards are drawn they may not change the game result

I spent a while looking at a steward./tournament simulation and many interesting continuations emerged based on the prizes claimed. Unfortunately because 70% of the games were landslides decided on tournament luck the simulator couldn't measure the other 30% to any accuracy. Something similar will be happening here.

rod- · « **Reply #10 on:** January 12, 2012, 08:45:29 pm »

Filtering such shuffleluck-based "noise" out from simulations ought to be reasonably straightforward, and probably necessary for full optimization of things.

Add a few "# of times this buy condition is activated" statistics, and then allow filtering based on whether or not a particular condition was activated in a particular game, and you can find out what the "Right" decision is in situations as minor as some of the things we're talking about.

Whether or not the sims should focus on this instead of actually working out how to properly play dominion is questionable, but it's possible that you could bootstrap up from the above ; you could train a sim how to make the "Best" play (say, Steward trashing vs 2$ vs +2cards) in a particular gamestate by simply running a few million mirror games, filtering down to those where that particular decision was made, and seeing what the win percentages are in each case.

Zakharov · « **Reply #11 on:** January 21, 2012, 04:48:42 am »

I was thinking the best way to play the endgame is to just run a minmax search over the next N turns and use the result of that to guide your decision. Of course, you can't do that with just XML.

blueblimp · « **Reply #12 on:** January 21, 2012, 05:18:25 pm »

Quote from: Zakharov on January 21, 2012, 04:48:42 am

I was thinking the best way to play the endgame is to just run a minmax search over the next N turns and use the result of that to guide your decision. Of course, you can't do that with just XML.

A straightforward minimax approach will run into a couple problems:

1. Randomness & imperfect information. These are just plain difficult to handle in minimax. My guess is that the best way to deal with this is to pretend that players draw randomly from their decks at the beginning of their turns (rather than shuffling and then drawing from the top during their clean-up phases) and also require that players show all their treasure in hand when playing, even though that's not required by the rules, since that's usually how people play on isotropic.

By doing this, the game becomes perfect information and the randomness is spread out a bit. But you're still going to have huge branching factors from the random draws and I believe the randomness will prevent use of alphabeta pruning--both of these factors limit your search depth a lot.

One nice aspect of big money is that the number of possible states is a bit restricted, so you might be able to get some mileage out of combining duplicate states.

2. What's your evaluation function? It can't be VPs because the shallow search (because of problem #1) might cause the bot to start preparing for greening too early. Any other hand-coded evaluation function is going to introduce human opinions about strategy, and avoiding those was the point of resorting to minimax in the first place. I imagine the right way to go here might be Monte Carlo evaluation (like what's used for Go) or machine learning (like what's used for Backgammon).

Dominion Strategy Forum

News:

Author Topic: Simulation Challenge - Endgame Polish (Read 3809 times)

WanderingWinder

Simulation Challenge - Endgame Polish

rod-

Re: Simulation Challenge - Endgame Polish

WanderingWinder

Re: Simulation Challenge - Endgame Polish

rod-

Re: Simulation Challenge - Endgame Polish

DG

Re: Simulation Challenge - Endgame Polish

rod-

Re: Simulation Challenge - Endgame Polish

WanderingWinder

Re: Simulation Challenge - Endgame Polish

rod-

Re: Simulation Challenge - Endgame Polish

WanderingWinder

Re: Simulation Challenge - Endgame Polish

DG

Re: Simulation Challenge - Endgame Polish

rod-

Re: Simulation Challenge - Endgame Polish

Zakharov

Re: Simulation Challenge - Endgame Polish

blueblimp

Re: Simulation Challenge - Endgame Polish