I wonder if AlphaZero can learn Dominion.
It learned chess and go by just self play, the only imput was rules.
The question is whether randomness and hidden informatin is limit for AlphaZero?
https://en.m.wikipedia.org/wiki/AlphaZero
I have not read the paper, it would take me a lot of time to understand that...
The common element uniting the three different games that AlphaZero learned (Chess, Go, and Shogi) is that they are all played on square boards where only one piece can inhabit a space at a time. Chess is played on a 8X8 board, Go on a 19X19, and Shogi on a 9X9. From the relevant part of the paper:
The input to the neural network is an N X N (MT + L) image stack that represents state
using a concatenation of T sets of M planes of size N X N
It seems like the optimal way to represent a Dominion board is fundamentally not an N X N plane. There are issues to contend with like how the order of cards matters in your deck and your In-Play area, but not in your hand, discard, the trash, or your tavern mat. Additionally, the order of Kingdom card
piles does not matter, but the order of the cards
in those piles can matter (Knights and Split Piles. And the order of Split Pile cards can't even be assumed thanks to Encampment, as well as Ambassador shenanigans).
Also, the Deck and In-Play area theoretically have an infinite size. There is no official rule stating that you can't have more than X cards in your deck or in play (and such a rule will probably never exist, because it kinda goes against the spirit of Dominion). AlphaZero needs to be trained on a single board size at a time. The only way to get around that would be to calculate the theoretical biggest deck you could ever construct. Such a setup would probably include Black Market, Rats, Young Witch (because she adds Bane cards), a Looter (for Ruins), Urchin (for Mercenary), Hermit (for Madman), a card that gains Spoils, a Potion cost, etc. The end result will be some number one or two orders of magnitude bigger than the board sizes that AlphaZero has played on that would make it inconvenient to represent to the AI. Consider for a moment that it will almost
never build a deck anywhere near as big as the biggest possible deck, yet we need it to be able to do so just to keep it from breaking should it ever encounter such a thing. The "image stack" of Dominion is not so basic, so movement from one "space" on the Dominion "board" to another can't generally be represented as a grid as AlphaZero does.
Now, I don't know how important the grid structure actually is to AlphaZero's success. For all I know, it could be the case that the AlphaZero methodology can handle Dominion if only it's reworked to be able to represent the inputs properly (though personally, I doubt it). But the point is that AlphaZero is not currently equipped to handle such a game, and it would take a deliberate effort on DeepMind's part to retool it significantly.