What RL algorithm should I try for a multi-agent card game?

Hello everyone,
I’m currently learning reinforcement learning (RL) and artificial intelligence (AI), and I’ve experimented with DQN and Double DQN in PyTorch.

I’m interested in implementing a multi-agent card game and would appreciate any advice on the best approach. Given that it’s an imperfect information game, I suspect simpler methods may not be effective.

I’ve developed a proof of concept with four Double DQN agents playing against each other. Each model is assigned a player and makes decisions based on the state, which includes counts of all card types on the table (13 values), counts of all card types in its hand (13 values), and additional game information.

Unfortunately, the agents haven’t learned much so far, so I’m exploring other options. You can check out the game here Presidents

Additionally, I’ve encountered a problem where the model often selects invalid actions, either because of insufficient cards of the chosen type or mismatches with the table.

Have you considered using policy gradient methods like PPO? From what I’ve heard, they are among the most reliable approaches.

Why use a separate agent for each player? In scenarios where all agents/players are engaging in the same game with identical observation and action spaces and have the same objectives, you can think of them as essentially identical. Self-play might be a more efficient approach than training N separate agents.

Regarding algorithms, the choice may not be critical, but PPO is generally a reliable option.

I initially opted for multiple agents to introduce more diversity in the actions of other players, hoping it would help generalize to the game rather than focusing on a specific player. However, this may be unnecessary when using PPO.

If your agents are having trouble finding valid actions, you should consider masking invalid actions to simplify the underlying MDP and improve learning efficiency. This approach prevents unnecessary computation and, since the game rules are known, it’s not considered cheating.

If the game is similar to poker or other gambling types, you might consider using DeepCFR or other CFR (Counterfactual Regret Minimization) derivatives. CFR can be very memory and computationally intensive due to its reliance on complete tree parsing. The latest “Student of Games” from Google also combines DNN with CFR but employs partial tree parsing to address some of these challenges.