Expansion at the border

As detailed in the previous blog articles, we train jointly the individual agents at the border of the map. As you can see below, we obtain an agent that perform significantly well at small scale. Indeed, it has learnt to conquer the highly productive squares (the bright ones) in priority.

To assess the confidence of our agent, we can look at the entropy of the learnt policy. For convenience, we implemented a interface that displays the softmax probabilities at time t as you click on the agent. We can see the 5 NORTH-EAST-SOUTH-WEST-STILL probabilities associated with each move, and how the agent greedily selects them.

The Dijsktra algorithm

The power of Dijsktra

We had dealt with the problem of border squares, learning with a neural network.

The Dijsktra algorithm, which runs here in linear time, gives us the ability to handle the squares in the middle of the map:

Now, only the borders’s behaviour is determined by our trained policy. We adopt a deterministic strategy for the interior of the map.

Halite Challenge

A fork of the Halite Starting Kit, aimed at providing an interface and debugging tools and for RL strategies (reinforcement learning).

Expansion at the border

The Dijsktra algorithm

The power of Dijsktra