A simple approach

Detailing the approach step by step

We will explain the rules of the game in this section, along with our strategy for training the agent. To start simple, we will try to conquer a 3*3 map, where we are the only player (cf below). As we can see, this trained agent is already pretty efficient at conquering the map.

conquermap

How does it start ?

Each player starts with a single square of the map, and can either decide:

To stay in order to increase the strength of its square (action = STILL).
To move (/conquer) a neighboring square (action = NORTH, SOUTH, EAST, WEST).

Conquering is only possible once the square’s strength is high enough, such that a wise bot would first wait for its strength to increase before attacking any adjacent square, since squares don’t produce when they attack.

To conquer a square, we must move in its direction having a strictly superior strength (action = NORTH, SOUTH, EAST, WEST)

The white numbers on the map below represent the current strength of the squares. On the left is just a snap of the initial state of the game. On the right you can see the strength of the blue square increment over time. This is because our agent decides to stay (action = STILL).

the strength map

The increase in production is computed according to a fixed production map. In our example, we can see the blue square’s strength increases by 4 at each turn. Each square has a different production speed, as represented by the white numbers below the squares. (cf below). On the left is also a snap of the initial game, whereas the game’s dynamic is on the right.

production map

This production map production is invariant over time, and is an information we should use to train our agent. Since we are interesting in maximizing our production, we should intuitively train our agent to target the squares with a high production rate. On the other hand, we should also consider the strength map, since squares with low strength are easier to conquer.

The Agent

We will teach our agent with:

The successive Game States.
The agent’s Moves (initially random).
The corresponding Reward for each Move (that we have to compute).

For now, the Game State is a (3 * 3) * 3 matrix (width * height) * n_features, the features being:

The Strength of the Square
The Production of the Square
The Owner of the Square

matrix

The Reward

As for the reward, we focus on the production. Since each square being conquered increase the total production of our land, the action leading to the conquest is rewarded according to the production rate of the conquered square. This strategy will best reward the conquest of highly productive squares.

Current results

We train over 500 games and get significant improvements of the total reward obtained over time.

screen shot 2017-09-26 at 17 34 04

On the right, you can observe the behaviour of the original, untrained bot, with random actions, whereas on the right, you can see the trained bot.

Halite Challenge

A fork of the Halite Starting Kit, aimed at providing an interface and debugging tools and for RL strategies (reinforcement learning).

Detailing the approach step by step

How does it start ?

The Agent

The Reward

Current results