Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Part 2 — The Board as Numbers

Let’s talk about what happens between “the game sends you JSON” and “the network decides a move.” The bridge between those two things is the board encoding — converting the structured game state into a grid of numbers that a neural network can read.

If Part 1 was about the API contract, this part is about the data contract. What format does the network actually need?

What a tensor is

The game sends you a JSON object with lists, nested structs, strings — rich structure. The network wants numbers. Specifically, it wants a tensor.

A tensor is a multi-dimensional array of numbers. A 1D tensor is a vector. A 2D tensor is a matrix. A 3D tensor is a stack of matrices. That’s all.

In Burn, a tensor is Tensor<Backend, D> where Backend is the execution backend (CPU, CUDA, etc.) and D is the number of dimensions. Under the hood it’s a blob of f32 values with a shape. The shape (3, 11, 11) means “3 channels, each 11 rows by 11 columns” — a stack of three 11×11 grids.

That’s our board representation.

Feature planes

The trick is deciding what goes in each channel. Each channel is a grid the same size as the board, where each cell is either 0 or 1 (or sometimes a number like health / 100).

Here’s what we’ll use:

ChannelContents
0Open space (1 for every valid board square)
1Food (1 on food squares)
2My body excluding head (1 on body squares that aren’t the head)
3My head (1 on my head square)
4Enemy bodies (1 on all enemy body squares)
5Enemy heads (1 on all enemy head squares)
6Hazard (dangerous squares — 1 where damage applies)
7My health normalized (my_health / 100, repeated across all squares)

Eight channels. The network gets an (8, height, width) tensor.

Why this layout? Three things worth explaining:

Channel 0 — open space. Without this, the network couldn’t tell the difference between a valid board square and a position outside the board (both would be 0 everywhere). Setting every valid square to 1.0 makes the board boundary explicit. This channel is the only thing preventing the network from hallucinating moves off the board.

Channels 2 and 3 — body without head, head alone. The head square is part of the body in the game data — it appears in body and as head. But we split them across channels. Channel 2 holds the body excluding the head. Channel 3 holds only the head. This gives the network a clean signal: channel 2 is “squares I can’t move to” and channel 3 is “where I am right now.” If we included the head in both channels, the network would need to learn that the overlap is the same position — splitting them makes the representation unambiguous.

Channel 7 — health everywhere. Health is a single scalar on the Snake struct, but the network needs it at every input position. We broadcast it across the entire channel — every square gets the same value. This is a little wasteful but keeps the tensor shape regular, and it works: the network learns “low health → take fewer risks” the same way it learns spatial patterns.

A concrete example

Let’s encode a tiny 5×5 board to see it in action:

Board (5×5, Battlesnake coordinate system — y increases upward):

  x: 0  1  2  3  4
y:4  .  .  .  ⭐ .  (food at (3,4))
   3  .  .  .  .  .
   2  .  🐍 🐍 .  .  (body at (1,2) and (2,2))
   1  .  .  🐍 .  .  (head at (2,1))
   0  .  .  .  .  .

My snake: head at (2,1), body: (2,1), (2,2), (1,2)
No enemies. No hazards.

Encoded (one channel at a time):

Channel 0 (open space):
1 1 1 1 1    ← y=4
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1    ← y=0, every valid square is 1.0

Channel 1 (food):
0 0 0 1 0    ← y=4, (3,4) has food
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0    ← y=0

Channel 2 (my body, excluding head):
0 0 0 0 0    ← y=4
0 0 0 0 0
0 1 1 0 0    ← y=2, (1,2) and (2,2) — body segments only
0 0 0 0 0    ← y=1, head at (2,1) is NOT in this channel
0 0 0 0 0    ← y=0

Channel 3 (my head only):
0 0 0 0 0    ← y=4
0 0 0 0 0
0 0 0 0 0
0 0 1 0 0    ← y=1, (2,1) — head gets its own channel
0 0 0 0 0    ← y=0

Channels 4–7 (enemies, hazards, health): all zeros

The network sees a 5×5×8 blob of numbers. It learns to associate patterns in these channels with good and bad moves.

(Real games use 11×11 boards. The same encoding works — more zeros around the edges.)

The implementation

We’ll put the encoder in snake-ml/src/lib.rs. It takes the deserialized Board and Snake structs and produces a Burn tensor. We use the autodiff backend so the tensor can carry gradients for training:

#![allow(unused)]
fn main() {
use burn::backend::Autodiff;
use burn::tensor::{Tensor, TensorData, Shape};
use serde::Deserialize;

type Backend = Autodiff<burn::backend::Cpu>;

#[derive(Debug, Clone, Deserialize)]
pub struct Point {
    pub x: i32,
    pub y: i32,
}

#[derive(Debug, Clone, Deserialize)]
pub struct Snake {
    pub id: String,
    pub body: Vec<Point>,
    pub health: u32,
    pub head: Point,
}

#[derive(Debug, Clone, Deserialize)]
pub struct Board {
    pub width: u32,
    pub height: u32,
    #[serde(default)]
    pub food: Vec<Point>,
    #[serde(default)]
    pub snakes: Vec<Snake>,
    #[serde(default)]
    pub hazards: Vec<Point>,
}

/// Convert a board + my_snake into a feature-plane tensor.
/// Shape: (8, height, width)
/// Channels: open_space, food, my_body, my_head, enemy_bodies, enemy_heads, hazards, health
pub fn encode_board<B: burn::tensor::backend::Backend>(board: &Board, my_snake: &Snake, device: &B::Device) -> Tensor<B, 3> {
    let h = board.height as usize;
    let w = board.width as usize;
    let n = h * w;

    // Flat buffer: 8 channels × h × w
    let mut data = vec![0.0_f32; 8 * n];

    for y in 0..h {
        for x in 0..w {
            let idx = y * w + x;

            // Channel 0: open space — set to 1.0 for all valid squares
            data[idx] = 1.0;
            if board.food.iter().any(|f| f.x == x as i32 && f.y == y as i32) {
                data[n + idx] = 1.0;
            }

            // My body (channel 2) — body segments excluding head.
            // The head gets its own channel (3), so channel 2 is pure "squares I occupy but am not at."
            if my_snake.body.iter().any(|p| {
                p.x == x as i32 && p.y == y as i32
                    && !(p.x == my_snake.head.x && p.y == my_snake.head.y)
            }) {
                data[2 * n + idx] = 1.0;
            }

            // My head (channel 3)
            if my_snake.head.x == x as i32 && my_snake.head.y == y as i32 {
                data[3 * n + idx] = 1.0;
            }

            // Enemy bodies (channel 4) and heads (channel 5)
            for snake in &board.snakes {
                if snake.id == my_snake.id {
                    continue;
                }
                for point in &snake.body {
                    if point.x == x as i32 && point.y == y as i32 {
                        data[4 * n + idx] = 1.0;
                    }
                }
                if snake.head.x == x as i32 && snake.head.y == y as i32 {
                    data[5 * n + idx] = 1.0;
                }
            }

            // Hazards (channel 6)
            if board.hazards.iter().any(|h| h.x == x as i32 && h.y == y as i32) {
                data[6 * n + idx] = 1.0;
            }
        }
    }

    // Channel 7: my health normalized [0, 1]
    let health = my_snake.health as f32 / 100.0;
    for idx in 0..n {
        data[7 * n + idx] = health;
    }

    // Build the tensor: shape (8, h, w), on CPU
    let tensor = Tensor::from_data(
        TensorData::new(data, Shape::new([8, h as i64, w as i64])),
        device,
    );

    tensor
}
}

Run it and you’ll get a tensor with shape [8, height, width]. Burn tensors are row-major: the first dimension is the channel axis.

One thing worth calling out: we encode hazards even though they’re usually empty in standard BattleSnake rules. We include them because some game modes use them, and it costs us nothing to handle the general case.

Flattening for the MLP

The network in Part 3 is a multi-layer perceptron (MLP) — it takes a flat input vector, not a 3D tensor. So after encoding, we’ll flatten:

#![allow(unused)]
fn main() {
let flat: Tensor<Backend, 1> = tensor.reshape([8 * h as i64 * w as i64]);
// (8*h*w,) = (968,) for an 11×11 board
// To add a batch dimension for the network:
// let batched = flat.reshape([1, 8 * h as i64 * w as i64]);  // (1, 968)
}

For an 11×11 board with 8 channels, that’s 8 * 11 * 11 = 968 input values. Manageable.

We’ll use this flatten-and-encode step often enough that it’s worth a helper:

#![allow(unused)]
fn main() {
/// Encode and flatten. Shape: (8 * H * W,) = (968,) for an 11×11 board.
pub fn encode_board_flat<B: burn::tensor::backend::Backend>(board: &Board, my_snake: &Snake, device: &B::Device) -> Vec<f32> {
    let tensor = encode_board::<B>(board, my_snake, device);
    let flat: Tensor<B, 1> = tensor.reshape([8 * board.height as i64 * board.width as i64]);
    let data = flat.to_data();
    data.to_vec::<f32>().unwrap()
}

/// Encode and return flat encoding with a unit marker for tuple destructuring.
/// The () is a convenience — some call sites destructure with `let (flat, _) = ...`
/// when they only need the flat vector but the API shape allows future extension.
pub fn encode_board_and_flat<B: burn::tensor::backend::Backend>(
    board: &Board,
    my_snake: &Snake,
    device: &B::Device,
) -> (Vec<f32>, ()) {
    (encode_board_flat(board, my_snake, device), ())
}
}

These are the three encoding functions you’ll see throughout the tutorial: encode_board (returns a Tensor), encode_board_flat (returns a Vec<f32>), and encode_board_and_flat (returns a (Vec<f32>, ()) for destructuring convenience in the self-play training loop).

If this feels wasteful — we’re duplicating the health channel across all squares — you’re right. Part 9 will show a better architecture using convolutional layers that share spatial weights. But for now, the flat representation works fine and keeps the code simple.

What we have

JSON game state  ──encode_board()──▶  Tensor(8, 11, 11)
                                          │
                                          ▼ reshape
                                     Vec<f32> (968 values)
                                          │
                                          ▼ Part 3 network
                                     Move probabilities

The encoding is deterministic and reversible — same board state always produces the same tensor. That’s important: the network can only learn from signal in the data, and signal requires consistency.

Next up: we feed that 968-element vector into a neural network and see what comes out.

Next: Part 3 — Your First Network