Inside the Transformer

From prompt to output - 5 chapters

Follow a prompt through every stage of LLM inference. Watch the data flow from raw text to generated response, step by step.

From text to numbers

Tokenization

Your prompt is just a string of characters. The model doesn't understand English — it understands numbers. The tokenizer splits your text into subword tokens using Byte-Pair Encoding, mapping each piece to an integer ID.

Most modern LLMs use BPE with vocabularies of ~32k-128k tokens. Common words like "the" are single tokens; rare words get split into pieces. "unhappiness" might become ["un", "happiness"]. Spaces often become part of the next token: " cat" not "cat".

Tokenizer training deep dive →

0 tokens18 characters

Tokenized

1 / 5

Continue learning

Tokenizer Training

How BPE vocabularies are built — the step before tokenization in the transformer.

→

Modern Techniques

Chain-of-thought reasoning, MoE routing, and multimodal extensions to the transformer.

→

The Inference Engine

KV cache, PagedAttention, and batching — how transformers run fast at scale.

→