Inside the Transformer
From prompt to output - 5 chapters
Follow a prompt through every stage of LLM inference. Watch the data flow from raw text to generated response, step by step.
01From text to numbers
Tokenization
Your prompt is just a string of characters. The model doesn't understand English — it understands numbers. The tokenizer splits your text into subword tokens using Byte-Pair Encoding, mapping each piece to an integer ID.
Most modern LLMs use BPE with vocabularies of ~32k-128k tokens. Common words like "the" are single tokens; rare words get split into pieces. "unhappiness" might become ["un", "happiness"]. Spaces often become part of the next token: " cat" not "cat".
1 / 5
Continue learning
Tokenizer Training
How BPE vocabularies are built — the step before tokenization in the transformer.
Modern Techniques
Chain-of-thought reasoning, MoE routing, and multimodal extensions to the transformer.
The Inference Engine
KV cache, PagedAttention, and batching — how transformers run fast at scale.