Skip to main content
← Home

Inside the Transformer

From prompt to output - 5 chapters

Follow a prompt through every stage of LLM inference. Watch the data flow from raw text to generated response, step by step.

01
From text to numbers

Tokenization

Your prompt is just a string of characters. The model doesn't understand English — it understands numbers. The tokenizer splits your text into subword tokens using Byte-Pair Encoding, mapping each piece to an integer ID.

Most modern LLMs use BPE with vocabularies of ~32k-128k tokens. Common words like "the" are single tokens; rare words get split into pieces. "unhappiness" might become ["un", "happiness"]. Spaces often become part of the next token: " cat" not "cat".

0 tokens18 characters
Tokenized
1 / 5

forwardpass.dev

An interactive educational project visualizing how LLM inference, training, and deployment work - from raw text to generated response.

Further reading

  • "Attention Is All You Need" - Vaswani et al., 2017
  • "Language Models are Few-Shot Learners" - Brown et al., 2020
  • "The Illustrated Transformer" - Jay Alammar
  • "Neural Networks: Zero to Hero" - Andrej Karpathy

Built with

  • Next.js + TypeScript
  • Framer Motion
  • Tailwind CSS
  • js-tiktoken
Everything runs in your browser - no data is sent to any server.