Skip to main content

From Prompt to Output

What happens inside a language model after you hit Enter? Explore every stage - from raw text to generated response - through interactive visualizations.

Recommended starting point

Inside the Transformer

Trace your tokens through every stage of inference - tokenization, embeddings, attention, sampling - with interactive visualizations, Python code, and playgrounds.

Begin the journey
" cat"
8415

The Attention Mechanism

Deep dive — 8 chapters

Why attention exists, scaled dot-product, multi-head, positional encoding, GQA, MLA, sparse attention, and hybrid architectures.

8 chaptersExplore →

Embeddings

From lookup tables to semantic search

How token IDs become vectors, why similar words cluster in space, Word2Vec, contextual representations, sentence embeddings, and RAG.

7 chaptersExplore →

Tokenizer Training

Building the vocabulary

How BPE builds a vocabulary from raw bytes. Merge rules, byte-level encoding, vocabulary design, and special tokens.

6 chaptersExplore →

Training & Fine-tuning

How models learn

Pre-training, SFT, RLHF, GRPO, DPO - from raw networks to assistants.

6 chaptersExplore →

Modern Techniques

Beyond the basics

MoE, reasoning chains, tool use, multimodal, and long context - what's new in LLMs.

5 chaptersExplore →

Quantization

Making models fit

FP16 to INT4: compress LLM weights 2-4x. GPTQ, AWQ, bitsandbytes, GGUF - methods, tradeoffs, and practical deployment.

6 chaptersExplore →

The Inference Engine

Serving at scale

Batching, KV cache, PagedAttention, FlashAttention, speculative decoding, and production serving frameworks.

11 chaptersExplore →

Architecture Explorer

Compare model internals

GPT-2, LLaMA 3, Mistral, Gemma 2, Phi-3, Qwen 2.5 - watch data flow through each architecture with animated layer diagrams.

6 modelsExplore →

Everything runs in your browser - no data is sent to any server, no API keys needed.
Built with Next.js, TypeScript, Framer Motion, D3.js, and js-tiktoken.

forwardpass.dev

An interactive educational project visualizing how LLM inference, training, and deployment work - from raw text to generated response.

Further reading

  • "Attention Is All You Need" - Vaswani et al., 2017
  • "Language Models are Few-Shot Learners" - Brown et al., 2020
  • "The Illustrated Transformer" - Jay Alammar
  • "Neural Networks: Zero to Hero" - Andrej Karpathy

Built with

  • Next.js + TypeScript
  • Framer Motion
  • Tailwind CSS
  • js-tiktoken
Everything runs in your browser - no data is sent to any server.