▋

From Prompt to Output

What happens inside a language model after you hit Enter? Explore every stage - from raw text to generated response - through interactive visualizations.

Start the token journey→Browse all guides

Recommended starting point

Inside the Transformer

Trace your tokens through every stage of inference - tokenization, embeddings, attention, sampling - with interactive visualizations, Python code, and playgrounds.

The Attention Mechanism

Deep dive — 8 chapters

Why attention exists, scaled dot-product, multi-head, positional encoding, GQA, MLA, sparse attention, and hybrid architectures.

Embeddings

From lookup tables to semantic search

How token IDs become vectors, why similar words cluster in space, Word2Vec, contextual representations, sentence embeddings, and RAG.

Tokenizer Training

Building the vocabulary

How BPE builds a vocabulary from raw bytes. Merge rules, byte-level encoding, vocabulary design, and special tokens.

Training & Fine-tuning

How models learn

Pre-training, SFT, RLHF, GRPO, DPO - from raw networks to assistants.

6 chaptersExplore →

Modern Techniques

Beyond the basics

MoE, reasoning chains, tool use, multimodal, and long context - what's new in LLMs.

5 chaptersExplore →

tok

8 experts, 2 active

Quantization

Making models fit

FP16 to INT4: compress LLM weights 2-4x. GPTQ, AWQ, bitsandbytes, GGUF - methods, tradeoffs, and practical deployment.

The Inference Engine

Serving at scale

Batching, KV cache, PagedAttention, FlashAttention, speculative decoding, and production serving frameworks.

11 chaptersExplore →

KV cache

Architecture Explorer

Compare model internals

GPT-2, LLaMA 3, Mistral, Gemma 2, Phi-3, Qwen 2.5 - watch data flow through each architecture with animated layer diagrams.

6 modelsExplore →

GPT-2

LLaMA

Everything runs in your browser - no data is sent to any server, no API keys needed.
Built with Next.js, TypeScript, Framer Motion, D3.js, and js-tiktoken.