Skip to main content
← Home

Architecture Explorer

Watch data flow through real architectures. Click any layer to pause the flow and see internal operations animate step by step.

GPT-2

124M - 12 layers - MHA - GELU - Learned absolute

residual
x12
residual
residual
  • -Post-norm (original transformer style)
  • -Learned position embeddings (not RoPE)
  • -Weight tying between input embeddings and output head

forwardpass.dev

An interactive educational project visualizing how LLM inference, training, and deployment work - from raw text to generated response.

Further reading

  • "Attention Is All You Need" - Vaswani et al., 2017
  • "Language Models are Few-Shot Learners" - Brown et al., 2020
  • "The Illustrated Transformer" - Jay Alammar
  • "Neural Networks: Zero to Hero" - Andrej Karpathy

Built with

  • Next.js + TypeScript
  • Framer Motion
  • Tailwind CSS
  • js-tiktoken
Everything runs in your browser - no data is sent to any server.