AnimG LogoAnimG

Transformer Self-Attention

Audience: Software EngineerCategory: Computer Science

Description

Visualizes the scaled dot-product self-attention mechanism in a Transformer. Input tokens are projected to Q, K, V matrices, attention scores are computed via QΓ—K^T dot products shown as a heatmap, softmax is applied, and the output is computed as a weighted sum with V. The full attention formula is displayed.

Inspired by this animation?

Transformer Self-Attention | AnimG | AnimG