Logo AnimGAnimG

Transformer Self-Attention

Pubblico: Software EngineerCategoria: Computer Science

Descrizione

Visualizes the scaled dot-product self-attention mechanism in a Transformer. Input tokens are projected to Q, K, V matrices, attention scores are computed via Q×K^T dot products shown as a heatmap, softmax is applied, and the output is computed as a weighted sum with V. The full attention formula is displayed.

Ispirato da questa animazione?

Transformer Self-Attention | AnimG | AnimG