Logo AnimGAnimG

Transformer Self-Attention

Mô tả

Visualizes the scaled dot-product self-attention mechanism in a Transformer. Input tokens are projected to Q, K, V matrices, attention scores are computed via Q×K^T dot products shown as a heatmap, softmax is applied, and the output is computed as a weighted sum with V. The full attention formula is displayed.

Transformer Self-Attention

Description

Visualizes the scaled dot-product self-attention mechanism in a Transformer. Input tokens are projected to Q, K, V matrices, attention scores are computed via Q×K^T dot products shown as a heatmap, softmax is applied, and the output is computed as a weighted sum with V. The full attention formula is displayed.


Phases

# Phase Name Duration Description
1 Intro 3s Title and input tokens displayed
2 Q K V Projection 8s Token embeddings projected to Q, K, V matrices shown
3 Attention Scores 10s Q×K^T dot products shown; result is a heatmap grid
4 Scale & Softmax 6s Divide by sqrt(d_k), apply softmax; rows sum to 1
5 Output Computation 8s Weighted sum with V; output tokens shown
6 Full Formula 6s Attention(Q,K,V) formula displayed
7 Multi-Head Note 5s Brief note: multiple heads run in parallel
8 Outro 4s Summary

Layout

+--------------------------------------------------+
|  Title: Transformer Self-Attention               |
+--------------------------------------------------+
|                                                  |
|  Tokens: [The] [cat] [sat] [on] [mat]           |
|                                                  |
|  Q matrix  K matrix  V matrix  (3 column panels) |
|                                                  |
|  Attention heatmap (5×5 grid):                   |
|  softmax(QK^T / sqrt(d_k))                       |
|                                                  |
|  Output = attn_weights × V                       |
|                                                  |
|  Formula (bottom):                               |
|  Attention(Q,K,V) = softmax(QK^T/√d_k)V         |
+--------------------------------------------------+

Area Descriptions

  • Top: Input token boxes
  • Center left: Q, K, V matrix panels
  • Center right: Attention heatmap
  • Bottom: Formula display

Assets & Dependencies

  • Fonts: LaTeX / sans-serif
  • Manim version: ManimCE 0.19.1

Notes

  • Token boxes colored distinctly (different hues)
  • Attention heatmap uses color gradient from dark (low) to bright (high attention)
  • Show which token attends most to which (highlight max attention in each row)
  • d_k scaling factor annotated with reasoning (prevents vanishing gradients in softmax)
Đối tượng: Software EngineerThể loại: Cs
Transformer Self-Attention for Software Engineer in Cs | AnimG Library