Encoder-Decoder Attention

Cross-attention from decoder queries to encoder outputs in the original Transformer for sequence-to-sequence tasks; now common in every modern encoder-decoder model.

In this vault

Backlinks