AI Definition

Attention Mechanism

The transformer operation that lets each token focus on the other tokens most relevant to it.

Self-attention is what makes transformers work. For each token, the model computes a weighted average of all other tokens in the sequence, where the weights are learned. The token learns to pay attention to whatever else in the input is most useful.

Multi-head attention runs many such operations in parallel, each with its own learned weights, then combines them. This lets a layer attend to multiple kinds of relationships at once syntactic, semantic, positional.

Attention is also the bottleneck that makes long contexts expensive naive attention is O(n²) in sequence length. Techniques like FlashAttention, sliding-window attention, and KV-cache reuse have made today's massive context windows practical.

Related concepts

Transformer

The neural network architecture behind nearly every modern AI model, introduced in 2017.

Context Window

The maximum number of tokens a model can attend to in a single request, including the response.

Want help applying this in production?

Our engineers ship AI features into production every week. Tell us what you're building.

Get a Free Quote Contact Us