AI Definition

Context Window

The maximum number of tokens a model can attend to in a single request, including the response.

The context window is the working memory of a language model. Anything outside it is invisible to the model on this request.

In 2026 most frontier models support 128K-1M token windows. That's enough to fit hundreds of pages of documentation, an entire small codebase, or a full year of email threads. Prompt-caching makes large contexts much more affordable by reusing already-processed prefixes.

Long context isn't free. Latency, cost, and surprisingly quality can all degrade as the window fills. A common pattern is hybrid: use RAG to retrieve only the relevant chunks instead of stuffing everything into context.

Related concepts

Token

The unit a language model processes roughly 4 characters or 0.75 of an English word.

LLM (Large Language Model)

A neural network trained on huge amounts of text to predict and generate language.

RAG (Retrieval-Augmented Generation)

Retrieving relevant documents at query time so a language model can answer with grounded, up-to-date information.

Want help applying this in production?

Our engineers ship AI features into production every week. Tell us what you're building.

Get a Free Quote Contact Us