What is an LLM Context Window?

PostedSeptember 17, 2025

UpdatedSeptember 17, 2025

ByChris Gaskins

An LLM Context Window is the maximum amount of text, or “tokens,” that a large language model (LLM) can consider at any one time when generating a response. It is often described as the model’s “working memory.”

How It Works

Tokens: LLMs don’t read words; they process “tokens,” which are chunks of text (e.g., words, parts of words, or punctuation). For example, the word “unbelievable” might be broken into three tokens: un, believ, and able. The context window is measured by the total number of these tokens.
Input and Output: The context window must contain the user’s prompt, the ongoing conversation history, and any documents or data the model has been given (such as in a RAG system). The model also uses a portion of this window for the text it is generating.
The “Working Memory” Limit: If a conversation or a document exceeds the size of the context window, the model starts to “forget” the oldest parts of the input. This can lead to a loss of coherence, as the model may no longer remember key details from earlier in the conversation.

Key Implications

Longer Conversations and Documents: A larger context window allows for more detailed and longer conversations, as well as the ability to summarize or analyze entire books or extensive codebases in a single prompt.
Cost and Speed: The size of the context window has a direct impact on the computational resources required. A larger window can make the model slower and more expensive to run, as it has to process more information with each turn.
RAG vs. Context Window: While larger context windows are a significant advancement, they are not a replacement for RAG. RAG allows an LLM to access and ground its responses in a vast amount of data that is far too large to fit into any context window. The context window is where the retrieved information is placed for the LLM to use.

Tags: