Drainpipe Knowledge Base
What is an LLM Context Window?
An LLM Context Window is the maximum amount of text, or “tokens,” that a large language model (LLM) can consider at any one time when generating a response. It is often described as the model’s “working memory.”
How It Works
- Tokens: LLMs don’t read words; they process “tokens,” which are chunks of text (e.g., words, parts of words, or punctuation). For example, the word “unbelievable” might be broken into three tokens:
un
,believ
, andable
. The context window is measured by the total number of these tokens. - Input and Output: The context window must contain the user’s prompt, the ongoing conversation history, and any documents or data the model has been given (such as in a RAG system). The model also uses a portion of this window for the text it is generating.
- The “Working Memory” Limit: If a conversation or a document exceeds the size of the context window, the model starts to “forget” the oldest parts of the input. This can lead to a loss of coherence, as the model may no longer remember key details from earlier in the conversation.
Key Implications
- Longer Conversations and Documents: A larger context window allows for more detailed and longer conversations, as well as the ability to summarize or analyze entire books or extensive codebases in a single prompt.
- Cost and Speed: The size of the context window has a direct impact on the computational resources required. A larger window can make the model slower and more expensive to run, as it has to process more information with each turn.
- RAG vs. Context Window: While larger context windows are a significant advancement, they are not a replacement for RAG. RAG allows an LLM to access and ground its responses in a vast amount of data that is far too large to fit into any context window. The context window is where the retrieved information is placed for the LLM to use.