Every chapter in this series so far has assumed a prompt written entirely by hand — every token deliberately chosen and placed. The previous chapter showed that the order and quality of those tokens determines the explicit sense frozen at each position in the substrate. But in real-world systems, most of what enters the substrate during prefill was never written by the person who designed the prompt. This chapter examines what happens when a prompt is composed from multiple sources — and why the discipline of managing that composition has become the central engineering challenge in working with large language models.
The Composed Prompt
When you type a message into a chat interface, it feels like you are writing the prompt. You are not. What actually enters the model during prefill is a composed sequence — assembled from multiple sources before your message ever reaches the model.
A system prompt is prepended, defining the model's role, tone, and constraints. Your message is inserted after it. If you are mid-conversation, the previous turns (the chat history) are included as well. In more complex applications, retrieved documents, tool results, fetched data, and even the output of other models may be added. All of this is assembled into a single token sequence by a prompt template — a static structure with placeholders that are filled with dynamic content at runtime. The rendered result is what enters prefill.
The user sees a text box and a response. The substrate that produces that response was composed from many sources, each contributing tokens, each producing K and V vectors with the same permanence as everything else in the substrate. The developer controls the template — the structure and the ordering. But the dynamic content that fills it at runtime is selected by pipelines, not written by hand. Its quality is uncertain by nature.
Context Engineering
The practice of deliberately designing what enters the substrate — choosing what to include, where to place it, how to structure it, and what to leave out — has a name. The field has converged on the term context engineering.
Prompt engineering, the earlier discipline, focused on the craft of writing effective instructions. Context engineering expands the scope. It treats the entire composed prompt as an engineering problem: not just what the instructions say, but what sources of context surround them, how those sources are selected and filtered, where they are positioned in the sequence, and how their presence shapes the substrate that the model will read from during generation.
The core insight of context engineering is that the model's context window is a finite resource with diminishing returns. Every token that enters prefill occupies a position in the substrate. Every position produces K and V vectors that persist and compete for attention during decode. Adding more context does not always help. Context that carries strong, relevant signal enriches the substrate. Context that is irrelevant, redundant, or poorly placed dilutes it. The engineering challenge is to compose the prompt so that the substrate is as dense with useful signal as possible.
This is not an abstract principle. It is a direct consequence of the mechanics described throughout this series. The quality of K vectors at each position determines what the model can find. The quality of V vectors determines what information gets delivered. The arrangement of sources within the composed prompt determines the explicit sense at every position. Context engineering is the discipline of controlling these outcomes by designing the composition.
Attention Span
None of this is as foreign as it might sound. Teachers, authors, and speechwriters have always been context engineers — they just never called it that.
A good teacher does not dump an entire semester of material onto the board on the first day. They sequence it. The most foundational concept comes first, because everything that follows will be understood in the context of what was established earlier. They place the critical idea at the moment when the students are most receptive, not buried in the middle of a dense lecture where attention has already drifted. They repeat key points at deliberate intervals — not because the students forgot, but because repetition at the right moment reinforces what matters.
An author structures a chapter the same way. The opening paragraph establishes the frame. The strongest argument is placed where it will carry the most weight. Supporting details are ordered so that each one builds on the last. A skilled author knows that a reader's attention is finite, and that what you place early colors how everything after it is read.
A speechwriter is perhaps the most disciplined practitioner of all. Every word in a speech competes for a listener's limited attention. The opening line sets the tone for everything that follows. The key message is placed where it will land hardest. Transitions are crafted to carry the audience from one idea to the next without losing the thread. A speech that front-loads its weakest material and buries its thesis at the end will fail — not because the content is wrong, but because the audience's attention was spent on the wrong things at the wrong time.
The model's attention mechanism is not human attention. It is a mathematical operation — a dot product between Queries and Keys, followed by a softmax-weighted blend of Values. But the constraint it operates under is remarkably similar: a finite budget that must be spent wisely. Every position in the substrate competes for that budget. What you place first shapes how everything after it is processed. What you bury in the middle may never be attended to when it matters. The intuitions that teachers, authors, and speechwriters have refined over centuries — about ordering, emphasis, and the cost of wasted attention — apply directly to the composition of a prompt.
The System Prompt: The Foundation of the Substrate
The system prompt is the first component in nearly every composed prompt. It occupies the earliest positions in the token sequence, which means every token that follows is computed having already seen it. Its K and V vectors are shaped by nothing but themselves and the tokens within the system prompt that preceded them. In return, the system prompt shapes the explicit sense of everything that comes after.
This is not a minor positional advantage. It is the strongest possible position in the substrate. Tokens at the beginning of the sequence influence the K vectors of every subsequent token at every layer. The system prompt is the foundation on which the rest of the substrate is built.
A well-designed system prompt establishes the core signals that should persist throughout generation: role, tone, constraints, output format, and behavioral boundaries. These signals become part of the explicit sense at every downstream position. When the model later encounters a user message or a retrieved document, the K and V vectors at those positions are computed in the context of the system prompt's influence. The system prompt does not merely instruct the model. It colors the substrate.
Why Position Within the System Prompt Matters
Because the system prompt itself is processed sequentially during prefill, the arrangement of content within it follows the same mechanical rules that govern the rest of the substrate. Tokens that appear early in the system prompt shape the explicit sense of tokens that appear later. The internal ordering is not cosmetic.
Consider two versions of the same system prompt. In the first, the role definition comes first, followed by constraints, followed by output format instructions. In the second, the output format instructions come first, followed by constraints, followed by the role definition.
In the first version, the role definition colors the K vectors of the constraint tokens and the format tokens. The constraints are understood in the context of the role. The format instructions are understood in the context of both the role and the constraints. Each layer builds on the one before it, and the explicit sense at each position is enriched by everything preceding it.
In the second version, the format instructions are computed first, with no role context and no constraints. Their K vectors advertise output structure in isolation. The constraints that follow are computed having seen only format details, not the role they serve. The role definition arrives last, occupying the position with the richest preceding context but unable to reach back and reshape the K vectors at the earlier positions where format and constraints were already frozen.
The tokens are identical. The information is the same. But the substrate is different, because the explicit sense at each position was shaped by a different preceding context. The first arrangement produces a coherent substrate where each component reinforces the next. The second produces a fragmented one where early positions lack the context that would have given them precise meaning.
This is not conjecture. Empirical research has consistently documented that input ordering significantly affects LLM performance, with earlier content receiving disproportionate influence — a primacy effect that researchers have identified as an architectural property of the transformer, not a quirk of any individual model. The mechanical explanation is the one described throughout this series: causal attention means earlier tokens shape the K vectors of later tokens, but not the reverse.
This is why experienced practitioners treat system prompt design as an exercise in deliberate ordering, not just content selection. The choice of what to say first is a choice about what will color everything else. It is the earliest and most consequential composition decision in the entire prompt.
The System Prompt as Controlled Context
There is one property of the system prompt that distinguishes it from every other source in a composed prompt: the developer controls it completely.
The system prompt is authored — written, tested, and refined by the person who designs the application. Its tokens are known in advance. Its structure is deliberate. Every word can be chosen for the specific K and V vectors it will produce during prefill. This is the one component of the composed prompt where the developer has full control over signal quality.
Every other source of context that enters the substrate — conversation history, retrieved documents, tool output, user messages — is dynamic. It changes with every request. It is selected or generated by pipelines, not written by hand. The developer designs the pipeline but does not control the specific tokens it produces. The signal quality of dynamic context is uncertain by nature.
This is why the system prompt carries an outsized responsibility. It is the one place in the composition where the developer can guarantee signal quality. The explicit sense it establishes at the earliest positions in the substrate becomes the contextual foundation that every dynamic source will be computed against. If the system prompt is precise, the dynamic content that follows inherits a strong contextual field. If the system prompt is vague or poorly ordered, the dynamic content enters a weaker field, and its own K and V vectors are correspondingly less precise.
The system prompt is the foundation. Everything else is built on top of it.
