The previous chapter described inference as a mechanical process — prefill builds the KV substrate, decode reads from it one token at a time. That description is accurate, but it leaves a question unanswered: what is the substrate actually doing during generation, beyond serving as a lookup table for faster attention?
The answer matters because it determines how to think about every prompt you will ever write. The substrate is not a storage trick. It is the medium in which the prompt's meaning becomes computationally active. And what the prompt's meaning amounts to, once you look closely, is intent.
What the Substrate Carries
The substrate does not hold the prompt. It holds the prompt as processed. The difference is substantial. A prompt on its own is a sequence of tokens — inert, uninterpreted, waiting to be read. The substrate is what the model has made of those tokens after every layer of the transformer has processed them: a structured representation distributed across every Key and Value vector at every layer and every position. The substrate is not a byproduct of inference. It is the form on which inference operates.
Intent Is What the Prompt Encodes
If the substrate is the computationally active form of the prompt's meaning, then the prompt is where meaning gets introduced. Nothing reaches the substrate except through the prompt. The model's weights provide capability — the trained ability to process language, to follow instructions, to maintain tone, to reason. But capability is not purpose. What the model should do on any given call is determined entirely by what was written into the prompt.
What the prompt encodes, when you look at what it is really doing, is intent.
Intent is the purpose behind the prompt — the job it is asking the model to do, the direction it is pointing the generation in. Every prompt carries intent, whether the person writing it thinks in those terms or not. Asking a question carries the intent of getting an answer. Requesting a summary carries the intent of receiving condensed information in a particular form. Establishing a role — "you are an expert analyst" — carries the intent of shaping how every request that follows gets handled. A prompt without intent is a prompt that is not asking for anything.
Intent is not a separate thing that rides alongside the words of a prompt. It is what the words of a purposeful prompt are. When you write "summarize this document in three bullet points," the intent is not hidden behind the instruction — the instruction is the intent, rendered in language. When the substrate is built from that prompt, the intent is what gets compiled into the Key and Value vectors across every layer. The substrate carries the intent because the substrate is made of the prompt, and the prompt was made of intent.
Why This Framing Matters
Mechanically, the generation is driven by the attention mechanism — Queries reading Keys, weighting Values, passing through the feed-forward network, projecting to logits, sampling. That is the causal chain, and it is real. The model has no separate circuit that reads purpose.
But the attention mechanism operates on the substrate, and the substrate was shaped by the prompt. What gets weighted strongly, what gets weighted weakly, what flows forward to shape the next token — all of it depends on what the prompt composed into the substrate during prefill. And what the prompt composed into the substrate is the intent.
So both statements are true, and they are not in tension. The attention mechanism drives the generation. Intent drives the generation, through the substrate that the attention mechanism reads from. The first is the mechanical account. The second is the account that matters for anyone trying to get useful output from the model.
A reader who only understood the mechanical account would be left thinking: the model attends to whatever it attends to, and I hope it attends well. A reader who understands the intent account knows that attention will do its work on whatever substrate the prompt produces, and therefore the prompt is where the leverage is. The model cannot do better than the intent the substrate carries. If the intent is clear, the substrate can support a clear generation. If the intent is muddled, the substrate carries that muddle into every attention read that follows.
This is the framing the rest of the series rests on. The mechanics are the how. Intent is the what.
