Explicit and Implicit Sense: What Is Frozen at a Position and What Must Be Reconstructed

In the previous chapter, we established that the K vector at a token’s position is not a single object but a full column of K vectors through the depth of the substrate — one K vector per layer, each representing a different degree of refinement, all shaped during prefill by every token that preceded it. We also saw that when preceding context is strong, the entire column is sharp and precise, and when preceding context is weak, the entire column remains ambiguous. This chapter gives those two conditions a name — and explains what the model does when it must recover meaning that was never frozen in a single position.

Explicit Sense

Every token position in the KV substrate carries an explicit sense — the frozen content of its K and V vectors, across every layer of the substrate. As established in the previous chapter, this is not a single vector but a full column through the depth of the substrate, each layer holding a different degree of refinement. Once computed, none of it changes.

If the preceding context was strong, the explicit sense is sharp and precise.

In the sentence “A dry log crackled and split as the campfire grew,” the K vector for “log” was computed having seen “a dry” in the context of a campfire. The explicit sense at that position is clear: a piece of timber.

If the preceding context was weak, the explicit sense is ambiguous.

In the sentence “The log crackled and split in the heat,” the K vector for “log” was computed having seen only “The.” The explicit sense at that position is unresolved — it could be timber, a system file, a logarithm, a ship’s record.

In both cases, the explicit sense is frozen. It is whatever it is. Strong or ambiguous, sharp or vague, it does not change after the moment of computation. The K and V vectors at that position are set.

The Triad: K, Q, and V

To understand how the model recovers from ambiguity during decode (the generation phase), the full attention mechanism needs to be visible — not just the K vector, but the three vectors that work together at every layer, every time a new token reads from the substrate.

K (Key) — the advertisement.: Each position’s K vector declares what that token carries, given everything that preceded it. It is the signal that says: this is what I contain, this is what I’m about.
Q (Query) — the question.: The currently decoding token generates a Q vector that expresses what it needs. It is the signal that says: I am looking for information about X.
V (Value) — the payload.: Each position’s V vector carries the content that gets delivered when attention selects it. If K is the label on the envelope, V is the letter inside.

Here is how they collaborate. The Q vector from the current token is compared against the K vector at every position in the substrate. This comparison — a dot product — produces an attention weight for each position: how relevant is what that position advertises to what the current token needs? Positions whose K vectors closely match the Query receive high weight. Positions that don’t match receive low weight¹.

But K vectors do not deliver the information. They select it. The information that actually flows forward to shape the next token prediction comes from the V vectors — blended together according to those attention weights. K determines relevance. V delivers content. Q asks the question.

Implicit Sense

With the full triad in view, the model’s handling of ambiguity becomes clear.

When the explicit sense of “log” is ambiguous — because it appeared before any disambiguating context — the downstream tokens that followed carry their own K and V vectors. “Crackled,” “split,” and “heat” were all computed having seen “log” plus each other. Their representations carry the contextual resolution that the K vector at “log” itself lacks.

During decode, the attention mechanism reads across the full substrate. The Q vector from the currently decoding token expresses what it needs. That Query is compared against K vectors at every position. Where it finds strong matches — positions whose K vectors advertise relevant content — the corresponding V vectors deliver their payload into the blend.

Critically, the resolution of “log” does not need to come from the “log” position itself. The downstream tokens — “crackled,” “split,” “heat” — advertise through their K vectors that they carry information about physical combustion. When the Query is looking for the meaning of the scene, those downstream K vectors match, and their V vectors deliver the signal that resolves the ambiguity. The model assembles a working understanding of what “log” means not from the V vector at the “log” position alone, but from the collective pattern of V vectors read from across the substrate, selected by K vectors that matched the Query.

This assembled understanding is implicit sense. It is not stored in any single position. It is not frozen in any vector. It is reconstructed dynamically, every time a Query attends to the substrate. It exists only in the blend — in the weighted sum of V vectors, selected by the Q·K matchmaking process, drawn from positions that together carry a meaning that no single position contains on its own.

Enrichment: When Context Does More Than Disambiguate

When preceding context is strong, the explicit sense at a token’s position carries more than simple disambiguation. It carries enrichment.

“I opened the server log and found the error immediately.”

“The server” does not merely tell the model that “log” is a system file. It establishes a technical context, a deliberate operation, a computing environment. The K vector for “log” advertises not just system event file but a file being accessed by someone for a purpose within a technical workflow. The V vector at that position carries correspondingly rich content — content that, when selected by a future Query, will deliver all of those associations in a single read.

“A dry log crackled and split as the campfire grew.”

“A dry” does not merely tell the model that “log” is timber. It tells the model this is a particular kind of timber — desiccated, ready to burn, the kind that cracks and sparks. The K vector advertises this enriched meaning. The V vector carries it. Both were shaped during prefill by the relationships between the preceding tokens, folding in associations that no single preceding token contributed on its own.

This enrichment happens because attention during prefill does not merely disambiguate — it synthesizes. At each layer, attention takes in the relationships between all tokens processed so far and folds those relationships into the K and V vectors at each position at that layer’s degree of refinement. The result is that a single position carries meaning computed from the interactions of many tokens, at every level of depth in the substrate. The explicit sense is not just disambiguated. It is dense with associations, from the shallowest degree of refinement to the deepest.

Why This Matters for Prompt Design

The distinction between explicit and implicit sense has a direct and practical consequence: front-loading context produces strong explicit sense — concentrated, precise, encoded directly in the full column of K and V vectors at the token position that matters, at every degree of refinement. Relying on downstream context forces the model to reconstruct meaning through implicit sense — distributed across multiple positions, assembled dynamically by the Q·K·V mechanism during decode.

Both work. But they are not equal.

When explicit sense is strong, the model reads meaning directly from where it lives. The K vector at that position accurately advertises what the token carries. The V vector delivers rich, concentrated content. The signal is concentrated where it lives, and the Query reads it cleanly.

When explicit sense is weak, the model must do additional work — the Query must search across many positions, weighting K vectors at downstream locations, assembling understanding from V vectors scattered across the substrate. The result is often correct, but the signal is distributed rather than concentrated. Ambiguity at a key position creates noise that the model must work around rather than build upon.

This is why the words you choose, and the order you place them in, determine not just what the model reads — but what it understands. Every token you write shapes the explicit sense of every token that follows it. And the richer that explicit sense is, the less reconstruction the model must perform to arrive at the meaning you intended.

The Q, K, and V vectors perform the same work in both prefill and decode — driving attention, refining the residual stream, and building the substrate layer by layer. What differs is what happens after: during prefill, the pipeline stops once the substrate is built. During decode, it continues through the language model head to produce logits, probabilities, and a selected token. The same forward pass, with one phase bypassing the exit and the other depending on it. This is what makes your prompt permanent — every token you force into the substrate during prefill is processed but never questioned, and everything the model generates afterward is shaped by what you chose to put in the prefill.

In the next chapter, we turn from how the substrate is built to what gets built into it. The previous chapters assumed a prompt written entirely by hand — every token deliberately chosen and placed. But in real-world systems, much of what enters the substrate during prefill is content the designer never wrote. Retrieved documents, fetched data, previous model output — all inserted into a prompt template at runtime and compiled into the substrate with the same permanence as everything else. The next chapter examines what happens when external content enters the substrate through a template placeholder, and why the position and structure of that entry point determines whether the result is enrichment or noise.

1. This functional independence is not just a pedagogical convenience. Anthropic’s mechanistic interpretability research has formally decomposed each attention head into two independent circuits — a QK circuit that determines which positions receive attention, and an OV circuit that determines what information flows when they do. The individual Q, K, and V vectors are intermediate products within these two circuits, but the separation between finding and delivering is architecturally real. See A Mathematical Framework for Transformer Circuits (Elhage et al., 2021).