Prompt Templates: Shaping the KV Substrate

Prompt Templates: Shaping the KV Substrate

Ben Um • April 9, 2026

In the previous chapters we saw how the prefill phase builds the KV substrate — the rich collection of Key and Value vectors that serves as the foundation for everything that follows during generation. The quality and structure of this KV substrate largely determines how consistent, reliable, and useful the model’s output will be.

This raises an important question: Is there a systematic way to shape the KV substrate before generation even begins? The answer is yes — and that method is the use of prompt templates.

The Origin of Prompt Templates

When large language models first became widely accessible in 2020–2021, prompting was largely ad hoc. Users would write instructions and requests directly, occasionally adding a few examples. While this often produced impressive results, the outputs were highly variable. The same request could yield different formats, tones, or levels of detail depending on small changes in wording.

Over time, it became clear that the prefill phase — during which the entire prompt is processed in parallel — is extremely sensitive to structure. The exact arrangement and wording of the input has a major impact on the resulting KV substrate. Practitioners began to treat prompts not as one-off messages, but as something that could be deliberately designed and reused.

By late 2021 and into 2022, two important developments crystallized the value of prompt templates:

  1. The demonstrated power of few-shot prompting, which showed that including clear input–output examples could strongly guide the model’s behavior.
  2. The practical needs of real-world applications, where inconsistent output formats made automation difficult and unreliable.

Developers started creating reusable prompt “skeletons” with placeholders for variable content. What began as simple string substitution evolved into structured templates supporting examples, constraints, and output specifications. This approach was later formalized in libraries such as LangChain and others, and the term prompt templates became standard.

The core insight was this: instead of hoping for a good KV substrate by chance, we could design the prompt in advance to reliably produce a higher-quality substrate every time it is rendered.

Mechanics of a Productive Template

A prompt template is a static piece of text containing placeholders that are replaced with actual values at runtime. Common placeholder syntax includes {{variable}} (Mustache style), Jinja2 syntax, or language-specific string formatting.

Here is a simple example of a productive template:

You are an expert analyst. Analyze the following text and return your response in valid JSON format only.

Text: {{user_text}}

Instructions:
- Identify the main sentiment
- Extract up to 4 key points
- Rate confidence from 0.0 to 1.0

Return ONLY a JSON object with the following structure:
{
"sentiment": "positive|negative|neutral",
"confidence": number,
"key_points": ["point 1", "point 2", ...],
"summary": "one short sentence"
}

Effective templates tend to share several mechanical properties:

When a template is rendered and passed to the model, the resulting KV substrate is no longer accidental. It has been deliberately shaped by the structure of the template.

Implications: Creating Outputs with Discrete Intent

The most powerful aspect of prompt templates is their ability to define and reliably produce outputs with discrete intent.

Without templates, the model’s output lives in a broad, continuous probability space. A well-designed template narrows this space significantly by embedding strong structural signals directly into the KV substrate during prefill. The decode phase then operates on a context that is already biased toward a specific category of response with clear, unambiguous intent.

Well-crafted templates make it possible to create outputs that:

Because templates act during the prefill phase, they influence the KV substrate at the earliest possible point in the inference process. This gives them far more leverage than trying to enforce structure later during decoding.

Templates do not eliminate the model’s reasoning ability or creativity. Instead, they constrain the form that the output takes. In doing so, they transform an open-ended generation process into something closer to a repeatable, well-defined output with discrete intent — precisely what most practical applications require.

This is why prompt templates became one of the earliest and most fundamental techniques in serious LLM development. They are not just a labor-saving device; they are a practical method for deliberately shaping the KV substrate and, by extension, the entire inference process that follows.