Are Monoliths the Answer?: Simple Circuit Topologies Might Be

Ben Um · March 20, 2026

Every frontier language model today follows the same approach: build one giant model, give it one unified latent space, train it on everything, and let emergence handle the rest. Grok 4.2, Claude 4 Opus, GPT-o3, Gemini 3, DeepSeek V4, Llama 4 — they are all monoliths. Scale the weights, scale the data, unify the representation, and capabilities will appear. It has worked spectacularly.

But scale has conceptual limits, not just compute limits. When every primitive — wild resonance, guarded moderation, kernel storage, salience routing — must live in the same high-dimensional manifold, they fight for representational real estate. Divergence and convergence interfere in the same gradients. Inventing a genuinely new reasoning primitive requires retraining the entire monolith. Interpretability becomes reverse-engineering an overpressured space rather than intentional design.

The Hidden Tax of the Monolith

A single latent space creates pressure:

Dimensional crowding — high-capacity directions get reused for too many roles (polysemanticity returns).
Gradient interference — training signals for creative divergence fight signals for tight coherence.
No clean modularity — new devices cannot be plugged in; everything must be retrained.
Diminishing analogical distance — models excel at interpolating known patterns, struggle with inventing new proportional mappings.

The monolith was the fastest path to emergence. It may no longer be the fastest path to open-ended discovery.

Topology Over Monolith

What if we stop trying to unify everything into one latent space and instead build reasoning as a circuit topology — discrete reasoning devices (models or sub-models), each with its own heterogeneous latent regime, wired together in feedback loops, forward paths, and guarded couplings?

Each device specializes in one primitive:

Hallucination Device — high-variance resonance generator, loose constraints, its own latent space tuned for distant kernel collisions.
Moderator Device — coherence curator, guarded attention, its own space tuned for proportional fidelity and safety.
Kernel Store Device — stable attractor bank, persistent memory.
Attention Router Device — salience allocator, maintains the Goldilocks vigilance band.
Feedback Mixer Device — closes loops, couples refined output back to seeds.

These devices do not need to share one latent space. They only need standardized semantic interfaces: text tokens for coarse coupling, KV cache snippets or lightweight adapters for fine-grained exchange. Reasoning emerges from the wiring — not from representational unification.

This is how analog circuits scale: you don't redesign the entire schematic when you invent a new diode. You add it to the topology, adjust a bias resistor, and the circuit gains new behavior. The same logic applies here.

Simple Circuits First

We can prototype this today with open models:

Fine-tune one Llama-3.1-70B with relaxed objectives (high temperature, diversity loss) → Hallucination Device.
Use Grok-4.2 or Claude-4 as the Moderator Device (coherence & safety alignment).
Connect them in a loop: Hallucination generates 5–20 wild candidates → Moderator scores, prunes, refines → refined kernels feed back as new seeds.
Use text + lightweight cross-attention adapters or reranker for semantic flow.

At this micro/meso scale the topology already delivers:

Micro precision: autoregressive token generation inside each device.
Meso creativity: codified ideas (kernels, analogies) flowing between devices in guarded loops.

No trillion-parameter retrain required. Just intentional specialization and wiring.

A Fractal Horizon

The pattern is self-similar. The same resonance–moderation–crystallization loop that happens inside a device can repeat between devices — and potentially between entire topologies at macro scale. Macro flow — topologies talking to topologies, evolving their own architecture — is the yet-to-be-defined frontier. But we don't need to solve that today. Simple 2–3 device circuits are enough to start.

From Brick to Circuit Board

The series began with a SwiftUI value feeling like a brick — ungrounded, abstract. It ends with a different question: what if we stop trying to understand the brick and start building circuits instead?

Monoliths gave us scale through unification. Topology may give us invention through a different kind of unification — a unified modular approach where discrete reasoning devices, each with its own local latent regime, are wired together thoughtfully. The monolith unified by compression into one manifold. The circuit unifies by connection across many. Both are unified. Only one is modular enough to keep inventing.

The circuit is open. The next device is waiting to be wired in.