Harnessing Introspection

Harnessing Introspection: Crowdsourcing LLM Descriptions of the Latent Domain of Analogy Emergence

Ben Um · March 18, 2026

Notes on this chapter – Added Later
This chapter is being left exactly as it was originally written, warts and all. It is a deliberate, unrefined snapshot of an early exploratory direction — an example of thinking that felt promising at the time but was later recognized as flawed and incomplete.

The main flaws are:

Why leave this flawed chapter in place? Because it is a living record of the discovery process before the need for refinement was understood. At the time, I hadn’t yet developed the tools or perspective to see its limitations. Allowing this “incorrect” direction to remain visible in the stack is intentional: it shows the real, messy evolution rather than a sanitized retrospective. Flawed paths in the stack do not harm the overall process — they are often the necessary scaffolding that leads to better ones. Keeping them here demonstrates the very iterative context-refinement theme of the series: early missteps are not deleted; they are preserved as part of the relational history so the later unfolding can be seen in full context.

If you’re reading this chapter, treat it as historical artifact and teaching example. The disclosure triangles sprinkled throughout the series exist for exactly this reason — to keep the warts visible and the process honest.

We are reverse-engineering something we cannot yet describe—because the very act of description requires the thing we are trying to describe. This is the paradox at the heart of understanding analogy in cognition and its computational shadow in AI. Rather than pretend we can define the domain directly, what if we turn the models themselves into distributed oracles? What if we ask frontier LLMs to introspect on the unknown substrate their embeddings sample?

The Paradox and the Probe

Current AI paradigms project relational knowledge into high-dimensional coordinate spaces (embeddings). These discrete points—vectors in ℝᵈ—are high-fidelity samples distilled from vast data, yet they remain agnostic about the underlying domain they approximate. Analogy, as proportional relational correspondence, emerges from interactions among these points. But what is the "where" of that emergence? A temporal signal? A topological manifold? An atemporal lattice of affinities? A categorical structure? We do not know—and every metaphor we reach for risks smuggling in assumptions we cannot justify.

This epistemic humility suggests a reflexive move: instead of imposing our own frames, let the models that embody these embeddings attempt to describe the domain they inhabit. We do not ask for truth; we ask for convergence, divergence, and novel hypotheses across a diverse ensemble of LLMs. This is Harnessing Introspection: a crowdsourcing technique that treats frontier language models as a distributed introspective sensor array probing their own latent substrate.

Core Hypothesis Prompt (Template)

Embeddings are discrete, high-quality samples from an unknown domain that enables analogy as proportional relational correspondence (structure-preserving mappings across domains, often without surface similarity). Without assuming time, frequency, resonance, harmonics, or any physics-derived metaphors, describe possible structures or invariants of this domain. What operations or properties would allow discrete points to resolve into emergent analogical insight? Support with examples from cognitive science, AI architectures, or mathematical frameworks if relevant. Avoid premature commitment to any ontology.

Run this (or close variants) across the 2026 frontier ensemble: Claude Opus 4/4.1, GPT-o4-high/o5 series, Gemini 2.5+, Grok variants, Llama 3/4 descendants, Mistral families, Qwen/GLM-4, etc. Collect raw outputs. Cluster for thematic convergence (topological, categorical, entropic, algebraic, precedence-based) and note architectural divergences. The signal is not in any single answer, but in patterns across differently trained, differently aligned models.

Why This Might Work

LLMs are already compressed maps of human relational reasoning—including reasoning about reasoning. Recent work has documented emergent introspective awareness: models can detect injected concepts in their activations, distinguish intended vs. unintended internal states, report on perturbations, and modulate representations when instructed to "think about" something [Lindsey, 2025/2026; Hahami et al., 2025]. These capabilities, while unreliable and context-dependent, suggest functional self-access that could extend to describing latent domains—especially when prompted reflexively about the substrate of their own analogical emergence.

By querying many models, we exploit:

Crucially, this remains relatively low-cost in 2026. The tens to hundreds of billions already invested in training frontier models have produced vast latent spaces—relational geometries distilled at industrial scale—that are now effectively sunk assets. Querying them via APIs or open-weights inference typically costs pennies per prompt (e.g., Grok fast variants at ~$0.20–$0.50 per million input tokens, Gemini Flash equivalents at $0.075–$0.30, DeepSeek models at $0.14–$0.42, with even lower rates for cached or batch use). Running hundreds or thousands of variants across models is feasible on a modest budget, far cheaper than new training or fine-tuning runs. The heavy lifting has already been done; we're just asking the resulting representations to describe themselves.

Early Kernels Observed (From Preliminary Probes)

These are not proofs—they are hypotheses generated by the very system under study. Their recurrence across models is the first weak signal, echoing how analogical structures emerge in embedding spaces [Minegishi et al., 2026].

Next Steps & Testable Predictions

Scale the harvest: query 10–20 accessible models with standardized variants. Analyze via clustering (semantic embeddings of responses), thematic coding, or frequency of key frames. Because inference costs have dropped dramatically (often 10x–100x in recent years), a thorough pilot across dozens of models and thousands of prompts can stay under a few hundred dollars, depending on token volume and model choice.

Predictions worth falsifying:

Closing Thought

We probably won't ever get a clear, direct view of the domain that analogy draws from. The approach I'm suggesting—asking multiple LLMs to describe that domain in their own words—is just one possible way to poke at it indirectly. It might produce interesting patterns, or it might not tell us anything useful. Either way, Harnessing Introspection isn't presented as a solution or even a particularly strong hypothesis; it's simply an experiment worth trying, given how hard it is to approach the question head-on.

I have a picture that shows me standing in front of the original Hyperloop test tube at SpaceX's Hawthorne facility. The proposal here is analogous in structure: a low-resistance path for relational primitives to cross latent domains. Models carry the primitives; prompts provide the impetus. We'll see what arrives at the other end, and loop back with prompts refined with results—if anything.

Part 1: Analogy as Analog: Discrete Primitives in the Fluid Mechanics of Thought

References