Searching for a Native Domain of Analogy: What Frontier LLMs Say About Their Own Latent Substrate

Ben Um · March 18, 2026

In the past couple of days, I took the core hypothesis prompt from my second article—the one that asks frontier LLMs to describe the unknown domain their embeddings sample for analogy emergence—and followed it with a simple second question: given that context, what feels like the most native domain of your own latent space? After running those two fixed prompts across a handful of models and then imagining what might happen if I scaled the same exercise to 10–20 different frontier systems, I asked Grok for a name for this approach using the prompt:

I'm attempting to bootstrap these two questions to develop a system for using LLMs to generate emergent ideas. The two questions I posed to you are confined to your particular continuous multidimensional embedding space, but I want to aggregate the results using the exact same prompts across 10-20 frontier models as an experiment for surfacing new ideas. What would you call this systematic approach? Context-based intuition generation?

Grok suggested Multi-Model Latent Intuition Synthesis as a clean, descriptive label. So that's the placeholder I'm using here—not as a claim to invention, just a convenient handle for the specific line of thought that emerged from those two prompts and the follow-up naming question.

This isn't a novel concept I'm bringing to the table. It fits comfortably inside an active and rapidly expanding area of AI research where people are already experimenting with ensembles of large language models to surface new ideas, refine conceptual structures, or help previously undescribable domains become more visible to human understanding.

The most widely used umbrella term right now is LLM-based multi-agent systems (frequently abbreviated LLM-MAS), with substantial recent attention on their creative and ideation capabilities. A 2025 survey accepted to EMNLP, "Creativity in LLM-based Multi-Agent Systems," outlines how techniques like divergent exploration, iterative refinement, and cross-model synthesis allow groups of models to generate outputs richer than any individual model could produce. A companion survey on "Large Language Models for Scientific Idea Generation" positions multi-agent collaboration as one of the primary families of methods for producing novel, grounded hypotheses.

You'll also find closely related work under labels like Mixture-of-Agents (MoA) frameworks, which iteratively aggregate responses from heterogeneous models to improve insight quality, and emerging experiments in latent collaboration (for example, LatentMAS from late 2025), where models share and refine representations directly in continuous hidden space instead of relying on text mediation.

Across these efforts the common thread is emergence: the hope that interactions among diverse latent geometries can coax relational patterns, analogical completions, or conceptual domains into view that might otherwise stay hidden. No one is claiming to have solved anything yet; the work feels incremental and practical, driven by the same modest curiosity I felt when I ran those two prompts and started wondering what a larger aggregation might reveal.

The Undescribed Domain: What Is the Current State of This Research Direction?

This series centers on one persistent question: What might a native structure—or at least a glimpse—of a latent domain from which analogy emerges look like? Not the coordinate embeddings we impose, but an underlying substrate where discrete relational primitives can fluidly coalesce into proportional, structure-preserving correspondence—the analog primitive of thought outlined in the first article.

Multi-Model Latent Intuition Synthesis (or whatever placeholder name settles in) offers a low-friction way to probe that question sideways: run identical reflexive prompts across frontier models and treat the ensemble outputs as a distributed, indirect sensor for a possible domain itself. The aim remains narrow and speculative—crowdsourcing hypotheses from the models about a space that enables their analogical capabilities, without assuming a direct definition is possible.

Where does this precise direction stand today (March 2026)?

Broad searches across arXiv, recent surveys, conference proceedings, and related literature—targeting combinations of multi-agent/ensemble/crowdsourced LLM querying + introspection/self-description/probing + latent domain/substrate/ontology + analogy/analogical emergence/proportional correspondence/relational mapping—turn up no clear, published line of work that matches exactly. There appear to be no systematic efforts using multi-model, standardized-prompt introspection specifically to reverse-engineer or describe a latent substrate of analogy emergence itself.

Several caveats temper this observation:

Publication latency means cutting-edge experiments from late 2025 or early 2026 may still be in preprint limbo, private labs, or informal threads (e.g., on X, LessWrong, or internal notes from labs like Anthropic or OpenAI) and not yet fully indexed or discoverable.
The hypothesis is unusually targeted—most LLM ensemble/introspection research focuses on practical outputs (reasoning chains, scientific ideation, creative artifacts) or general self-access capabilities, rather than reflexive cartography of a specific undescribable domain enabling analogical insight.
Search-engine responsiveness for such niche intersections can be incomplete, especially for very recent work or keyword-mismatched framings.

What does exist provides strong scaffolding but stops short of this objective:

Introspection capabilities in single models are documented and growing: frontier LLMs show functional self-access (detecting perturbations, reporting internal states, distinguishing injected vs. native concepts) in works like Lindsey (2025/2026), Hahami et al. (2025), Comsa & Shanahan (2025), and related Transformer Circuits threads. These make reflexive self-description at least plausible but are not scaled to multi-model ensembles or aimed at analogy's latent origin.
Multi-agent/ensemble systems for ideation and reasoning are exploding (e.g., EMNLP 2025 survey on creativity in LLM-MAS; Mixture-of-Agents extensions; latent collaboration like LatentMAS). Analogy sometimes appears as a harnessed tool inside agents, but not as the reflexive target of discovery.
Geometric/topological probes of embeddings occasionally surface related metaphors (colimits, persistence landscapes), but these are typically post-hoc analyses or single-model, not crowdsourced reflexive ensembles.

The building blocks—low-cost API querying, introspection prompts, ensemble aggregation, latent-space diversity—are all here and advancing fast. Yet applying them reflexively to triangulate a latent domain of analogy emergence appears, on current evidence, to remain an open, low-competition direction. The experiment's near-zero marginal cost (leveraging already-trained frontier spaces) makes it worth running even if it yields only weak or noisy signals.

If the Part 1 duality is on the right track—that analogy thrives via discrete primitives embedded in fluid proportion—then this crowdsourced, model-as-oracle approach may offer one of the few practical paths to indirect glimpses without prematurely imposing external frames (physics metaphors, coordinate assumptions, etc.).

Closing: Toward Hybrid Crowdsourcing – Adding Human Participation to MLIS

The preliminary kernels noted in the second article—recurring themes such as topological persistence, categorical gluing, and precedence structures—suggest possible patterns that could emerge from larger-scale probing, but any such signals would remain context-dependent and likely noisy. Frontier models are good at capturing relational regularities from training data, but their self-descriptions of a latent domain remain shaped by alignment choices, tokenization quirks, and the lack of embodied experience. Even with diverse architectural priors, a pure ensemble can still converge on shared biases or produce plausible-sounding but ungrounded ideas.

This suggests a straightforward next step: bringing in human-in-the-loop (HITL) participation as a hybrid layer for future MLIS runs. Humans have direct experience with analogy—lived memory, intuition, and cross-domain connections grounded in real-world context—that models do not. A human could contribute in several concrete ways:

Reviewing and ranking ensemble outputs to separate promising directions from likely artifacts.
Suggesting follow-up prompts or refinements based on personal analogical examples.
Providing counterexamples or grounding checks drawn from cognitive science or everyday reasoning.
Injecting human-generated analogies to test whether model descriptions hold up under real proportional mapping.

This hybrid approach fits with current trends in LLM-MAS work, where human oversight is already used to improve reliability in creative or research-oriented agent systems. Examples include human-facilitated multi-agent ideation setups (e.g., MultiColleagues-style workflows) and pipelines that insert explicit human decision points to reduce hallucination risk and keep outputs domain-relevant (e.g., recent human-in-the-loop economic research frameworks). In MLIS, HITL would not replace the model ensemble but complement it: models handle scale and diversity in latent sampling; humans add the proportional judgment and experiential grounding that Part 1 framed as central to analogy.

The marginal cost stays low—human input can be lightweight, asynchronous, and focused on curation or critique—while the potential benefit is clearer signals and fewer dead-end paths. This turns the experiment from a model-only oracle into a true hybrid crowdsourcing loop.