In my earlier series on analogy, I began with a very practical problem: I had used SwiftUI for years but never truly felt I understood why “views are values” mattered so much. The official explanations always felt abstract and ungrounded.
Drawing on my background as a longtime Cocoa, AVFoundation, and systems developer, I broke the concept down by connecting it to familiar ideas I already trusted:
- Immutable snapshots and reconciliation (like git diff comparing two states and only patching what changed)
- Fire-and-forget immutable data passing (like the plist notifications I used in batch audio processing tools)
- Disposable descriptions handed off to a system that figures out how to render them efficiently (similar to how declarative UI frameworks work)
From there, the series widened into a broader exploration of how ideas from one domain can be mapped onto another in a structure-preserving way, how coherence can remain smooth or suddenly become turbulent, and how minimal “seeds” of ideas can expand into rich, meaningful connections.
A consistent theme throughout was the tension between surface-level fluency (smooth, plausible output) and deeper mechanical reliability (whether the system is actually doing something robust and consistent under the hood). The series deliberately tried to avoid using high-level terms such as “understanding,” “cognition,” “comprehension,” “analytical,” and “logical.” Instead, it stayed grounded in concrete engineering concepts like snapshots, reconciliation, deconstruction, probing, and modular systems.
The series has a “fork in the road” chapter that provides a link to this series. The fork article simply states:
I'm stepping away from the Analogy series for a short detour. Fresh ideas have already started pulling in a new direction, and I want to follow that thread while the current stack continues to settle and quietly resonate in the background. I'm leaving a marker here as a clear branch point.
This is that new direction. I am now starting a new article series on the topic of "Understanding" in LLMs. I want to explore it rigorously, mechanically, and pragmatically.
The goal is to investigate what "understanding" actually means mechanically inside current large language models. I want to examine the substrate, the limitations, and what a better substrate for genuine mechanical understanding might look like.
Core Tension
Surface-level statistical coherence (what current LLMs excel at: fluent next-token prediction, strong pattern matching, and distributional mimicry) vs. Deeper mechanical understanding (verifiable causal modeling, robust generalization, consistent counterfactual reasoning, and compositional manipulation of internal representations).
Major Directions the Series Could Explore
-
Mechanistic Interpretability: What LLMs Actually Compute
- Examine features, circuits, superposition, and sparse autoencoders as the building blocks of "understanding."
- How much of what looks like understanding is actually shallow, entangled features vs. reusable algorithmic circuits?
- Fruitful angle: Use activation patching and feature steering to test whether models have stable internal representations or just sophisticated pattern completion.
- Paradox: Superposition allows far more concepts than dimensions, but makes clean "understanding" inherently lossy and context-dependent.
Possible chapter titles:
- "Features vs. Circuits: What Is Actually Being Represented?"
- "Superposition and the Fragmentation of Understanding"
-
Philosophy of Language Applied to Transformer Mechanics
- Map Frege’s sense/reference, Wittgenstein’s language games, Quine’s indeterminacy, and Austin’s speech acts onto internal activations and output behavior.
- Do LLMs achieve fluent "language games" (distributional use) but fail at stable reference or truth-conditional evaluation?
- Open question: Can a purely predictive system ever resolve Quinean indeterminacy without additional grounding mechanisms?
Chapter ideas:
- "Sense and Reference Inside the Residual Stream"
- "Language as Use: What LLMs Actually Learn"
-
Cognitive Science Perspectives
- Compare LLMs to schema theory, mental models, levels of processing, and causal vs. statistical reasoning.
- Do transformers build stable, composable schemas, or only transient activations that look schema-like under prompting?
- Paradox: In-context learning sometimes mimics mental model construction, yet systematic generalization often fails.
Chapter ideas:
- "Schemas and Mental Models in Transformer Activations"
- "Statistical Pattern Matching vs. Causal Reasoning"
-
AI Safety and Alignment Views on True vs. Illusory Understanding
- Explore deceptive alignment, illusory coherence in evaluations, and the gap between fluent output and reliable world models.
- If a model can appear to understand during oversight but behave differently off-distribution, how much "understanding" is real?
Chapter ideas:
- "Deceptive Alignment and the Illusion of Understanding"
- "Safety Evaluations: Probing Beyond Surface Fluency"
-
The "Does It Understand?" Debate – Mechanical Audit
- Ground the critiques (Marcus, Mitchell, LeCun, Chomsky) and counter-evidence in observable failure modes and scaling behaviors.
- Focus on specific mechanical weaknesses: reversal curse, hallucinations under distribution shift, poor compositionality.
Chapter ideas:
- "Behavioral Tests vs. Mechanistic Probes"
- "Scaling Laws and the Limits of Statistical Approximation"
-
Information Theory and Compression Perspectives
- View understanding through compression, information bottleneck, and hierarchical representation learning.
- Is "understanding" mostly better compression of the training distribution, or does it require preserving causal/structural information beyond what prediction optimizes for?
- Paradox: Better compression often correlates with better performance, yet may discard the very structure needed for robust understanding.
Chapter ideas:
- "Understanding as Compression: What Gets Lost?"
- "The Information Bottleneck in Transformer Training"
-
Engineering and Systems Requirements for Robust Understanding
- What substrate limitations prevent deeper mechanical understanding (weak binding, no native persistent world models, lack of verifiable causality)?
- Lessons from symbolic AI, GOFAI, connectionism, and hybrid systems.
- Speculative but grounded: What would a better substrate need (explicit memory structures, causal graphs, programmable modules, etc.)?
Chapter ideas:
- "Substrate Limitations: What Attention and Weights Cannot Easily Do"
- "Historical Lessons and Future Substrate Requirements"
