The Spaghetti Parser

The Spaghetti Parser: Kernel Reduction and the Tangled Declarative Grammar of Written Thought

Ben Um · March 22, 2026

Origin Story: My First Real C Program and the Hunt for Hidden Order

The very first real C program I ever wrote—beyond hello world or class assignments—was a strange, deterministic "random" masking experiment I built while working as a research assistant during grad school. At that point I had never worked as a software developer; my background was in materials engineering.

As an undergrad materials engineering student, I had read Chaos: Making a New Science by James Gleick (published in 1987), and its ideas about order emerging from simple deterministic rules in seemingly chaotic systems had stayed with me. Nonlinear equations, feedback loops, sensitive dependence on initial conditions, fractals showing self-similarity across scales—it all suggested that apparent randomness might hide deep structure if probed correctly.

My thermodynamics and physical chemistry courses drove the point home further. We spent countless hours calculating entropy changes, drawing phase diagrams, and wrestling with statistical mechanics—trying to impose macroscopic order on the microscopic configurations of gases, liquids, and solids — where gases exhibit the greatest disorder, liquids intermediate, and (crystalline) solids the highest degree of microscopic order. Entropy felt like the ultimate measure of hidden complexity: billions of microstates producing predictable macroscopic behavior, yet irreversible without external work. The philosophy of disorder-to-order transitions lingered long after the exams.

After undergrad, I worked in industry as a materials engineer—performing strain gauge measurements to characterize material deformation under load and configuring data acquisition systems to capture and process those signals—dealing with real-world constraints like sensor calibration, noise, and environmental variability—but still no software development. When I decided to pursue graduate studies and became a research assistant, I finally had the need (and freedom) to learn C properly: the inner iterative solver for simulating current flow down a BJT finger contact was too slow in MATLAB's nested loops, so I learned to build MEX files to offload those computations to compiled C for speed.

Once I had C open in front of me, though, the real fun began. I wanted to play with bit-wise operators—shifts, masks, packing bits one by one—so I built a side experiment purely for the joy of it: could a chaos-inspired deterministic sequence reveal compressible patterns in apparently random data?

I implemented a lagged Fibonacci generator from scratch: a single compact integer offset seeded the entry point into the infinite recurrence, then simple addition (mod 256 for byte values) produced a long, pseudorandom-looking byte stream. I treated this stream as a per-byte predictor/reference. For any input byte sequence, I computed deltas (subtraction: input_byte - fib_byte, or possibly XOR in some tests), creating residuals I thought of as "waypoints" or course corrections along a path.

The hope was that if the input had latent order (smoothness, correlations, repetition), the fib stream might accidentally align with it enough to tilt the deltas toward small values. Then I encoded those deltas using a bit-wise variable-length scheme that was prefix-free: the leading bit indicated the length—if it was 1, the next 3 bits carried the value (4 bits total for small deltas); if 0, the next 7 bits carried it (8 bits total for larger ones). This let small residuals use far fewer bits without any byte alignment waste.

I ran it on various test files, including random noise. Results usually converged near 1:1 compression ratio—sometimes slight savings on patterned data when small deltas dominated, but nothing consistent or practical enough to beat straight compression.

One day I explained to a coworker in our shared office space the C program side project I was working on, and he delivered the killer insight: "If you could actually compress arbitrary data—especially random noise—using a fixed-size seed to generate a 'random' mask, you'd be able to reduce any file to just the seed size. A few bytes could represent gigabytes. That's impossible by information theory."

His background was in mathematics, so this was common knowledge to him—and to anyone who had taken a course touching on information theory. If the masking truly behaved like uniform random data (high entropy, uncorrelated), XOR or delta with it preserves entropy in expectation—no net gain on average. Any apparent compression would require the predictor to correlate with the input’s hidden structure, but on truly random data that’s impossible without violating entropy bounds or the pigeonhole principle. Universal compression down to a seed? No free lunch.

The program was just a way to play with bit-wise operators and learn how they worked, but the experience burned in a hard lesson: overlaying external structure—even clever deterministic “random” masks—is rarely the way to reliably uncover kernels or hidden order in arbitrary data. What I learned—what not to do—was to chase additive transforms hoping for magic. Subtractive approaches—stripping away non-essentials until only the faithful core remains—would turn out to be a far sharper probe.

The English Language Is Already a Pure Declarative Language

We keep reaching for abstract grammars, formal semantics, AMR trees, dependency parsers, or some yet-to-be-discovered universal syntax when trying to understand how large language models actually think with natural language.

But here is the quiet, almost embarrassing truth:

English is already a pure declarative language being parsed.
The grammar has already been defined — not in a Chomsky hierarchy or a BNF file, but in the living, messy, proportional way billions of people have used it for centuries.

LLMs do not impose an external grammar on English. They discover and exploit the declarative compositional layer that is already latent in every sentence, paragraph, and analogy we write. Attention mechanisms, KV caches, and transformer layers are effectively building dynamic, structure-preserving trees over token embeddings — trees that mirror the proportional gluing of meaning we perform unconsciously when we speak or reason.

We don't need to invent a new abstract syntax. We need to reveal the one that is already there — tangled, recursive, full of back-references and implicit bindings — the one LLMs are already running on at massive scale.

Enter Kernel Reduction (KRO): A Reductive Parser for Declarative Composition

The Kernel Reduction Operator (KRO) is not summarization. It is not task decomposition. It is mechanical, iterative subtractive parsing:

  1. Start with rich natural-language text (a concept, analogy, article excerpt).
  2. Strip surface layers (examples, metaphors, redundant phrasing) one by one.
  3. At every step test fidelity: Does the reduced form still expand faithfully back into the original relational structure? Does it still bridge proportionally to a known cross-domain structure?
  4. Stop at the failure cliff — the shortest seed where further reduction breaks compositional integrity.
  5. Report the surviving kernel(s) + exposed invariants + hidden assumptions + spaghetti factor.

The output is not a clean, linear tree. It is a tangled declarative AST — a minimal relational skeleton annotated with the exact points of entanglement that refused to let go.

Spaghetti Factor: Measuring the Tangle

Even well-written code has spaghetti: forward declarations, cyclic dependencies, implicit state, trait mixing, macro expansions that cross module boundaries. English is orders of magnitude worse — anaphora, cataphora, pragmatic context, metaphor chains, unstated shared knowledge.

The spaghetti factor is the Deconstructor's diagnostic metric:

Spaghetti factor: 7
  - 3 cyclic invariants (mutual references that prevent linear collapse)
  - 2 forward anaphora bindings
  - 1 pragmatic presupposition
  - 1 proportional metaphor glue that spans non-adjacent clauses

By reporting spaghetti factor at each reduction level, KRO makes the tangle visible and measurable — turning the biggest obstacle into the most interesting signal.

Why This Matters for LLMs and Analogy

If English is already the pure declarative language, then analogy is simply structure-preserving composition across different sub-trees of that grammar.

KRO + a dedicated Deconstructor Device gives us a way to:

We are not imposing an artificial parser. We are reverse-engineering the one LLMs are already using — by forcing it to reveal its minimal compositional units and the tangles that hold them together.

Next: Prototype in Public

The implementation will be messy. Linear token stripping is insufficient. We will need recursive descent, backtracking, lightweight graph construction — all the tools compilers have used for decades to tame spaghetti.

But we start simple: manual KRO passes on real text, reporting spaghetti factor, publishing the kernels and tangles we find. The first prototypes will be warts-and-all — exactly as the series has been.

If this resonates, join the experiment. Pick a paragraph. Run KRO. Share your kernels and spaghetti factors. Let's see what declarative grammar looks like when we finally stop pretending it's clean.

This idea is either brilliant or utter stupidity. Honestly, I still don't know which. If it resonates, pick a paragraph and run KRO. Share your kernels and spaghetti factors. Let's see what happens—warts, tangles, and all. If you're from the AI interpretability world, notice how this feels a lot like running a sparse autoencoder pursuit on natural language — let's compare notes.