Why LLMs Remind Me of Pinball | Oliver 'Oli' Cheng

while (world) { observe(); infer(); hallucinate?(); self-correct(); } while (world) { observe(); infer(); hallucinate?(); self-correct(); } while (world) { observe(); infer(); hallucinate?(); self-correct(); }

001101 010011 sigil::bagua seed::entropy trace::mechanical awe::human 001101 010011 sigil::bagua seed::entropy trace::mechanical awe::human 001101 010011 sigil::bagua seed::entropy trace::mechanical awe::human

temperature=0.92 top_p=0.95 sampler=stochastic runtime=deterministic user=astonished temperature=0.92 top_p=0.95 sampler=stochastic runtime=deterministic user=astonished temperature=0.92 top_p=0.95 sampler=stochastic runtime=deterministic user=astonished

☰☱☲☳☴☵☶☷ oracle != truth ritual == interface meaning <- interpretation ☰☱☲☳☴☵☶☷ oracle != truth ritual == interface meaning <- interpretation ☰☱☲☳☴☵☶☷ oracle != truth ritual == interface meaning <- interpretation

seed = hash(question + state); noise = sample(temperature); pattern = deterministic(seed, noise); if (human_can_track === false) mark("mystic"); return explainability_gap(pattern); seed = hash(question + state); noise = sample(temperature); pattern = deterministic(seed, noise); if (human_can_track === false) mark("mystic"); return explainability_gap(pattern);

## Not a person, still persuasive - interface implies intention - language implies confidence - user infers agency => design for interpretability ## Not a person, still persuasive - interface implies intention - language implies confidence - user infers agency => design for interpretability

cast.bagua = pickTrigrams(seed); cast.moonBlocks = deriveHexagram(seed); cast.fortuneSticks = burnModel(intensity); cast.scapula = generateCracks(seed); return readable_fiction(cast); cast.bagua = pickTrigrams(seed); cast.moonBlocks = deriveHexagram(seed); cast.fortuneSticks = burnModel(intensity); cast.scapula = generateCracks(seed); return readable_fiction(cast);

{ "observe": true, "decide": constrained, "act": reversible, "measure": behavior, "learn": weekly } { "observe": true, "decide": constrained, "act": reversible, "measure": behavior, "learn": weekly }

def perceivable_randomness(system): return complexity(system) > attention_budget if perceivable_randomness(llm): user.labels_output = "fate" def perceivable_randomness(system): return complexity(system) > attention_budget if perceivable_randomness(llm): user.labels_output = "fate"

[trace] t=02:13 system murmurs [trace] tokens fall like ash [trace] certainty simulated [trace] mechanism remains [trace] human names it chance [trace] t=02:13 system murmurs [trace] tokens fall like ash [trace] certainty simulated [trace] mechanism remains [trace] human names it chance

LLMs feel like pinball to me.

You drop in one token, and it falls through a machine full of structure. The output looks lively and surprising, but the machine is still doing math all the way down.

The ball is the next token

At each step, the model predicts a distribution over possible next tokens. Then one token is selected. That selected token becomes the next ball in motion.

Do this repeatedly and you get a sentence, then a paragraph, then a worldview-shaped answer.

The machine is the network

A pinball table has geometry, bumpers, rails, friction, angles. An LLM has layers, weights, attention paths, normalization, and decoding settings.

Both systems constrain motion. Both systems make some trajectories easy and others unlikely.

The flashy lights are not the mechanism. They are byproducts of the mechanism.

Same with chat output: the style, excitement, and confidence are surface effects of matrix operations applied in sequence.

Where “non-deterministic” fits

People say LLMs are random. That is only half-true.

Given fixed model weights, fixed prompt, fixed context window, fixed seed, and fixed decoding settings, you can reproduce behavior. Change sampling temperature, top-p, or seed and you change which local branch gets chosen.

That is pinball-like:

The table is fixed.
The ball can still take different local paths.
Tiny differences early can cascade into visibly different outcomes.

Prompts are paddles and lane guides

You do have control, just not total control.

System prompts, structure, examples, tool constraints, and output schemas are your paddles. You cannot dictate every bounce, but you can dramatically change the region of state space the ball is likely to visit.

That is why prompt design works. Not because language is magic, but because constraints reshape trajectory.

Why even the machine designer can’t predict every ball

The builder of a pinball machine knows the parts, but cannot tell you the exact path of every future ball.

Same with model designers. They know training pipeline, architecture, and evaluation behavior. They still cannot pre-compute every token path in every future conversation.

Not because rules are absent. Because the interaction surface is enormous.

The equation view

At an abstract level, each step is still:

next_token ~ sample(softmax(f(weights, prompt, context, token_history)))

That is it. No ghost in the shell required.

But “just an equation” does not mean “trivial to reason about” in live use.

Why this model helps builders

Pinball is a good mental model because it avoids two bad extremes:

Anthropomorphism: “the model wants this”
Dismissal: “it is just autocomplete so nothing matters”

A better framing:

it is mechanical,
it is steerable,
it is path-sensitive,
and it can still surprise you in production.

For product work, that implies:

design good rails (input contracts, output schemas),
add paddles (guardrails, tool boundaries, retry strategies),
instrument outcomes (not just token usage),
treat weird outputs as trajectory bugs, not mystical events.

That gives you better systems and less AI theater.