How I Think About Agents in Product

while (world) { observe(); infer(); hallucinate?(); self-correct(); } while (world) { observe(); infer(); hallucinate?(); self-correct(); } while (world) { observe(); infer(); hallucinate?(); self-correct(); }

001101 010011 sigil::bagua seed::entropy trace::mechanical awe::human 001101 010011 sigil::bagua seed::entropy trace::mechanical awe::human 001101 010011 sigil::bagua seed::entropy trace::mechanical awe::human

temperature=0.92 top_p=0.95 sampler=stochastic runtime=deterministic user=astonished temperature=0.92 top_p=0.95 sampler=stochastic runtime=deterministic user=astonished temperature=0.92 top_p=0.95 sampler=stochastic runtime=deterministic user=astonished

☰☱☲☳☴☵☶☷ oracle != truth ritual == interface meaning <- interpretation ☰☱☲☳☴☵☶☷ oracle != truth ritual == interface meaning <- interpretation ☰☱☲☳☴☵☶☷ oracle != truth ritual == interface meaning <- interpretation

seed = hash(question + state); noise = sample(temperature); pattern = deterministic(seed, noise); if (human_can_track === false) mark("mystic"); return explainability_gap(pattern); seed = hash(question + state); noise = sample(temperature); pattern = deterministic(seed, noise); if (human_can_track === false) mark("mystic"); return explainability_gap(pattern);

## Not a person, still persuasive - interface implies intention - language implies confidence - user infers agency => design for interpretability ## Not a person, still persuasive - interface implies intention - language implies confidence - user infers agency => design for interpretability

cast.bagua = pickTrigrams(seed); cast.moonBlocks = deriveHexagram(seed); cast.fortuneSticks = burnModel(intensity); cast.scapula = generateCracks(seed); return readable_fiction(cast); cast.bagua = pickTrigrams(seed); cast.moonBlocks = deriveHexagram(seed); cast.fortuneSticks = burnModel(intensity); cast.scapula = generateCracks(seed); return readable_fiction(cast);

{ "observe": true, "decide": constrained, "act": reversible, "measure": behavior, "learn": weekly } { "observe": true, "decide": constrained, "act": reversible, "measure": behavior, "learn": weekly }

def perceivable_randomness(system): return complexity(system) > attention_budget if perceivable_randomness(llm): user.labels_output = "fate" def perceivable_randomness(system): return complexity(system) > attention_budget if perceivable_randomness(llm): user.labels_output = "fate"

[trace] t=02:13 system murmurs [trace] tokens fall like ash [trace] certainty simulated [trace] mechanism remains [trace] human names it chance [trace] t=02:13 system murmurs [trace] tokens fall like ash [trace] certainty simulated [trace] mechanism remains [trace] human names it chance

Agents are getting framed like autonomous coworkers. That framing is exciting and often misleading.

My framing is simpler:

An agent is a workflow with memory, tools, and risk.

The core agent loop

Most useful agents execute some variant of:

observe context
decide next step
act via tool
report outcome
update memory/state

If you can’t describe your loop this clearly, you don’t have an agent system yet.

What separates toy agents from production agents

Production agents need:

explicit permissions
bounded action spaces
approval gates for risky steps
complete event logs
rollback mechanics

Without these, you get automation theater plus operational anxiety.

Start with a single-agent orchestrator + deterministic tools before multi-agent choreography.

Reasons:

easier debugging
clearer ownership
lower coordination overhead
fewer emergent failure modes

Multi-agent only pays off once single-agent constraints become the bottleneck.

Memory should be designed, not improvised

There are three memory layers I like:

session memory (current task)
working memory (recent patterns/preferences)
durable memory (explicit approved facts)

Conflating these is a common source of weird behavior.

What to measure for agents

task success rate
supervision load (how often humans intervene)
error severity distribution
recovery time after failure
user trust trend over repeated usage

This tells you whether the agent is helping or creating hidden labor.

Final stance

I’m bullish on agents. I’m just allergic to magical thinking.

Treat agents like junior teammates:

define scope,
instrument behavior,
review outputs,
coach continuously.

That’s when they become genuinely useful.

The core agent loop

What separates toy agents from production agents

Agent architecture choice I recommend first

Memory should be designed, not improvised

What to measure for agents

Final stance