The GPT-3.5 to GPT-4 Inflection Point

while (world) { observe(); infer(); hallucinate?(); self-correct(); } while (world) { observe(); infer(); hallucinate?(); self-correct(); } while (world) { observe(); infer(); hallucinate?(); self-correct(); }

001101 010011 sigil::bagua seed::entropy trace::mechanical awe::human 001101 010011 sigil::bagua seed::entropy trace::mechanical awe::human 001101 010011 sigil::bagua seed::entropy trace::mechanical awe::human

temperature=0.92 top_p=0.95 sampler=stochastic runtime=deterministic user=astonished temperature=0.92 top_p=0.95 sampler=stochastic runtime=deterministic user=astonished temperature=0.92 top_p=0.95 sampler=stochastic runtime=deterministic user=astonished

☰☱☲☳☴☵☶☷ oracle != truth ritual == interface meaning <- interpretation ☰☱☲☳☴☵☶☷ oracle != truth ritual == interface meaning <- interpretation ☰☱☲☳☴☵☶☷ oracle != truth ritual == interface meaning <- interpretation

seed = hash(question + state); noise = sample(temperature); pattern = deterministic(seed, noise); if (human_can_track === false) mark("mystic"); return explainability_gap(pattern); seed = hash(question + state); noise = sample(temperature); pattern = deterministic(seed, noise); if (human_can_track === false) mark("mystic"); return explainability_gap(pattern);

## Not a person, still persuasive - interface implies intention - language implies confidence - user infers agency => design for interpretability ## Not a person, still persuasive - interface implies intention - language implies confidence - user infers agency => design for interpretability

cast.bagua = pickTrigrams(seed); cast.moonBlocks = deriveHexagram(seed); cast.fortuneSticks = burnModel(intensity); cast.scapula = generateCracks(seed); return readable_fiction(cast); cast.bagua = pickTrigrams(seed); cast.moonBlocks = deriveHexagram(seed); cast.fortuneSticks = burnModel(intensity); cast.scapula = generateCracks(seed); return readable_fiction(cast);

{ "observe": true, "decide": constrained, "act": reversible, "measure": behavior, "learn": weekly } { "observe": true, "decide": constrained, "act": reversible, "measure": behavior, "learn": weekly }

def perceivable_randomness(system): return complexity(system) > attention_budget if perceivable_randomness(llm): user.labels_output = "fate" def perceivable_randomness(system): return complexity(system) > attention_budget if perceivable_randomness(llm): user.labels_output = "fate"

[trace] t=02:13 system murmurs [trace] tokens fall like ash [trace] certainty simulated [trace] mechanism remains [trace] human names it chance [trace] t=02:13 system murmurs [trace] tokens fall like ash [trace] certainty simulated [trace] mechanism remains [trace] human names it chance

The public AI timeline did not move in smooth increments. It moved in jumps, and GPT-3.5 to GPT-4 was the first mainstream one many builders felt directly.

Two anchor dates still matter:

November 30, 2022: ChatGPT shipped on GPT-3.5.
March 14, 2023: GPT-4 launched.

But they were the start of a broader pattern, not the end of it.

What GPT-3.5 changed

GPT-3.5 made language interfaces mainstream overnight. The key unlock was not perfect answers. It was conversational usability at scale.

For product builders, that changed three assumptions:

Users would tolerate probabilistic outputs if the UX was fast and legible.
Writing quality became a core product surface, not just a model artifact.
“Good enough” assistants could create daily habits even with visible flaws.

What GPT-4 changed

GPT-4 raised the floor for reasoning-heavy tasks and long-context synthesis. This is where many teams learned a hard lesson: upgrading model quality does not fix weak product design.

The teams that won were the ones that paired better models with:

tighter prompts,
explicit response formats,
guardrails and recovery flows,
and real usage instrumentation.

The pattern repeated beyond GPT

You can see the same cycle in later releases too:

Claude 3 / 3.5,
Gemini 1.5,
GPT-4o,
DeepSeek-V3 and R1,
and the next open-model waves after that.

Different model families, same product dynamic:

Demo shock.
Over-promising by product teams.
Reliability pain in production.
A new wave of disciplined builders who add constraints and eval loops.

The core mistake is always the same: treating model capability as a product strategy.

Practical takeaway when any model drops

When a new model drops, I run the same checklist before touching roadmap scope:

Question	Why it matters
Does this reduce user effort in an existing workflow?	Prevents novelty-driven detours
What error class does it actually improve?	Forces measurable claim
What fallback is now required?	Better models can still fail badly
Does this unlock a simpler UI?	UX simplification is often the real value

If we cannot answer these in one meeting, we do not expand scope.

Bottom line

GPT-3.5 and GPT-4 were not just model milestones. They reset user expectations for speed, fluency, and usefulness.

Later model families confirmed the same thing.

The lesson was never “just use the latest model.” The lesson is: model jumps reward teams that can redesign the whole product loop quickly.

Best,
Oli
April 18, 2024