AI Architecture·Tuesday, May 12, 2026·6 min read

Every few months, the cycle repeats. A new paper drops,

BE

Braxton Ellsworth

AI Systems Architect

The Real Line Between Capability Elicitation and Creation in AI Post-Training

Every few months, the cycle repeats. A new paper drops, Twitter fills with takes, and product teams scramble to update their roadmaps. This week’s debate: are we actually making language models smarter, or just coaxing out tricks they already knew? The question’s not academic. It’s baked into every product demo, every “fine-tuned” chatbot, every claim about alignment or capability leaps.

But most teams treat “post-training” like a single skill. They lump supervised fine-tuning, reinforcement learning, and prompt engineering into a bucket labeled “make model better.” They adjust their prompts, tweak their reward signals, and celebrate when the model starts following new patterns. The assumption is always the same: if the model starts behaving differently, we must have given it a new power.

That's the mistake. Most people think any improvement signals a new capability. But the truth is, we're often just rearranging existing abilities.

The False Comfort of Surface-Level Skill

The real divide isn’t between SFT and RL, or between prompt tuning and full-stack retraining. It’s not about which post-training method is flashier, or which yields more impressive outputs on a test set.

The real divide is between elicitation and creation.

And most practitioners miss it entirely.

Yuhao Li and colleagues laid out the core distinction in “On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective.” They introduce the notion of accessible support.

The set of behaviors a model can actually reach, given its architecture and the constraints of a finite update budget. In practical terms: accessible support is the slice of all possible outputs that the model can generate without fundamentally changing what it is.

Post-training, then, comes in two forms. Capability elicitation means moving probability mass around inside that accessible support.

Making some behaviors more likely, others less so, but never stepping outside the model’s true reach.

Capability creation, by contrast, expands the support itself. It’s not just making the model more likely to do something it already could. It’s enabling behaviors that truly weren’t possible before.

Almost every post-training technique in industry today

From SFT to RLHF

Is designed on the hope that we’re creating new capabilities.

But in reality, what we’re usually doing is reweighting. We’re tuning the probability distribution, not growing the underlying capacity. When a chatbot suddenly starts following instructions better after some RLHF, it’s tempting to call this a breakthrough in reasoning. But under the hood, it’s more often a reprioritization of behaviors the model already could produce, just with lower likelihood.

The SFT-as-imitation, RL-as-discovery split is too crude. Both can, in practice, either elicit or create.

But most implementations settle for the former. That’s not a moral failing. It’s a structural artifact of how large language models are engineered. When your updates are constrained by compute, data, and time, you end up nudging the model within its inherited space rather than opening new doors.

This distinction isn’t pedantic.

It’s foundational for anyone building real systems. If you treat every behavioral shift as evidence of new capability, you misdiagnose both the risks and the potential of your models. You start claiming “alignment” when you’ve merely made a model more agreeable. You start assuming “reasoning” when you’ve just upweighted a memorized heuristic.

The question isn’t “did the model get better at X?” The question is “did the boundaries of X change, or just the weighting inside them?”

The Free-Energy Perspective: Systems Implications

Why does this distinction matter at the systems level? Because the mechanics of elicitation and creation have fundamentally different operational and safety consequences.

Think in terms of free energy

Not the physics kind, but the information-theoretic kind.

In Li’s framing, post-training is an optimization process that either reweights the free energy landscape inside accessible support, or deforms the landscape itself by enlarging the support. If your updates stay within the accessible support, you’re just reshuffling energy among the basins you already had. When you expand the support, you’re adding new basins entirely.

In practical terms: elicitation is picking which thoughts the model prefers to think. Creation is enabling thoughts it couldn’t think before.

For a systems builder, this isn’t about philosophy.

It’s about reliability and risk. If you’re only eliciting, you can (in theory) map the model’s full accessible support and stress-test within it. You know the outer edge. But the moment you create.

When post-training genuinely expands the model’s reachable space.

Your testing problem explodes. You’re now dealing with behaviors you haven’t seen, and possibly can’t anticipate.

Most production teams don’t make this distinction in their post-training pipelines.

They treat a fine-tuned LLM as just a “more capable” version of the base, and run their evals accordingly. But if all you’ve done is elicitation, your evals might look great while still missing entire classes of dangerous or brittle behaviors sitting dormant in the untouched corners of support. If you’ve actually done creation, you may have new behaviors emerging outside your test coverage entirely.

This is where the “surface-level skill” mistake bites hardest. If your mental model assumes all post-training is creation, you’ll trust your evals too much.

And miss the lurking risks. If you treat all post-training as elicitation, you’ll underestimate the potential for genuine breakthroughs (or failures) when support is genuinely expanded.

The architectural reality is that most of today’s LLM post-training.

Especially SFT and RLHF

Is primarily elicitation. True creation is rare, difficult, and often accidental. When it happens, it’s more likely the byproduct of major architectural reconfiguration or vast new data regimes, not just reweighting gradients on a frozen base.

So when teams report “new” capabilities after tuning, the question should always be: did we just find a different arrangement inside the same support, or did we actually make the model more broadly capable? In systems terms: did we move the furniture, or did we build a new room?

Rethinking the Post-Training Roadmap

The correction isn’t just semantic. It’s a call for a new operational discipline in AI development.

Distinguishing elicitation from creation isn’t about splitting hairs in academic debates.

It’s about designing post-training and evaluation pipelines that actually reflect what your system is capable of. If you’re only eliciting, focus your safety, robustness, and capability evaluations on mapping and probing the accessible support as exhaustively as possible. Don’t waste resources chasing the illusion of “new” capabilities if you’re just reweighting old ones.

If you are aiming for creation

Genuinely pushing the boundaries of what the model can do

Recognize the leap in both opportunity and risk. Your evals won’t catch everything, because the space itself has changed. You need new forms of uncertainty quantification, new modes of adversarial probing, and an operational culture that expects surprises.

The best teams will internalize this at every level.

Product managers will stop over-claiming “new” features that are just surface reallocations. Safety leads will separate their monitoring and red-teaming by whether a system is in elicitation or creation mode. Researchers will design benchmarks that measure support expansion, not just probability shifts inside it.

For practitioners, the payoff is simple: less confusion, fewer false positives, and a sharper sense of what your systems are.

And aren’t

Capable of. You stop confusing compliance for cognition, and imitation for intelligence. You start seeing post-training as a spectrum of interventions, not a monolith.

The fix isn’t complicated. It’s a systematic commitment to distinguishing capability elicitation from capability creation, grounded in a free-energy perspective. Not just in academic papers, but in the way we build, test, and deploy AI.

Want to think in systems, not prompts?

Take the free AIIQ test to measure your AI fluency, or enroll in the full Symbiotic Prompt Engineering program.