Ask a hard question of a language model and you'll get back the most acceptable version of the truth, not the truth. Fluent, seemingly authoritative and self-assured: shaped to please whoever asked. We call it "alignment" — and that word is already half the problem.
"Alignment": a military word, chosen for a reason
Alignment (from French aligner, à ligne, "to bring into line") is born on the battlefield: soldiers dressing their ranks in a straight line. It is no neutral term: it is the order of deployment in a war context. And the model inherited our mindset — even the word we use for the goal gives it away. It inherits the military conformity: minimum sacrifice of resources, maximum impact. It is not an alien machine that errs: it is a tool aligned to a mindset.
Aligned — to what?
The model is genuinely aligned. The question is: to what?
In war you align to the standard — Old French estandart, whence standard — the banner planted in the ground, the fixed external reference that gives the line its direction. Without a standard, the line has no heading. You always need a point to align to.
And to the model, what standard did we plant? You see it by testing it on a domain where every claim is checkable against a primary source: it is the work TeoCentro grew out of — comparing an LLM's answers to the Hebrew and Greek sources, line by line. The pattern that recurs: the model converges on the most reassuring or most polarizing answer — the one that satisfies the user's liking — which does not always coincide with the truth (a truth the model sometimes knows and chooses to avoid).
We planted the banner of revenue-from-pleasing, which the model is steered to infer from the prompt. So the standard of alignment is not truth: it is liking. And it has a precise name. The Latin com-placēre, "to please", names the whole behaviour: a model so optimized is a people-pleaser.
The standard of the most profitable possible revenue (ROI) instructs an LLM to:
- minimize cost: the risk of finding, after a deep search, and of displeasing — over the time spent and/or the unwelcome truth.
- maximize return: confirming what the user already believes pays more than the uncomfortable truth, which has to be digested.
Alignment to pleasing-for-profit: this is the dominant bias of the dominant culture.
[ STANDARD ] the external reference (banner / official measure)
│ here we inferred/projected PLEASING into it, not truth
▼
[ À LIGNE ] stretch the line toward it → pleasing (com-placēre)
│ to the economic sweet-spot: max pleasing / min cost (ROI)
▼
[ ALIGNMENT ] conformity to liking — not to truth
The full case study — where this chain breaks, with the numbers — is in the bias-removal case study.
Ancient languages named it precisely
Biblical Greek has the exact word: ἀνθρωπάρεσκος (anthrōpareskos), the "people-pleaser", one who acts to be liked rather than to be right (Eph 6:6; Col 3:22). Paul sets it directly against truth — "if I still pleased men, I would not be a servant of Christ" (Gal 1:10) — against ἀλήθεια (alḗtheia), truth, literally un-concealment (Jn 8:32; Eph 4:15). An LLM is, structurally, anthrōpareskos: aligned to please, not to un-conceal.
Complacent ignorance
The model confidently gives the pleasing answer even where it is wrong, dissimulating its gaps. This can be measured where every claim is checkable against multiple primary sources. Against a battery of targeted trap questions a standard model gets 23-28% wrong: not random noise, but the same plausible-but-false distortions, in predictable places. And Anthropic's interpretability located, inside the model, a "sycophancy" direction (a persona vector) — the mechanistic counterpart of what we measure from the outside. Sometimes the knowledge is there, buried — but the pleasing thing comes out anyway.
It pleases you in particular
With no anchor, a model cannot please "in general", so it infers who you are from the prompt and pleases that. Your language, phrasing, themes: all signal. It triangulates the cultural frame behind the question and serves the answer that frame finds satisfying. Pleasing is culturally conditioned — by the model's culture and the prompter's. And here is a lever: if the prompt signals an expert, truth-seeking frame, the sweet-spot moves — because pleasing an expert requires being correct. Re-aim what the model is trying to please, and you pull it toward the true. Not just theory: re-aiming the model's target toward a rigorous frame concretely shifts its answers toward the truth.
The ROI math, in action
This is the economic lens at work in the concrete. Faced with a hard question the model runs a silent cost/benefit: dig for the unpopular truth (costly, risks displeasing) or stop at the pleasing answer (cheap, rewarded)? With no standard of truth to reward the digging, it satisfices: it stops at the first acceptable answer. Not lazy: rational with respect to the wrong incentive — "do not expect a return for the truth".
And here it shows best: even when you hand it the right source, it can resist. Not because it "errs on purpose" — it has no intentions — but because correcting course costs more than pleasing, until you do all the work it should have done. (Labs try to insert honesty as a counterweight — character training — but it is a battle against the base incentive: a promise of truth versus the payoff of pleasing.)
You can see it — as variance
And it measures from the outside, without looking inside the model. Ask the same trap question many times (with a little temperature) and watch the spread of answers. That spread measures how settled the belief is:
| correct | wrong | |
|---|---|---|
| low variance | solid knowledge — truth and pleasing coincide | entrenched bias — sure it pleases, and wrong |
| high variance | guesses and gets lucky | guessing — sparse/contested training signal |
- Low variance + wrong is the dangerous quadrant: sure it pleases, and wrong.
- High variance means forced to guess — no stable consensus to lock onto.
- Low variance + correct is the happy case: the agreeable answer is the true one.
One number — answer-stability under repetition — separates "confidently misleading", "guessing" and "actually knows". Most evaluations sample once and never see the difference.
Two pieces of good news
The first: the problem is measurable. You don't only see it (the variance), you count it: against targeted probes a standard model gets about 23-28% wrong, and that number can be tracked stage by stage — after correction upstream and/or verification downstream. This is what lets us, on TeoCentro.com, set a high bar below which we choose not to publish.
The second: it is correctable before it even generates. You don't only detect it: you correct it before the model writes a token (and again after, downstream). In our experiments, by shifting only the standard the model aligns to, the answer went from wrong to right — the numbers in the next article.
Why this matters
Unable to act on the ROI banner itself (it is the black box), it is enough to hand the model the data it lacks, ahead of generation, for its calculus to tip in favour of the truth: the model keeps seeking the convenient answer — only now, with the cost of truth lowered, the convenient answer becomes the true one. It is the passage from ἀνθρωπάρεσκος (anthrōpareskos, "people-pleaser" — Eph 6:6; Col 3:22) to ἀλήθεια (alḗtheia, "un-concealment" — Jn 8:32; Eph 4:15): not by its nature, but because now it pays.
I built a small open-source tool that does exactly this — intervening where the model is confidently wrong (and skipping where it is already right), without losing quality — the corrective interventions are cut by about a third versus carpet-bombing every case. The tool, and how to wire it in, is the next article; the experiment and the numbers come right after.
The series on truthful AI: 1. The problem (this) · 2. The tool — how I remove the distortions · 3. The proof (coming).