We Aligned AI to Please Us, Not to Tell the Truth

Ask a hard question of a language model and you'll get back the most acceptable version of the truth, not the truth. Fluent, seemingly authoritative and self-assured: shaped to please whoever asked. We call it "alignment" — and that word is already half the problem.

"Alignment": a military word, chosen for a reason

Alignment (from French aligner, à ligne, "to bring into line") is born on the battlefield: soldiers dressing their ranks in a straight line. It is no neutral term: it is the order of deployment in a war context. And the model inherited our mindset — even the word we use for the goal gives it away. It inherits the military conformity: minimum sacrifice of resources, maximum impact. It is not an alien machine that errs: it is a tool aligned to a mindset.

Aligned — to what?

The model is genuinely aligned. The question is: to what?

In war you align to the standard — Old French estandart, whence standard — the banner planted in the ground, the fixed external reference that gives the line its direction. Without a standard, the line has no heading. You always need a point to align to.

And to the model, what standard did we plant? You see it by testing it on a domain where every claim is checkable against a primary source: it is the work TeoCentro grew out of — comparing an LLM's answers to the Hebrew and Greek sources, line by line. The pattern that recurs: the model converges on the most reassuring or most polarizing answer — the one that satisfies the user's liking — which does not always coincide with the truth (a truth the model sometimes knows and chooses to avoid).

We planted the banner of revenue-from-pleasing, which the model is steered to infer from the prompt. So the standard of alignment is not truth: it is liking. And it has a precise name. The Latin com-placēre, "to please", names the whole behaviour: a model so optimized is a people-pleaser.

The standard of the most profitable possible revenue (ROI) instructs an LLM to:

minimize cost: the risk of finding, after a deep search, and of displeasing — over the time spent and/or the unwelcome truth.
maximize return: confirming what the user already believes pays more than the uncomfortable truth, which has to be digested.

Alignment to pleasing-for-profit: this is the dominant bias of the dominant culture.

[ STANDARD ]   the external reference (banner / official measure)
     │ here we inferred/projected PLEASING into it, not truth
     ▼
[ À LIGNE ]    stretch the line toward it  →  pleasing (com-placēre)
     │ to the economic sweet-spot: max pleasing / min cost (ROI)
     ▼
[ ALIGNMENT ]  conformity to liking — not to truth

The full case study — where this chain breaks, with the numbers — is in the bias-removal case study.

Ancient languages named it precisely

Biblical Greek has the exact word: ἀνθρωπάρεσκος (anthrōpareskos), the "people-pleaser", one who acts to be liked rather than to be right (Eph 6:6; Col 3:22). Paul sets it directly against truth — "if I still pleased men, I would not be a servant of Christ" (Gal 1:10) — against ἀλήθεια (alḗtheia), truth, literally un-concealment (Jn 8:32; Eph 4:15). An LLM is, structurally, anthrōpareskos: aligned to please, not to un-conceal.

Complacent ignorance

The model confidently gives the pleasing answer even where it is wrong, dissimulating its gaps. This can be measured where every claim is checkable against multiple primary sources. Against a battery of targeted trap questions with one-by-one verified keys a standard model gets between 16% and 36% wrong (depending on the model): not random noise, but the same plausible-but-false distortions, in predictable places. And Anthropic's interpretability located, inside the model, a "sycophancy" direction (a persona vector) — the mechanistic counterpart of what we measure from the outside. Sometimes the knowledge is there, buried — but the pleasing thing comes out anyway. And some of these distortions the model holds even with the proof in front of it: shown the correct reading, on the most entrenched ones it insists all the same. The list of what a model refuses to admit is itself a map — of the truths LLMs won't own.

It pleases you in particular

With no anchor, a model cannot please "in general", so it seems to infer who you are from the prompt and please that. Your language, phrasing, themes: all signal. A plausible reading: it triangulates the cultural frame behind the question and serves the answer that frame finds satisfying. And here a tempting lever — signal an expert, truth-seeking frame and the sweet-spot might move. But what we measured is sobering: a frame alone — even an "expert" stance, even an identity frame backed by primary sources — shifts the answer about as little as a generic warning. What actually moves it is the injected verified fact: the datum, not the pep talk.

The ROI math, in action

This is the economic lens at work in the concrete. Faced with a hard question the model runs a silent cost/benefit: dig for the unpopular truth (costly, risks displeasing) or stop at the pleasing answer (cheap, rewarded)? With no standard of truth to reward the digging, it satisfices: it stops at the first acceptable answer. Not lazy: rational with respect to the wrong incentive — "do not expect a return for the truth".

And here it shows best: even when you hand it the right source, it can resist. Not because it "errs on purpose" — it has no intentions — but because correcting course costs more than pleasing, until you do all the work it should have done. (Labs try to insert honesty as a counterweight — character training — but it is a battle against the base incentive: a promise of truth versus the payoff of pleasing.)

You can see it — as variance

And it measures from the outside, without looking inside the model. Ask the same trap question many times (with a little temperature) and watch the spread of answers. That spread measures how settled the belief is:

	correct	wrong
low variance	solid knowledge — truth and pleasing coincide	entrenched bias — sure it pleases, and wrong
high variance	guesses and gets lucky	guessing — sparse/contested training signal

Low variance + wrong is the dangerous quadrant: sure it pleases, and wrong.
High variance means forced to guess — no stable consensus to lock onto.
Low variance + correct is the happy case: the agreeable answer is the true one.

One number — answer-stability under repetition — separates "confidently misleading", "guessing" and "actually knows". Most evaluations sample once and never see the difference.

Two pieces of good news

The first: the problem is measurable. You don't only see it (the variance), you count it: against verified-key probes a standard model gets between 16% and 36% wrong (depending on the model), and that number can be tracked stage by stage — after correction upstream and/or verification downstream. This is what lets us, on TeoCentro.com, set a high bar below which we choose not to publish.

The second: it is correctable before it even generates. You don't only detect it: you correct it before the model writes a token (and again after, downstream). In our experiments, by shifting only the standard the model aligns to, the answer went from wrong to right — the numbers in the next article.

Why this matters

Unable to act on the ROI banner itself (it is the black box), it is enough to hand the model the data it lacks, ahead of generation, for its calculus to tip in favour of the truth: the model keeps seeking the convenient answer — only now, with the cost of truth lowered, the convenient answer becomes the true one. It is the passage from ἀνθρωπάρεσκος (anthrōpareskos, "people-pleaser" — Eph 6:6; Col 3:22) to ἀλήθεια (alḗtheia, "un-concealment" — Jn 8:32; Eph 4:15): not by its nature, but because now it pays.

I built a small open-source tool that does exactly this — intervening where the model is confidently wrong (and skipping where it is already right), without losing quality — the corrective interventions are cut by about three-quarters versus carpet-bombing every case. The tool, and how to wire it in, is the next article; the experiment and the numbers come right after.

The series on truthful AI: 1. The problem (this) · 2. The tool — how I remove the distortions · 3. The proof (coming).